List: Iceberg Datalake | Curated by Nathan Hanks | Medium

Apr 29, 2024

6 stories

Iceberg Datalake

In

TDS Archive

by

Dario Radečić

DuckDB and AWS — How to Aggregate 100 Million Rows in 1 Minute

Process huge volumes of data with Python and DuckDB — An AWS S3 example.

Apr 25, 2024

DuckDB and AWS — How to Aggregate 100 Million Rows in 1 Minute

Apr 25, 2024

In

Ancestry Product & Technology

by

Thomas Cardenas

Solving the Small File Problem in Iceberg Tables

The Data Platform team at Ancestry has been maintaining a fully-refreshed 100-billion-row Apache Iceberg table for several months. A…

Aug 29, 2023

Solving the Small File Problem in Iceberg Tables

Aug 29, 2023

Thomas Cardenas

How to Reduce Full Table Scans during Merges in Apache Iceberg and Save Money

Sep 28, 2023

How to Reduce Full Table Scans during Merges in Apache Iceberg and Save Money

Sep 28, 2023

Cesar Cordoba

How to work with Iceberg format in AWS-Glue.

As the official guide might be overwhelming some times, this post has been designed to cover all the main operations that one would want to…

Sep 6, 2023

Sep 6, 2023

Tabular

What’s new in Iceberg 1.1

Author: Ryan Blue at Tabular

Dec 9, 2022

What’s new in Iceberg 1.1

Dec 9, 2022

In

Dataminded

by

Jonathan Merlevede

Upserting Data using Spark and Iceberg

Use Spark and Iceberg’s MERGE INTO syntax to efficiently store daily, incremental snapshots of a mutable source table.

May 25, 2023

Upserting Data using Spark and Iceberg

May 25, 2023

Nathan Hanks
73 Followers
I like to talk and think about complex problems, in the domains of data science, software engineering, innovation, and CrossFit (yes, I’m that guy).
Following
Data Science Collective
Ran Isenberg
The Medium Blog
rishad
Jacob Eiting
See all (142)

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams