InTDS ArchivebyDario RadečićDuckDB and AWS — How to Aggregate 100 Million Rows in 1 MinuteProcess huge volumes of data with Python and DuckDB — An AWS S3 example.Apr 25, 20248Apr 25, 20248
InAncestry Product & TechnologybyThomas CardenasSolving the Small File Problem in Iceberg TablesThe Data Platform team at Ancestry has been maintaining a fully-refreshed 100-billion-row Apache Iceberg table for several months. A…Aug 29, 20234Aug 29, 20234
Thomas CardenasHow to Reduce Full Table Scans during Merges in Apache Iceberg and Save MoneySep 28, 2023Sep 28, 2023
Cesar CordobaHow to work with Iceberg format in AWS-Glue.As the official guide might be overwhelming some times, this post has been designed to cover all the main operations that one would want to…Sep 6, 20234Sep 6, 20234
InDatamindedbyJonathan MerlevedeUpserting Data using Spark and IcebergUse Spark and Iceberg’s MERGE INTO syntax to efficiently store daily, incremental snapshots of a mutable source table.May 25, 20236May 25, 20236