Deep Dive into CDC with Iceberg - Workshop Series

Watch the Recordings On-Demand

Introduction

More and more companies choose Iceberg for its unmatched performance, cost savings, and schema and partition flexibility, which significantly enhance data consistency and operational efficiency in large-scale data lakes.

However, implementing CDC is tricky due to several factors. Frequent updates lead to many small files, complicating file management and degrading query performance. Furthermore, writing and optimizing frequent CDC updates and deletes often leads to conflicts, resulting in missing data. Additionally, Iceberg currently lacks support for global reordering of records, which is crucial for optimizing queries. These complexities require careful resource management and maintenance task scheduling to ensure efficient data processing and minimize data loss.

That is why we created a series of three recorded technical sessions, each focusing on CDC from a specific database system to Iceberg tables. Watch these recordings to gain a comprehensive understanding of the challenges and solutions associated with CDC in a data lake environment.

What You’ll Learn

  • Fundamentals of CDC with Iceberg Tables: Understand the architecture and mechanisms of CDC in the context of Iceberg.
  • Hands-on Implementation: Practical exercises on setting up CDC pipelines for SQL Server, MongoDB, and PostgreSQL.
  • Handling Semi-Structured Data: Techniques for processing and managing semi-structured data in Iceberg tables.
  • Best Practices and Optimization: Strategies for optimizing data ingestion and managing Iceberg table metadata.
Jason Hall
Principal Solution Architect
Roy Hasson
VP Product
Ajay Chhawacharia
Data Architect

Watch Now

Templates

All Templates

Explore our expert-made templates & start with the right one for you.