Deep Dive into CDC with Iceberg - Workshop Series

Watch the Recordings On-Demand

Introduction

More and more companies choose Iceberg for its unmatched performance, cost savings, and schema and partition flexibility, which significantly enhance data consistency and operational efficiency in large-scale data lakes.

However, implementing CDC is tricky due to several factors. Frequent updates lead to many small files, complicating file management and degrading query performance. Furthermore, writing and optimizing frequent CDC updates and deletes often leads to conflicts, resulting in missing data. Additionally, Iceberg currently lacks support for global reordering of records, which is crucial for optimizing queries. These complexities require careful resource management and maintenance task scheduling to ensure efficient data processing and minimize data loss.

That is why we created a series of three recorded technical sessions, each focusing on CDC from a specific database system to Iceberg tables. Watch these recordings to gain a comprehensive understanding of the challenges and solutions associated with CDC in a data lake environment.

What You’ll Learn

Fundamentals of CDC with Iceberg Tables: Understand the architecture and mechanisms of CDC in the context of Iceberg.
Hands-on Implementation: Practical exercises on setting up CDC pipelines for SQL Server, MongoDB, and PostgreSQL.
Handling Semi-Structured Data: Techniques for processing and managing semi-structured data in Iceberg tables.
Best Practices and Optimization: Strategies for optimizing data ingestion and managing Iceberg table metadata.

Jason Hall

Principal Solution Architect

Roy Hasson

VP Product

Ajay Chhawacharia

Data Architect

Deep Dive into CDC with Iceberg - Workshop Series

Watch the Recordings On-Demand

Introduction

What You’ll Learn

Watch Now

Templates

All Templates