Building Iceberg Lakehouse with Spark and Upsolver

E-Learning Module

Watch this eLearning module for a technical deep dive where we show you how to build and operate an Iceberg-based lakehouse.

You’ll start by learning how to create and query Iceberg tables using Apache Spark. Then, you’ll explore how data is organized in S3 and what properties you should tune for best performance.

What will be covered?

  • Spark in Lakehouse Architecture: Learn how Apache Iceberg integrates with Apache Spark, emphasizing its role in ETL and reducing the cost of data transformation and storage compared to a traditional data warehouse.
  • Simple and Reliable Ingestion with Upsolver: Examine how Upsolver simplifies the ingestion of operational data into Iceberg tables, highlighting its no-code and ZeroETL approaches for efficient data movement.
  • Impacts of Data Management on Query Performance: Explore the impacts of small files, fast and continuous updates/deletes, and manifest file churn on query performance. Compare how data is managed and optimized between Spark and Upsolver, including how each handles schema evolution and transactional concurrency.
  • Best Practices for Implementing Lakehouse Architectures: Discuss best practices for deploying and managing a lakehouse architecture using Spark and Upsolver, with insights into optimizing storage, improving query speeds, and ensuring high quality, reliable data

 

Presented by:

Roy Hasson
VP Product

Watch now

Templates

All Templates

Explore our expert-made templates & start with the right one for you.