Apache DataFusion. Putting Theory Into Practice by Matt Butrovich | DC Systems 004

Antithesis June 13, 2025
Video Thumbnail
Antithesis Logo

Antithesis

View Channel

About

Antithesis is an autonomous testing platform that helps you ship more reliable software by catching bugs before they reach production, and making them perfectly reproducible when they do. We continuously run your software in a simulated environment under real-world conditions to uncover hidden issues and streamline debugging. Trusted by teams in fintech, databases, distributed systems, and web3, Antithesis helps engineering teams boost productivity, improve code quality, and release with confidence.

Video Description

What if you could speed up Apache Spark using an embeddable Rust-based query engine? In this talk, Matt from Apple walks through Apache DataFusion, an open-source high-performance SQL engine for analytical workloads, and how it powers Comet, a Spark native accelerator that cuts query runtimes nearly in half. You will learn how DataFusion fits into modern data systems, why German-style strings can make queries dramatically faster, and how Comet replaces Spark's query execution with native Rust code while preserving Spark semantics. Matt also shares performance benchmarks, design tradeoffs, and practical challenges of keeping pace with Spark's evolving behavior. Whether you are building a database, optimizing a data pipeline, or just curious about columnar execution engines and system performance tricks, this talk offers a clear and engaging look into the future of fast pluggable data infrastructure. http://antithesis.com/

You May Also Like