Spark Optimization Playbook: Adaptive Query Execution AQE Tuning Guide
Adaptive Query Execution (AQE) Tuning Guide Datanest Digital — Spark Optimization Playbook AQE is Spark's runtime query re-optimization engine. It observes actual data statistics during execution a...

Source: DEV Community
Adaptive Query Execution (AQE) Tuning Guide Datanest Digital — Spark Optimization Playbook AQE is Spark's runtime query re-optimization engine. It observes actual data statistics during execution and adjusts the query plan on the fly. This guide covers every AQE feature, when it helps, and how to tune it. What AQE Does AQE re-optimizes the query plan at stage boundaries (after each shuffle). It can: Coalesce shuffle partitions — Merge small post-shuffle partitions into larger ones Switch join strategies — Convert sort-merge join to broadcast join at runtime Handle skewed joins — Split skewed partitions and replicate the other side Optimize skewed aggregations — Split skewed groups across multiple tasks Enabling AQE # Master switch (enabled by default on DBR 12.2+) spark.conf.set("spark.sql.adaptive.enabled", "true") AQE is generally safe to enable and should be on for all workloads. The only reason to disable it is for benchmarking to isolate its impact. Feature 1: Partition Coalescing