Autoscaling: How Databricks Scales Compute Automatically
Autoscaling dynamically adjusts the number of worker nodes in a cluster based on workload demand, adding workers when tasks queue up and removing them when utilisation drops. This balances performance and cost automatically, ensuring you have enough compute for peak loads without paying for idle capacity during quiet periods. Configure autoscaling with a min and max worker count on every cluster.
- Understand how Databricks autoscaling detects demand and adjusts cluster size
- Configure autoscaling parameters for clusters and SQL warehouses
- Tune scaling behaviour for different workload patterns
Who this is for: Data engineers and platform administrators who want to optimise cluster sizing for variable workloads.
Part of the Databricks Compute section of the Databricks tutorial series.
Architecture / Concept Overview: Autoscaling: How Databricks Scales Compute Automatically
Databricks autoscaling monitors the ratio of pending tasks to available slots. When pending tasks exceed available capacity, the scheduler requests additional worker nodes up to the configured maximum. When workers are idle for a sustained period, the scheduler decommissions them down to the configured minimum. This happens automatically without user intervention.
*The autoscaler monitors demand and adds or removes workers within the configured min/max range.*
Autoscaling operates differently for all-purpose clusters (optimised for interactive responsiveness) and job clusters (optimised for cost efficiency).
*All-purpose clusters scale up eagerly for responsiveness; job clusters scale down aggressively for cost savings.*
*Autoscaling maintains at least the minimum workers and scales up to the maximum based on demand.*
Key Terms
- Autoscaling
- Automatic adjustment of worker count based on workload demand within a configured min/max range.
- Min Workers
- The minimum number of worker nodes the cluster always maintains.
- Max Workers
- The maximum number of worker nodes the cluster can scale to under peak demand.
- Scale-Up
- Adding worker nodes when pending tasks exceed available capacity.
- Scale-Down
- Removing idle worker nodes after a sustained period of low utilisation.
- Optimised Autoscaling
- Databricks-enhanced autoscaling that considers shuffle data locality and task completion rates.
Prerequisites and Setup
- A Databricks workspace with cluster creation permissions
- Understanding of your workload's demand variability (bursty vs steady)
- Knowledge of cloud instance availability and quotas for your selected instance type
- Cluster policies that set appropriate min/max ranges
Step-by-Step Implementation
Configuration Reference
| Parameter | Description | Recommended Value |
|---|---|---|
autoscale.min_workers | Always-running workers | 1-2 for dev, 2-4 for production |
autoscale.max_workers | Upper scaling limit | 2-3x expected peak demand |
autotermination_minutes | Idle timeout for the whole cluster | 15-30 minutes |
| Cluster type | Scaling behaviour | All-purpose: eager up, conservative down. Job: balanced |
| Instance pool | Speeds up scale-up | Use pools for sub-minute node acquisition |
| Spot instances | Cost savings on scale-up | Enable for workers on fault-tolerant workloads |
Monitoring, Cost, and Security Considerations
Monitoring
Track cluster sizing events (UPSIZE_COMPLETED, DOWNSIZE_COMPLETED) in the event log. Graph worker count over time to identify whether your min/max settings match actual demand patterns.
Cost Optimisation
- Autoscaling is the single most effective cost lever for variable workloads.
- Set min_workers to the baseline needed for the quietest periods.
- Set max_workers to handle peak load without significant queuing.
- Combine autoscaling with Spot instances for workers to reduce scale-up costs.
- Pair autoscaling with auto-termination to eliminate idle cluster costs entirely.
Security and Governance
- Autoscaling does not change security boundaries; new workers inherit the cluster's access mode and Unity Catalog permissions.
- Use cluster policies to cap max_workers so autoscaling cannot trigger excessive cloud resource consumption.
- Monitor cloud quotas to ensure autoscaling can acquire instances when needed.
Common Pitfalls and Recommended Patterns
- Setting
min_workerstoo high: you pay for idle capacity during low-demand periods. - Setting
max_workerstoo low: workloads queue instead of scaling, increasing latency. - Using fixed
num_workersinstead of autoscaling: this wastes resources during low-load periods or bottlenecks during peaks. - Not pairing autoscaling with auto-termination: the cluster scales down workers but stays running with min workers indefinitely.
- Ignoring the difference between all-purpose and job cluster scaling: job clusters scale down faster, which is better for batch workloads.
- Not monitoring scaling events: without observability, you cannot tune min/max values.
Frequently Asked Questions
How quickly does autoscaling add workers?
Scale-up takes 3-7 minutes for classic clusters (cloud VM provisioning). With instance pools, new workers join in under a minute. Serverless compute scales in seconds.
Does autoscaling work with Spot instances?
Yes. Spot instances are preferred for scale-up workers. If Spot capacity is unavailable, Databricks falls back to on-demand instances for the additional workers.
Can I disable autoscaling for a specific job?
Yes. Set num_workers to a fixed value instead of providing an autoscale block. This is appropriate for workloads with predictable, constant resource needs.
Does autoscaling affect running tasks?
Scale-down only removes workers that have finished their current tasks. In-progress tasks are not interrupted. Spark gracefully decommissions nodes to avoid data loss.
What is the difference between cluster autoscaling and SQL warehouse scaling?
Cluster autoscaling adjusts worker nodes within a single cluster. SQL warehouse scaling adds or removes entire clusters behind a load balancer to handle query concurrency.