Verify settings
Classic compute runs Apache Spark clusters on VMs in your own cloud account, giving you full control over instance types, networking, and runtime configurations. You create all-purpose clusters for interactive work and job clusters for automated pipelines, configuring workers, autoscaling, auto-termination, and Spark properties to match your workload. Classic is the right choice when you need custom VPCs, GPU nodes, or specific instance families.
- Create and configure all-purpose and job clusters via UI, CLI, and API
- Tune worker count, autoscaling, auto-termination, and Spark settings
- Understand when classic compute is preferred over serverless
Who this is for: Data engineers and platform administrators who need hands-on cluster management and custom infrastructure configurations.
Part of the Databricks Compute section of the Databricks tutorial series.
Architecture / Concept Overview: Verify settings
A classic Databricks cluster consists of a driver node and zero or more worker nodes running in your cloud VPC. The driver coordinates the Spark application while workers execute tasks in parallel. Databricks manages the Spark runtime, libraries, and cluster lifecycle, but the VMs run in your account, giving you network-level control and access to your existing cloud security posture.
*The control plane manages the driver, which distributes tasks across worker nodes reading from your cloud storage.*
Cluster provisioning follows a lifecycle from request to termination, with optional pool acceleration.
*A cluster moves from request through policy validation, VM acquisition, initialisation, running, and auto-termination.*
*All-purpose clusters serve interactive multi-user work; job clusters are ephemeral and terminate after a single run.*
Key Terms
- Driver Node
- The node that runs the Spark driver process, coordinates tasks, and returns results to the user.
- Worker Node
- A node that executes Spark tasks assigned by the driver, processing data in parallel.
- Spark Version
- The Databricks Runtime version (e.g., 15.4 LTS), which bundles a specific Spark release with optimised libraries.
- Node Type
- The cloud VM instance type (e.g.,
m5d.xlarge) that determines CPU, memory, and local storage. - Init Script
- A shell script that runs on each node at startup for custom library installation or environment configuration.
- Auto-Termination
- Automatic cluster shutdown after a configurable idle period to save costs.
Prerequisites and Setup
- A Databricks workspace with classic compute enabled
- Cluster-create permission or an appropriate cluster policy
- Network configuration (VPC/VNet) if using private networking
- Knowledge of your cloud's instance type naming conventions
Step-by-Step Implementation
Configuration Reference
| Parameter | Description | Recommended Default |
|---|---|---|
spark_version | Databricks Runtime version | Latest LTS |
node_type_id | Cloud VM instance type | General-purpose (m5d.xlarge) |
driver_node_type_id | Separate instance type for driver | Same as workers or one size larger |
num_workers | Fixed worker count | Use autoscale instead |
autoscale.min_workers | Minimum workers for autoscaling | 1 |
autoscale.max_workers | Maximum workers for autoscaling | 8 (adjust per workload) |
autotermination_minutes | Idle minutes before shutdown | 30 |
enable_photon | Use Photon engine | true for SQL-heavy workloads |
spark_conf | Custom Spark configuration | Enable AQE |
Monitoring, Cost, and Security Considerations
Monitoring
Use the cluster event log to track starts, terminations, resizing, and failures. Query system.compute.clusters for utilisation metrics across all clusters. Set up alerts for clusters that run longer than expected or fail to auto-terminate.
Cost Optimisation
- Always enable auto-termination (15-30 minutes) on all-purpose clusters.
- Use autoscaling to avoid paying for peak capacity when load is low.
- Choose Spot/Preemptible instances for worker nodes on fault-tolerant workloads.
- Use instance pools to trade a small idle-instance cost for faster starts.
- Prefer job clusters over all-purpose clusters for scheduled workloads.
Security and Governance
- Classic clusters run in your VPC, enabling private networking, firewall rules, and VPN/Private Link.
- Use Unity Catalog table ACLs to control data access rather than instance profiles where possible.
- Restrict cluster configuration via cluster policies to prevent disabling security features.
- Use init scripts from secure, audited locations (workspace files or Unity Catalog volumes).
Common Pitfalls and Recommended Patterns
- Setting
num_workersinstead ofautoscale: fixed sizing wastes resources during low-load periods. - Using the latest non-LTS runtime: LTS versions receive patches longer and are more stable for production.
- Forgetting
autotermination_minutes: clusters left running overnight are the top cause of cost overruns. - Over-sizing the driver node: for most workloads the driver can be the same size as worker nodes.
- Installing packages via init scripts when
%pip installsuffices: init scripts add start time and complexity. - Not tagging clusters: tags enable cost allocation and chargeback to teams.
Frequently Asked Questions
How long does a classic cluster take to start?
Typically 3-7 minutes depending on the cloud provider, instance availability, and whether you are using an instance pool (which reduces it to under a minute).
Can I use Spot instances for workers?
Yes. Databricks supports Spot (AWS), Low-Priority (Azure), and Preemptible (GCP) instances for worker nodes. The driver always runs on on-demand to prevent job loss.
What happens if a worker node is lost?
Spark automatically reschedules failed tasks on remaining workers. If autoscaling is enabled, Databricks replaces the lost node. For Spot instances, Databricks handles preemption transparently.
Should I use the same instance type for driver and workers?
For most workloads, yes. Use a larger driver only when the driver collects large result sets or coordinates thousands of tasks. Use driver_node_type_id to set it independently.