Verify settings

Classic compute runs Apache Spark clusters on VMs in your own cloud account, giving you full control over instance types, networking, and runtime configurations. You create all-purpose clusters for interactive work and job clusters for automated pipelines, configuring workers, autoscaling, auto-termination, and Spark properties to match your workload. Classic is the right choice when you need custom VPCs, GPU nodes, or specific instance families.

Create and configure all-purpose and job clusters via UI, CLI, and API
Tune worker count, autoscaling, auto-termination, and Spark settings
Understand when classic compute is preferred over serverless

Who this is for: Data engineers and platform administrators who need hands-on cluster management and custom infrastructure configurations.

Part of the Databricks Compute section of the Databricks tutorial series.

Architecture / Concept Overview: Verify settings

A classic Databricks cluster consists of a driver node and zero or more worker nodes running in your cloud VPC. The driver coordinates the Spark application while workers execute tasks in parallel. Databricks manages the Spark runtime, libraries, and cluster lifecycle, but the VMs run in your account, giving you network-level control and access to your existing cloud security posture.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED CP[Control Plane]:::governance -->|manages| Driver[Driver Node]:::processing Driver --> W1[Worker 1]:::processing Driver --> W2[Worker 2]:::processing Driver --> W3[Worker N]:::processing W1 --> Storage[(Cloud Object Storage)]:::storage W2 --> Storage W3 --> Storage

*The control plane manages the driver, which distributes tasks across worker nodes reading from your cloud storage.*

Cluster provisioning follows a lifecycle from request to termination, with optional pool acceleration.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Req[Create Request]:::source --> Validate[Policy Check]:::governance Validate --> Acquire[Acquire VMs]:::ingestion Acquire --> Init[Init Scripts]:::processing Init --> Running[Running]:::serving Running --> Idle[Idle Timeout]:::source Idle --> Terminated[Terminated]:::source

*A cluster moves from request through policy validation, VM acquisition, initialisation, running, and auto-termination.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED AP[All-Purpose Cluster]:::processing -->|interactive| Notebooks[Notebooks]:::serving AP -->|shared| MultiUser[Multi-User Access]:::source JC[Job Cluster]:::processing -->|automated| SingleJob[Single Job Run]:::serving JC -->|ephemeral| AutoTerm[Auto-Terminate on Completion]:::governance

*All-purpose clusters serve interactive multi-user work; job clusters are ephemeral and terminate after a single run.*

Key Terms

Driver Node: The node that runs the Spark driver process, coordinates tasks, and returns results to the user.
Worker Node: A node that executes Spark tasks assigned by the driver, processing data in parallel.
Spark Version: The Databricks Runtime version (e.g., 15.4 LTS), which bundles a specific Spark release with optimised libraries.
Node Type: The cloud VM instance type (e.g., m5d.xlarge) that determines CPU, memory, and local storage.
Init Script: A shell script that runs on each node at startup for custom library installation or environment configuration.
Auto-Termination: Automatic cluster shutdown after a configurable idle period to save costs.

Prerequisites and Setup

A Databricks workspace with classic compute enabled
Cluster-create permission or an appropriate cluster policy
Network configuration (VPC/VNet) if using private networking
Knowledge of your cloud's instance type naming conventions

Step-by-Step Implementation

Configuration Reference

Verify settings configuration options
Parameter	Description	Recommended Default
`spark_version`	Databricks Runtime version	Latest LTS
`node_type_id`	Cloud VM instance type	General-purpose (m5d.xlarge)
`driver_node_type_id`	Separate instance type for driver	Same as workers or one size larger
`num_workers`	Fixed worker count	Use autoscale instead
`autoscale.min_workers`	Minimum workers for autoscaling	1
`autoscale.max_workers`	Maximum workers for autoscaling	8 (adjust per workload)
`autotermination_minutes`	Idle minutes before shutdown	30
`enable_photon`	Use Photon engine	true for SQL-heavy workloads
`spark_conf`	Custom Spark configuration	Enable AQE

Monitoring, Cost, and Security Considerations

Monitoring

Use the cluster event log to track starts, terminations, resizing, and failures. Query system.compute.clusters for utilisation metrics across all clusters. Set up alerts for clusters that run longer than expected or fail to auto-terminate.

Cost Optimisation

- Always enable auto-termination (15-30 minutes) on all-purpose clusters.

- Use autoscaling to avoid paying for peak capacity when load is low.

- Choose Spot/Preemptible instances for worker nodes on fault-tolerant workloads.

- Use instance pools to trade a small idle-instance cost for faster starts.

- Prefer job clusters over all-purpose clusters for scheduled workloads.

Security and Governance

- Classic clusters run in your VPC, enabling private networking, firewall rules, and VPN/Private Link.

- Use Unity Catalog table ACLs to control data access rather than instance profiles where possible.

- Restrict cluster configuration via cluster policies to prevent disabling security features.

- Use init scripts from secure, audited locations (workspace files or Unity Catalog volumes).

Common Pitfalls and Recommended Patterns

Setting num_workers instead of autoscale: fixed sizing wastes resources during low-load periods.
Using the latest non-LTS runtime: LTS versions receive patches longer and are more stable for production.
Forgetting autotermination_minutes: clusters left running overnight are the top cause of cost overruns.
Over-sizing the driver node: for most workloads the driver can be the same size as worker nodes.
Installing packages via init scripts when %pip install suffices: init scripts add start time and complexity.
Not tagging clusters: tags enable cost allocation and chargeback to teams.

Frequently Asked Questions

How long does a classic cluster take to start?

Typically 3-7 minutes depending on the cloud provider, instance availability, and whether you are using an instance pool (which reduces it to under a minute).

Can I use Spot instances for workers?

Yes. Databricks supports Spot (AWS), Low-Priority (Azure), and Preemptible (GCP) instances for worker nodes. The driver always runs on on-demand to prevent job loss.

What happens if a worker node is lost?

Spark automatically reschedules failed tasks on remaining workers. If autoscaling is enabled, Databricks replaces the lost node. For Spot instances, Databricks handles preemption transparently.

Should I use the same instance type for driver and workers?

For most workloads, yes. Use a larger driver only when the driver collects large result sets or coordinates thousands of tasks. Use driver_node_type_id to set it independently.