Verify settings

Classic compute runs Apache Spark clusters on VMs in your own cloud account, giving you full control over instance types, networking, and runtime configurations. You create all-purpose clusters for interactive work and job clusters for automated pipelines, configuring workers, autoscaling, auto-termination, and Spark properties to match your workload. Classic is the right choice when you need custom VPCs, GPU nodes, or specific instance families.

  • Create and configure all-purpose and job clusters via UI, CLI, and API
  • Tune worker count, autoscaling, auto-termination, and Spark settings
  • Understand when classic compute is preferred over serverless

Who this is for: Data engineers and platform administrators who need hands-on cluster management and custom infrastructure configurations.

Part of the Databricks Compute section of the Databricks tutorial series.

Architecture / Concept Overview: Verify settings

A classic Databricks cluster consists of a driver node and zero or more worker nodes running in your cloud VPC. The driver coordinates the Spark application while workers execute tasks in parallel. Databricks manages the Spark runtime, libraries, and cluster lifecycle, but the VMs run in your account, giving you network-level control and access to your existing cloud security posture.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED CP[Control Plane]:::governance -->|manages| Driver[Driver Node]:::processing Driver --> W1[Worker 1]:::processing Driver --> W2[Worker 2]:::processing Driver --> W3[Worker N]:::processing W1 --> Storage[(Cloud Object Storage)]:::storage W2 --> Storage W3 --> Storage

*The control plane manages the driver, which distributes tasks across worker nodes reading from your cloud storage.*

Cluster provisioning follows a lifecycle from request to termination, with optional pool acceleration.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Req[Create Request]:::source --> Validate[Policy Check]:::governance Validate --> Acquire[Acquire VMs]:::ingestion Acquire --> Init[Init Scripts]:::processing Init --> Running[Running]:::serving Running --> Idle[Idle Timeout]:::source Idle --> Terminated[Terminated]:::source

*A cluster moves from request through policy validation, VM acquisition, initialisation, running, and auto-termination.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED AP[All-Purpose Cluster]:::processing -->|interactive| Notebooks[Notebooks]:::serving AP -->|shared| MultiUser[Multi-User Access]:::source JC[Job Cluster]:::processing -->|automated| SingleJob[Single Job Run]:::serving JC -->|ephemeral| AutoTerm[Auto-Terminate on Completion]:::governance

*All-purpose clusters serve interactive multi-user work; job clusters are ephemeral and terminate after a single run.*

Key Terms

Driver Node
The node that runs the Spark driver process, coordinates tasks, and returns results to the user.
Worker Node
A node that executes Spark tasks assigned by the driver, processing data in parallel.
Spark Version
The Databricks Runtime version (e.g., 15.4 LTS), which bundles a specific Spark release with optimised libraries.
Node Type
The cloud VM instance type (e.g., m5d.xlarge) that determines CPU, memory, and local storage.
Init Script
A shell script that runs on each node at startup for custom library installation or environment configuration.
Auto-Termination
Automatic cluster shutdown after a configurable idle period to save costs.

Prerequisites and Setup

  • A Databricks workspace with classic compute enabled
  • Cluster-create permission or an appropriate cluster policy
  • Network configuration (VPC/VNet) if using private networking
  • Knowledge of your cloud's instance type naming conventions

Step-by-Step Implementation

    Configuration Reference

    Verify settings configuration options
    ParameterDescriptionRecommended Default
    spark_versionDatabricks Runtime versionLatest LTS
    node_type_idCloud VM instance typeGeneral-purpose (m5d.xlarge)
    driver_node_type_idSeparate instance type for driverSame as workers or one size larger
    num_workersFixed worker countUse autoscale instead
    autoscale.min_workersMinimum workers for autoscaling1
    autoscale.max_workersMaximum workers for autoscaling8 (adjust per workload)
    autotermination_minutesIdle minutes before shutdown30
    enable_photonUse Photon enginetrue for SQL-heavy workloads
    spark_confCustom Spark configurationEnable AQE

    Monitoring, Cost, and Security Considerations

    Monitoring

    Use the cluster event log to track starts, terminations, resizing, and failures. Query system.compute.clusters for utilisation metrics across all clusters. Set up alerts for clusters that run longer than expected or fail to auto-terminate.

    Cost Optimisation

    - Always enable auto-termination (15-30 minutes) on all-purpose clusters.

    - Use autoscaling to avoid paying for peak capacity when load is low.

    - Choose Spot/Preemptible instances for worker nodes on fault-tolerant workloads.

    - Use instance pools to trade a small idle-instance cost for faster starts.

    - Prefer job clusters over all-purpose clusters for scheduled workloads.

    Security and Governance

    - Classic clusters run in your VPC, enabling private networking, firewall rules, and VPN/Private Link.

    - Use Unity Catalog table ACLs to control data access rather than instance profiles where possible.

    - Restrict cluster configuration via cluster policies to prevent disabling security features.

    - Use init scripts from secure, audited locations (workspace files or Unity Catalog volumes).

    Common Pitfalls and Recommended Patterns

    • Setting num_workers instead of autoscale: fixed sizing wastes resources during low-load periods.
    • Using the latest non-LTS runtime: LTS versions receive patches longer and are more stable for production.
    • Forgetting autotermination_minutes: clusters left running overnight are the top cause of cost overruns.
    • Over-sizing the driver node: for most workloads the driver can be the same size as worker nodes.
    • Installing packages via init scripts when %pip install suffices: init scripts add start time and complexity.
    • Not tagging clusters: tags enable cost allocation and chargeback to teams.

    Frequently Asked Questions

    How long does a classic cluster take to start?

    Typically 3-7 minutes depending on the cloud provider, instance availability, and whether you are using an instance pool (which reduces it to under a minute).

    Can I use Spot instances for workers?

    Yes. Databricks supports Spot (AWS), Low-Priority (Azure), and Preemptible (GCP) instances for worker nodes. The driver always runs on on-demand to prevent job loss.

    What happens if a worker node is lost?

    Spark automatically reschedules failed tasks on remaining workers. If autoscaling is enabled, Databricks replaces the lost node. For Spot instances, Databricks handles preemption transparently.

    Should I use the same instance type for driver and workers?

    For most workloads, yes. Use a larger driver only when the driver collects large result sets or coordinates thousands of tasks. Use driver_node_type_id to set it independently.