What is Databricks Compute?

Databricks Compute refers to the managed processing resources — clusters, SQL warehouses, and serverless endpoints — that execute all workloads on the platform. The compute layer is fully separated from storage, allowing you to scale processing up or down independently without moving data. Understanding the compute types and their trade-offs is the first step to running efficient, cost-effective workloads.

  • Understand the separation of compute and storage in Databricks
  • Learn the three primary compute types and when each applies
  • See how the control plane and compute plane interact

Who this is for: Engineers, analysts, and administrators who are new to Databricks and need a mental model of how compute works.

Part of the Databricks Compute section of the Databricks tutorial series.

Architecture / Concept Overview: What is Databricks Compute?

Databricks uses a two-plane architecture. The control plane, hosted by Databricks, manages cluster lifecycle, notebook state, job scheduling, and access control. The compute plane, which runs either in your cloud account (classic) or in a Databricks-managed account (serverless), is where Spark drivers and executors actually process data. This separation means your data never leaves your cloud account in classic mode, while serverless trades that control for instant start times and zero infrastructure management.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED User[User Request]:::source --> CP[Control Plane]:::governance CP -->|provisions| Classic[Classic Compute Plane]:::processing CP -->|provisions| Serverless[Serverless Compute Plane]:::serving Classic --> ObjStore[(Your Cloud Storage)]:::storage Serverless --> ObjStore

*The control plane orchestrates provisioning while classic and serverless compute planes read and write to your cloud storage.*

Each compute type serves a different access pattern. Interactive development uses all-purpose clusters, automated pipelines use job clusters, and SQL analytics uses SQL warehouses.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Dev[Developer]:::source --> APC[All-Purpose Cluster]:::processing Sched[Scheduler]:::source --> JC[Job Cluster]:::processing Analyst[Analyst / BI Tool]:::source --> SQL[SQL Warehouse]:::serving APC --> Delta[(Delta Lake)]:::storage JC --> Delta SQL --> Delta

*Three compute types serve three access patterns, all reading from the same Delta Lake storage layer.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED subgraph Classic Driver[Driver Node]:::processing --> Worker1[Worker 1]:::processing Driver --> Worker2[Worker 2]:::processing Driver --> WorkerN[Worker N]:::processing end subgraph Serverless SDriver[Managed Driver]:::serving --> SWorker[Managed Workers]:::serving end UC[Unity Catalog]:::governance -.->|governs| Classic UC -.->|governs| Serverless

*Both classic and serverless clusters run driver-worker architectures, governed by Unity Catalog.*

Key Terms

Control Plane
The Databricks-hosted layer that manages cluster lifecycle, notebook state, job scheduling, and user authentication.
Compute Plane
The layer where Spark drivers and executors run, either in your cloud account (classic) or Databricks-managed (serverless).
All-Purpose Cluster
A long-lived interactive cluster designed for notebook exploration and development by multiple users.
Job Cluster
An ephemeral cluster created for a single automated job run and terminated immediately afterward.
SQL Warehouse
A managed SQL-optimised compute endpoint for running queries, powering dashboards, and connecting BI tools.
Serverless
Compute infrastructure fully managed by Databricks with instant startup and no cluster configuration.

Prerequisites and Setup

  • A Databricks workspace on AWS, Azure, or GCP
  • Basic understanding of distributed computing concepts (driver, worker, executor)
  • Workspace admin role or cluster-create entitlement
  • Unity Catalog enabled for governance integration

Step-by-Step Implementation

    Configuration Reference

    What is Databricks Compute? configuration options
    ParameterAll-Purpose ClusterJob ClusterSQL Warehouse
    LifecycleManual start/stopAuto-created per runAuto-start on query
    Multi-userYesNo (single job)Yes
    Auto-terminationConfigurableAutomaticAuto-stop configurable
    AutoscalingMin/max workersMin/max workersT-shirt sizing
    Photon supportYesYesAlways on
    Serverless optionYesYesYes
    Typical start time3-7 min (classic)3-7 min (classic)Seconds (serverless)

    Monitoring, Cost, and Security Considerations

    Monitoring

    Query system.compute.clusters and system.billing.usage system tables to track which clusters are running, how many DBUs they consume, and whether they are idle. Set up SQL alerts to notify when a cluster has been idle for longer than its auto-termination window.

    Cost Optimisation

    - Right-size clusters: start with small instances and autoscaling rather than large fixed clusters.

    - Prefer serverless for bursty or short-lived workloads to avoid paying for idle time.

    - Use job clusters instead of all-purpose clusters for production to guarantee auto-termination.

    - Monitor DBU consumption weekly and review the top-consuming clusters.

    Security and Governance

    - Classic compute runs in your cloud VPC, giving full network control.

    - Serverless compute uses encryption in transit and at rest with Databricks-managed keys or customer-managed keys.

    - Unity Catalog enforces data access policies regardless of which compute type runs the query.

    - Enable credential passthrough or Lakeguard for user-level isolation on shared clusters.

    Common Pitfalls and Recommended Patterns

    • Confusing all-purpose and job clusters: all-purpose is for development, job clusters are for production automation.
    • Running production ETL on all-purpose clusters: this wastes DBUs on idle time between manual runs.
    • Over-provisioning workers: use autoscaling instead of fixed large clusters.
    • Ignoring auto-termination: every interactive cluster should terminate after 15-30 minutes of inactivity.
    • Assuming serverless works for all cases: custom networking and GPU workloads still require classic compute.
    • Skipping system table monitoring: you cannot optimise costs you do not measure.

    Frequently Asked Questions

    What is the difference between compute and a cluster?

    Compute is the umbrella term for all processing resources on Databricks (clusters, SQL warehouses, serverless endpoints). A cluster is one specific type of compute that runs the Apache Spark engine.

    Do I pay for compute when it is idle?

    For classic clusters, yes — you pay cloud VM costs and DBUs while the cluster is running, even if idle. For serverless, billing stops within seconds of the last query or command completing.

    Can I use both classic and serverless in the same workspace?

    Yes. Most workspaces use a mix: serverless SQL warehouses for analysts, classic clusters for workloads needing custom networking or GPUs, and serverless notebooks for ad-hoc development.

    How do I choose the right instance type?

    Start with a general-purpose instance (e.g., m5d.xlarge on AWS) and monitor CPU and memory utilisation. Switch to memory-optimised for caching-heavy workloads or compute-optimised for CPU-bound transformations.