What is Databricks Compute?

Databricks Compute refers to the managed processing resources — clusters, SQL warehouses, and serverless endpoints — that execute all workloads on the platform. The compute layer is fully separated from storage, allowing you to scale processing up or down independently without moving data. Understanding the compute types and their trade-offs is the first step to running efficient, cost-effective workloads.

Understand the separation of compute and storage in Databricks
Learn the three primary compute types and when each applies
See how the control plane and compute plane interact

Who this is for: Engineers, analysts, and administrators who are new to Databricks and need a mental model of how compute works.

Part of the Databricks Compute section of the Databricks tutorial series.

Architecture / Concept Overview: What is Databricks Compute?

Databricks uses a two-plane architecture. The control plane, hosted by Databricks, manages cluster lifecycle, notebook state, job scheduling, and access control. The compute plane, which runs either in your cloud account (classic) or in a Databricks-managed account (serverless), is where Spark drivers and executors actually process data. This separation means your data never leaves your cloud account in classic mode, while serverless trades that control for instant start times and zero infrastructure management.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED User[User Request]:::source --> CP[Control Plane]:::governance CP -->|provisions| Classic[Classic Compute Plane]:::processing CP -->|provisions| Serverless[Serverless Compute Plane]:::serving Classic --> ObjStore[(Your Cloud Storage)]:::storage Serverless --> ObjStore

*The control plane orchestrates provisioning while classic and serverless compute planes read and write to your cloud storage.*

Each compute type serves a different access pattern. Interactive development uses all-purpose clusters, automated pipelines use job clusters, and SQL analytics uses SQL warehouses.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Dev[Developer]:::source --> APC[All-Purpose Cluster]:::processing Sched[Scheduler]:::source --> JC[Job Cluster]:::processing Analyst[Analyst / BI Tool]:::source --> SQL[SQL Warehouse]:::serving APC --> Delta[(Delta Lake)]:::storage JC --> Delta SQL --> Delta

*Three compute types serve three access patterns, all reading from the same Delta Lake storage layer.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED subgraph Classic Driver[Driver Node]:::processing --> Worker1[Worker 1]:::processing Driver --> Worker2[Worker 2]:::processing Driver --> WorkerN[Worker N]:::processing end subgraph Serverless SDriver[Managed Driver]:::serving --> SWorker[Managed Workers]:::serving end UC[Unity Catalog]:::governance -.->|governs| Classic UC -.->|governs| Serverless

*Both classic and serverless clusters run driver-worker architectures, governed by Unity Catalog.*

Key Terms

Control Plane: The Databricks-hosted layer that manages cluster lifecycle, notebook state, job scheduling, and user authentication.
Compute Plane: The layer where Spark drivers and executors run, either in your cloud account (classic) or Databricks-managed (serverless).
All-Purpose Cluster: A long-lived interactive cluster designed for notebook exploration and development by multiple users.
Job Cluster: An ephemeral cluster created for a single automated job run and terminated immediately afterward.
SQL Warehouse: A managed SQL-optimised compute endpoint for running queries, powering dashboards, and connecting BI tools.
Serverless: Compute infrastructure fully managed by Databricks with instant startup and no cluster configuration.

Prerequisites and Setup

A Databricks workspace on AWS, Azure, or GCP
Basic understanding of distributed computing concepts (driver, worker, executor)
Workspace admin role or cluster-create entitlement
Unity Catalog enabled for governance integration

Step-by-Step Implementation

Configuration Reference

What is Databricks Compute? configuration options
Parameter	All-Purpose Cluster	Job Cluster	SQL Warehouse
Lifecycle	Manual start/stop	Auto-created per run	Auto-start on query
Multi-user	Yes	No (single job)	Yes
Auto-termination	Configurable	Automatic	Auto-stop configurable
Autoscaling	Min/max workers	Min/max workers	T-shirt sizing
Photon support	Yes	Yes	Always on
Serverless option	Yes	Yes	Yes
Typical start time	3-7 min (classic)	3-7 min (classic)	Seconds (serverless)

Monitoring, Cost, and Security Considerations

Monitoring

Query system.compute.clusters and system.billing.usage system tables to track which clusters are running, how many DBUs they consume, and whether they are idle. Set up SQL alerts to notify when a cluster has been idle for longer than its auto-termination window.

Cost Optimisation

- Right-size clusters: start with small instances and autoscaling rather than large fixed clusters.

- Prefer serverless for bursty or short-lived workloads to avoid paying for idle time.

- Use job clusters instead of all-purpose clusters for production to guarantee auto-termination.

- Monitor DBU consumption weekly and review the top-consuming clusters.

Security and Governance

- Classic compute runs in your cloud VPC, giving full network control.

- Serverless compute uses encryption in transit and at rest with Databricks-managed keys or customer-managed keys.

- Unity Catalog enforces data access policies regardless of which compute type runs the query.

- Enable credential passthrough or Lakeguard for user-level isolation on shared clusters.

Common Pitfalls and Recommended Patterns

Confusing all-purpose and job clusters: all-purpose is for development, job clusters are for production automation.
Running production ETL on all-purpose clusters: this wastes DBUs on idle time between manual runs.
Over-provisioning workers: use autoscaling instead of fixed large clusters.
Ignoring auto-termination: every interactive cluster should terminate after 15-30 minutes of inactivity.
Assuming serverless works for all cases: custom networking and GPU workloads still require classic compute.
Skipping system table monitoring: you cannot optimise costs you do not measure.

Frequently Asked Questions

What is the difference between compute and a cluster?

Compute is the umbrella term for all processing resources on Databricks (clusters, SQL warehouses, serverless endpoints). A cluster is one specific type of compute that runs the Apache Spark engine.

Do I pay for compute when it is idle?

For classic clusters, yes — you pay cloud VM costs and DBUs while the cluster is running, even if idle. For serverless, billing stops within seconds of the last query or command completing.

Can I use both classic and serverless in the same workspace?

Yes. Most workspaces use a mix: serverless SQL warehouses for analysts, classic clusters for workloads needing custom networking or GPUs, and serverless notebooks for ad-hoc development.

How do I choose the right instance type?

Start with a general-purpose instance (e.g., m5d.xlarge on AWS) and monitor CPU and memory utilisation. Switch to memory-optimised for caching-heavy workloads or compute-optimised for CPU-bound transformations.