Databricks Compute
Databricks Compute is the collection of processing resources — clusters, SQL warehouses, serverless endpoints, and instance pools — that execute your workloads on the Databricks platform. Choosing the right compute type and configuration directly controls performance, cost, and security for every job, query, and notebook you run. This page is the pillar overview for the entire Compute section.
- Understand the compute options available on Databricks and when to use each
- Learn how serverless, classic, and SQL warehouse compute differ
- Navigate to focused tutorials on clusters, policies, Photon, autoscaling, and cost management
Who this is for: Data engineers, platform administrators, and analysts who provision or consume compute resources on Databricks.
Architecture / Concept Overview: Databricks Compute
Databricks separates storage from compute so you can scale processing independently of your data. The control plane manages cluster lifecycle, scheduling, and access control, while the compute plane runs in your cloud account (classic) or in a Databricks-managed account (serverless). Every workload — ETL pipelines, ad-hoc notebooks, SQL dashboards, or ML training — ultimately runs on one of these compute surfaces.
*The control plane orchestrates both classic and serverless compute, each reading and writing to Delta Lake storage.*
Different workloads map to different compute types. Interactive exploration uses all-purpose clusters, production pipelines use job clusters, and BI queries use SQL warehouses.
*Users choose all-purpose clusters for exploration, job clusters for production, and SQL warehouses for BI and SQL analytics.*
*Cluster policies govern what users can provision, instance pools speed up start times, and Photon accelerates queries.*
Key Terms
- All-Purpose Cluster
- An interactive cluster shared by multiple users for notebooks and exploratory work.
- Job Cluster
- A short-lived cluster that Databricks creates for a single job run and terminates automatically afterward.
- SQL Warehouse
- A managed compute endpoint optimised for SQL queries, dashboards, and BI tool connections.
- Serverless Compute
- Databricks-managed infrastructure that starts in seconds with no cluster configuration required.
- Instance Pool
- A set of pre-provisioned cloud VMs that reduce cluster start-up time by keeping idle instances warm.
- Photon
- A C++-based vectorised query engine that accelerates SQL and DataFrame workloads on Databricks.
- Cluster Policy
- A set of rules that constrains cluster configuration options available to users.
- DBU (Databricks Unit)
- A normalised unit of compute consumption used for billing.
Prerequisites and Setup
- A Databricks workspace on AWS, Azure, or GCP
- Workspace admin or cluster-create permissions
- Unity Catalog enabled for governance features
- Familiarity with Spark or SQL basics
Step-by-Step Implementation
Configuration Reference
| Compute Type | Best For | Start Time | Billing Model |
|---|---|---|---|
| All-purpose cluster | Interactive notebooks, exploration | 3-7 min (classic) | DBU per hour while running |
| Job cluster | Scheduled ETL and ML training | 3-7 min (classic) | DBU per hour, auto-terminates |
| Serverless compute | Notebooks and jobs with instant start | Seconds | DBU per hour, pay only while active |
| SQL warehouse (serverless) | SQL analytics and BI dashboards | Seconds | DBU per hour with auto-stop |
| SQL warehouse (classic) | SQL with custom networking | 3-7 min | DBU per hour with auto-stop |
| Instance pool | Speeding up cluster starts | Seconds (from pool) | Cloud VM cost while idle |
Monitoring, Cost, and Security Considerations
Monitoring
Use system tables (system.billing.usage, system.compute.clusters) to track cluster utilisation, query runtimes, and DBU consumption across all compute types. Set up alerts for idle clusters and unexpectedly long-running jobs.
Cost Optimisation
- Enable auto-termination on every all-purpose cluster (30 minutes or less).
- Use serverless compute where possible to eliminate idle costs.
- Apply cluster policies to cap maximum workers and restrict expensive instance types.
- Use instance pools to trade a small idle-VM cost for faster start times, reducing developer wait time.
Security and Governance
- Use Unity Catalog to control which data each compute resource can access.
- Enable Lakeguard for strong user isolation on shared clusters.
- Apply cluster policies so users cannot disable security features or over-provision resources.
- Use private networking (VPC peering or Private Link) for classic compute in regulated environments.
Common Pitfalls and Recommended Patterns
- Leaving all-purpose clusters running overnight: always set auto-termination to 30 minutes or less.
- Using all-purpose clusters for production jobs: use job clusters that auto-terminate after each run.
- Over-sizing clusters: start small with autoscaling and monitor utilisation before scaling up.
- Ignoring cluster policies: without policies, any user can spin up expensive GPU clusters.
- Skipping instance pools: if teams complain about start times, pools are cheaper than always-on clusters.
- Running SQL workloads on Spark clusters: use SQL warehouses for better price-performance on SQL.
Frequently Asked Questions
What is the difference between a cluster and a SQL warehouse?
A cluster runs general Spark workloads (Python, Scala, R, SQL) while a SQL warehouse is optimised exclusively for SQL queries and BI tool connectivity with automatic scaling and concurrency management.
Should I use serverless or classic compute?
Use serverless for most workloads — it starts in seconds and eliminates infrastructure management. Choose classic when you need custom networking, specific instance types, or GPU nodes not yet available in serverless.
How do I control compute costs?
Apply cluster policies, enable auto-termination, use autoscaling, prefer serverless, and monitor DBU consumption via system tables. The managing and monitoring compute costs tutorial covers this in detail.
Can multiple users share a cluster?
Yes. All-purpose clusters and SQL warehouses support multi-user access. Enable Lakeguard for strong isolation so users cannot see each other's data or intermediate results.
What is Photon and should I enable it?
Photon is a vectorised C++ query engine that accelerates SQL and DataFrame operations. Enable it on clusters and SQL warehouses running scan-heavy or aggregation-heavy workloads for significant speedups.