Managing and Monitoring Compute Costs
Databricks compute costs are driven by DBU (Databricks Unit) consumption multiplied by your contract's per-DBU rate, plus underlying cloud infrastructure charges for VMs, storage, and networking. The primary levers for cost control are auto-termination, autoscaling, cluster policies, right-sizing, and choosing serverless where appropriate. Monitor costs through system tables, billing dashboards, and custom alerts.
- Understand the DBU-based billing model and what drives compute costs
- Set up monitoring using system tables and billing dashboards
- Implement cost governance through policies, budgets, and alerts
Who this is for: Platform administrators, FinOps engineers, and engineering managers responsible for Databricks spend.
Part of the Databricks Compute section of the Databricks tutorial series.
Architecture / Concept Overview: Managing and Monitoring Compute Costs
Databricks billing has two components: DBU charges (billed by Databricks) and cloud infrastructure charges (billed by your cloud provider). DBU consumption depends on the compute type, instance size, runtime, and whether Photon is enabled. System tables provide granular billing data you can query directly to build dashboards, set alerts, and perform chargeback.
*Total cost equals Databricks DBU charges plus cloud infrastructure charges for VMs, storage, and networking.*
Different compute types consume DBUs at different rates, which affects total cost.
*Each compute type has a different DBU rate; job clusters are cheaper per-DBU than all-purpose clusters.*
*Cost governance layers: policies enforce constraints, auto-termination prevents idle cost, autoscaling and right-sizing match resources to demand.*
Key Terms
- DBU (Databricks Unit)
- A normalised unit of compute consumption; billing equals DBUs consumed multiplied by the per-DBU rate.
- System Tables
- Built-in Delta tables in
system.billingandsystem.computethat provide granular usage and cost data. - Chargeback
- Allocating compute costs to specific teams or projects using cluster tags.
- Auto-Termination
- Automatic cluster shutdown after an idle period, preventing unattended compute costs.
- Committed Use
- Pre-purchased DBU capacity at a discounted per-DBU rate (annual or multi-year commitments).
- Cost Alert
- A SQL-based alert that fires when DBU consumption exceeds a defined threshold.
Prerequisites and Setup
- Workspace admin or account admin permissions for billing data access
- System tables enabled (
system.billing.usage,system.compute.clusters) - A SQL warehouse for querying billing data
- Cost allocation tags defined for teams and projects
Step-by-Step Implementation
Configuration Reference
| Cost Lever | Implementation | Impact |
|---|---|---|
| Auto-termination | autotermination_minutes: 15-30 | Eliminates idle cluster costs |
| Autoscaling | autoscale: {min: 1, max: 8} | Matches resources to demand |
| Cluster policies | Restrict instance types and max workers | Prevents over-provisioning |
| Job clusters | Use for scheduled workloads | Lower DBU rate, auto-terminate |
| Serverless | Use for bursty workloads | No idle cost |
| Spot instances | Enable for workers | 60-90% cloud VM savings |
| Committed use | Pre-purchase DBU capacity | Discounted per-DBU rate |
| Tagging | custom_tags.team on every cluster | Enables chargeback |
Monitoring, Cost, and Security Considerations
Monitoring
Query system.billing.usage daily to track DBU consumption trends. Build dashboards broken down by team, compute type, and workspace. Set SQL alerts for daily spend exceeding budgeted amounts.
Cost Optimisation
- Auto-termination is the highest-impact, lowest-effort cost lever. Enforce it on every cluster via policy.
- Move production workloads from all-purpose clusters to job clusters for lower DBU rates.
- Evaluate serverless for bursty workloads where idle time dominates the cost profile.
- Use committed-use discounts for predictable baseline consumption.
- Enable Spot/Preemptible instances for worker nodes on fault-tolerant workloads.
Security and Governance
- Restrict access to billing system tables to admins and FinOps teams.
- Use Unity Catalog row-level security if sharing cost dashboards across teams.
- Audit cluster creation events to find untagged or policy-exempt clusters.
- Implement approval workflows for clusters that exceed a DBU-per-day budget.
Common Pitfalls and Recommended Patterns
- Not tagging clusters: without tags, cost allocation and chargeback are impossible.
- Running all-purpose clusters for production: job clusters have lower DBU rates and auto-terminate.
- Ignoring auto-termination: a single forgotten cluster can cost hundreds of dollars overnight.
- Relying solely on the cloud bill: the cloud bill lacks DBU-level granularity; use system tables for Databricks costs.
- Not setting cost alerts: by the time you notice a cost spike on the monthly bill, the money is spent.
- Over-committing DBU purchases: start with a conservative commitment and increase as you understand usage patterns.
Frequently Asked Questions
How do I calculate the cost of a cluster?
Cost equals (DBU rate per hour for the instance type x number of nodes x runtime hours x per-DBU price) plus cloud VM costs. Use system.billing.usage for actual historical costs.
What is the difference between DBU cost and cloud cost?
DBU charges are billed by Databricks and cover the platform software. Cloud costs are billed by AWS/Azure/GCP and cover the underlying VMs, storage, and networking.
How do committed-use discounts work?
You pre-purchase a fixed number of DBUs per year at a discounted rate. If you exceed the commitment, overage is billed at the on-demand rate. If you under-utilise, the unused DBUs are lost.
Can I set a hard spending cap?
Databricks does not enforce hard spending caps, but you can use cluster policies, auto-termination, and budget alerts together to create an effective governance framework.
How do I charge costs back to teams?
Use custom_tags on clusters (e.g., team, project, cost_center) and join billing data with tag values in system.billing.usage.