Databricks Glossary: Essential Terms Every User Should Know
This glossary defines the core Databricks vocabulary, lakehouse, Delta Lake, Unity Catalog, DBU, Photon, clusters, SQL Warehouses, and more, so every user shares one accurate mental model. Knowing these terms precisely prevents the most common misunderstandings about how the platform works and what each part costs. After reading, you will recognize each essential term, understand how the terms relate, and know where to apply them.
- Define the most important Databricks terms in plain, precise language
- See how core concepts (storage, compute, governance, AI) relate to one another
- Use the terms correctly when configuring and discussing the platform
Who this is for: New Databricks users and anyone needing a quick, accurate reference for platform terminology.
Part of the What is Databricks section in the Databricks tutorial series.
Architecture / Concept Overview: Databricks Glossary: Essential Terms Every User Should Know
The essential terms cluster into four conceptual groups: storage (how data is kept), compute (how data is processed), governance (how data is controlled), and intelligence (how data drives AI). Seeing these groups helps you place any new term you encounter and understand which concern it addresses.
*The lakehouse organizes terminology into four groups: storage, compute, governance, and intelligence.*
Many terms appear together in a typical data flow, which clarifies how they relate in practice.
*Where common terms appear in a flow: Auto Loader ingests to Delta, Photon-powered compute transforms it, and a SQL Warehouse serves it.*
Key Terms
- Lakehouse
- An architecture combining data-warehouse reliability and performance with data-lake openness and scale on a single storage layer.
- Delta Lake
- The open table format adding ACID transactions, time travel, schema enforcement, and performance optimizations to files on object storage.
- Unity Catalog
- The unified governance layer for permissions, lineage, discovery, and auditing across data and AI assets.
- DBU (Databricks Unit)
- A normalized unit of processing capacity consumed over time; the basis for Databricks consumption charges.
- Photon
- A vectorized C++ query engine that accelerates SQL and DataFrame workloads compared to standard Spark execution.
- Medallion architecture
- A layered design refining data through Bronze (raw), Silver (cleaned), and Gold (curated) tables.
- SQL Warehouse
- Compute optimized for BI and SQL analytics on lakehouse tables, with autoscaling and Photon.
- Mosaic AI
- The platform's suite for building, tuning, serving, and governing machine learning and generative AI.
Prerequisites and Setup
- A Databricks workspace to try the terms in context
- Unity Catalog access to a catalog and schema
- Basic SQL familiarity
- This glossary open as a reference while you work
Step-by-Step Implementation
See Delta Lake in action
Create a Delta table to make the most fundamental storage term concrete.
-- SQL cell - create a Delta table (the lakehouse storage unit)\nCREATE TABLE main.demo.products (id INT, name STRING) USING delta;Observe time travel
Inspect a table's history to understand the "version" and "time travel" terms.
-- SQL cell - view the transaction history of a Delta table\nDESCRIBE HISTORY main.demo.products;Apply governance with Unity Catalog
Grant access to see how governance terminology maps to real commands.
-- SQL cell - a Unity Catalog grant\nGRANT SELECT ON TABLE main.demo.products TO `analysts`;Measure consumption in DBUs
Query billing system tables to connect the DBU term to actual usage.
-- SQL cell - DBUs consumed recently\nSELECT sku_name, SUM(usage_quantity) AS dbus\nFROM system.billing.usage\nWHERE usage_date >= current_date() - INTERVAL 7 DAYS\nGROUP BY sku_name;Run a Photon-accelerated query
Execute an aggregate on Photon-capable compute to ground the performance terminology.
-- SQL cell - scan-heavy query that benefits from Photon\nSELECT name, COUNT(*) AS n FROM main.demo.products GROUP BY name;
Configuration Reference
| Term | Category | Relates to | One-line meaning |
|---|---|---|---|
| Delta Lake | Storage | Lakehouse, Medallion | Transactional open table format |
| Unity Catalog | Governance | Lineage, Grants | Unified access and governance |
| DBU | Cost | Compute, Billing | Unit of processing consumed |
| Photon | Compute | SQL Warehouse, Spark | Vectorized acceleration engine |
| Medallion | Pattern | Bronze/Silver/Gold | Layered data refinement |
| SQL Warehouse | Compute | BI, Photon | Compute tuned for SQL/BI |
Monitoring, Cost, and Security Considerations
Monitoring
Knowing the terms makes system tables readable: system.billing.usage measures DBUs, system.query.history tracks queries, and DESCRIBE HISTORY shows Delta operations. Precise vocabulary turns raw telemetry into actionable observability.
Cost Optimisation
The DBU is the cost vocabulary you must master: lowering DBUs (via Photon, autoscaling, and auto-stop) is how you control spend. Understanding compute terms like SQL Warehouse vs job cluster directly informs cheaper choices.
Security and Governance
Governance terms, Unity Catalog, grants, row filters, column masks, and lineage, are the language of platform security. Using them correctly ensures access requests and audits are unambiguous.
Common Pitfalls and Recommended Patterns
- Confusing Delta Lake (format) with the lakehouse (architecture): they are related but distinct.
- Treating a DBU as a fixed price: it is a unit of consumption multiplied by a rate.
- Mixing up clusters and SQL Warehouses: clusters run general code, warehouses serve SQL/BI.
- Assuming Photon is automatic everywhere: confirm compute supports and enables it.
- Using "catalog" loosely: in Unity Catalog it is the top namespace level, not a vague synonym for database.
Frequently Asked Questions
What is the single most important term to learn first?
The lakehouse, because it frames everything else: storage, compute, governance, and AI all exist to deliver the lakehouse architecture on open data.
Is Delta Lake the same as the lakehouse?
No. Delta Lake is the open table format that makes storage transactional; the lakehouse is the broader architecture built on top of it.
What does DBU stand for and mean?
DBU stands for Databricks Unit, a normalized measure of processing capacity consumed over time, used as the basis for Databricks charges.
How does Photon relate to Spark?
Photon is a vectorized engine that accelerates compatible SQL and DataFrame operations within the Databricks Runtime, complementing rather than replacing the Spark APIs you write against.