What is Databricks?
Databricks is a unified, cloud-based Data Intelligence Platform built on the lakehouse architecture, combining data engineering, analytics, and AI on a single governed copy of your data in open formats. It unifies what used to require separate data lakes, warehouses, and ML tools into one platform spanning AWS, Azure, and GCP. This page is the pillar overview for the "What is Databricks" section and links to focused tutorials on each foundational topic.
- Understand what Databricks is and the lakehouse foundation it is built on
- See how the platform's pieces, storage, compute, governance, and AI, fit together
- Navigate to in-depth tutorials covering each core concept
Who this is for: Anyone new to Databricks who wants a clear starting point and a map of the fundamentals.
Architecture / Concept Overview: What is Databricks?
Databricks is a managed platform that puts data-warehouse reliability and performance directly on low-cost data-lake storage, the lakehouse. Data lives in your own cloud object storage in open Delta Lake format, while Databricks provides unified compute, governance through Unity Catalog, and AI through Mosaic AI. The result is one place where engineers, analysts, and data scientists work on the same governed data without copying it between disconnected systems.
*Databricks unifies storage, compute, governance, and AI on one lakehouse so every workload runs on a single governed copy of data.*
The platform consolidates a previously fragmented stack into a single, governed environment.
*Databricks replaces separate lakes, warehouses, and ML tools with one unified lakehouse platform.*
Key Terms
- Databricks
- A unified, cloud-based Data Intelligence Platform for data engineering, analytics, and AI built on the lakehouse architecture.
- Lakehouse
- An architecture delivering warehouse reliability and performance directly on data-lake storage using open table formats.
- Delta Lake
- The open table format providing ACID transactions, time travel, and performance features on object storage.
- Unity Catalog
- The unified governance layer for permissions, lineage, discovery, and auditing across data and AI assets.
- DBU (Databricks Unit)
- A normalized unit of processing consumed over time; the basis for Databricks consumption charges.
- Mosaic AI
- The platform's capabilities for building, tuning, serving, and governing machine learning and generative AI models.
Prerequisites and Setup
- A cloud account on AWS, Azure, or GCP with rights to create a Databricks workspace
- Permission to provision cloud object storage for the data layer
- A Databricks workspace with Unity Catalog enabled
- Basic familiarity with SQL and either Python or Scala
Step-by-Step Implementation
Start with the platform overview
Read how the two-plane architecture and core layers fit together.
Begin with: The Databricks Data Intelligence Platform ExplainedLearn the storage foundation
Understand the lakehouse and why it replaces the lake-plus-warehouse split.
Continue with: What is a Data Lakehouse?Compare against what you know
See how Databricks differs from traditional warehouses and from open-source Spark.
Compare: Databricks vs Traditional Data Warehouses; Databricks vs Apache SparkMap the components and cost
Learn the building blocks, choose a cloud, and understand DBU-based pricing.
Then: Core Components; AWS vs Azure vs GCP; Pricing and DBU ExplainedKnow the people and the language
Finish with who uses the platform and the essential glossary.
Finish with: Key Personas; Databricks Glossary
Configuration Reference
| Topic | What it covers | Why it matters |
|---|---|---|
| Platform explained | Two-plane architecture and core layers | The accurate mental model |
| Data lakehouse | Open transactional storage | The foundation everything sits on |
| vs Warehouses / vs Spark | Comparisons | Positions Databricks against alternatives |
| Core components | Workspace, compute, storage, governance, AI | How pieces connect |
| Clouds & Pricing | AWS/Azure/GCP and DBUs | Deployment and cost decisions |
| Personas & Glossary | Roles and vocabulary | Shared understanding |
Monitoring, Cost, and Security Considerations
Monitoring
Databricks centralizes observability in system tables covering billing, query history, and audit logs. Starting from these built-in tables gives a single source of truth across every workspace and workload.
Cost Optimisation
Costs are measured in DBUs that scale with compute size and runtime, so the main levers are autoscaling, auto-termination, and Photon acceleration. The dedicated pricing tutorial explains how to estimate and control spend in detail.
Security and Governance
Unity Catalog provides one governance model, permissions, lineage, row/column security, and auditing, across all assets. Keeping data in your own storage with private networking and secret scopes rounds out a secure baseline.
Common Pitfalls and Recommended Patterns
- Treating Databricks as just managed Spark: it is a full platform with storage, governance, and AI.
- Skipping Unity Catalog: begin in a governed catalog to avoid governance debt.
- Leaving compute always-on: use autoscaling and auto-termination to control DBU spend.
- Copying data into many tools: keep one governed copy and connect tools to it.
- Reading topics out of order: start with the platform overview, then the lakehouse foundation.
Frequently Asked Questions
What is Databricks in one sentence?
Databricks is a unified, cloud-based Data Intelligence Platform built on the lakehouse architecture that combines data engineering, analytics, and AI on one governed copy of your data.
Is Databricks the same as Apache Spark?
No. Spark is one compute engine inside Databricks, which also adds Delta Lake, Unity Catalog, Photon, SQL Warehouses, orchestration, and Mosaic AI. See the dedicated comparison tutorial.
Which cloud does Databricks run on?
Databricks runs on AWS, Azure, and GCP with a consistent core experience; differences are mostly in storage, identity, and networking, covered in the cloud comparison tutorial.
How is Databricks priced?
On consumption measured in DBUs (Databricks Units) plus underlying cloud infrastructure cost. The pricing tutorial breaks down the levers and how to control them.