What is Databricks?

Databricks is a unified, cloud-based Data Intelligence Platform built on the lakehouse architecture, combining data engineering, analytics, and AI on a single governed copy of your data in open formats. It unifies what used to require separate data lakes, warehouses, and ML tools into one platform spanning AWS, Azure, and GCP. This page is the pillar overview for the "What is Databricks" section and links to focused tutorials on each foundational topic.

  • Understand what Databricks is and the lakehouse foundation it is built on
  • See how the platform's pieces, storage, compute, governance, and AI, fit together
  • Navigate to in-depth tutorials covering each core concept

Who this is for: Anyone new to Databricks who wants a clear starting point and a map of the fundamentals.

Architecture / Concept Overview: What is Databricks?

Databricks is a managed platform that puts data-warehouse reliability and performance directly on low-cost data-lake storage, the lakehouse. Data lives in your own cloud object storage in open Delta Lake format, while Databricks provides unified compute, governance through Unity Catalog, and AI through Mosaic AI. The result is one place where engineers, analysts, and data scientists work on the same governed data without copying it between disconnected systems.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Src[All Your Data]:::source --> Store[(Delta Lake on Object Storage)]:::storage Compute[Unified Compute: Spark and Photon]:::processing --> Store Gov[Unity Catalog Governance]:::governance -.governs.-> Store Store --> BI[Analytics and BI]:::serving Store --> AI[Mosaic AI and ML]:::serving Store --> DE[Data Engineering]:::serving

*Databricks unifies storage, compute, governance, and AI on one lakehouse so every workload runs on a single governed copy of data.*

The platform consolidates a previously fragmented stack into a single, governed environment.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef neutral fill:#2A2F3A,stroke:#7A828F,stroke-width:2px,rx:8,ry:8,color:#E0E6ED subgraph Before [Fragmented Stack] L[Data Lake]:::neutral W[Warehouse]:::neutral ML[Separate ML Tools]:::neutral end subgraph After [Databricks] One[One Lakehouse Platform]:::processing --> Out[BI, Engineering, AI]:::serving end Before --> After

*Databricks replaces separate lakes, warehouses, and ML tools with one unified lakehouse platform.*

Key Terms

Databricks
A unified, cloud-based Data Intelligence Platform for data engineering, analytics, and AI built on the lakehouse architecture.
Lakehouse
An architecture delivering warehouse reliability and performance directly on data-lake storage using open table formats.
Delta Lake
The open table format providing ACID transactions, time travel, and performance features on object storage.
Unity Catalog
The unified governance layer for permissions, lineage, discovery, and auditing across data and AI assets.
DBU (Databricks Unit)
A normalized unit of processing consumed over time; the basis for Databricks consumption charges.
Mosaic AI
The platform's capabilities for building, tuning, serving, and governing machine learning and generative AI models.

Prerequisites and Setup

  • A cloud account on AWS, Azure, or GCP with rights to create a Databricks workspace
  • Permission to provision cloud object storage for the data layer
  • A Databricks workspace with Unity Catalog enabled
  • Basic familiarity with SQL and either Python or Scala

Step-by-Step Implementation

  1. Start with the platform overview

    Read how the two-plane architecture and core layers fit together.

    Begin with: The Databricks Data Intelligence Platform Explained
  2. Learn the storage foundation

    Understand the lakehouse and why it replaces the lake-plus-warehouse split.

    Continue with: What is a Data Lakehouse?
  3. Compare against what you know

    See how Databricks differs from traditional warehouses and from open-source Spark.

    Compare: Databricks vs Traditional Data Warehouses; Databricks vs Apache Spark
  4. Map the components and cost

    Learn the building blocks, choose a cloud, and understand DBU-based pricing.

    Then: Core Components; AWS vs Azure vs GCP; Pricing and DBU Explained
  5. Know the people and the language

    Finish with who uses the platform and the essential glossary.

    Finish with: Key Personas; Databricks Glossary

Configuration Reference

What is Databricks? configuration options
TopicWhat it coversWhy it matters
Platform explainedTwo-plane architecture and core layersThe accurate mental model
Data lakehouseOpen transactional storageThe foundation everything sits on
vs Warehouses / vs SparkComparisonsPositions Databricks against alternatives
Core componentsWorkspace, compute, storage, governance, AIHow pieces connect
Clouds & PricingAWS/Azure/GCP and DBUsDeployment and cost decisions
Personas & GlossaryRoles and vocabularyShared understanding

Monitoring, Cost, and Security Considerations

Monitoring

Databricks centralizes observability in system tables covering billing, query history, and audit logs. Starting from these built-in tables gives a single source of truth across every workspace and workload.

Cost Optimisation

Costs are measured in DBUs that scale with compute size and runtime, so the main levers are autoscaling, auto-termination, and Photon acceleration. The dedicated pricing tutorial explains how to estimate and control spend in detail.

Security and Governance

Unity Catalog provides one governance model, permissions, lineage, row/column security, and auditing, across all assets. Keeping data in your own storage with private networking and secret scopes rounds out a secure baseline.

Common Pitfalls and Recommended Patterns

  • Treating Databricks as just managed Spark: it is a full platform with storage, governance, and AI.
  • Skipping Unity Catalog: begin in a governed catalog to avoid governance debt.
  • Leaving compute always-on: use autoscaling and auto-termination to control DBU spend.
  • Copying data into many tools: keep one governed copy and connect tools to it.
  • Reading topics out of order: start with the platform overview, then the lakehouse foundation.

Frequently Asked Questions

What is Databricks in one sentence?

Databricks is a unified, cloud-based Data Intelligence Platform built on the lakehouse architecture that combines data engineering, analytics, and AI on one governed copy of your data.

Is Databricks the same as Apache Spark?

No. Spark is one compute engine inside Databricks, which also adds Delta Lake, Unity Catalog, Photon, SQL Warehouses, orchestration, and Mosaic AI. See the dedicated comparison tutorial.

Which cloud does Databricks run on?

Databricks runs on AWS, Azure, and GCP with a consistent core experience; differences are mostly in storage, identity, and networking, covered in the cloud comparison tutorial.

How is Databricks priced?

On consumption measured in DBUs (Databricks Units) plus underlying cloud infrastructure cost. The pricing tutorial breaks down the levers and how to control them.