The Databricks Data Intelligence Platform Explained
The Databricks Data Intelligence Platform is a unified, cloud-based lakehouse that combines data engineering, analytics, AI, and governance on a single copy of your data in open formats. It lets data engineers, analysts, and ML teams work on the same governed data without copying it between disconnected tools. After reading, you will understand the platform's two-plane architecture, its core layers, and how a request flows from raw source to governed insight.
- Explain the control plane vs compute plane split and why it matters for security and cost
- Describe the platform's core layers: storage (Delta Lake), governance (Unity Catalog), compute (Spark + Photon), and intelligence (Mosaic AI)
- Trace an end-to-end flow from ingestion through the medallion architecture to BI and AI serving
Who this is for: Data engineers, analytics engineers, and solutions architects new to Databricks who need an accurate mental model of the platform.
Part of the What is Databricks section in the Databricks tutorial series.
Architecture / Concept Overview: The Databricks Data Intelligence Platform Explained
The Databricks Data Intelligence Platform is built on a lakehouse foundation: it stores data in open Delta Lake format on your own cloud object storage, then layers unified compute, governance, and AI on top. Architecturally it splits into a Databricks-managed control plane (web UI, job orchestration, query routing, metadata) and a compute plane that runs in your cloud account close to your data, so raw data never has to leave your security boundary.
*Two-plane architecture: the managed control plane orchestrates work, while compute and your Delta Lake data stay in your own cloud account, all governed centrally by Unity Catalog.*
A typical workload moves through a layered "medallion" refinement, turning raw inputs into trustworthy, query-ready data.
*Data refinement across Bronze, Silver, and Gold tables, with each stage adding structure and quality before serving.*
Key Terms
- Data Intelligence Platform
- Databricks' branding for a lakehouse that adds AI-driven understanding of your data (natural-language search, semantics, and automation) on top of unified storage, governance, and compute.
- Control plane
- The Databricks-managed services (web app, REST APIs, orchestration, cluster manager) that coordinate work without holding your raw data.
- Compute plane
- Where queries and jobs actually run; with classic compute it runs in your cloud account, and with serverless it runs in Databricks-managed infrastructure in the same region.
- Lakehouse
- An architecture that delivers data-warehouse reliability and performance directly on data-lake storage using open table formats.
- Unity Catalog
- The unified governance layer that manages permissions, lineage, discovery, and auditing across all data and AI assets.
- Mosaic AI
- The platform's set of capabilities for building, tuning, serving, and governing machine learning and generative AI models.
Prerequisites and Setup
- A cloud account on AWS, Azure, or GCP with rights to create a Databricks workspace
- Permission to provision cloud object storage (S3, ADLS, or GCS) for the data layer
- A Databricks account/workspace with Unity Catalog enabled
- Basic familiarity with SQL and either Python or Scala
- Network access to the workspace URL and, for production, a plan for private connectivity
Step-by-Step Implementation
Create a workspace and enable Unity Catalog
Provision a workspace from your cloud marketplace or account console, then attach it to a Unity Catalog metastore for your region so all assets share one governance model.
# bash cell - inspect the active workspace with the Databricks CLI\ndatabricks current-user me\ndatabricks catalogs listDefine your governance namespace
Create a catalog and schema to hold the project's tables. The three-level
catalog.schema.tablenamespace is how Unity Catalog isolates and secures data.-- SQL cell - create a governed namespace\nCREATE CATALOG IF NOT EXISTS sales;\nCREATE SCHEMA IF NOT EXISTS sales.analytics;Ingest raw data into a Bronze table
Use Auto Loader to incrementally ingest files from cloud storage into a Delta table, which gives you schema tracking and exactly-once processing.
# Python cell - incremental ingestion into Bronze\n(spark.readStream\n.format("cloudFiles")\n.option("cloudFiles.format", "json")\n.load("/Volumes/sales/landing/orders/")\n.writeStream\n.option("checkpointLocation", "/Volumes/sales/_chk/orders/")\n.toTable("sales.analytics.orders_bronze"))Refine into Silver and Gold
Clean and conform the data into Silver, then aggregate business-ready metrics into Gold for analytics.
-- SQL cell - curated Gold aggregate\nCREATE OR REPLACE TABLE sales.analytics.daily_revenue_gold AS\nSELECT order_date, SUM(amount) AS revenue\nFROM sales.analytics.orders_silver\nGROUP BY order_date;Serve to BI and AI
Point a SQL Warehouse at the Gold tables for dashboards, and register features or models through Mosaic AI for downstream applications.
-- SQL cell - query served via a SQL Warehouse\nSELECT order_date, revenue\nFROM sales.analytics.daily_revenue_gold\nORDER BY order_date DESC\nLIMIT 30;
Configuration Reference
| Parameter / Option | Type | Default | Description |
|---|---|---|---|
| Compute type | enum (classic / serverless) | classic | Whether compute runs in your cloud account or Databricks-managed serverless infrastructure |
| Unity Catalog metastore | string | none | The regional governance metastore the workspace attaches to |
| Default catalog | string | hive_metastore | The catalog used when a query omits the catalog name; set to a UC catalog for governed defaults |
| Auto Loader format | enum (json/csv/parquet/...) | none | Source file format for incremental ingestion |
| Photon acceleration | boolean | enabled on supported compute | Vectorized C++ engine that speeds up SQL and DataFrame workloads |
Monitoring, Cost, and Security Considerations
Monitoring
Observe pipelines and queries through built-in system tables (billing, query history, audit logs) and job run history. Centralizing on system tables lets you build a single observability dashboard across all workspaces rather than stitching together per-tool logs.
Cost Optimisation
Costs are measured in DBUs (Databricks Units) that scale with compute size and runtime. Prefer serverless or autoscaling SQL Warehouses for spiky BI traffic, enable auto-termination on interactive clusters, and let Photon reduce wall-clock time, which directly reduces DBU consumption on scan-heavy workloads.
Security and Governance
Unity Catalog centralizes access control, row/column security, lineage, and auditing across every workspace on a metastore. Keep data in your own storage, use private networking for the workspace, and manage credentials through secret scopes rather than embedding them in code.
Common Pitfalls and Recommended Patterns
- Treating the lakehouse as a raw dump: enforce the Bronze/Silver/Gold pattern so consumers query curated, reliable tables.
- Skipping Unity Catalog: starting in the legacy
hive_metastorecreates governance debt; begin in a UC catalog. - Over-provisioning always-on clusters: use autoscaling and auto-termination to avoid idle DBU burn.
- Copying data into many tools: keep one governed copy and connect tools to it instead of exporting.
- Ignoring lineage: rely on Unity Catalog lineage to understand impact before changing upstream tables.
Frequently Asked Questions
Is Databricks just managed Apache Spark?
No. Spark is one compute engine within the platform, but Databricks adds Delta Lake storage, Unity Catalog governance, Photon, SQL Warehouses, orchestration, and Mosaic AI as an integrated system.
Where does my data physically live?
Your table data resides in your own cloud object storage in open Delta format. The control plane stores only metadata and orchestration state, not your raw records.
What is the difference between classic and serverless compute?
Classic compute runs in your cloud account and gives you full network control; serverless runs in Databricks-managed infrastructure for faster startup and less operational overhead. Both are governed identically by Unity Catalog.
Do I need to choose between data warehousing and data science?
No. The lakehouse supports SQL analytics, data engineering, and AI on the same governed tables, which removes the need for separate, siloed platforms.