The Databricks Data Intelligence Platform Explained

The Databricks Data Intelligence Platform is a unified, cloud-based lakehouse that combines data engineering, analytics, AI, and governance on a single copy of your data in open formats. It lets data engineers, analysts, and ML teams work on the same governed data without copying it between disconnected tools. After reading, you will understand the platform's two-plane architecture, its core layers, and how a request flows from raw source to governed insight.

Explain the control plane vs compute plane split and why it matters for security and cost
Describe the platform's core layers: storage (Delta Lake), governance (Unity Catalog), compute (Spark + Photon), and intelligence (Mosaic AI)
Trace an end-to-end flow from ingestion through the medallion architecture to BI and AI serving

Who this is for: Data engineers, analytics engineers, and solutions architects new to Databricks who need an accurate mental model of the platform.

Part of the What is Databricks section in the Databricks tutorial series.

Architecture / Concept Overview: The Databricks Data Intelligence Platform Explained

The Databricks Data Intelligence Platform is built on a lakehouse foundation: it stores data in open Delta Lake format on your own cloud object storage, then layers unified compute, governance, and AI on top. Architecturally it splits into a Databricks-managed control plane (web UI, job orchestration, query routing, metadata) and a compute plane that runs in your cloud account close to your data, so raw data never has to leave your security boundary.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED subgraph ControlPlane [Databricks Control Plane] UI[Workspace UI and APIs]:::processing Jobs[Job and Query Orchestration]:::processing end subgraph ComputePlane [Compute Plane in Your Cloud] Compute[Clusters and SQL Warehouses]:::processing Store[(Delta Lake on Object Storage)]:::storage end Gov[Unity Catalog Governance]:::governance Serve[BI, Apps, and AI Serving]:::serving UI --> Jobs --> Compute Compute --> Store Gov -.governs.-> Store Gov -.governs.-> Compute Store --> Serve

*Two-plane architecture: the managed control plane orchestrates work, while compute and your Delta Lake data stay in your own cloud account, all governed centrally by Unity Catalog.*

A typical workload moves through a layered "medallion" refinement, turning raw inputs into trustworthy, query-ready data.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED Src[Source Systems]:::source --> Ing[Auto Loader Ingestion]:::ingestion Ing --> Bronze[(Bronze Raw)]:::storage Bronze --> Silver[(Silver Cleaned)]:::storage Silver --> Gold[(Gold Curated)]:::storage Gold --> BI[Dashboards and AI]:::serving

*Data refinement across Bronze, Silver, and Gold tables, with each stage adding structure and quality before serving.*

Key Terms

Data Intelligence Platform: Databricks' branding for a lakehouse that adds AI-driven understanding of your data (natural-language search, semantics, and automation) on top of unified storage, governance, and compute.
Control plane: The Databricks-managed services (web app, REST APIs, orchestration, cluster manager) that coordinate work without holding your raw data.
Compute plane: Where queries and jobs actually run; with classic compute it runs in your cloud account, and with serverless it runs in Databricks-managed infrastructure in the same region.
Lakehouse: An architecture that delivers data-warehouse reliability and performance directly on data-lake storage using open table formats.
Unity Catalog: The unified governance layer that manages permissions, lineage, discovery, and auditing across all data and AI assets.
Mosaic AI: The platform's set of capabilities for building, tuning, serving, and governing machine learning and generative AI models.

Prerequisites and Setup

A cloud account on AWS, Azure, or GCP with rights to create a Databricks workspace
Permission to provision cloud object storage (S3, ADLS, or GCS) for the data layer
A Databricks account/workspace with Unity Catalog enabled
Basic familiarity with SQL and either Python or Scala
Network access to the workspace URL and, for production, a plan for private connectivity

Step-by-Step Implementation

Create a workspace and enable Unity Catalog
Provision a workspace from your cloud marketplace or account console, then attach it to a Unity Catalog metastore for your region so all assets share one governance model.
```
# bash cell - inspect the active workspace with the Databricks CLI\ndatabricks current-user me\ndatabricks catalogs list
```
Define your governance namespace
Create a catalog and schema to hold the project's tables. The three-level catalog.schema.table namespace is how Unity Catalog isolates and secures data.
```
-- SQL cell - create a governed namespace\nCREATE CATALOG IF NOT EXISTS sales;\nCREATE SCHEMA IF NOT EXISTS sales.analytics;
```

Ingest raw data into a Bronze table

Use Auto Loader to incrementally ingest files from cloud storage into a Delta table, which gives you schema tracking and exactly-once processing.

# Python cell - incremental ingestion into Bronze\n(spark.readStream\n.format("cloudFiles")\n.option("cloudFiles.format", "json")\n.load("/Volumes/sales/landing/orders/")\n.writeStream\n.option("checkpointLocation", "/Volumes/sales/_chk/orders/")\n.toTable("sales.analytics.orders_bronze"))

Refine into Silver and Gold

Clean and conform the data into Silver, then aggregate business-ready metrics into Gold for analytics.

-- SQL cell - curated Gold aggregate\nCREATE OR REPLACE TABLE sales.analytics.daily_revenue_gold AS\nSELECT order_date, SUM(amount) AS revenue\nFROM sales.analytics.orders_silver\nGROUP BY order_date;

Serve to BI and AI
Point a SQL Warehouse at the Gold tables for dashboards, and register features or models through Mosaic AI for downstream applications.
```
-- SQL cell - query served via a SQL Warehouse\nSELECT order_date, revenue\nFROM sales.analytics.daily_revenue_gold\nORDER BY order_date DESC\nLIMIT 30;
```

Configuration Reference

The Databricks Data Intelligence Platform Explained configuration options
Parameter / Option	Type	Default	Description
Compute type	enum (classic / serverless)	classic	Whether compute runs in your cloud account or Databricks-managed serverless infrastructure
Unity Catalog metastore	string	none	The regional governance metastore the workspace attaches to
Default catalog	string	`hive_metastore`	The catalog used when a query omits the catalog name; set to a UC catalog for governed defaults
Auto Loader format	enum (json/csv/parquet/...)	none	Source file format for incremental ingestion
Photon acceleration	boolean	enabled on supported compute	Vectorized C++ engine that speeds up SQL and DataFrame workloads

Monitoring, Cost, and Security Considerations

Monitoring

Observe pipelines and queries through built-in system tables (billing, query history, audit logs) and job run history. Centralizing on system tables lets you build a single observability dashboard across all workspaces rather than stitching together per-tool logs.

Cost Optimisation

Costs are measured in DBUs (Databricks Units) that scale with compute size and runtime. Prefer serverless or autoscaling SQL Warehouses for spiky BI traffic, enable auto-termination on interactive clusters, and let Photon reduce wall-clock time, which directly reduces DBU consumption on scan-heavy workloads.

Security and Governance

Unity Catalog centralizes access control, row/column security, lineage, and auditing across every workspace on a metastore. Keep data in your own storage, use private networking for the workspace, and manage credentials through secret scopes rather than embedding them in code.

Common Pitfalls and Recommended Patterns

Treating the lakehouse as a raw dump: enforce the Bronze/Silver/Gold pattern so consumers query curated, reliable tables.
Skipping Unity Catalog: starting in the legacy hive_metastore creates governance debt; begin in a UC catalog.
Over-provisioning always-on clusters: use autoscaling and auto-termination to avoid idle DBU burn.
Copying data into many tools: keep one governed copy and connect tools to it instead of exporting.
Ignoring lineage: rely on Unity Catalog lineage to understand impact before changing upstream tables.

Frequently Asked Questions

Is Databricks just managed Apache Spark?

No. Spark is one compute engine within the platform, but Databricks adds Delta Lake storage, Unity Catalog governance, Photon, SQL Warehouses, orchestration, and Mosaic AI as an integrated system.

Where does my data physically live?

Your table data resides in your own cloud object storage in open Delta format. The control plane stores only metadata and orchestration state, not your raw records.

What is the difference between classic and serverless compute?

Classic compute runs in your cloud account and gives you full network control; serverless runs in Databricks-managed infrastructure for faster startup and less operational overhead. Both are governed identically by Unity Catalog.

Do I need to choose between data warehousing and data science?

No. The lakehouse supports SQL analytics, data engineering, and AI on the same governed tables, which removes the need for separate, siloed platforms.