Databricks on Azure
Who this is for:
Architecture / Concept Overview: Databricks on Azure
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
A[Azure Event Hubs / IoT Hub] -->|Stream| B[Azure Databricks Workspace]
C[Azure Data Lake Storage Gen2] -->|Batch| B
D[Azure SQL / Cosmos DB] -->|CDC| B
B -->|Transform| E[Delta Lake on ADLS Gen2]
E -->|Serve| F[Azure Synapse / Power BI]
E -->|Govern| G[Unity Catalog]
A:::source
C:::source
D:::source
B:::processing
E:::storage
F:::serving
G:::governance
*Azure Databricks data pipeline showing ingestion from Azure-native sources through processing and serving layers.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
SUB[Azure Subscription] --> RG[Resource Group]
RG --> WS[Databricks Workspace]
RG --> VNET[Virtual Network]
RG --> SA[Storage Account - ADLS Gen2]
WS --> CP[Control Plane - Databricks Managed]
WS --> DP[Data Plane - Customer VNet]
DP --> PUB[Public Subnet]
DP --> PRIV[Private Subnet]
SUB:::source
RG:::ingestion
WS:::processing
VNET:::storage
SA:::storage
CP:::governance
DP:::serving
PUB:::serving
PRIV:::serving
*Azure Databricks resource hierarchy showing the relationship between Azure subscription resources and the Databricks control/data plane split.*
Key Terms
Prerequisites and Setup
- An active Azure subscription with Owner or Contributor role on the target resource group
- Azure CLI installed and authenticated (
az login) - The
Microsoft.Databricksresource provider registered in the subscription - A Virtual Network with two dedicated subnets (minimum /26 each) if using VNet injection
- An Azure Data Lake Storage Gen2 account for workspace root storage
Step-by-Step Implementation
Configuration Reference
| Parameter | Description | Default | Recommended |
|---|---|---|---|
--sku | Workspace pricing tier (standard, premium, trial) | standard | premium |
--enable-no-public-ip | Disable public IPs on cluster nodes | false | true |
--managed-resource-group | Resource group for Databricks-managed resources | auto-generated | explicit name |
--vnet | Customer-managed VNet resource ID | none | set for production |
--private-subnet | Subnet for private cluster communication | none | /26 or larger |
--public-subnet | Subnet for Databricks control plane communication | none | /26 or larger |
--require-infrastructure-encryption | Double encryption for DBFS root | false | true for sensitive data |