Azure Databricks Architecture and Key Azure Integrations
Who this is for:
Architecture / Concept Overview: Azure Databricks Architecture and Key Azure Integrations
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
CP[Control Plane - Databricks Managed] --> UI[Workspace UI]
CP --> SCHED[Job Scheduler]
CP --> NB[Notebook Service]
CP --> REST[REST API Gateway]
DP[Data Plane - Customer VNet] --> DRIVER[Driver Node]
DP --> WORKER[Worker Nodes]
DP --> DBFS[DBFS Mount]
DRIVER --> ADLS[ADLS Gen2]
DRIVER --> KV[Azure Key Vault]
CP ---|Secure Channel| DP
CP:::processing
UI:::serving
SCHED:::processing
NB:::processing
REST:::serving
DP:::storage
DRIVER:::processing
WORKER:::processing
DBFS:::storage
ADLS:::storage
KV:::governance
*Azure Databricks control plane and data plane architecture showing the separation of managed services from customer-owned compute resources.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
ADF[Azure Data Factory] -->|Trigger| DBX[Databricks Jobs]
EH[Event Hubs] -->|Stream| DBX
ADLS[ADLS Gen2] -->|Storage| DBX
DBX -->|Write| DELTA[Delta Tables]
DELTA -->|Serve| SYN[Synapse Serverless]
DELTA -->|Serve| PBI[Power BI Direct Query]
KV[Key Vault] -->|Secrets| DBX
AAD[Azure AD] -->|Identity| DBX
ADF:::ingestion
EH:::source
ADLS:::storage
DBX:::processing
DELTA:::storage
SYN:::serving
PBI:::serving
KV:::governance
AAD:::governance
*Key Azure service integrations with Databricks showing data flow from ingestion through processing to downstream consumption.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
USER[User Request] -->|HTTPS| LB[Azure Load Balancer]
LB --> CP[Control Plane]
CP -->|NAT Relay| DRIVER[Driver in Customer VNet]
DRIVER -->|Private Endpoint| ADLS[ADLS Gen2]
DRIVER -->|Private Endpoint| KV[Key Vault]
DRIVER -->|VNet Peering| ONPREM[On-Premises Network]
USER:::source
LB:::ingestion
CP:::processing
DRIVER:::processing
ADLS:::storage
KV:::governance
ONPREM:::source
*Network connectivity flow showing how user requests reach the data plane and how clusters access Azure services via private endpoints.*
Key Terms
Prerequisites and Setup
- An Azure Databricks workspace (Premium tier recommended for full integration support)
- Azure Data Lake Storage Gen2 account with hierarchical namespace enabled
- Azure Key Vault instance for secrets management
- Service principals or managed identities configured for service-to-service authentication
- Network line-of-sight between the Databricks VNet and target Azure services (via private endpoints or service endpoints)
Step-by-Step Implementation
Configuration Reference
| Integration | Auth Method | Configuration Key | Notes |
|---|---|---|---|
| ADLS Gen2 | Service Principal | fs.azure.account.oauth2.* | Preferred for production workloads |
| ADLS Gen2 | Credential Passthrough | Cluster config flag | Premium tier only, per-user ACL |
| ADLS Gen2 | Unity Catalog | External Location | Recommended for new deployments |
| Key Vault | Secret Scope | Scope backend type | Supports both Key Vault and Databricks backends |
| Event Hubs | Connection String | Kafka protocol settings | Use Key Vault to store connection strings |
| Azure Data Factory | Linked Service | MSI or Service Principal | Use managed identity for passwordless auth |
| Synapse | JDBC/ODBC | Serverless SQL endpoint | Direct query over Delta tables in ADLS |