Azure Databricks Architecture and Key Azure Integrations

    Who this is for:

    Architecture / Concept Overview: Azure Databricks Architecture and Key Azure Integrations

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED CP[Control Plane - Databricks Managed] --> UI[Workspace UI] CP --> SCHED[Job Scheduler] CP --> NB[Notebook Service] CP --> REST[REST API Gateway] DP[Data Plane - Customer VNet] --> DRIVER[Driver Node] DP --> WORKER[Worker Nodes] DP --> DBFS[DBFS Mount] DRIVER --> ADLS[ADLS Gen2] DRIVER --> KV[Azure Key Vault] CP ---|Secure Channel| DP CP:::processing UI:::serving SCHED:::processing NB:::processing REST:::serving DP:::storage DRIVER:::processing WORKER:::processing DBFS:::storage ADLS:::storage KV:::governance

    *Azure Databricks control plane and data plane architecture showing the separation of managed services from customer-owned compute resources.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED ADF[Azure Data Factory] -->|Trigger| DBX[Databricks Jobs] EH[Event Hubs] -->|Stream| DBX ADLS[ADLS Gen2] -->|Storage| DBX DBX -->|Write| DELTA[Delta Tables] DELTA -->|Serve| SYN[Synapse Serverless] DELTA -->|Serve| PBI[Power BI Direct Query] KV[Key Vault] -->|Secrets| DBX AAD[Azure AD] -->|Identity| DBX ADF:::ingestion EH:::source ADLS:::storage DBX:::processing DELTA:::storage SYN:::serving PBI:::serving KV:::governance AAD:::governance

    *Key Azure service integrations with Databricks showing data flow from ingestion through processing to downstream consumption.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED USER[User Request] -->|HTTPS| LB[Azure Load Balancer] LB --> CP[Control Plane] CP -->|NAT Relay| DRIVER[Driver in Customer VNet] DRIVER -->|Private Endpoint| ADLS[ADLS Gen2] DRIVER -->|Private Endpoint| KV[Key Vault] DRIVER -->|VNet Peering| ONPREM[On-Premises Network] USER:::source LB:::ingestion CP:::processing DRIVER:::processing ADLS:::storage KV:::governance ONPREM:::source

    *Network connectivity flow showing how user requests reach the data plane and how clusters access Azure services via private endpoints.*

    Key Terms

    Prerequisites and Setup

    • An Azure Databricks workspace (Premium tier recommended for full integration support)
    • Azure Data Lake Storage Gen2 account with hierarchical namespace enabled
    • Azure Key Vault instance for secrets management
    • Service principals or managed identities configured for service-to-service authentication
    • Network line-of-sight between the Databricks VNet and target Azure services (via private endpoints or service endpoints)

    Step-by-Step Implementation

      Configuration Reference

      Azure Databricks Architecture and Key Azure Integrations configuration options
      IntegrationAuth MethodConfiguration KeyNotes
      ADLS Gen2Service Principalfs.azure.account.oauth2.*Preferred for production workloads
      ADLS Gen2Credential PassthroughCluster config flagPremium tier only, per-user ACL
      ADLS Gen2Unity CatalogExternal LocationRecommended for new deployments
      Key VaultSecret ScopeScope backend typeSupports both Key Vault and Databricks backends
      Event HubsConnection StringKafka protocol settingsUse Key Vault to store connection strings
      Azure Data FactoryLinked ServiceMSI or Service PrincipalUse managed identity for passwordless auth
      SynapseJDBC/ODBCServerless SQL endpointDirect query over Delta tables in ADLS

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions