Common Azure Databricks Setup Issues and Troubleshooting
Who this is for:
Architecture / Concept Overview: Common Azure Databricks Setup Issues and Troubleshooting
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
ISSUE[Issue Detected] -->|Identify| CAT{Issue Category}
CAT -->|Provisioning| PROV[Workspace Deploy Failures]
CAT -->|Cluster| CLU[Cluster Launch Errors]
CAT -->|Network| NET[Connectivity Problems]
CAT -->|Auth| AUTH[Authentication Failures]
PROV -->|Fix| RESOLVE[Resolution Applied]
CLU -->|Fix| RESOLVE
NET -->|Fix| RESOLVE
AUTH -->|Fix| RESOLVE
RESOLVE -->|Validate| TEST[Test and Confirm]
ISSUE:::source
CAT:::ingestion
PROV:::processing
CLU:::processing
NET:::storage
AUTH:::governance
RESOLVE:::serving
TEST:::serving
*Troubleshooting workflow categorizing issues and routing to resolution paths.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
D1[Cluster Won't Start] --> C1{Check Subnet IPs}
C1 -->|Exhausted| F1[Expand Subnet CIDR]
C1 -->|Available| C2{Check NSG Rules}
C2 -->|Blocking| F2[Allow Databricks Service Tags]
C2 -->|OK| C3{Check Quota}
C3 -->|Exceeded| F3[Request Quota Increase]
C3 -->|OK| C4{Check Region Capacity}
C4 -->|Constrained| F4[Try Different VM Size/Region]
D1:::source
C1:::ingestion
F1:::serving
C2:::processing
F2:::serving
C3:::governance
F3:::serving
C4:::storage
F4:::serving
*Decision tree for diagnosing cluster startup failures.*
Key Terms
Prerequisites and Setup
- Azure CLI with access to the affected subscription and resource group
- Access to the Databricks workspace admin console (or API token)
- Log Analytics workspace configured with Databricks diagnostic logs (recommended)
- Familiarity with Azure Activity Log and Databricks cluster event logs
- Network tools:
nslookup,traceroute(from within cluster via notebook if needed)
Step-by-Step Implementation
Configuration Reference
| Error Code / Symptom | Root Cause | Resolution |
|---|---|---|
CLOUD_PROVIDER_LAUNCH_FAILURE | Subnet IP exhaustion or VM quota exceeded | Expand subnet or request quota increase |
INIT_SCRIPT_FAILURE | Init script error or timeout | Check init script logs in cluster log delivery path |
CLOUD_PROVIDER_RESOURCE_STOCKOUT | Azure region capacity constraint | Use a different VM size or region |
| Workspace shows "Failed" provisioning | Missing resource provider or permission | Register provider, verify RBAC roles |
| 403 on workspace URL | IP access list blocking your IP | Update IP access list or access via Azure portal |
| Cluster stuck in "Pending" | NSG blocking control plane communication | Allow AzureDatabricks service tag on ports 443, 8443-8451 |
| Storage mount fails with 403 | Service principal missing RBAC on storage | Assign Storage Blob Data Contributor role |