- Condition: sts:ExternalId matches your Databricks account ID
Who this is for:
Architecture / Concept Overview: - Condition: sts:ExternalId matches your Databricks account ID
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
DEPLOY[Deployment Attempt] -->|Fail| DIAG{Diagnose}
DIAG -->|IAM| IAM_FIX[Fix IAM Policies/Trust]
DIAG -->|Network| NET_FIX[Fix VPC/Subnet/SG]
DIAG -->|Storage| S3_FIX[Fix S3 Bucket Config]
DIAG -->|Compute| EC2_FIX[Fix Quota/Instance Type]
DIAG -->|DNS| DNS_FIX[Fix DNS Resolution]
IAM_FIX --> RETRY[Retry Deployment]
NET_FIX --> RETRY
S3_FIX --> RETRY
EC2_FIX --> RETRY
DNS_FIX --> RETRY
RETRY -->|Success| DONE[Workspace Running]
DEPLOY:::source
DIAG:::ingestion
IAM_FIX:::governance
NET_FIX:::storage
S3_FIX:::storage
EC2_FIX:::processing
DNS_FIX:::serving
RETRY:::processing
DONE:::serving
*Troubleshooting workflow for Databricks on AWS deployment failures.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
TOP5[Top 5 Deployment Pitfalls] --> P1[1. Cross-Account Trust Missing External ID]
TOP5 --> P2[2. Subnet Has No Route to Internet]
TOP5 --> P3[3. S3 Bucket Policy Denies Databricks]
TOP5 --> P4[4. Security Group Blocks Control Plane]
TOP5 --> P5[5. EC2 Service Quota Exceeded]
P1 --> F1[STS AssumeRole fails silently]
P2 --> F2[Cluster nodes cannot reach control plane]
P3 --> F3[Workspace creation stuck or failed]
P4 --> F4[Cluster state stuck in Pending]
P5 --> F5[RunInstances API throttled]
TOP5:::source
P1:::governance
P2:::storage
P3:::storage
P4:::governance
P5:::processing
F1:::ingestion
F2:::ingestion
F3:::ingestion
F4:::ingestion
F5:::ingestion
*Top five deployment pitfalls and their downstream failure symptoms.*
Key Terms
Prerequisites and Setup
- AWS CLI configured with admin access to the affected account
- Access to the Databricks account console and workspace (if deployed)
- CloudTrail enabled with management events logging
- Familiarity with IAM policy simulator and VPC reachability analyzer
- Access to Databricks cluster event logs and driver logs
Step-by-Step Implementation
Configuration Reference
| Error Symptom | Root Cause | Diagnostic Command | Fix |
|---|---|---|---|
| Workspace stuck in PROVISIONING | Cross-account role trust failure | Check CloudTrail for AssumeRole errors | Fix External ID in trust policy |
| Cluster PENDING then TERMINATED | No outbound connectivity | Check route tables for NAT/IGW route | Add NAT Gateway and route |
CLOUD_PROVIDER_LAUNCH_FAILURE | EC2 quota exceeded | aws service-quotas get-service-quota | Request quota increase |
| S3 access denied in notebooks | Instance profile missing permissions | IAM policy simulator | Add S3 actions to instance profile |
| Workspace creation API returns 400 | Invalid network configuration | Validate subnet/SG IDs exist | Confirm subnets are in correct VPC |
| Cluster logs show TLS errors | Security group blocking 443 egress | Check egress rules | Allow 443 outbound to 0.0.0.0/0 |