Databricks on GCP
Who this is for:
Architecture / Concept Overview: Databricks on GCP
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
A[Pub/Sub / Dataflow] -->|Stream| B[Databricks Workspace]
C[Google Cloud Storage] -->|Batch| B
D[Cloud SQL / Spanner] -->|CDC| B
B -->|Transform| E[Delta Lake on GCS]
E -->|Query| F[BigQuery / Looker]
E -->|ML| G[Vertex AI]
E -->|Govern| H[Unity Catalog]
A:::source
C:::source
D:::source
B:::processing
E:::storage
F:::serving
G:::serving
H:::governance
*Databricks on GCP data pipeline showing ingestion from Google Cloud sources through Delta Lake transformation to serving and ML layers.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
PROJ[GCP Project] --> VPC[Customer VPC]
PROJ --> SA[Service Account]
PROJ --> GKE[GKE Cluster - Databricks Managed]
VPC --> SUBNET[Regional Subnet]
VPC --> PSC[Private Service Connect]
SA --> WS[Databricks Workspace]
GKE --> WS
WS --> GCS[GCS Root Bucket]
PROJ:::source
VPC:::storage
SA:::governance
GKE:::processing
SUBNET:::serving
PSC:::serving
WS:::processing
GCS:::storage
*GCP resource topology for Databricks showing the GKE-based data plane, VPC networking, and service account trust model.*
Key Terms
Prerequisites and Setup
- A GCP project with billing enabled and the Databricks service activated
gcloudCLI installed and authenticated with a user having Owner or Editor role- APIs enabled: Compute Engine, GKE, Cloud Storage, IAM, and Databricks
- A Databricks account linked to GCP (via the Databricks account console or GCP Marketplace)
- A VPC with a subnet sized for GKE node pools (minimum /22 recommended)
Step-by-Step Implementation
Configuration Reference
| Parameter | Description | Default | Recommended |
|---|---|---|---|
| GKE Master IP Range | CIDR for GKE control plane | /28 required | Non-overlapping /28 |
| GCS Root Bucket | Workspace root storage | required | CMEK-encrypted |
| Service Account | GCP identity for Databricks | required | Least-privilege custom role |
| VPC | Network for GKE data plane | Databricks-managed or customer | Customer-managed for production |
| Subnet Size | IP range for GKE nodes and pods | /20 | /20 nodes, /20 pods, /20 services |
| Private Service Connect | Private control plane connectivity | disabled | enabled for production |
| Region | GCP region for workspace | required | Match data locality |