Networking and Identity Configuration for Databricks on GCP
Who this is for:
Architecture / Concept Overview: Networking and Identity Configuration for Databricks on GCP
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
USERS[Corporate Users] -->|PSC| FE[Frontend PSC Endpoint]
FE -->|Private| CP[Databricks Control Plane]
CP -->|PSC| BE[Backend PSC Endpoint]
BE -->|Private| GKE[GKE Data Plane]
GKE -->|Workload Identity| SA[GCP Service Account]
SA -->|Access| GCS[GCS Buckets]
SA -->|Access| BQ[BigQuery]
GKE -->|VPC SC| PERIM[VPC Service Controls Perimeter]
USERS:::source
FE:::governance
CP:::processing
BE:::governance
GKE:::processing
SA:::governance
GCS:::storage
BQ:::serving
PERIM:::governance
*Fully private network architecture with PSC endpoints and Workload Identity-based data access.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
ID[Identity Architecture] --> USER_ID[User Identity]
ID --> SVC_ID[Service Identity]
ID --> WL_ID[Workload Identity]
USER_ID --> GOOG[Google Cloud Identity SSO]
USER_ID --> COND[IAM Conditions]
SVC_ID --> SA_WS[Workspace Service Account]
SVC_ID --> SA_DATA[Data Access Service Account]
WL_ID --> KSA[Kubernetes Service Account]
WL_ID --> BIND[IAM Binding to GCP SA]
KSA --> POD[Spark Pod Credential Access]
ID:::governance
USER_ID:::governance
SVC_ID:::governance
WL_ID:::governance
GOOG:::source
COND:::governance
SA_WS:::governance
SA_DATA:::governance
KSA:::processing
BIND:::governance
POD:::processing
*Identity model for Databricks on GCP showing user, service, and workload identity layers.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
VPC[Customer VPC] --> SUBNET[Node Subnet /22]
VPC --> POD_R[Pod Range /14]
VPC --> SVC_R[Service Range /20]
VPC --> FW[Firewall Rules]
VPC --> ROUTER[Cloud Router + NAT]
ROUTER --> EGRESS[Controlled Egress]
VPC --> PSC_EP[PSC Endpoints]
PSC_EP --> CP_PSC[Control Plane PSC]
PSC_EP --> RELAY_PSC[Relay PSC]
VPC:::storage
SUBNET:::serving
POD_R:::serving
SVC_R:::serving
FW:::governance
ROUTER:::ingestion
EGRESS:::source
PSC_EP:::governance
CP_PSC:::processing
RELAY_PSC:::processing
*VPC network topology showing IP range allocation, Cloud NAT, and PSC endpoint placement.*
Key Terms
Prerequisites and Setup
- Databricks workspace deployed on GCP with a customer-managed VPC
- Organization-level access for VPC Service Controls configuration
- Cloud Identity or Google Workspace for user management
- Understanding of GKE networking (pods, services, node IP ranges)
- Network connectivity plan including PSC endpoints and Cloud NAT configuration
Step-by-Step Implementation
Configuration Reference
| Security Control | GCP Service | Configuration | Purpose |
|---|---|---|---|
| Private Connectivity | Private Service Connect | Forwarding rules + service attachments | Eliminates public control plane access |
| Keyless Auth | Workload Identity | KSA-to-GSA binding | No static keys, automatic rotation |
| Data Perimeter | VPC Service Controls | Access policies + perimeters | Prevents data exfiltration |
| Egress Filtering | Firewall Rules | Priority-based deny/allow | Controls outbound traffic destinations |
| Private Google Access | Subnet Config | --enable-private-ip-google-access | Access Google APIs without public IPs |
| NAT | Cloud NAT | Router + NAT config | Outbound internet for cluster nodes |
| User SSO | Cloud Identity | SAML/OIDC federation | Centralized user lifecycle management |
| Attribute-Based Access | IAM Conditions | CEL expressions | Context-aware access restrictions |