Networking and Identity Configuration for Databricks on GCP

    Who this is for:

    Architecture / Concept Overview: Networking and Identity Configuration for Databricks on GCP

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED USERS[Corporate Users] -->|PSC| FE[Frontend PSC Endpoint] FE -->|Private| CP[Databricks Control Plane] CP -->|PSC| BE[Backend PSC Endpoint] BE -->|Private| GKE[GKE Data Plane] GKE -->|Workload Identity| SA[GCP Service Account] SA -->|Access| GCS[GCS Buckets] SA -->|Access| BQ[BigQuery] GKE -->|VPC SC| PERIM[VPC Service Controls Perimeter] USERS:::source FE:::governance CP:::processing BE:::governance GKE:::processing SA:::governance GCS:::storage BQ:::serving PERIM:::governance

    *Fully private network architecture with PSC endpoints and Workload Identity-based data access.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED ID[Identity Architecture] --> USER_ID[User Identity] ID --> SVC_ID[Service Identity] ID --> WL_ID[Workload Identity] USER_ID --> GOOG[Google Cloud Identity SSO] USER_ID --> COND[IAM Conditions] SVC_ID --> SA_WS[Workspace Service Account] SVC_ID --> SA_DATA[Data Access Service Account] WL_ID --> KSA[Kubernetes Service Account] WL_ID --> BIND[IAM Binding to GCP SA] KSA --> POD[Spark Pod Credential Access] ID:::governance USER_ID:::governance SVC_ID:::governance WL_ID:::governance GOOG:::source COND:::governance SA_WS:::governance SA_DATA:::governance KSA:::processing BIND:::governance POD:::processing

    *Identity model for Databricks on GCP showing user, service, and workload identity layers.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED VPC[Customer VPC] --> SUBNET[Node Subnet /22] VPC --> POD_R[Pod Range /14] VPC --> SVC_R[Service Range /20] VPC --> FW[Firewall Rules] VPC --> ROUTER[Cloud Router + NAT] ROUTER --> EGRESS[Controlled Egress] VPC --> PSC_EP[PSC Endpoints] PSC_EP --> CP_PSC[Control Plane PSC] PSC_EP --> RELAY_PSC[Relay PSC] VPC:::storage SUBNET:::serving POD_R:::serving SVC_R:::serving FW:::governance ROUTER:::ingestion EGRESS:::source PSC_EP:::governance CP_PSC:::processing RELAY_PSC:::processing

    *VPC network topology showing IP range allocation, Cloud NAT, and PSC endpoint placement.*

    Key Terms

    Prerequisites and Setup

    • Databricks workspace deployed on GCP with a customer-managed VPC
    • Organization-level access for VPC Service Controls configuration
    • Cloud Identity or Google Workspace for user management
    • Understanding of GKE networking (pods, services, node IP ranges)
    • Network connectivity plan including PSC endpoints and Cloud NAT configuration

    Step-by-Step Implementation

      Configuration Reference

      Networking and Identity Configuration for Databricks on GCP configuration options
      Security ControlGCP ServiceConfigurationPurpose
      Private ConnectivityPrivate Service ConnectForwarding rules + service attachmentsEliminates public control plane access
      Keyless AuthWorkload IdentityKSA-to-GSA bindingNo static keys, automatic rotation
      Data PerimeterVPC Service ControlsAccess policies + perimetersPrevents data exfiltration
      Egress FilteringFirewall RulesPriority-based deny/allowControls outbound traffic destinations
      Private Google AccessSubnet Config--enable-private-ip-google-accessAccess Google APIs without public IPs
      NATCloud NATRouter + NAT configOutbound internet for cluster nodes
      User SSOCloud IdentitySAML/OIDC federationCentralized user lifecycle management
      Attribute-Based AccessIAM ConditionsCEL expressionsContext-aware access restrictions

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions