Databricks on GCP Overview

    Who this is for:

    Architecture / Concept Overview: Databricks on GCP Overview

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED CP[Databricks Control Plane] -->|Orchestrate| GKE[GKE Data Plane] PUB[Pub/Sub Streams] -->|Ingest| GKE GCS_SRC[GCS Data Lake] -->|Read| GKE BQ_SRC[BigQuery Tables] -->|Federated Query| GKE GKE -->|Write| DELTA[Delta Lake on GCS] DELTA -->|Serve| BQ[BigQuery BI Engine] DELTA -->|ML| VERTEX[Vertex AI] DELTA -->|Govern| UC[Unity Catalog] CP:::processing GKE:::processing PUB:::source GCS_SRC:::source BQ_SRC:::source DELTA:::storage BQ:::serving VERTEX:::serving UC:::governance

    *Databricks on GCP platform architecture showing GKE-based compute with GCP-native data service integrations.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED GCP_DBX[Databricks on GCP] --> DIFF[Key Differentiators] GCP_DBX --> USE[Use Cases] DIFF --> GKE_D[GKE-Based Compute - Fast Startup] DIFF --> PSC_D[Private Service Connect] DIFF --> WI_D[Workload Identity Federation] DIFF --> BQ_D[Native BigQuery Connector] USE --> ETL[Lakehouse ETL] USE --> ML[ML and Feature Engineering] USE --> STREAM[Real-Time Analytics] USE --> SQL_D[SQL Analytics and BI] GCP_DBX:::processing DIFF:::ingestion USE:::serving GKE_D:::processing PSC_D:::storage WI_D:::governance BQ_D:::serving ETL:::ingestion ML:::processing STREAM:::source SQL_D:::serving

    *Databricks on GCP differentiators and primary use cases.*

    Key Terms

    Prerequisites and Setup

    • A GCP project with billing enabled
    • Google Cloud SDK (gcloud) installed and authenticated
    • Databricks account linked to GCP (via the GCP Marketplace or accounts.gcp.databricks.com)
    • Basic familiarity with GCP IAM, GCS, and networking concepts
    • Understanding of Kubernetes concepts (pods, nodes, service accounts)

    Step-by-Step Implementation

      Configuration Reference

      Databricks on GCP Overview configuration options
      ComponentGCP ServicePurposeNotes
      ComputeGKESpark cluster executionFaster startup than VM-based approaches
      StorageGCSDelta Lake, DBFS, cluster logsUse regional buckets matching workspace
      IdentityCloud Identity / WorkspaceUser SSO authenticationFederated via Google identity
      Service IdentityService AccountsMachine-to-machine accessWorkload Identity for GKE pods
      NetworkingVPC / PSCNetwork isolation and private connectivityCustomer-managed VPC for production
      EncryptionCloud KMSData at rest encryptionCMEK for GCS and persistent disks
      MonitoringCloud MonitoringCluster and resource metricsIntegrated with Databricks dashboards

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions