Alternatively, configure a specific service account
Who this is for:
Architecture / Concept Overview: Alternatively, configure a specific service account
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
DBX[Databricks Workspace] -->|Read/Write| GCS[GCS Delta Lake]
DBX -->|Read/Write| BQ[BigQuery Tables]
DBX -->|Train| MODEL[ML Model]
MODEL -->|Register| MLFLOW[MLflow Registry]
MLFLOW -->|Deploy| VERTEX[Vertex AI Endpoint]
GCS -->|External Table| BQ
BQ -->|BI| LOOKER[Looker / Data Studio]
VERTEX -->|Predict| APP[Applications]
DBX:::processing
GCS:::storage
BQ:::serving
MODEL:::processing
MLFLOW:::governance
VERTEX:::serving
LOOKER:::serving
APP:::source
*End-to-end integration flow showing Databricks as the processing hub connecting GCS storage, BigQuery analytics, and Vertex AI serving.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
GCS_INT[GCS Integration] --> DIRECT[Direct gs:// Access]
GCS_INT --> MOUNT[DBFS Mount]
GCS_INT --> UC_EXT[Unity Catalog External Location]
BQ_INT[BigQuery Integration] --> SPARK_BQ[Spark BigQuery Connector]
BQ_INT --> BQ_FED[BigQuery Federated - Delta Tables]
BQ_INT --> BQ_MAT[Materialized Views]
VERTEX_INT[Vertex AI Integration] --> MLFLOW_D[MLflow Model Export]
VERTEX_INT --> FEAT[Feature Store Sync]
VERTEX_INT --> PIPE[Vertex Pipelines]
GCS_INT:::storage
DIRECT:::storage
MOUNT:::storage
UC_EXT:::governance
BQ_INT:::serving
SPARK_BQ:::serving
BQ_FED:::serving
BQ_MAT:::serving
VERTEX_INT:::processing
MLFLOW_D:::processing
FEAT:::processing
PIPE:::processing
*Integration patterns for each GCP service showing multiple connectivity approaches.*
Key Terms
Prerequisites and Setup
- Databricks workspace deployed on GCP with a running cluster
- GCS buckets created for your data lake zones
- BigQuery dataset created in the target project
- Service account with appropriate IAM roles for GCS, BigQuery, and Vertex AI
- Python packages:
google-cloud-bigquery,google-cloud-aiplatform(installed on cluster)
Step-by-Step Implementation
Configuration Reference
| Integration | Configuration | Auth Method | Performance Notes |
|---|---|---|---|
| GCS Read/Write | gs:// path, format("delta") | Workload Identity / Service Account | Parallel reads via Spark partitions |
| BigQuery Read | format("bigquery"), table option | Service Account | Uses Storage Read API for high throughput |
| BigQuery Write | format("bigquery"), temp GCS bucket | Service Account | Stages data in GCS then bulk loads |
| BigQuery External | Delta Lake external table | BigQuery service account | Direct read from GCS, no copy needed |
| Vertex AI Deploy | aiplatform.Model.upload() | Service Account | Model artifacts must be in GCS |
| MLflow Registry | Automatic with Databricks | Workspace authentication | Tracks versions and deployment stage |