Install the Databricks CLI
Who this is for:
Architecture / Concept Overview: Install the Databricks CLI
Lakeflow sits at the heart of the Databricks Lakehouse, bridging raw data sources with analytics-ready tables through three tightly integrated subsystems.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
DB[(Databases)]:::source --> LC[Lakeflow Connect]:::ingestion
CS[Cloud Storage]:::source --> AL[Auto Loader]:::ingestion
SA[SaaS Apps]:::source --> LC
KF[Kafka / Event Hubs]:::source --> SS[Structured Streaming]:::ingestion
LC --> B[Bronze Layer]:::storage
AL --> B
SS --> B
B --> DP[Declarative Pipelines]:::processing
DP --> S[Silver Layer]:::storage
S --> DP2[Declarative Pipelines]:::processing
DP2 --> G[Gold Layer]:::storage
G --> BI[BI & Analytics]:::serving
*End-to-end Lakeflow pipeline from diverse sources through the medallion architecture to analytics.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
ORCH[Lakeflow Jobs]:::serving
ORCH --> T1[Task: Ingest]:::ingestion
ORCH --> T2[Task: Transform]:::processing
ORCH --> T3[Task: Validate]:::governance
ORCH --> T4[Task: Publish]:::serving
T1 --> T2
T2 --> T3
T3 --> T4
*Lakeflow Jobs orchestrating a multi-task workflow with sequential dependencies.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
UC[Unity Catalog]:::governance
UC --> LIN[Lineage Tracking]:::governance
UC --> ACL[Access Controls]:::governance
UC --> AUD[Audit Logs]:::governance
UC --> DISC[Data Discovery]:::serving
*Unity Catalog provides governance across all Lakeflow assets.*
Key Terms
Prerequisites and Setup
- Databricks workspace with Unity Catalog enabled.
- Permissions:
CREATE CATALOG,CREATE SCHEMA,CREATE TABLEon the target metastore. - Network connectivity to external data sources (VPC peering, Private Link, or public endpoints).
- Databricks CLI installed for programmatic job management.
Step-by-Step Implementation
Configuration Reference
| Parameter | Scope | Description | Default |
|---|---|---|---|
cloudFiles.format | Auto Loader | Source file format | Required |
cloudFiles.useNotifications | Auto Loader | Use cloud-native file notifications instead of directory listing | false |
cloudFiles.schemaEvolutionMode | Auto Loader | How to handle new columns: addNewColumns, rescue, none | addNewColumns |
pipelines.numActiveFlows | Declarative Pipelines | Max concurrent flows | 5 |
pipelines.maxFlowRetryAttempts | Declarative Pipelines | Retry attempts on failure | 2 |
schedule.quartz_cron_expression | Lakeflow Jobs | Cron expression for scheduled runs | None (manual) |