Install the Databricks CLI

    Who this is for:

    Architecture / Concept Overview: Install the Databricks CLI

    Lakeflow sits at the heart of the Databricks Lakehouse, bridging raw data sources with analytics-ready tables through three tightly integrated subsystems.

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED DB[(Databases)]:::source --> LC[Lakeflow Connect]:::ingestion CS[Cloud Storage]:::source --> AL[Auto Loader]:::ingestion SA[SaaS Apps]:::source --> LC KF[Kafka / Event Hubs]:::source --> SS[Structured Streaming]:::ingestion LC --> B[Bronze Layer]:::storage AL --> B SS --> B B --> DP[Declarative Pipelines]:::processing DP --> S[Silver Layer]:::storage S --> DP2[Declarative Pipelines]:::processing DP2 --> G[Gold Layer]:::storage G --> BI[BI & Analytics]:::serving

    *End-to-end Lakeflow pipeline from diverse sources through the medallion architecture to analytics.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED ORCH[Lakeflow Jobs]:::serving ORCH --> T1[Task: Ingest]:::ingestion ORCH --> T2[Task: Transform]:::processing ORCH --> T3[Task: Validate]:::governance ORCH --> T4[Task: Publish]:::serving T1 --> T2 T2 --> T3 T3 --> T4

    *Lakeflow Jobs orchestrating a multi-task workflow with sequential dependencies.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED UC[Unity Catalog]:::governance UC --> LIN[Lineage Tracking]:::governance UC --> ACL[Access Controls]:::governance UC --> AUD[Audit Logs]:::governance UC --> DISC[Data Discovery]:::serving

    *Unity Catalog provides governance across all Lakeflow assets.*

    Key Terms

    Prerequisites and Setup

    • Databricks workspace with Unity Catalog enabled.
    • Permissions: CREATE CATALOG, CREATE SCHEMA, CREATE TABLE on the target metastore.
    • Network connectivity to external data sources (VPC peering, Private Link, or public endpoints).
    • Databricks CLI installed for programmatic job management.

    Step-by-Step Implementation

      Configuration Reference

      Install the Databricks CLI configuration options
      ParameterScopeDescriptionDefault
      cloudFiles.formatAuto LoaderSource file formatRequired
      cloudFiles.useNotificationsAuto LoaderUse cloud-native file notifications instead of directory listingfalse
      cloudFiles.schemaEvolutionModeAuto LoaderHow to handle new columns: addNewColumns, rescue, noneaddNewColumns
      pipelines.numActiveFlowsDeclarative PipelinesMax concurrent flows5
      pipelines.maxFlowRetryAttemptsDeclarative PipelinesRetry attempts on failure2
      schedule.quartz_cron_expressionLakeflow JobsCron expression for scheduled runsNone (manual)

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions