Building Your First Declarative Pipeline

    Who this is for:

    Architecture / Concept Overview: Building Your First Declarative Pipeline

    This tutorial builds a three-layer pipeline following the medallion architecture: Bronze (raw ingestion), Silver (cleaned and validated), and Gold (aggregated for analytics).

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED JSON[JSON Files in S3/ADLS]:::source --> AL[Auto Loader]:::ingestion AL --> RAW[raw_sales]:::storage RAW --> EXP[Expectations]:::governance EXP --> CLEAN[clean_sales]:::processing CLEAN --> AGG[daily_revenue]:::serving AGG --> DASH[Dashboards]:::serving

    *The pipeline reads JSON files, ingests them as raw_sales, validates into clean_sales, and aggregates into daily_revenue.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED DEV[Development Workflow]:::processing DEV --> W[Write Pipeline Code]:::source DEV --> V[Validate in Dev Mode]:::processing DEV --> T[Test with Sample Data]:::processing DEV --> D[Deploy to Production]:::serving DEV --> M[Monitor & Iterate]:::governance W --> V --> T --> D --> M

    *The development lifecycle for a Declarative Pipeline.*

    Key Terms

    Prerequisites and Setup

    • Databricks workspace with Unity Catalog enabled.
    • A catalog and three schemas (bronze, silver, gold) for the output tables.
    • Sample JSON files in a cloud storage landing zone.
    • A Databricks Repo or workspace folder for pipeline notebooks.

    Step-by-Step Implementation

      Configuration Reference

      Building Your First Declarative Pipeline configuration options
      ParameterDescriptionDefault
      catalogUnity Catalog catalog for pipeline outputRequired
      targetTarget schema for output tablesRequired
      developmentEnable development modefalse
      photonEnable Photon accelerationfalse
      continuousRun continuously vs triggeredfalse
      clusters.autoscale.min_workersMinimum worker nodes1
      clusters.autoscale.max_workersMaximum worker nodes5

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions