Implementing CDC with Declarative Pipelines (AUTO CDC / SCD Type 1 and 2)
Who this is for:
Architecture / Concept Overview: Implementing CDC with Declarative Pipelines (AUTO CDC / SCD Type 1 and 2)
CDC pipelines consume a change feed — typically from a database's transaction log — and apply those changes to a target Delta table. Declarative Pipelines automate the merge logic, deduplication, ordering, and optionally track historical versions of each record.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
DB[(Source Database)]:::source --> WAL[Transaction Log / CDC]:::ingestion
WAL --> CF[Change Feed - Streaming Table]:::storage
CF --> AC[APPLY CHANGES INTO]:::processing
AC --> SCD1[SCD Type 1 - Current State]:::serving
AC --> SCD2[SCD Type 2 - Full History]:::serving
*CDC flow from source database through the change feed to SCD Type 1 or Type 2 target tables.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
SCD[SCD Comparison]:::processing
SCD --> T1[SCD Type 1]:::serving
SCD --> T2[SCD Type 2]:::storage
T1 --> T1A[Overwrites existing rows]:::serving
T1 --> T1B[Current state only]:::serving
T1 --> T1C[Simpler queries]:::serving
T2 --> T2A[Preserves row history]:::storage
T2 --> T2B[Adds validity columns]:::storage
T2 --> T2C[Supports time-travel queries]:::storage
*SCD Type 1 vs Type 2: trade-offs between simplicity and historical completeness.*
Key Terms
Prerequisites and Setup
- A Declarative Pipeline with Unity Catalog enabled.
- A streaming table containing the CDC change feed (from Lakeflow Connect, Debezium, or another CDC tool).
- The change feed must include: primary key columns, a sequence/ordering column, and optionally an operation type column.
Step-by-Step Implementation
Configuration Reference
| Parameter | Description | Required |
|---|---|---|
target | Name of the target streaming table | Yes |
source | Name of the source streaming table containing CDC events | Yes |
keys | List of primary key columns | Yes |
sequence_by | Column expression for ordering changes | Yes |
stored_as_scd_type | 1 for current-state-only, 2 for full history | No (default: 1) |
apply_as_deletes | Expression identifying delete operations | No |
except_column_list | Columns to exclude from the target | No |
track_history_column_list | Columns to track for SCD Type 2 (subset tracking) | No |
column_list | Explicit list of columns to include | No |