Upserts with MERGE: Change Data Capture (CDC) Patterns
Who this is for:
Architecture / Concept Overview: Upserts with MERGE: Change Data Capture (CDC) Patterns
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
CDC[CDC Source] -->|Extract Changes| LAND[Landing Zone]
LAND -->|Read Batch / Stream| STAGE[Staging DataFrame]
STAGE -->|MERGE INTO| TARGET[Delta Target Table]
TARGET -->|Change Data Feed| DOWN[Downstream Consumers]
CDC:::source
LAND:::ingestion
STAGE:::processing
TARGET:::storage
DOWN:::serving
*A CDC pipeline extracts change events from a source system, stages them, and applies them to the Delta target via MERGE.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
MERGE[MERGE Statement] --> MATCH[WHEN MATCHED]
MERGE --> NOTMATCH[WHEN NOT MATCHED]
MERGE --> NOTMATCH_SRC[WHEN NOT MATCHED BY SOURCE]
MATCH --> UPD[UPDATE SET ...]
MATCH --> DEL[DELETE]
NOTMATCH --> INS[INSERT *]
NOTMATCH_SRC --> DEL2[DELETE]
MERGE:::storage
MATCH:::processing
NOTMATCH:::ingestion
NOTMATCH_SRC:::governance
UPD:::processing
DEL:::processing
INS:::ingestion
DEL2:::governance
*The MERGE statement supports three clauses covering every possible CDC operation: update, insert, and delete.*
Key Terms
Prerequisites and Setup
- Databricks workspace with Unity Catalog enabled
- A target Delta table with a primary key or natural key for matching
- CDC events landing in a staging location (cloud storage, Kafka topic, or another Delta table)
MODIFYprivilege on the target table
Step-by-Step Implementation
Configuration Reference
| Property | Default | Description |
|---|---|---|
delta.enableChangeDataFeed | false | Records row-level changes for CDC consumers |
spark.databricks.delta.merge.optimizeInsertOnlyMerge.enabled | true | Optimises MERGE when only inserts are needed |
spark.databricks.delta.merge.repartitionBeforeWrite.enabled | true | Repartitions output for better file sizes |
delta.enableDeletionVectors | true | Marks rows as deleted without full file rewrites |
spark.databricks.delta.merge.enableLowShuffle | true | Reduces shuffle volume during MERGE |