Upserts with MERGE: Change Data Capture (CDC) Patterns

    Who this is for:

    Architecture / Concept Overview: Upserts with MERGE: Change Data Capture (CDC) Patterns

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED CDC[CDC Source] -->|Extract Changes| LAND[Landing Zone] LAND -->|Read Batch / Stream| STAGE[Staging DataFrame] STAGE -->|MERGE INTO| TARGET[Delta Target Table] TARGET -->|Change Data Feed| DOWN[Downstream Consumers] CDC:::source LAND:::ingestion STAGE:::processing TARGET:::storage DOWN:::serving

    *A CDC pipeline extracts change events from a source system, stages them, and applies them to the Delta target via MERGE.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED MERGE[MERGE Statement] --> MATCH[WHEN MATCHED] MERGE --> NOTMATCH[WHEN NOT MATCHED] MERGE --> NOTMATCH_SRC[WHEN NOT MATCHED BY SOURCE] MATCH --> UPD[UPDATE SET ...] MATCH --> DEL[DELETE] NOTMATCH --> INS[INSERT *] NOTMATCH_SRC --> DEL2[DELETE] MERGE:::storage MATCH:::processing NOTMATCH:::ingestion NOTMATCH_SRC:::governance UPD:::processing DEL:::processing INS:::ingestion DEL2:::governance

    *The MERGE statement supports three clauses covering every possible CDC operation: update, insert, and delete.*

    Key Terms

    Prerequisites and Setup

    • Databricks workspace with Unity Catalog enabled
    • A target Delta table with a primary key or natural key for matching
    • CDC events landing in a staging location (cloud storage, Kafka topic, or another Delta table)
    • MODIFY privilege on the target table

    Step-by-Step Implementation

      Configuration Reference

      Upserts with MERGE: Change Data Capture (CDC) Patterns configuration options
      PropertyDefaultDescription
      delta.enableChangeDataFeedfalseRecords row-level changes for CDC consumers
      spark.databricks.delta.merge.optimizeInsertOnlyMerge.enabledtrueOptimises MERGE when only inserts are needed
      spark.databricks.delta.merge.repartitionBeforeWrite.enabledtrueRepartitions output for better file sizes
      delta.enableDeletionVectorstrueMarks rows as deleted without full file rewrites
      spark.databricks.delta.merge.enableLowShuffletrueReduces shuffle volume during MERGE

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions