ACID Transactions on the Data Lake: How Delta Lake Delivers Them
Who this is for:
Architecture / Concept Overview: ACID Transactions on the Data Lake: How Delta Lake Delivers Them
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
W[Writer] -->|1. Write Parquet| OBJ[Object Storage]
W -->|2. Commit JSON| LOG[_delta_log/0000N.json]
LOG -->|3. Atomic Put-If-Absent| OBJ
R[Reader] -->|4. Read Log| LOG
LOG -->|5. Resolve File Set| OBJ
W:::ingestion
OBJ:::source
LOG:::governance
R:::serving
*A Delta write is a two-phase process: data files land first, then a single atomic log commit makes them visible to readers.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
ACID[ACID Properties] --> A[Atomicity]
ACID --> C[Consistency]
ACID --> I[Isolation]
ACID --> D[Durability]
A -->|All-or-nothing commits| IMPL1[Single JSON commit file]
C -->|Schema checked on write| IMPL2[Schema enforcement]
I -->|Snapshot reads| IMPL3[Optimistic concurrency]
D -->|Cloud storage replication| IMPL4[Object store durability]
ACID:::storage
A:::processing
C:::governance
I:::processing
D:::source
IMPL1:::ingestion
IMPL2:::governance
IMPL3:::serving
IMPL4:::source
*Each ACID property maps to a concrete mechanism in the Delta Lake protocol.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
JOB_A[Job A: MERGE] -->|Reads v5| SNAP[Snapshot v5]
JOB_B[Job B: Append] -->|Reads v5| SNAP
JOB_B -->|Commits first| V6[Version 6]
JOB_A -->|Conflict check passes| V7[Version 7]
JOB_A:::processing
JOB_B:::ingestion
SNAP:::source
V6:::storage
V7:::storage
*Optimistic concurrency allows two concurrent writers to succeed as long as they do not modify overlapping data.*
Key Terms
Prerequisites and Setup
- A Databricks workspace with Unity Catalog enabled
- A cluster running Databricks Runtime 13.3 LTS or later
- Two notebooks or jobs that will write to the same table concurrently (for conflict demonstration)
MODIFYprivilege on the target table
Step-by-Step Implementation
Configuration Reference
| Property | Default | Description |
|---|---|---|
delta.isolationLevel | WriteSerializable | Controls conflict detection strictness; Serializable is stricter |
spark.databricks.delta.retryWriteConflict.enabled | true | Automatically retries conflicting commits |
spark.databricks.delta.retryWriteConflict.limit | 3 | Maximum number of automatic retries |
delta.logRetentionDuration | 30 days | Duration to retain commit history |
delta.checkpoint.writeStatsAsJson | true | Includes file statistics in checkpoints for data skipping |