Reading and Writing Delta Tables with PySpark and SQL
Who this is for:
Architecture / Concept Overview: Reading and Writing Delta Tables with PySpark and SQL
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
SRC[Data Sources] -->|Batch Read| DF[DataFrame]
SRC -->|Streaming Read| SS[Structured Streaming]
DF -->|append / overwrite| DT[Delta Table]
SS -->|micro-batch / continuous| DT
DT -->|spark.read| AN[Analysts]
DT -->|SQL SELECT| BI[BI Tools]
SRC:::source
DF:::processing
SS:::ingestion
DT:::storage
AN:::serving
BI:::serving
*Delta tables accept both batch and streaming writes and serve reads through PySpark DataFrames or SQL queries.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
WRITE[Write Modes] --> APPEND[append]
WRITE --> OVERWRITE[overwrite]
WRITE --> MERGE_W[merge]
WRITE --> REPLACE[replaceWhere]
READ[Read Patterns] --> FULL[Full scan]
READ --> FILTER[Predicate pushdown]
READ --> TT[Time travel]
READ --> CDF[Change Data Feed]
READ --> STREAM[readStream]
WRITE:::ingestion
APPEND:::processing
OVERWRITE:::processing
MERGE_W:::processing
REPLACE:::processing
READ:::serving
FULL:::source
FILTER:::source
TT:::source
CDF:::storage
STREAM:::storage
*Delta Lake supports multiple write modes and read patterns, each suited to different ingestion and consumption use cases.*
Key Terms
Prerequisites and Setup
- Databricks workspace with a cluster running Databricks Runtime 13.3 LTS or later
- A Delta table to read from and write to
SELECTprivilege for reads;MODIFYprivilege for writes
Step-by-Step Implementation
Configuration Reference
| Property | Default | Description |
|---|---|---|
mergeSchema | false | Allows schema evolution during write |
overwriteSchema | false | Replaces the table schema on overwrite |
replaceWhere | — | Predicate defining which partition(s) to atomically replace |
maxRecordsPerFile | — | Limits rows per output file for size control |
optimizeWrite | true | Coalesces output files for better read performance |
readChangeFeed | false | Enables Change Data Feed streaming reads |
startingVersion | — | CDF read starting version |
ignoreChanges | false | Ignores data-modifying commits in streaming reads |