Flows, Streaming Tables, and Materialized Views Explained
Who this is for:
Architecture / Concept Overview: Flows, Streaming Tables, and Materialized Views Explained
Every Declarative Pipeline consists of flows connecting datasets (streaming tables, materialized views, or views). A flow represents one transformation path, while the dataset type determines how data is stored and refreshed.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
SRC[Source Data]:::source --> F1[Flow 1: Ingest]:::ingestion
F1 --> ST[Streaming Table: raw_events]:::storage
ST --> F2[Flow 2: Clean]:::processing
F2 --> ST2[Streaming Table: clean_events]:::storage
ST2 --> F3[Flow 3: Aggregate]:::processing
F3 --> MV[Materialized View: hourly_stats]:::serving
*Three flows connecting source data through streaming tables to a materialized view.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
COMP[Dataset Comparison]:::processing
COMP --> ST[Streaming Table]:::storage
COMP --> MV[Materialized View]:::serving
COMP --> VW[View]:::processing
ST --> SP1[Append-only]:::storage
ST --> SP2[Incremental processing]:::storage
ST --> SP3[Persisted to Delta]:::storage
MV --> MP1[Recomputed on change]:::serving
MV --> MP2[Supports aggregations]:::serving
MV --> MP3[Persisted to Delta]:::serving
VW --> VP1[Not persisted]:::processing
VW --> VP2[Re-evaluated each run]:::processing
VW --> VP3[Intermediate step only]:::processing
*Comparison of the three dataset types and their key characteristics.*
Key Terms
Prerequisites and Setup
- A Databricks workspace with Unity Catalog enabled.
- An existing Declarative Pipeline or permission to create one.
- Familiarity with Python or SQL for defining pipeline datasets.
Step-by-Step Implementation
Configuration Reference
| Parameter | Applies To | Description | Default |
|---|---|---|---|
table_properties.quality | All tables | Metadata tag for medallion layer | None |
spark.databricks.delta.optimizeWrite.enabled | Streaming Tables | Auto-optimize file sizes on write | true |
pipelines.maxFlowRetryAttempts | Flows | Retry count for failed flows | 2 |
continuous | Pipeline | Enables continuous processing mode | false |
photon | Pipeline | Enables Photon-accelerated execution | false |