ETL with SQL: Streaming Tables and Materialized Views in Databricks SQL

    Who this is for:

    Architecture / Concept Overview: ETL with SQL: Streaming Tables and Materialized Views in Databricks SQL

    Streaming tables and materialised views operate within a DLT pipeline that Databricks manages behind the scenes. You define the transformations in SQL; Databricks handles orchestration, incremental processing, and error recovery.

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED SRC[Cloud Storage<br/>S3 / ADLS / GCS] -->|Auto Loader| BRONZE[Streaming Table<br/>Bronze — Raw Events] BRONZE -->|SQL Transform| SILVER[Materialized View<br/>Silver — Cleaned] SILVER -->|SQL Aggregate| GOLD[Materialized View<br/>Gold — Business Metrics] GOLD --> DASH[Dashboards & BI Tools] SRC:::source BRONZE:::ingestion SILVER:::processing GOLD:::storage DASH:::serving

    *Figure 1 — Medallion pipeline built entirely with SQL: streaming tables ingest, materialised views transform and aggregate.*

    The key difference between streaming tables and materialised views lies in how they process data.

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED ST[Streaming Table] --> ST1[Append-only ingestion] ST --> ST2[Processes new files only] ST --> ST3[Best for raw event data] MV[Materialized View] --> MV1[Full or incremental recompute] MV --> MV2[Handles updates and deletes] MV --> MV3[Best for aggregations and joins] ST:::ingestion ST1:::ingestion ST2:::ingestion ST3:::ingestion MV:::processing MV1:::processing MV2:::processing MV3:::processing

    *Figure 2 — Streaming tables handle append-only ingestion; materialised views handle aggregations and change-aware recomputation.*

    Key Terms

    Prerequisites and Setup

    • Unity Catalog enabled with a schema where you have CREATE TABLE and CREATE MATERIALIZED VIEW privileges
    • An external location configured for raw data ingestion (if using Auto Loader)
    • A SQL warehouse or serverless DLT pipeline
    • Source data in cloud storage (JSON, CSV, Parquet, or Delta format)

    Step-by-Step Implementation

      Configuration Reference

      ETL with SQL: Streaming Tables and Materialized Views in Databricks SQL configuration options
      SettingScopeDefaultNotes
      cloudFiles.formatAuto LoaderRequiredjson, csv, parquet, avro
      cloudFiles.inferColumnTypesAuto LoaderfalseSet true for schema inference
      cloudFiles.schemaLocationAuto LoaderRequired for schema evolutionCheckpoint directory for schema tracking
      Pipeline modeDLT pipelineTriggeredtriggered (batch) or continuous
      Pipeline target schemaDLT pipelineRequiredUnity Catalog schema for output tables
      Refresh schedulePipelineManualCron expression or interval

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions