Data Lineage: Tracking Data Flows Across Your Organisation
Who this is for:
Architecture / Concept Overview: Data Lineage: Tracking Data Flows Across Your Organisation
Unity Catalog captures lineage at runtime — every notebook, job, pipeline, and SQL query that reads from or writes to a Unity Catalog table generates a lineage event.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
NB[Notebook] -->|Writes| SILVER[Silver Table]
JOB[Scheduled Job] -->|Writes| GOLD[Gold Table]
SQL[SQL Query] -->|Reads| GOLD
SILVER -->|Read by| JOB
UC[Unity Catalog<br/>Lineage Capture] -.->|Tracks| NB
UC -.->|Tracks| JOB
UC -.->|Tracks| SQL
NB:::source
JOB:::processing
SQL:::serving
SILVER:::storage
GOLD:::storage
UC:::governance
*Figure 1 — Unity Catalog transparently captures lineage from notebooks, jobs, and SQL queries at runtime.*
Lineage operates at two granularity levels: table-level (which tables feed which tables) and column-level (which columns flow into which columns).
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
TL[Table-Level Lineage] --> UP_TBL[Upstream Tables<br/>What feeds this table?]
TL --> DOWN_TBL[Downstream Tables<br/>What depends on this table?]
CL[Column-Level Lineage] --> UP_COL[Upstream Columns<br/>Which source columns feed this column?]
CL --> DOWN_COL[Downstream Columns<br/>Which columns depend on this column?]
TL:::governance
UP_TBL:::processing
DOWN_TBL:::serving
CL:::governance
UP_COL:::processing
DOWN_COL:::serving
*Figure 2 — Two levels of lineage granularity: table-level for understanding data flow, column-level for impact analysis.*
Key Terms
Prerequisites and Setup
- Unity Catalog enabled on the workspace
- Tables registered in Unity Catalog (lineage is not captured for legacy Hive metastore tables)
- Compute that supports lineage capture (SQL warehouses, jobs, notebooks using Unity Catalog-enabled clusters)
- Access to system tables for programmatic lineage queries
Step-by-Step Implementation
Configuration Reference
| System Table | Description | Key Columns |
|---|---|---|
system.access.table_lineage | Table-level lineage events | source_table_full_name, target_table_full_name, event_time |
system.access.column_lineage | Column-level lineage events | source_column_name, target_column_name, event_time |
| Lineage retention | Default 1 year | Configure via account settings |
| Supported compute | SQL warehouses, jobs, notebooks | Must use Unity Catalog-enabled clusters |