Feature Engineering: Creating and Serving Features at Scale
Who this is for:
Architecture / Concept Overview: Feature Engineering: Creating and Serving Features at Scale
Features flow from raw data through transformation pipelines into a governed Feature Store, then fan out to both training and real-time serving.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
RAW[Raw Tables] -->|Transform| PIPE[Feature Pipelines]
PIPE -->|Write| FT[Feature Tables in UC]
FT -->|Batch| TRAIN[Training Datasets]
FT -->|Publish| ONLINE[Online Store]
ONLINE -->|Lookup| SERVE[Model Serving Endpoint]
TRAIN -->|Train| MODEL[ML Model]
MODEL -->|Deploy| SERVE
RAW:::source
PIPE:::ingestion
FT:::storage
TRAIN:::processing
ONLINE:::serving
SERVE:::serving
MODEL:::processing
*Feature pipeline from raw data to both batch training and online serving with a unified Feature Store.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
UC[Unity Catalog] --> OFFLINE[Offline Store - Delta Table]
UC --> ONLINE_S[Online Store]
UC --> LINEAGE[Data Lineage]
OFFLINE --> PIT[Point-in-Time Lookups]
OFFLINE --> BATCH[Batch Scoring]
ONLINE_S --> RT[Real-Time Serving]
LINEAGE --> DISCOVER[Feature Discovery]
UC:::governance
OFFLINE:::storage
ONLINE_S:::serving
LINEAGE:::governance
PIT:::processing
BATCH:::processing
RT:::serving
DISCOVER:::source
*Feature Store components: offline Delta tables, online store, lineage, and discovery through Unity Catalog.*
Key Terms
Prerequisites and Setup
- Unity Catalog enabled with a catalog and schema for feature tables.
CREATE TABLEandUSE SCHEMAprivileges.- Databricks Runtime for ML.
- For online serving: Model Serving enabled and an online store configured.
Step-by-Step Implementation
Configuration Reference
| Parameter | Default | Description |
|---|---|---|
primary_keys | — | Columns uniquely identifying a feature row |
timestamp_keys | None | Timestamp column for point-in-time lookups |
description | "" | Human-readable description shown in Unity Catalog |
mode (publish) | "merge" | Publish mode: merge (upsert) or overwrite |
exclude_columns | [] | Columns to drop from the training set |
lookup_key | — | Columns to join on when looking up features |