Feature Engineering: Creating and Serving Features at Scale

    Who this is for:

    Architecture / Concept Overview: Feature Engineering: Creating and Serving Features at Scale

    Features flow from raw data through transformation pipelines into a governed Feature Store, then fan out to both training and real-time serving.

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED RAW[Raw Tables] -->|Transform| PIPE[Feature Pipelines] PIPE -->|Write| FT[Feature Tables in UC] FT -->|Batch| TRAIN[Training Datasets] FT -->|Publish| ONLINE[Online Store] ONLINE -->|Lookup| SERVE[Model Serving Endpoint] TRAIN -->|Train| MODEL[ML Model] MODEL -->|Deploy| SERVE RAW:::source PIPE:::ingestion FT:::storage TRAIN:::processing ONLINE:::serving SERVE:::serving MODEL:::processing

    *Feature pipeline from raw data to both batch training and online serving with a unified Feature Store.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED UC[Unity Catalog] --> OFFLINE[Offline Store - Delta Table] UC --> ONLINE_S[Online Store] UC --> LINEAGE[Data Lineage] OFFLINE --> PIT[Point-in-Time Lookups] OFFLINE --> BATCH[Batch Scoring] ONLINE_S --> RT[Real-Time Serving] LINEAGE --> DISCOVER[Feature Discovery] UC:::governance OFFLINE:::storage ONLINE_S:::serving LINEAGE:::governance PIT:::processing BATCH:::processing RT:::serving DISCOVER:::source

    *Feature Store components: offline Delta tables, online store, lineage, and discovery through Unity Catalog.*

    Key Terms

    Prerequisites and Setup

    • Unity Catalog enabled with a catalog and schema for feature tables.
    • CREATE TABLE and USE SCHEMA privileges.
    • Databricks Runtime for ML.
    • For online serving: Model Serving enabled and an online store configured.

    Step-by-Step Implementation

      Configuration Reference

      Feature Engineering: Creating and Serving Features at Scale configuration options
      ParameterDefaultDescription
      primary_keysColumns uniquely identifying a feature row
      timestamp_keysNoneTimestamp column for point-in-time lookups
      description""Human-readable description shown in Unity Catalog
      mode (publish)"merge"Publish mode: merge (upsert) or overwrite
      exclude_columns[]Columns to drop from the training set
      lookup_keyColumns to join on when looking up features

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions