Model Serving: Deploying Models as REST Endpoints

Who this is for:

Architecture / Concept Overview: Model Serving: Deploying Models as REST Endpoints

Model Serving sits between the model registry and client applications, handling scaling, versioning, and traffic splitting.

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED UC[Unity Catalog Model] -->|Load| EP[Serving Endpoint] EP -->|Route| V1[Served Entity v1 - 90%] EP -->|Route| V2[Served Entity v2 - 10%] V1 -->|Predict| CLIENT[Client Application] V2 -->|Predict| CLIENT EP -->|Log| INF[Inference Table - Delta] INF -->|Monitor| MON[Lakehouse Monitor] UC:::governance EP:::serving V1:::processing V2:::processing CLIENT:::source INF:::storage MON:::governance

*Model Serving architecture: Unity Catalog models are deployed to auto-scaling endpoints with traffic routing and inference logging.*

%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED ENDPOINT[Serving Endpoint] --> CONFIG[Configuration] ENDPOINT --> TRAFFIC[Traffic Policy] ENDPOINT --> MONITOR[Monitoring] CONFIG --> SIZE[Workload Size] CONFIG --> SCALE[Scale to Zero] CONFIG --> GPU_CFG[GPU Serving] TRAFFIC --> SPLIT[A/B Traffic Split] TRAFFIC --> CANARY[Canary Rollout] MONITOR --> LATENCY[Latency Metrics] MONITOR --> ERRORS[Error Rate] MONITOR --> INF_LOG[Inference Logging] ENDPOINT:::governance CONFIG:::processing TRAFFIC:::serving MONITOR:::storage SIZE:::ingestion SCALE:::ingestion GPU_CFG:::ingestion SPLIT:::source CANARY:::source LATENCY:::source ERRORS:::source INF_LOG:::source

*Serving endpoint components: configuration, traffic management, and monitoring.*

Key Terms

Prerequisites and Setup

Premium or Enterprise Databricks workspace.
A model registered in Unity Catalog.
EXECUTE privilege on the registered model for serving.
For GPU serving: GPU-capable workload types enabled in your region.

Step-by-Step Implementation

Configuration Reference

Model Serving: Deploying Models as REST Endpoints configuration options
Parameter	Default	Description
`workload_size`	`Small`	Compute tier: `Small`, `Medium`, `Large`
`scale_to_zero_enabled`	`true`	Scale down to zero replicas when idle
`workload_type`	`CPU`	Set to `GPU_SMALL`, `GPU_MEDIUM`, or `GPU_LARGE` for GPU serving
`traffic_percentage`	`100`	Percentage of traffic routed to each served entity
`auto_capture_config.enabled`	`false`	Enable inference table logging
`environment_vars`	`{}`	Environment variables passed to the serving container

Model Serving: Deploying Models as REST Endpoints

Architecture / Concept Overview: Model Serving: Deploying Models as REST Endpoints

Key Terms

Prerequisites and Setup

Step-by-Step Implementation

Configuration Reference

Monitoring, Cost, and Security Considerations

Common Pitfalls and Recommended Patterns

Frequently Asked Questions

Model Serving: Deploying Models as REST Endpoints

Architecture / Concept Overview: Model Serving: Deploying Models as REST Endpoints

Key Terms

Prerequisites and Setup

Step-by-Step Implementation

Configuration Reference

Monitoring, Cost, and Security Considerations

Common Pitfalls and Recommended Patterns

Frequently Asked Questions

Related Topics