AI Gateway: Governing and Monitoring Access to AI Models
Who this is for:
Architecture / Concept Overview: AI Gateway: Governing and Monitoring Access to AI Models
AI Gateway sits between applications and model providers, intercepting every request for governance and monitoring.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
APP1[App A] -->|Request| GW[AI Gateway]
APP2[App B] -->|Request| GW
APP3[Agent C] -->|Request| GW
GW -->|Rate Limit| RL[Rate Limiter]
GW -->|Log| LOG[Request Logger]
GW -->|Route| HOSTED[Databricks Models]
GW -->|Route| OPENAI[OpenAI]
GW -->|Route| ANTHRO[Anthropic]
GW -->|Failover| FALLBACK[Fallback Provider]
APP1:::source
APP2:::source
APP3:::source
GW:::governance
RL:::processing
LOG:::storage
HOSTED:::serving
OPENAI:::ingestion
ANTHRO:::ingestion
FALLBACK:::ingestion
*AI Gateway architecture: applications route through a single proxy with rate limiting, logging, and multi-provider failover.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
FEATURES[AI Gateway Features] --> RATE[Rate Limiting]
FEATURES --> LOGGING[Usage Logging]
FEATURES --> ROUTING[Intelligent Routing]
FEATURES --> GUARDRAILS_F[Guardrails]
RATE --> PER_USER[Per-User Limits]
RATE --> PER_EP[Per-Endpoint Limits]
RATE --> TOKEN_LIM[Token Budget Limits]
LOGGING --> TOKENS[Token Usage]
LOGGING --> LATENCY_L[Latency Metrics]
LOGGING --> PAYLOAD[Request/Response Payloads]
ROUTING --> FALLBACK_R[Provider Failover]
ROUTING --> LOAD_BAL[Load Balancing]
FEATURES:::governance
RATE:::processing
LOGGING:::storage
ROUTING:::serving
GUARDRAILS_F:::ingestion
PER_USER:::source
PER_EP:::source
TOKEN_LIM:::source
TOKENS:::source
LATENCY_L:::source
PAYLOAD:::source
FALLBACK_R:::source
LOAD_BAL:::source
*AI Gateway feature set: rate limiting, logging, routing, and guardrails.*
Key Terms
Prerequisites and Setup
- Databricks workspace (Premium or Enterprise).
- Admin access to configure AI Gateway routes and rate limits.
- Provider API keys stored in Databricks Secrets for external models.
Step-by-Step Implementation
Configuration Reference
| Parameter | Default | Description |
|---|---|---|
rate_limits[].key | — | Scope: user or endpoint |
rate_limits[].renewal_period | — | Window: minute, hour, or day |
rate_limits[].calls | — | Maximum calls per window |
usage_tracking_config.enabled | false | Enable token usage tracking |
inference_table_config.enabled | false | Log request/response payloads |
guardrails | {} | Input/output safety filters |