Apache Kafka and Event Hubs: Real-Time Streaming Integrations
Who this is for:
Architecture / Concept Overview: Apache Kafka and Event Hubs: Real-Time Streaming Integrations
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
PRODUCERS[Event Producers]:::source
KAFKA[Apache Kafka / MSK]:::ingestion
EVENTHUB[Azure Event Hubs]:::ingestion
SS[Structured Streaming]:::processing
CHECKPOINT[Checkpoint Store]:::governance
DELTA[Delta Lake Tables]:::storage
SERVE[Serving Layer / Alerts]:::serving
PRODUCERS --> KAFKA --> SS
PRODUCERS --> EVENTHUB --> SS
SS --> CHECKPOINT
SS --> DELTA --> SERVE
*Structured Streaming consumes from Kafka or Event Hubs, maintains checkpoints for exactly-once semantics, and writes results to Delta Lake.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
STREAM[Streaming Pipeline Stages]:::source
INGEST[Ingest / Deserialize]:::ingestion
TRANSFORM[Transform / Enrich]:::processing
WINDOW[Window Aggregations]:::processing
WRITE[Write to Delta]:::storage
MONITOR[Monitor & Alert]:::serving
STREAM --> INGEST --> TRANSFORM --> WINDOW --> WRITE --> MONITOR
*A streaming pipeline follows a linear progression from ingestion through transformation, aggregation, and persistence.*
Key Terms
Prerequisites and Setup
- Databricks workspace with a cluster running DBR 13.3+
- Apache Kafka cluster (Confluent, Amazon MSK, or self-managed) or Azure Event Hubs namespace
- Network connectivity from Databricks to the broker (VNet peering or public endpoints)
- SASL/SSL credentials or SAS tokens for authentication
- Cloud storage path for checkpoint locations
Step-by-Step Implementation
Configuration Reference
| Parameter | Kafka Setting | Event Hubs Setting | Description |
|---|---|---|---|
| Broker Address | kafka.bootstrap.servers | Connection string | Endpoint for the message broker |
| Topic/Entity | subscribe | Part of connection string | Source topic or event hub name |
| Consumer Group | group.id | eventhubs.consumerGroup | Parallel consumption group |
| Starting Position | startingOffsets | eventhubs.startingPosition | Where to begin reading |
| Authentication | kafka.sasl.* | SAS token in connection string | Credentials for access |
| Max Offsets/Trigger | maxOffsetsPerTrigger | maxEventsPerTrigger | Rate limiting per batch |