Store Kafka credentials
Who this is for:
Architecture / Concept Overview: Store Kafka credentials
Spark Structured Streaming treats Kafka topics and Event Hubs partitions as unbounded tables. The Kafka connector reads messages by offset, while Event Hubs uses the Kafka-compatible endpoint, allowing you to use the same Spark Kafka API for both systems.
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
flowchart LR
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
P1[Producer Apps]:::source --> KF[Apache Kafka]:::ingestion
P2[IoT Devices]:::source --> EH[Azure Event Hubs]:::ingestion
KF --> SS[Structured Streaming]:::processing
EH --> SS
SS --> CK[Checkpoint Store]:::governance
SS --> DT[Delta Lake Tables]:::storage
DT --> RT[Real-Time Dashboards]:::serving
DT --> ML[ML Feature Store]:::serving
*Kafka and Event Hubs feed into Structured Streaming, which writes to Delta Lake with checkpoint-based exactly-once guarantees.*
%%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%%
graph TD
classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED
MSG[Kafka Message]:::source
MSG --> KEY[Key - bytes]:::ingestion
MSG --> VAL[Value - bytes]:::ingestion
MSG --> TS[Timestamp]:::processing
MSG --> TOP[Topic]:::processing
MSG --> PAR[Partition]:::processing
MSG --> OFF[Offset]:::governance
VAL --> DES[Deserialize JSON/Avro/Protobuf]:::processing
DES --> DF[Spark DataFrame]:::storage
*Kafka message structure and the deserialization flow into a Spark DataFrame.*
Key Terms
Prerequisites and Setup
- Databricks Runtime 13.3 LTS or later.
- Network access from the cluster to the Kafka broker(s) or Event Hubs endpoint.
- Kafka broker addresses and authentication credentials stored in Databricks Secrets.
- For Event Hubs: the connection string with SAS key or Managed Identity configured.
Step-by-Step Implementation
Configuration Reference
| Parameter | Description | Default |
|---|---|---|
kafka.bootstrap.servers | Comma-separated list of Kafka broker addresses | Required |
subscribe | Comma-separated list of topics to subscribe to | Required (or subscribePattern) |
startingOffsets | Where to start reading: earliest, latest, or JSON offsets | latest |
maxOffsetsPerTrigger | Rate limit: max offsets per micro-batch | None (unlimited) |
kafka.security.protocol | Security protocol: PLAINTEXT, SSL, SASL_SSL, SASL_PLAINTEXT | PLAINTEXT |
kafka.sasl.mechanism | SASL mechanism: PLAIN, SCRAM-SHA-256, SCRAM-SHA-512 | None |
failOnDataLoss | Fail the query if data loss is detected (offsets out of range) | true |
minPartitions | Minimum number of Spark partitions to read from Kafka | Matches Kafka partitions |