Store Kafka credentials

    Who this is for:

    Architecture / Concept Overview: Store Kafka credentials

    Spark Structured Streaming treats Kafka topics and Event Hubs partitions as unbounded tables. The Kafka connector reads messages by offset, while Event Hubs uses the Kafka-compatible endpoint, allowing you to use the same Spark Kafka API for both systems.

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% flowchart LR classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED P1[Producer Apps]:::source --> KF[Apache Kafka]:::ingestion P2[IoT Devices]:::source --> EH[Azure Event Hubs]:::ingestion KF --> SS[Structured Streaming]:::processing EH --> SS SS --> CK[Checkpoint Store]:::governance SS --> DT[Delta Lake Tables]:::storage DT --> RT[Real-Time Dashboards]:::serving DT --> ML[ML Feature Store]:::serving

    *Kafka and Event Hubs feed into Structured Streaming, which writes to Delta Lake with checkpoint-based exactly-once guarantees.*

    %%{init: {"theme":"base","themeVariables":{"background":"#0B0E14","primaryTextColor":"#E0E6ED","lineColor":"#5D6470","darkMode":true,"primaryColor":"#2E4A4A","secondaryColor":"#374151","secondaryTextColor":"#E0E6ED","tertiaryColor":"#111827","tertiaryTextColor":"#E0E6ED","edgeLabelBackground":"#1f2937"}}}%% graph TD classDef source fill:#3F4B59,stroke:#9CA3AF,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef ingestion fill:#5A4B36,stroke:#C9A86B,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef processing fill:#535072,stroke:#8E82B4,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef storage fill:#2E4A4A,stroke:#5FAFA8,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef serving fill:#3D5550,stroke:#6BB7AA,stroke-width:2px,rx:8,ry:8,color:#E0E6ED classDef governance fill:#5A3F52,stroke:#C28BB0,stroke-width:2px,rx:8,ry:8,color:#E0E6ED MSG[Kafka Message]:::source MSG --> KEY[Key - bytes]:::ingestion MSG --> VAL[Value - bytes]:::ingestion MSG --> TS[Timestamp]:::processing MSG --> TOP[Topic]:::processing MSG --> PAR[Partition]:::processing MSG --> OFF[Offset]:::governance VAL --> DES[Deserialize JSON/Avro/Protobuf]:::processing DES --> DF[Spark DataFrame]:::storage

    *Kafka message structure and the deserialization flow into a Spark DataFrame.*

    Key Terms

    Prerequisites and Setup

    • Databricks Runtime 13.3 LTS or later.
    • Network access from the cluster to the Kafka broker(s) or Event Hubs endpoint.
    • Kafka broker addresses and authentication credentials stored in Databricks Secrets.
    • For Event Hubs: the connection string with SAS key or Managed Identity configured.

    Step-by-Step Implementation

      Configuration Reference

      Store Kafka credentials configuration options
      ParameterDescriptionDefault
      kafka.bootstrap.serversComma-separated list of Kafka broker addressesRequired
      subscribeComma-separated list of topics to subscribe toRequired (or subscribePattern)
      startingOffsetsWhere to start reading: earliest, latest, or JSON offsetslatest
      maxOffsetsPerTriggerRate limit: max offsets per micro-batchNone (unlimited)
      kafka.security.protocolSecurity protocol: PLAINTEXT, SSL, SASL_SSL, SASL_PLAINTEXTPLAINTEXT
      kafka.sasl.mechanismSASL mechanism: PLAIN, SCRAM-SHA-256, SCRAM-SHA-512None
      failOnDataLossFail the query if data loss is detected (offsets out of range)true
      minPartitionsMinimum number of Spark partitions to read from KafkaMatches Kafka partitions

      Monitoring, Cost, and Security Considerations

      Common Pitfalls and Recommended Patterns

        Frequently Asked Questions