AzureMessaging & Eventsintermediate

Azure Event Hubs Guide

Stream millions of events with Azure Event Hubs: namespaces, partitions, Capture, Kafka protocol, Schema Registry, and consumer groups.

CloudToolStack Editorial24 min readPublished Mar 14, 2026

Prerequisites

Basic understanding of event-driven architecture and messaging
Azure account with Event Hubs permissions

Introduction to Azure Event Hubs

Azure Event Hubs is a fully managed, real-time data streaming platform capable of receiving and processing millions of events per second. It serves as the "front door" for event pipelines, decoupling event producers from event consumers with high throughput, low latency, and seamless scaling. Event Hubs is used for telemetry ingestion, log aggregation, clickstream analytics, IoT data collection, and any scenario requiring reliable high-volume data streaming.

Event Hubs is built on a partitioned consumer model similar to Apache Kafka. In fact, Event Hubs provides a Kafka-compatible endpoint, allowing existing Kafka producers and consumers to connect without code changes. Events are organized into partitions for parallel processing, retained for a configurable period (up to 90 days or unlimited with Event Hubs Capture), and consumed by one or more consumer groups that track their position independently.

This guide covers the complete Event Hubs ecosystem: creating namespaces and event hubs, producing and consuming events, configuring partitions and consumer groups, using the Kafka protocol, enabling Event Hubs Capture for automatic archival, implementing Schema Registry for data governance, monitoring throughput and consumer lag, and designing event-driven architectures with Azure Functions and Stream Analytics.

Event Hubs Pricing Tiers

Event Hubs offers four pricing tiers. Basic: 1 consumer group, 1-day retention, no Kafka support ($0.015/million events). Standard: 20 consumer groups, 7-day retention, Kafka support ($0.028/million events + TU charges). Premium: 100 consumer groups, 90-day retention, dedicated compute ($1.079/processing unit/hour). Dedicated: single-tenant clusters for extreme scale ($7.27/capacity unit/hour). For most production workloads, Standard or Premium is recommended.

Creating an Event Hubs Namespace

An Event Hubs namespace is the management container for one or more event hubs. It provides a unique FQDN (fully qualified domain name), shared access policies, network configuration, and scaling settings. The namespace determines the pricing tier and throughput capacity for all event hubs within it.

bash

# Create a resource group
az group create --name rg-event-hubs --location eastus

# Create an Event Hubs namespace (Standard tier)
az eventhubs namespace create \
  --name production-events-ns \
  --resource-group rg-event-hubs \
  --location eastus \
  --sku Standard \
  --capacity 4 \
  --enable-auto-inflate true \
  --maximum-throughput-units 20 \
  --enable-kafka true \
  --zone-redundant true

# Create an event hub with 8 partitions
az eventhubs eventhub create \
  --name order-events \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs \
  --partition-count 8 \
  --message-retention 7 \
  --cleanup-policy Delete

# Create a consumer group
az eventhubs eventhub consumer-group create \
  --name order-processor \
  --eventhub-name order-events \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs

# Create additional consumer groups for different processors
az eventhubs eventhub consumer-group create \
  --name analytics-pipeline \
  --eventhub-name order-events \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs

az eventhubs eventhub consumer-group create \
  --name audit-logger \
  --eventhub-name order-events \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs

# List event hubs in the namespace
az eventhubs eventhub list \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs \
  --query '[].{Name: name, Partitions: partitionCount, Retention: messageRetentionInDays, Status: status}' \
  --output table

Partition Count Planning

Each partition supports up to 1 MB/s ingress and 2 MB/s egress. The number of partitions determines your maximum parallel consumer count: within a consumer group, each partition is consumed by exactly one consumer instance. Choose a partition count equal to or greater than the maximum number of parallel consumers you expect. Partitions cannot be reduced after creation (only increased on Premium and Dedicated tiers), so plan for growth. For most workloads, 8-32 partitions is a good starting point.

Producing Events

Events are sent to Event Hubs using the Event Hubs SDK, AMQP protocol, HTTPS, or the Kafka producer protocol. Each event consists of a body (the payload, up to 1 MB), optional properties (key-value metadata), and an optional partition key (used to route events to a specific partition). Events with the same partition key always go to the same partition, preserving ordering for related events.

python

# Python producer using the Azure Event Hubs SDK
import asyncio
import json
from azure.eventhub.aio import EventHubProducerClient
from azure.eventhub import EventData
from azure.identity.aio import DefaultAzureCredential
from datetime import datetime

async def produce_events():
    # Use managed identity authentication
    credential = DefaultAzureCredential()
    producer = EventHubProducerClient(
        fully_qualified_namespace="production-events-ns.servicebus.windows.net",
        eventhub_name="order-events",
        credential=credential
    )

    async with producer:
        # Create a batch of events (most efficient approach)
        event_batch = await producer.create_batch(partition_key="customer-42")

        for i in range(100):
            event = {
                "eventType": "OrderCreated",
                "orderId": f"ORD-2026-{i:05d}",
                "customerId": "customer-42",
                "items": [
                    {"sku": "WIDGET-001", "quantity": 2, "price": 29.99},
                    {"sku": "GADGET-042", "quantity": 1, "price": 149.99}
                ],
                "total": 209.97,
                "timestamp": datetime.utcnow().isoformat(),
                "region": "us-east"
            }

            event_data = EventData(json.dumps(event))
            event_data.properties = {
                "event-type": "OrderCreated",
                "version": "2.0",
                "source": "order-service"
            }

            try:
                event_batch.add(event_data)
            except ValueError:
                # Batch is full, send it and create a new one
                await producer.send_batch(event_batch)
                event_batch = await producer.create_batch(partition_key="customer-42")
                event_batch.add(event_data)

        # Send remaining events in the batch
        if len(event_batch) > 0:
            await producer.send_batch(event_batch)

    print(f"Sent {100} events to order-events")
    await credential.close()

asyncio.run(produce_events())

Using the Kafka Producer Protocol

python

# Kafka producer connecting to Event Hubs
from confluent_kafka import Producer
import json

kafka_config = {
    'bootstrap.servers': 'production-events-ns.servicebus.windows.net:9093',
    'security.protocol': 'SASL_SSL',
    'sasl.mechanism': 'PLAIN',
    'sasl.username': '$ConnectionString',
    'sasl.password': 'Endpoint=sb://production-events-ns.servicebus.windows.net/;SharedAccessKeyName=sender;SharedAccessKey=<key>',
    'client.id': 'order-service-producer',
    'acks': 'all',
    'retries': 3,
    'linger.ms': 5,
    'batch.size': 16384
}

producer = Producer(kafka_config)

def delivery_callback(err, msg):
    if err:
        print(f"Delivery failed: {err}")
    else:
        print(f"Delivered to {msg.topic()}[{msg.partition()}] @ {msg.offset()}")

# Send events using standard Kafka API
for i in range(1000):
    event = {
        "eventType": "PageView",
        "userId": f"user-{i % 100}",
        "page": f"/products/{i % 50}",
        "timestamp": "2026-03-14T10:30:00Z"
    }
    producer.produce(
        topic="clickstream-events",
        key=f"user-{i % 100}",
        value=json.dumps(event),
        callback=delivery_callback
    )

producer.flush(timeout=30)
print("All events sent")

Consuming Events

Events are consumed using the Event Hubs SDK, Kafka consumer protocol, or through Azure services like Azure Functions, Stream Analytics, and Azure Databricks. The SDK provides an EventHubConsumerClient with checkpoint-based tracking that resumes from the last processed event after restarts. Checkpoints are stored in Azure Blob Storage.

python

# Python consumer with checkpoint store
import asyncio
import json
from azure.eventhub.aio import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreaio import BlobCheckpointStore
from azure.identity.aio import DefaultAzureCredential

async def on_event(partition_context, event):
    """Process a single event."""
    body = json.loads(event.body_as_str())
    print(f"Partition: {partition_context.partition_id}, "
          f"Offset: {event.offset}, "
          f"Sequence: {event.sequence_number}")
    print(f"  Event: {body.get('eventType')}, "
          f"Order: {body.get('orderId')}")

    # Process the event (your business logic here)
    await process_order(body)

    # Checkpoint after processing (enables resume on restart)
    await partition_context.update_checkpoint(event)

async def on_error(partition_context, error):
    """Handle errors during event processing."""
    if partition_context:
        print(f"Error on partition {partition_context.partition_id}: {error}")
    else:
        print(f"General error: {error}")

async def consume_events():
    credential = DefaultAzureCredential()

    # Checkpoint store in Blob Storage
    checkpoint_store = BlobCheckpointStore(
        blob_account_url="https://checkpointstorage.blob.core.windows.net",
        container_name="event-checkpoints",
        credential=credential
    )

    consumer = EventHubConsumerClient(
        fully_qualified_namespace="production-events-ns.servicebus.windows.net",
        eventhub_name="order-events",
        consumer_group="order-processor",
        credential=credential,
        checkpoint_store=checkpoint_store
    )

    async with consumer:
        # Process events from all partitions
        await consumer.receive(
            on_event=on_event,
            on_error=on_error,
            starting_position="-1"  # Start from beginning if no checkpoint
        )

asyncio.run(consume_events())

Event Hubs Capture

Event Hubs Capture automatically delivers streaming data to Azure Blob Storage or Azure Data Lake Storage in Avro format. Capture runs continuously, creating files based on configurable time and size windows. This provides zero-code archival of all events for batch analytics, compliance, and disaster recovery without building a custom consumer.

bash

# Enable Capture on an event hub
az eventhubs eventhub update \
  --name order-events \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs \
  --enable-capture true \
  --capture-interval 300 \
  --capture-size-limit 314572800 \
  --destination-name EventHubArchive.AzureBlockBlob \
  --storage-account /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.Storage/storageAccounts/capturestorage \
  --blob-container event-archive \
  --archive-name-format "{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}" \
  --skip-empty-archives true

# Captured files are stored in Avro format at:
# event-archive/production-events-ns/order-events/0/2026/03/14/10/30/00.avro

# Read captured Avro files with Python
# pip install avro-python3 azure-storage-blob

python

# Read captured Avro files from Blob Storage
import avro.schema
from avro.datafile import DataFileReader
from avro.io import DatumReader
from azure.storage.blob import BlobServiceClient
import io
import json

blob_service = BlobServiceClient(
    account_url="https://capturestorage.blob.core.windows.net",
    credential=DefaultAzureCredential()
)

container = blob_service.get_container_client("event-archive")

# List captured files for a specific date
blobs = container.list_blobs(
    name_starts_with="production-events-ns/order-events/0/2026/03/14/"
)

for blob in blobs:
    # Download the Avro file
    blob_client = container.get_blob_client(blob.name)
    avro_bytes = blob_client.download_blob().readall()

    # Read Avro records
    reader = DataFileReader(io.BytesIO(avro_bytes), DatumReader())
    for record in reader:
        body = json.loads(record["Body"].decode())
        print(f"Event: {body.get('eventType')}, Order: {body.get('orderId')}")
    reader.close()

Capture File Sizes

Capture creates files based on the time window (1-15 minutes) or size window (10 MB - 500 MB), whichever occurs first. For high-throughput event hubs, files are typically created at the size limit. For low-throughput event hubs, files are created at the time interval with smaller sizes. Enable skip-empty-archives to avoid creating empty Avro files during periods of no activity.

Schema Registry

Event Hubs Schema Registry provides a centralized repository for managing event schemas. It ensures that producers and consumers agree on the structure of events, enables schema evolution with compatibility checking, and reduces payload size by storing schemas separately from events. Schema Registry supports Avro, JSON, and custom schema formats.

bash

# Create a schema group
az eventhubs namespace schema-registry create \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs \
  --schema-group-name order-schemas \
  --schema-compatibility Forward \
  --schema-type Avro

# Register a schema
az eventhubs namespace schema-registry schema create \
  --namespace-name production-events-ns \
  --resource-group rg-event-hubs \
  --schema-group-name order-schemas \
  --schema-name OrderCreated \
  --schema-version 1 \
  --content '{
    "type": "record",
    "name": "OrderCreated",
    "namespace": "com.example.events",
    "fields": [
      {"name": "orderId", "type": "string"},
      {"name": "customerId", "type": "string"},
      {"name": "total", "type": "double"},
      {"name": "timestamp", "type": "string"},
      {"name": "region", "type": ["null", "string"], "default": null}
    ]
  }'

Integration with Azure Services

Event Hubs integrates natively with multiple Azure services for event processing, analytics, and archival. These integrations enable building complete event-driven architectures without custom infrastructure.

Integration Options

Service	Use Case	Processing Model
Azure Functions	Serverless event processing	Per-event or batched trigger
Stream Analytics	Real-time SQL-based analytics	Continuous query over event stream
Azure Databricks	Complex stream processing	Structured Streaming (Spark)
Azure Data Factory	Batch ETL from captured events	Pipeline with event trigger
Azure Data Explorer	Log and telemetry analytics	Direct ingestion with KQL queries
Logic Apps	Workflow automation	Event-triggered workflow

python

# Azure Function triggered by Event Hubs
import azure.functions as func
import json
import logging

app = func.FunctionApp()

@app.event_hub_message_trigger(
    arg_name="events",
    event_hub_name="order-events",
    connection="EventHubConnection",
    consumer_group="order-processor",
    cardinality="many"
)
def process_order_events(events: func.EventHubEvent):
    """Process a batch of events from Event Hubs."""
    for event in events:
        body = json.loads(event.get_body().decode())
        logging.info(f"Processing: {body.get('eventType')} - {body.get('orderId')}")

        # Extract event metadata
        partition_id = event.partition_key
        sequence_number = event.sequence_number
        enqueued_time = event.enqueued_time

        logging.info(
            f"  Partition: {partition_id}, "
            f"Sequence: {sequence_number}, "
            f"Enqueued: {enqueued_time}"
        )

        # Process the event
        if body.get('eventType') == 'OrderCreated':
            process_new_order(body)
        elif body.get('eventType') == 'OrderCancelled':
            process_cancellation(body)

    logging.info(f"Processed batch of {len(events)} events")

Monitoring and Operations

Monitor Event Hubs using Azure Monitor metrics, diagnostic logs, and consumer lag tracking. Key metrics include incoming/outgoing messages, incoming/outgoing bytes, throttled requests, and consumer group lag. Set up alerts for critical conditions like high throttle rates, growing consumer lag, or approaching throughput limits.

bash

# Enable diagnostic logging
az monitor diagnostic-settings create \
  --name eventhub-diagnostics \
  --resource /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.EventHub/namespaces/production-events-ns \
  --workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/log-analytics \
  --logs '[{"category": "ArchiveLogs", "enabled": true}, {"category": "OperationalLogs", "enabled": true}, {"category": "AutoScaleLogs", "enabled": true}, {"category": "KafkaCoordinatorLogs", "enabled": true}]' \
  --metrics '[{"category": "AllMetrics", "enabled": true}]'

# Create an alert for high throttling
az monitor metrics alert create \
  --name eventhub-throttle-alert \
  --resource-group rg-event-hubs \
  --scopes /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.EventHub/namespaces/production-events-ns \
  --condition "total ThrottledRequests > 100" \
  --window-size 5m \
  --evaluation-frequency 1m \
  --severity 2 \
  --action-group /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Insights/actionGroups/ops-team \
  --description "Event Hubs throttling exceeds threshold"

# Check namespace metrics
az monitor metrics list \
  --resource /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.EventHub/namespaces/production-events-ns \
  --metric "IncomingMessages" "OutgoingMessages" "ThrottledRequests" \
  --interval PT1H \
  --start-time "2026-03-14T00:00:00Z" \
  --end-time "2026-03-14T23:59:59Z"

Throughput Units and Auto-Inflate

Each throughput unit (TU) in Standard tier provides 1 MB/s ingress (1000 events/s) and 2 MB/s egress. If your producers exceed the TU capacity, requests are throttled with HTTP 429 errors. Enable Auto-Inflate to automatically scale TUs up to a configured maximum when traffic increases. Auto-Inflate only scales up, not down. To scale down, manually reduce the TU count during off-peak hours. For Premium tier, processing units replace TUs with higher per-unit capacity.

Azure Event Hubs is the foundation of event-driven architectures on Azure. Start with a Standard namespace for most workloads, use partition keys for ordered processing of related events, implement consumer groups for independent processing pipelines, enable Capture for zero-code archival, and integrate with Azure Functions for serverless event processing. For Kafka-compatible workloads, connect existing Kafka producers and consumers to Event Hubs with minimal configuration changes.

Azure Data Factory Guide Azure Virtual WAN Guide Azure Arc Guide

Key Takeaways

1Event Hubs ingests millions of events per second with configurable retention up to 90 days.
2Kafka-compatible endpoints allow existing Kafka producers and consumers to connect without code changes.
3Event Hubs Capture automatically archives events to Blob Storage or Data Lake in Avro format.
4Schema Registry provides centralized schema management with compatibility checking for data governance.

Frequently Asked Questions

How does Event Hubs compare to Kafka?

Event Hubs is a managed service equivalent to Kafka with Kafka-compatible endpoints. It eliminates cluster management, patching, and scaling. Key differences: Event Hubs uses throughput units for scaling (vs. brokers/partitions), supports up to 90-day retention (vs. configurable in Kafka), and integrates natively with Azure services. Existing Kafka applications can connect with minimal configuration changes.

What is the maximum throughput of Event Hubs?

Standard tier: each throughput unit provides 1 MB/s ingress and 2 MB/s egress, up to 40 TUs with Auto-Inflate. Premium tier: each processing unit provides higher throughput with dedicated resources. Dedicated tier: single-tenant clusters with capacity units providing up to 200 MB/s ingress. Individual events can be up to 1 MB.

Written by CloudToolStack Editorial

Written and reviewed by the CloudToolStack editorial team. Every guide is verified against current provider documentation and revised in place when providers change pricing, deprecate services, or release meaningfully better alternatives.

Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.