Azure Event Hubs Guide
Stream millions of events with Azure Event Hubs: namespaces, partitions, Capture, Kafka protocol, Schema Registry, and consumer groups.
Prerequisites
- Basic understanding of event-driven architecture and messaging
- Azure account with Event Hubs permissions
Introduction to Azure Event Hubs
Azure Event Hubs is a fully managed, real-time data streaming platform capable of receiving and processing millions of events per second. It serves as the "front door" for event pipelines, decoupling event producers from event consumers with high throughput, low latency, and seamless scaling. Event Hubs is used for telemetry ingestion, log aggregation, clickstream analytics, IoT data collection, and any scenario requiring reliable high-volume data streaming.
Event Hubs is built on a partitioned consumer model similar to Apache Kafka. In fact, Event Hubs provides a Kafka-compatible endpoint, allowing existing Kafka producers and consumers to connect without code changes. Events are organized into partitions for parallel processing, retained for a configurable period (up to 90 days or unlimited with Event Hubs Capture), and consumed by one or more consumer groups that track their position independently.
This guide covers the complete Event Hubs ecosystem: creating namespaces and event hubs, producing and consuming events, configuring partitions and consumer groups, using the Kafka protocol, enabling Event Hubs Capture for automatic archival, implementing Schema Registry for data governance, monitoring throughput and consumer lag, and designing event-driven architectures with Azure Functions and Stream Analytics.
Event Hubs Pricing Tiers
Event Hubs offers four pricing tiers. Basic: 1 consumer group, 1-day retention, no Kafka support ($0.015/million events). Standard: 20 consumer groups, 7-day retention, Kafka support ($0.028/million events + TU charges). Premium: 100 consumer groups, 90-day retention, dedicated compute ($1.079/processing unit/hour). Dedicated: single-tenant clusters for extreme scale ($7.27/capacity unit/hour). For most production workloads, Standard or Premium is recommended.
Creating an Event Hubs Namespace
An Event Hubs namespace is the management container for one or more event hubs. It provides a unique FQDN (fully qualified domain name), shared access policies, network configuration, and scaling settings. The namespace determines the pricing tier and throughput capacity for all event hubs within it.
# Create a resource group
az group create --name rg-event-hubs --location eastus
# Create an Event Hubs namespace (Standard tier)
az eventhubs namespace create \
--name production-events-ns \
--resource-group rg-event-hubs \
--location eastus \
--sku Standard \
--capacity 4 \
--enable-auto-inflate true \
--maximum-throughput-units 20 \
--enable-kafka true \
--zone-redundant true
# Create an event hub with 8 partitions
az eventhubs eventhub create \
--name order-events \
--namespace-name production-events-ns \
--resource-group rg-event-hubs \
--partition-count 8 \
--message-retention 7 \
--cleanup-policy Delete
# Create a consumer group
az eventhubs eventhub consumer-group create \
--name order-processor \
--eventhub-name order-events \
--namespace-name production-events-ns \
--resource-group rg-event-hubs
# Create additional consumer groups for different processors
az eventhubs eventhub consumer-group create \
--name analytics-pipeline \
--eventhub-name order-events \
--namespace-name production-events-ns \
--resource-group rg-event-hubs
az eventhubs eventhub consumer-group create \
--name audit-logger \
--eventhub-name order-events \
--namespace-name production-events-ns \
--resource-group rg-event-hubs
# List event hubs in the namespace
az eventhubs eventhub list \
--namespace-name production-events-ns \
--resource-group rg-event-hubs \
--query '[].{Name: name, Partitions: partitionCount, Retention: messageRetentionInDays, Status: status}' \
--output tablePartition Count Planning
Each partition supports up to 1 MB/s ingress and 2 MB/s egress. The number of partitions determines your maximum parallel consumer count: within a consumer group, each partition is consumed by exactly one consumer instance. Choose a partition count equal to or greater than the maximum number of parallel consumers you expect. Partitions cannot be reduced after creation (only increased on Premium and Dedicated tiers), so plan for growth. For most workloads, 8-32 partitions is a good starting point.
Producing Events
Events are sent to Event Hubs using the Event Hubs SDK, AMQP protocol, HTTPS, or the Kafka producer protocol. Each event consists of a body (the payload, up to 1 MB), optional properties (key-value metadata), and an optional partition key (used to route events to a specific partition). Events with the same partition key always go to the same partition, preserving ordering for related events.
# Python producer using the Azure Event Hubs SDK
import asyncio
import json
from azure.eventhub.aio import EventHubProducerClient
from azure.eventhub import EventData
from azure.identity.aio import DefaultAzureCredential
from datetime import datetime
async def produce_events():
# Use managed identity authentication
credential = DefaultAzureCredential()
producer = EventHubProducerClient(
fully_qualified_namespace="production-events-ns.servicebus.windows.net",
eventhub_name="order-events",
credential=credential
)
async with producer:
# Create a batch of events (most efficient approach)
event_batch = await producer.create_batch(partition_key="customer-42")
for i in range(100):
event = {
"eventType": "OrderCreated",
"orderId": f"ORD-2026-{i:05d}",
"customerId": "customer-42",
"items": [
{"sku": "WIDGET-001", "quantity": 2, "price": 29.99},
{"sku": "GADGET-042", "quantity": 1, "price": 149.99}
],
"total": 209.97,
"timestamp": datetime.utcnow().isoformat(),
"region": "us-east"
}
event_data = EventData(json.dumps(event))
event_data.properties = {
"event-type": "OrderCreated",
"version": "2.0",
"source": "order-service"
}
try:
event_batch.add(event_data)
except ValueError:
# Batch is full, send it and create a new one
await producer.send_batch(event_batch)
event_batch = await producer.create_batch(partition_key="customer-42")
event_batch.add(event_data)
# Send remaining events in the batch
if len(event_batch) > 0:
await producer.send_batch(event_batch)
print(f"Sent {100} events to order-events")
await credential.close()
asyncio.run(produce_events())Using the Kafka Producer Protocol
# Kafka producer connecting to Event Hubs
from confluent_kafka import Producer
import json
kafka_config = {
'bootstrap.servers': 'production-events-ns.servicebus.windows.net:9093',
'security.protocol': 'SASL_SSL',
'sasl.mechanism': 'PLAIN',
'sasl.username': '$ConnectionString',
'sasl.password': 'Endpoint=sb://production-events-ns.servicebus.windows.net/;SharedAccessKeyName=sender;SharedAccessKey=<key>',
'client.id': 'order-service-producer',
'acks': 'all',
'retries': 3,
'linger.ms': 5,
'batch.size': 16384
}
producer = Producer(kafka_config)
def delivery_callback(err, msg):
if err:
print(f"Delivery failed: {err}")
else:
print(f"Delivered to {msg.topic()}[{msg.partition()}] @ {msg.offset()}")
# Send events using standard Kafka API
for i in range(1000):
event = {
"eventType": "PageView",
"userId": f"user-{i % 100}",
"page": f"/products/{i % 50}",
"timestamp": "2026-03-14T10:30:00Z"
}
producer.produce(
topic="clickstream-events",
key=f"user-{i % 100}",
value=json.dumps(event),
callback=delivery_callback
)
producer.flush(timeout=30)
print("All events sent")Consuming Events
Events are consumed using the Event Hubs SDK, Kafka consumer protocol, or through Azure services like Azure Functions, Stream Analytics, and Azure Databricks. The SDK provides an EventHubConsumerClient with checkpoint-based tracking that resumes from the last processed event after restarts. Checkpoints are stored in Azure Blob Storage.
# Python consumer with checkpoint store
import asyncio
import json
from azure.eventhub.aio import EventHubConsumerClient
from azure.eventhub.extensions.checkpointstoreaio import BlobCheckpointStore
from azure.identity.aio import DefaultAzureCredential
async def on_event(partition_context, event):
"""Process a single event."""
body = json.loads(event.body_as_str())
print(f"Partition: {partition_context.partition_id}, "
f"Offset: {event.offset}, "
f"Sequence: {event.sequence_number}")
print(f" Event: {body.get('eventType')}, "
f"Order: {body.get('orderId')}")
# Process the event (your business logic here)
await process_order(body)
# Checkpoint after processing (enables resume on restart)
await partition_context.update_checkpoint(event)
async def on_error(partition_context, error):
"""Handle errors during event processing."""
if partition_context:
print(f"Error on partition {partition_context.partition_id}: {error}")
else:
print(f"General error: {error}")
async def consume_events():
credential = DefaultAzureCredential()
# Checkpoint store in Blob Storage
checkpoint_store = BlobCheckpointStore(
blob_account_url="https://checkpointstorage.blob.core.windows.net",
container_name="event-checkpoints",
credential=credential
)
consumer = EventHubConsumerClient(
fully_qualified_namespace="production-events-ns.servicebus.windows.net",
eventhub_name="order-events",
consumer_group="order-processor",
credential=credential,
checkpoint_store=checkpoint_store
)
async with consumer:
# Process events from all partitions
await consumer.receive(
on_event=on_event,
on_error=on_error,
starting_position="-1" # Start from beginning if no checkpoint
)
asyncio.run(consume_events())Event Hubs Capture
Event Hubs Capture automatically delivers streaming data to Azure Blob Storage or Azure Data Lake Storage in Avro format. Capture runs continuously, creating files based on configurable time and size windows. This provides zero-code archival of all events for batch analytics, compliance, and disaster recovery without building a custom consumer.
# Enable Capture on an event hub
az eventhubs eventhub update \
--name order-events \
--namespace-name production-events-ns \
--resource-group rg-event-hubs \
--enable-capture true \
--capture-interval 300 \
--capture-size-limit 314572800 \
--destination-name EventHubArchive.AzureBlockBlob \
--storage-account /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.Storage/storageAccounts/capturestorage \
--blob-container event-archive \
--archive-name-format "{Namespace}/{EventHub}/{PartitionId}/{Year}/{Month}/{Day}/{Hour}/{Minute}/{Second}" \
--skip-empty-archives true
# Captured files are stored in Avro format at:
# event-archive/production-events-ns/order-events/0/2026/03/14/10/30/00.avro
# Read captured Avro files with Python
# pip install avro-python3 azure-storage-blob# Read captured Avro files from Blob Storage
import avro.schema
from avro.datafile import DataFileReader
from avro.io import DatumReader
from azure.storage.blob import BlobServiceClient
import io
import json
blob_service = BlobServiceClient(
account_url="https://capturestorage.blob.core.windows.net",
credential=DefaultAzureCredential()
)
container = blob_service.get_container_client("event-archive")
# List captured files for a specific date
blobs = container.list_blobs(
name_starts_with="production-events-ns/order-events/0/2026/03/14/"
)
for blob in blobs:
# Download the Avro file
blob_client = container.get_blob_client(blob.name)
avro_bytes = blob_client.download_blob().readall()
# Read Avro records
reader = DataFileReader(io.BytesIO(avro_bytes), DatumReader())
for record in reader:
body = json.loads(record["Body"].decode())
print(f"Event: {body.get('eventType')}, Order: {body.get('orderId')}")
reader.close()Capture File Sizes
Capture creates files based on the time window (1-15 minutes) or size window (10 MB - 500 MB), whichever occurs first. For high-throughput event hubs, files are typically created at the size limit. For low-throughput event hubs, files are created at the time interval with smaller sizes. Enable skip-empty-archives to avoid creating empty Avro files during periods of no activity.
Schema Registry
Event Hubs Schema Registry provides a centralized repository for managing event schemas. It ensures that producers and consumers agree on the structure of events, enables schema evolution with compatibility checking, and reduces payload size by storing schemas separately from events. Schema Registry supports Avro, JSON, and custom schema formats.
# Create a schema group
az eventhubs namespace schema-registry create \
--namespace-name production-events-ns \
--resource-group rg-event-hubs \
--schema-group-name order-schemas \
--schema-compatibility Forward \
--schema-type Avro
# Register a schema
az eventhubs namespace schema-registry schema create \
--namespace-name production-events-ns \
--resource-group rg-event-hubs \
--schema-group-name order-schemas \
--schema-name OrderCreated \
--schema-version 1 \
--content '{
"type": "record",
"name": "OrderCreated",
"namespace": "com.example.events",
"fields": [
{"name": "orderId", "type": "string"},
{"name": "customerId", "type": "string"},
{"name": "total", "type": "double"},
{"name": "timestamp", "type": "string"},
{"name": "region", "type": ["null", "string"], "default": null}
]
}'Integration with Azure Services
Event Hubs integrates natively with multiple Azure services for event processing, analytics, and archival. These integrations enable building complete event-driven architectures without custom infrastructure.
Integration Options
| Service | Use Case | Processing Model |
|---|---|---|
| Azure Functions | Serverless event processing | Per-event or batched trigger |
| Stream Analytics | Real-time SQL-based analytics | Continuous query over event stream |
| Azure Databricks | Complex stream processing | Structured Streaming (Spark) |
| Azure Data Factory | Batch ETL from captured events | Pipeline with event trigger |
| Azure Data Explorer | Log and telemetry analytics | Direct ingestion with KQL queries |
| Logic Apps | Workflow automation | Event-triggered workflow |
# Azure Function triggered by Event Hubs
import azure.functions as func
import json
import logging
app = func.FunctionApp()
@app.event_hub_message_trigger(
arg_name="events",
event_hub_name="order-events",
connection="EventHubConnection",
consumer_group="order-processor",
cardinality="many"
)
def process_order_events(events: func.EventHubEvent):
"""Process a batch of events from Event Hubs."""
for event in events:
body = json.loads(event.get_body().decode())
logging.info(f"Processing: {body.get('eventType')} - {body.get('orderId')}")
# Extract event metadata
partition_id = event.partition_key
sequence_number = event.sequence_number
enqueued_time = event.enqueued_time
logging.info(
f" Partition: {partition_id}, "
f"Sequence: {sequence_number}, "
f"Enqueued: {enqueued_time}"
)
# Process the event
if body.get('eventType') == 'OrderCreated':
process_new_order(body)
elif body.get('eventType') == 'OrderCancelled':
process_cancellation(body)
logging.info(f"Processed batch of {len(events)} events")Monitoring and Operations
Monitor Event Hubs using Azure Monitor metrics, diagnostic logs, and consumer lag tracking. Key metrics include incoming/outgoing messages, incoming/outgoing bytes, throttled requests, and consumer group lag. Set up alerts for critical conditions like high throttle rates, growing consumer lag, or approaching throughput limits.
# Enable diagnostic logging
az monitor diagnostic-settings create \
--name eventhub-diagnostics \
--resource /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.EventHub/namespaces/production-events-ns \
--workspace /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.OperationalInsights/workspaces/log-analytics \
--logs '[{"category": "ArchiveLogs", "enabled": true}, {"category": "OperationalLogs", "enabled": true}, {"category": "AutoScaleLogs", "enabled": true}, {"category": "KafkaCoordinatorLogs", "enabled": true}]' \
--metrics '[{"category": "AllMetrics", "enabled": true}]'
# Create an alert for high throttling
az monitor metrics alert create \
--name eventhub-throttle-alert \
--resource-group rg-event-hubs \
--scopes /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.EventHub/namespaces/production-events-ns \
--condition "total ThrottledRequests > 100" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--action-group /subscriptions/<sub-id>/resourceGroups/rg-monitoring/providers/Microsoft.Insights/actionGroups/ops-team \
--description "Event Hubs throttling exceeds threshold"
# Check namespace metrics
az monitor metrics list \
--resource /subscriptions/<sub-id>/resourceGroups/rg-event-hubs/providers/Microsoft.EventHub/namespaces/production-events-ns \
--metric "IncomingMessages" "OutgoingMessages" "ThrottledRequests" \
--interval PT1H \
--start-time "2026-03-14T00:00:00Z" \
--end-time "2026-03-14T23:59:59Z"Throughput Units and Auto-Inflate
Each throughput unit (TU) in Standard tier provides 1 MB/s ingress (1000 events/s) and 2 MB/s egress. If your producers exceed the TU capacity, requests are throttled with HTTP 429 errors. Enable Auto-Inflate to automatically scale TUs up to a configured maximum when traffic increases. Auto-Inflate only scales up, not down. To scale down, manually reduce the TU count during off-peak hours. For Premium tier, processing units replace TUs with higher per-unit capacity.
Azure Event Hubs is the foundation of event-driven architectures on Azure. Start with a Standard namespace for most workloads, use partition keys for ordered processing of related events, implement consumer groups for independent processing pipelines, enable Capture for zero-code archival, and integrate with Azure Functions for serverless event processing. For Kafka-compatible workloads, connect existing Kafka producers and consumers to Event Hubs with minimal configuration changes.
Azure Data Factory GuideAzure Virtual WAN GuideAzure Arc GuideKey Takeaways
- 1Event Hubs ingests millions of events per second with configurable retention up to 90 days.
- 2Kafka-compatible endpoints allow existing Kafka producers and consumers to connect without code changes.
- 3Event Hubs Capture automatically archives events to Blob Storage or Data Lake in Avro format.
- 4Schema Registry provides centralized schema management with compatibility checking for data governance.
Frequently Asked Questions
How does Event Hubs compare to Kafka?
What is the maximum throughput of Event Hubs?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.