Azure Cosmos DB Deep Dive
Master Cosmos DB: consistency levels, partitioning, global distribution, change feed, and cost optimization.
Prerequisites
- Basic understanding of NoSQL databases and distributed systems
- Azure account with Cosmos DB permissions
Azure Cosmos DB: Globally Distributed, Multi-Model Database
Azure Cosmos DB is a fully managed, globally distributed, multi-model database service designed for mission-critical applications requiring single-digit millisecond latency, automatic and elastic scalability, and guaranteed availability. It supports multiple APIs (NoSQL, MongoDB, Cassandra, Gremlin, Table) and offers five consistency levels that let you trade off between consistency and performance based on your application requirements.
Unlike traditional databases that require complex replication setups for global distribution, Cosmos DB provides turnkey global distribution. You add or remove regions with a single click, and your data is automatically replicated with the consistency level you choose. This makes it ideal for globally distributed applications, real-time personalization, IoT telemetry, gaming leaderboards, and e-commerce catalogs.
This guide covers the five consistency levels in depth, partition key strategy, request unit (RU) optimization, global distribution patterns, change feed for event-driven architectures, and cost optimization techniques.
Free Tier
Cosmos DB offers a free tier with 1000 RU/s provisioned throughput and 25 GB of storage per account. This is enough to build and test applications without any cost. The free tier is available for the lifetime of the account (one free tier account per subscription).
The Five Consistency Levels
Cosmos DB's most distinctive feature is its five well-defined consistency levels. Most databases offer only two options: strong consistency or eventual consistency. Cosmos DB provides three additional levels between these extremes, giving you fine-grained control over the consistency-latency-availability tradeoff.
| Level | Guarantee | Latency | Availability | Use Case |
|---|---|---|---|---|
| Strong | Linearizability (reads always return the latest write) | Higher (waits for quorum) | Lower during region outage | Financial transactions, inventory |
| Bounded Staleness | Reads lag writes by at most K versions or T seconds | Medium | High | Leaderboards, analytics with freshness SLA |
| Session | Read-your-own-writes within a session | Low | High | User profiles, shopping carts (default) |
| Consistent Prefix | Reads never see out-of-order writes | Low | High | Social feeds, activity timelines |
| Eventual | No ordering guarantee, lowest latency | Lowest | Highest | Vote counts, likes, non-critical counters |
# Set default consistency level for an account
az cosmosdb update \
--name mycosmosdb \
--resource-group myapp-rg \
--default-consistency-level Session
# Override consistency per-request (SDK example)
# Python SDK:
# container.read_item(item="id1", partition_key="pk1",
# consistency_level=ConsistencyLevel.Strong)
# Check current consistency settings
az cosmosdb show \
--name mycosmosdb \
--resource-group myapp-rg \
--query '{DefaultConsistency:consistencyPolicy.defaultConsistencyLevel, MaxStaleness:consistencyPolicy.maxStalenessPrefix}'Strong Consistency and Multi-Region
Strong consistency is not available for multi-region write (multi-master) accounts. If you need strong consistency with global distribution, use single-region write with multi-region read replicas. The write region provides strong consistency, and read regions serve reads at the chosen consistency level. For multi-region write accounts, Session consistency is the strongest option available.
Partition Key Strategy
Choosing the right partition key is the most important design decision for Cosmos DB. The partition key determines how your data is distributed across physical partitions, which directly impacts performance, scalability, and cost. A bad partition key creates hot partitions that throttle requests and waste provisioned throughput.
A good partition key has high cardinality (many distinct values), distributes requests evenly across partitions, and is used as a filter in your most common queries. The partition key should be a property that appears in every document and is always known when querying.
Partition Key Examples
| Scenario | Good Key | Bad Key | Why |
|---|---|---|---|
| E-commerce orders | /customerId | /orderDate | High cardinality, even distribution |
| IoT telemetry | /deviceId | /sensorType | Many devices vs few sensor types |
| Multi-tenant SaaS | /tenantId | /region | Tenant isolation, even distribution |
| Social media posts | /userId | /category | User-centric queries, many users |
# Create a container with a partition key
az cosmosdb sql container create \
--account-name mycosmosdb \
--database-name myapp \
--name orders \
--resource-group myapp-rg \
--partition-key-path "/customerId" \
--throughput 4000
# Create a container with hierarchical partition keys (preview)
az cosmosdb sql container create \
--account-name mycosmosdb \
--database-name myapp \
--name events \
--resource-group myapp-rg \
--partition-key-path "/tenantId" "/userId" "/sessionId" \
--throughput 10000Request Units (RU) and Throughput
Cosmos DB measures throughput in Request Units per second (RU/s). A Request Unit is a normalized measure of the compute, memory, and I/O required to process a request. A point read (get a single 1 KB item by id and partition key) costs 1 RU. More complex operations cost more: writes cost approximately 5-10 RU, queries vary based on complexity.
# Set provisioned throughput
az cosmosdb sql container throughput update \
--account-name mycosmosdb \
--database-name myapp \
--name orders \
--resource-group myapp-rg \
--throughput 10000
# Enable autoscale (scales between 10% and max RU/s)
az cosmosdb sql container throughput migrate \
--account-name mycosmosdb \
--database-name myapp \
--name orders \
--resource-group myapp-rg \
--throughput-type autoscale
az cosmosdb sql container throughput update \
--account-name mycosmosdb \
--database-name myapp \
--name orders \
--resource-group myapp-rg \
--max-throughput 20000
# Check RU consumption for a query (SDK response headers)
# Python: response.get_response_headers()['x-ms-request-charge']
# JavaScript: response.requestCharge
# Monitor RU consumption
az cosmosdb sql container throughput show \
--account-name mycosmosdb \
--database-name myapp \
--name orders \
--resource-group myapp-rgServerless Mode
For development, testing, or bursty workloads, consider Cosmos DB Serverless mode. Instead of provisioning RU/s, you pay per RU consumed. This eliminates the need to right-size throughput and avoids paying for idle capacity. Serverless mode supports up to 5,000 RU/s per container and is limited to a single region.
Global Distribution
Cosmos DB supports turnkey global distribution with multi-region reads and optional multi-region writes. Data is automatically replicated to all configured regions with the consistency level you specify. Adding or removing regions takes minutes and requires no application downtime.
# Add read regions
az cosmosdb update \
--name mycosmosdb \
--resource-group myapp-rg \
--locations regionName=eastus failoverPriority=0 isZoneRedundant=true \
--locations regionName=westeurope failoverPriority=1 isZoneRedundant=true \
--locations regionName=southeastasia failoverPriority=2 isZoneRedundant=false
# Enable multi-region writes (multi-master)
az cosmosdb update \
--name mycosmosdb \
--resource-group myapp-rg \
--enable-multiple-write-locations true
# Configure automatic failover
az cosmosdb update \
--name mycosmosdb \
--resource-group myapp-rg \
--enable-automatic-failover true
# Trigger a manual failover (for DR testing)
az cosmosdb failover-priority-change \
--name mycosmosdb \
--resource-group myapp-rg \
--failover-policies regionName=westeurope failoverPriority=0 \
--failover-policies regionName=eastus failoverPriority=1Terraform Multi-Region Configuration
resource "azurerm_cosmosdb_account" "main" {
name = "mycosmosdb"
location = "eastus"
resource_group_name = azurerm_resource_group.main.name
offer_type = "Standard"
automatic_failover_enabled = true
multi_region_write_enabled = false
consistency_policy {
consistency_level = "Session"
max_interval_in_seconds = 5
max_staleness_prefix = 100
}
geo_location {
location = "eastus"
failover_priority = 0
zone_redundant = true
}
geo_location {
location = "westeurope"
failover_priority = 1
zone_redundant = true
}
geo_location {
location = "southeastasia"
failover_priority = 2
zone_redundant = false
}
tags = {
Environment = "production"
}
}Change Feed
The Cosmos DB change feed provides an ordered log of changes (inserts and updates) to a container. It enables event-driven architectures by allowing you to react to data changes in real time. The change feed is available for every container at no additional cost and supports both push model (Azure Functions trigger) and pull model (SDK).
# Azure Functions Cosmos DB trigger (Python)
import azure.functions as func
import json
import logging
app = func.FunctionApp()
@app.cosmos_db_trigger(
arg_name="documents",
container_name="orders",
database_name="myapp",
connection="CosmosDBConnection",
lease_container_name="leases",
create_lease_container_if_not_exists=True,
)
def process_order_changes(documents: func.DocumentList):
"""Process changes from the orders container."""
for doc in documents:
order = doc.to_dict()
logging.info(f"Order changed: {order['id']}, status: {order.get('status')}")
if order.get("status") == "confirmed":
# Trigger downstream processing
send_confirmation_email(order)
update_inventory(order)
publish_analytics_event(order)
def send_confirmation_email(order):
logging.info(f"Sending confirmation for order {order['id']}")
def update_inventory(order):
logging.info(f"Updating inventory for order {order['id']}")
def publish_analytics_event(order):
logging.info(f"Publishing analytics for order {order['id']}")Indexing and Query Optimization
Cosmos DB automatically indexes all properties in every document by default. While this makes ad-hoc queries fast, it increases storage and RU consumption for writes. You can customize the indexing policy to include only the paths you query, significantly reducing write costs and storage.
{
"indexingMode": "consistent",
"includedPaths": [
{ "path": "/customerId/?" },
{ "path": "/status/?" },
{ "path": "/createdAt/?" },
{ "path": "/total/?" }
],
"excludedPaths": [
{ "path": "/metadata/*" },
{ "path": "/lineItems/*" },
{ "path": "/*" }
],
"compositeIndexes": [
[
{ "path": "/customerId", "order": "ascending" },
{ "path": "/createdAt", "order": "descending" }
],
[
{ "path": "/status", "order": "ascending" },
{ "path": "/total", "order": "descending" }
]
],
"spatialIndexes": [
{ "path": "/location/*", "types": ["Point", "Polygon"] }
]
}Cost Optimization Strategies
Cosmos DB costs are driven by provisioned RU/s and storage. The following strategies help optimize costs without sacrificing performance.
| Strategy | Impact | Implementation |
|---|---|---|
| Right-size RU/s | 30-60% savings | Use autoscale or analyze Metrics to right-size |
| Custom indexing | 20-40% write RU reduction | Exclude unqueried paths from indexing |
| Point reads over queries | 1 RU vs 3-100+ RU | Fetch by id + partition key when possible |
| Reserved capacity | Up to 65% discount | 1-year or 3-year reservation |
| TTL for auto-deletion | Storage savings | Set TTL on containers or documents |
| Serverless for dev/test | Pay only for usage | Use serverless mode for non-production |
# Enable TTL on a container (auto-delete old documents)
az cosmosdb sql container update \
--account-name mycosmosdb \
--database-name myapp \
--name events \
--resource-group myapp-rg \
--analytical-storage-ttl -1 \
--default-ttl 2592000 # 30 days in seconds
# View RU consumption metrics
az monitor metrics list \
--resource "/subscriptions/SUB_ID/resourceGroups/myapp-rg/providers/Microsoft.DocumentDB/databaseAccounts/mycosmosdb" \
--metric "TotalRequestUnits" \
--interval PT1H \
--start-time "2026-03-13T00:00:00Z" \
--end-time "2026-03-14T00:00:00Z" \
--aggregation TotalUse Analytical Store for Analytics
Enable Cosmos DB Analytical Store (Azure Synapse Link) for analytical queries instead of running expensive aggregations against your transactional store. The analytical store is a column-oriented copy of your data optimized for analytical queries. It does not consume provisioned RU/s and can be queried directly from Azure Synapse Analytics using Spark or serverless SQL.
Key Takeaways
- 1Cosmos DB offers five consistency levels between strong and eventual for fine-grained tradeoffs.
- 2Partition key choice is the most critical design decision, impacting performance and cost.
- 3Request Units (RU/s) measure throughput; autoscale and serverless modes reduce waste.
- 4Change feed enables event-driven architectures with at-least-once delivery guarantees.
Frequently Asked Questions
What is the default consistency level in Cosmos DB?
How do I choose the right partition key?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.