Cosmos DB Consistency Levels
Understand the five Cosmos DB consistency levels and their trade-offs for global distribution.
Prerequisites
- Azure subscription with Cosmos DB access
- Understanding of distributed database concepts
- Basic knowledge of CAP theorem
Cosmos DB Consistency Levels Explained
Azure Cosmos DB is a globally distributed, multi-model database service designed for applications that need single-digit millisecond response times at any scale, anywhere in the world. One of its most distinctive and powerful features is the ability to choose from five well-defined consistency levels, each representing a different trade-off between data freshness, availability, latency, and throughput.
Understanding these consistency levels is critical for building applications that behave correctly while maximizing performance and minimizing cost. Choosing the wrong consistency level can result in applications that show stale data (causing user confusion or business logic errors), or applications that pay a significant performance and cost penalty for guarantees they don't actually need. This guide explains each level in depth, provides practical code examples, and offers a framework for selecting the right consistency level for your specific use case.
The Consistency Spectrum
Traditional databases typically offer only two options: strong consistency (relational databases with synchronous replication) or eventual consistency (most NoSQL databases). Cosmos DB fills in the spectrum with three additional levels between these extremes, giving you fine-grained control over the consistency-performance trade-off.
| Level | Guarantee | Read Latency | Write Latency | Throughput | Read RU Cost |
|---|---|---|---|---|---|
| Strong | Linearizable (always latest committed write) | Highest | Highest | Lowest | 2x |
| Bounded Staleness | Reads lag by at most K versions or T time | High | High | Low | 2x |
| Session | Read-your-own-writes within a session | Low | Low | High | 1x |
| Consistent Prefix | Reads never see out-of-order writes | Low | Low | High | 1x |
| Eventual | No ordering guarantee; convergence over time | Lowest | Lowest | Highest | 1x |
Session Consistency Is the Default
Cosmos DB defaults to Session consistency, and Microsoft recommends it as the starting point for most applications. Session consistency provides read-your-own-writes within a client session (using a session token), which is what most applications actually need. It keeps latency low, throughput high, and reads cost 1x RU, half the cost of Strong or Bounded Staleness reads. Unless you have a specific requirement for stronger or weaker guarantees, start with Session.
Strong Consistency
Strong consistency provides linearizable reads. A read is guaranteed to return the most recent committed write. This is the equivalent of a single-region, single-replica database in terms of data freshness: readers always see the latest state, regardless of which replica they connect to.
How Strong Consistency Works
When you read with Strong consistency, Cosmos DB contacts a quorum of replicas to determine the latest committed version before returning the result. This quorum check adds latency and consumes additional RUs, but guarantees that you never read stale data.
Trade-offs and Constraints
- 2x RU cost for reads: Each read operation consumes twice the RUs compared to Session, Consistent Prefix, or Eventual consistency. For read-heavy workloads, this doubles your throughput cost.
- Higher write latency: Writes must be replicated to a quorum of replicas before being acknowledged, increasing write latency compared to weaker consistency levels.
- Not available with multi-region writes: Strong consistency is only available with single-region write configurations. If you enable multi-region writes, the strongest available option is Bounded Staleness.
- Reduced availability during outages: Quorum reads may fail during regional outages because the required number of replicas may not be reachable.
When to Use Strong Consistency
- Financial transactions where reading stale data could cause incorrect calculations
- Inventory management where overselling must be prevented
- Leader election or distributed locking scenarios
- Any application where correctness is more important than performance
# Set account-level consistency to Strong
az cosmosdb update \
--name mycosmosdb \
--resource-group myRG \
--default-consistency-level Strong
# Verify the consistency configuration
az cosmosdb show \
--name mycosmosdb \
--resource-group myRG \
--query '{ConsistencyLevel:consistencyPolicy.defaultConsistencyLevel, MaxStalenessPrefix:consistencyPolicy.maxStalenessPrefix, MaxIntervalInSeconds:consistencyPolicy.maxIntervalInSeconds}' \
--output table
# Note: Strong consistency requires single-region writes
# Verify write region configuration
az cosmosdb show \
--name mycosmosdb \
--resource-group myRG \
--query '{EnableMultipleWriteLocations:enableMultipleWriteLocations, WriteLocations:writeLocations[].locationName, ReadLocations:readLocations[].locationName}' \
--output jsonBounded Staleness
Bounded Staleness guarantees that reads lag behind writes by at most K versions (operations) or T time interval, whichever is reached first. Within the write region, Bounded Staleness behaves identically to Strong consistency. In secondary (read) regions, the staleness bound applies, meaning reads may be up to K versions or T seconds behind the latest write.
Configuring Staleness Bounds
The staleness bound is configured with two parameters:
- maxStalenessPrefix (K): Maximum number of version operations that reads can lag behind. For multi-region accounts, Microsoft recommends at least 100,000.
- maxIntervalInSeconds (T): Maximum time in seconds that reads can lag. For multi-region accounts, Microsoft recommends at least 300 seconds (5 minutes).
# Set consistency to Bounded Staleness with recommended multi-region values
az cosmosdb update \
--name mycosmosdb \
--resource-group myRG \
--default-consistency-level BoundedStaleness \
--max-staleness-prefix 100000 \
--max-interval 300
# For single-region accounts, tighter bounds are acceptable:
# --max-staleness-prefix 10 --max-interval 5
# This means reads are at most 10 versions or 5 seconds behind
# Bounded Staleness is the strongest level compatible with multi-region writes
az cosmosdb update \
--name mycosmosdb \
--resource-group myRG \
--enable-multiple-write-locations true \
--default-consistency-level BoundedStalenessBounded Staleness Cost Impact
Like Strong consistency, Bounded Staleness charges 2x RU for reads because Cosmos DB must check version information across replicas. In multi-region deployments with heavy read traffic, this can significantly increase costs. Model your read/write ratio carefully before choosing this level. If you need strong guarantees only within a single user session (which covers most web and mobile applications), Session consistency achieves this at 1x RU cost, a 50% savings on reads.
Session Consistency
Session consistency is the sweet spot for the vast majority of applications. It guarantees that within a single client session (identified by a session token), a client will always see its own writes and all writes that happened before its own writes. Other clients may see slightly stale data, but individual users experience fully consistent behavior within their own session.
Why Session Is Usually the Right Choice
Consider a typical web application: User A updates their profile name. With Session consistency, User A immediately sees their new name on the next page load (read-your-own-writes). User B might see the old name for a brief moment, but this is acceptable in virtually all real-world scenarios. User B doesn't know User A just changed their name and won't notice the brief delay.
How Session Tokens Work
- When a client performs a write, Cosmos DB returns a session token in the
x-ms-session-tokenresponse header. - The client passes this token with subsequent read requests to ensure it reads at least the data from its own writes.
- Azure SDKs (Java, .NET, Python, JavaScript) handle session token management automatically when using a single
CosmosClientinstance. - In distributed architectures where reads and writes may go through different service instances (e.g., a microservices architecture with separate read and write services), you must propagate the session token explicitly between services.
const { CosmosClient } = require("@azure/cosmos");
const client = new CosmosClient({
endpoint: process.env.COSMOS_ENDPOINT,
key: process.env.COSMOS_KEY,
consistencyLevel: "Session"
});
const container = client.database("mydb").container("orders");
// --- Single Service Pattern (automatic token management) ---
// When using a single CosmosClient instance, session tokens
// are managed automatically by the SDK
async function createAndReadOrder() {
// Write an order
const { resource: order } = await container.items.create({
id: "order-123",
customerId: "cust-456",
total: 99.99,
status: "pending"
});
// This read is guaranteed to see the write above
// (SDK passes session token automatically)
const { resource: readOrder } = await container
.item("order-123", "cust-456")
.read();
console.log(readOrder.status); // Guaranteed: "pending"
}
// --- Distributed Service Pattern (manual token propagation) ---
// When reads and writes happen in different service instances,
// pass the session token explicitly
async function writeOrder(orderData) {
const { resource, headers } = await container.items.create(orderData);
const sessionToken = headers["x-ms-session-token"];
// Return the session token to the caller (e.g., via HTTP header)
return { order: resource, sessionToken };
}
async function readOrder(orderId, partitionKey, sessionToken) {
// Pass the session token from the write service
const { resource } = await container
.item(orderId, partitionKey)
.read({ sessionToken });
return resource; // Guaranteed to see the write
}const express = require("express");
const { CosmosClient } = require("@azure/cosmos");
const app = express();
const client = new CosmosClient({ /* config */ });
const container = client.database("mydb").container("orders");
// POST /orders - Create an order and return session token
app.post("/orders", async (req, res) => {
const { resource, headers } = await container.items.create(req.body);
// Pass session token back to the client via response header
res.set("x-cosmos-session-token", headers["x-ms-session-token"]);
res.json(resource);
});
// GET /orders/:id - Read with optional session token for consistency
app.get("/orders/:id", async (req, res) => {
const options = {};
// If client passes a session token, use it for read-your-own-writes
const sessionToken = req.headers["x-cosmos-session-token"];
if (sessionToken) {
options.sessionToken = sessionToken;
}
const { resource } = await container
.item(req.params.id, req.query.partitionKey)
.read(options);
res.json(resource);
});Session Token in Web Applications
In a typical web application with a single backend, the Cosmos DB SDK manages session tokens automatically, so no extra code is needed. You only need manual session token propagation in specific architectures: CQRS patterns with separate read/write services, serverless functions where each invocation may use a different SDK instance, or load-balanced services where requests from the same user may hit different backends. In these cases, pass the session token through HTTP headers or a shared cache (Redis).
Consistent Prefix
Consistent Prefix guarantees that reads never see out-of-order writes. If writes occur in the order A, B, C, a reader will see one of: nothing, A, A+B, or A+B+C. It will never see A+C (skipping B) or B+A (reordered). This is a weaker guarantee than Session (no read-your-own-writes) but stronger than Eventual (which has no ordering guarantee).
Use Cases for Consistent Prefix
- Activity feeds and timelines: Social media feeds where the order of posts matters but a slight delay in seeing the latest post is acceptable.
- Event sourcing: Event streams where consumers must process events in order but do not need to see the latest event immediately.
- Change data capture: Systems that read change feeds for data synchronization where ordering is essential but real-time freshness is not.
- Multi-step workflows: Where workflow state transitions must be observed in order (e.g., Pending → Processing → Completed, never Pending → Completed).
Eventual Consistency
Eventual consistency provides no ordering guarantees whatsoever. Replicas will eventually converge to the same state, but a read at any given moment may return any previously acknowledged write, possibly an older version. This level offers the lowest latency and highest throughput because reads can be served from any replica without checking version information.
When Eventual Is Appropriate
- Counters and aggregates: View counts, like counts, and statistics where approximate values are acceptable
- Non-critical telemetry: IoT sensor readings, application metrics, and log data
- Social media interactions: Likes, reactions, and shares where slight inconsistency is invisible to users
- Search indexes: Full-text search or recommendation systems that tolerate brief staleness
- Analytics queries: Analytical workloads that aggregate large datasets where individual record freshness is less important
Eventual Does Not Mean Slow
In practice, eventual consistency converges very quickly, typically within milliseconds for single-region accounts and within a few seconds for multi-region accounts. The term “eventual” refers to the guarantee (or lack thereof), not the actual time it takes. For many read-heavy workloads, Eventual consistency provides effectively the same user experience as Session while maximizing throughput and minimizing cost.
Per-Request Consistency Overrides
A powerful but often overlooked feature is the ability to override the default consistency level on individual read requests. The account-level setting is the default, but you canweaken it (never strengthen it) per operation. This enables scenarios where different operations within the same application use different consistency levels based on their specific requirements.
// Account default: Session consistency
// Critical operation: Use account default (Session)
// User reads their own order - needs read-your-own-writes
var orderResponse = await container.ReadItemAsync<Order>(
orderId,
new PartitionKey(customerId)
// No override - uses Session default
);
// Analytics query: Override to Eventual for better throughput
var analyticsOptions = new QueryRequestOptions
{
ConsistencyLevel = ConsistencyLevel.Eventual,
MaxItemCount = 100
};
var analyticsQuery = container.GetItemQueryIterator<PageView>(
"SELECT * FROM c WHERE c.type = 'pageview' AND c.timestamp > @since",
requestOptions: analyticsOptions
);
// Reads at 1x RU cost with maximum throughput
// Feed display: Override to Consistent Prefix
var feedOptions = new QueryRequestOptions
{
ConsistencyLevel = ConsistencyLevel.ConsistentPrefix,
MaxItemCount = 50
};
var feedQuery = container.GetItemQueryIterator<FeedItem>(
"SELECT * FROM c WHERE c.feedId = @feedId ORDER BY c.timestamp DESC",
requestOptions: feedOptions
);
// Guarantees in-order reads without per-session tracking| Account Default | Can Override To | Cannot Override To |
|---|---|---|
| Strong | Bounded Staleness, Session, Consistent Prefix, Eventual | N/A (strongest level) |
| Bounded Staleness | Session, Consistent Prefix, Eventual | Strong |
| Session | Consistent Prefix, Eventual | Strong, Bounded Staleness |
| Consistent Prefix | Eventual | Strong, Bounded Staleness, Session |
| Eventual | N/A (weakest level) | Strong, Bounded Staleness, Session, Consistent Prefix |
Cost Optimization with Per-Request Overrides
Set your account default to Session (which covers most needs) and use Eventual or Consistent Prefix for specific read operations that do not need session guarantees. This is particularly effective for analytics queries, dashboard displays, and reporting workloads where slightly stale data is acceptable. The 2x RU penalty for Strong and Bounded Staleness reads makes per-request overrides a significant cost optimization lever for mixed workloads.
Consistency and Global Distribution
Consistency levels become especially important in globally distributed Cosmos DB accounts where data must be replicated across regions. The interaction between consistency choice and multi-region configuration directly affects availability SLAs, write latency, and cost.
Multi-Region Write Compatibility
| Consistency Level | Single-Region Write | Multi-Region Write | Conflict Resolution Needed |
|---|---|---|---|
| Strong | Yes | No (not supported) | N/A |
| Bounded Staleness | Yes | Yes (strongest option) | Yes (LWW or custom) |
| Session | Yes | Yes | Yes (LWW or custom) |
| Consistent Prefix | Yes | Yes | Yes (LWW or custom) |
| Eventual | Yes | Yes | Yes (LWW or custom) |
# Create a multi-region Cosmos DB account with Session consistency
az cosmosdb create \
--name mycosmosdb-global \
--resource-group myRG \
--default-consistency-level Session \
--locations regionName=eastus2 failoverPriority=0 isZoneRedundant=true \
--locations regionName=westeurope failoverPriority=1 isZoneRedundant=true \
--locations regionName=southeastasia failoverPriority=2 isZoneRedundant=true
# Enable multi-region writes (with conflict resolution)
az cosmosdb update \
--name mycosmosdb-global \
--resource-group myRG \
--enable-multiple-write-locations true
# Create a container with Last Write Wins conflict resolution
az cosmosdb sql container create \
--account-name mycosmosdb-global \
--database-name mydb \
--name orders \
--partition-key-path "/customerId" \
--throughput 4000 \
--conflict-resolution-policy '{"mode": "LastWriterWins", "conflictResolutionPath": "/_ts"}'
# Enable autoscale throughput for variable workloads
az cosmosdb sql container throughput migrate \
--account-name mycosmosdb-global \
--database-name mydb \
--name orders \
--throughput-type autoscaleChoosing the Right Consistency Level
| Scenario | Recommended Level | Reasoning |
|---|---|---|
| Financial transactions, inventory | Strong | Cannot tolerate stale reads; correctness is critical |
| Multi-region with strong guarantees | Bounded Staleness | Strongest option compatible with multi-region writes |
| User-facing web/mobile apps | Session | Users see their own changes; best cost/consistency balance |
| Event streams, activity feeds | Consistent Prefix | Order matters; slight delay is acceptable |
| Telemetry, analytics, social likes | Eventual | Stale reads are fine; maximize throughput and minimize cost |
| Mixed workload (API + analytics) | Session (with per-request overrides) | Session for user operations; Eventual for analytics queries |
| Gaming leaderboards | Eventual | Approximate scores are acceptable; high write throughput needed |
| E-commerce product catalog | Session | Sellers see their updates; buyers can tolerate brief delay |
Request Units (RU) and Cost Impact
The consistency level you choose directly impacts your Cosmos DB bill because Strong and Bounded Staleness reads consume 2x RUs compared to Session, Consistent Prefix, and Eventual reads. For read-heavy workloads, this cost difference is substantial.
Scenario: E-commerce application
- 10 million reads per day (1 KB average document)
- 1 million writes per day
- Read RU cost per operation: ~1 RU (for 1 KB document)
- Write RU cost per operation: ~6 RU (for 1 KB document)
Strong/Bounded Staleness reads:
Reads: 10M x 2 RU = 20M RU/day
Writes: 1M x 6 RU = 6M RU/day
Total: 26M RU/day = ~300 RU/s average
Session/Consistent Prefix/Eventual reads:
Reads: 10M x 1 RU = 10M RU/day
Writes: 1M x 6 RU = 6M RU/day
Total: 16M RU/day = ~185 RU/s average
Cost difference: ~38% reduction in provisioned RU/s
At $0.008 per 100 RU/s per hour:
Strong: 300 RU/s = ~$17.50/day = ~$525/month
Session: 185 RU/s = ~$10.80/day = ~$324/month
Savings: ~$201/month (38%)
For applications with higher read-to-write ratios, the savings
from Session vs Strong consistency are even more dramatic.Autoscale Provisioned Throughput
Use autoscale throughput instead of manual provisioned throughput for workloads with variable traffic patterns. Autoscale automatically adjusts between 10% and 100% of your maximum configured RU/s, and you only pay for the throughput actually consumed. This is particularly effective when combined with the lower RU cost of Session or Eventual consistency, as your effective cost scales down proportionally during low-traffic periods.
Key Takeaways
- 1Cosmos DB offers five consistency levels: Strong, Bounded Staleness, Session, Consistent Prefix, and Eventual.
- 2Session consistency is the default and most popular, guaranteeing read-your-writes per session.
- 3Stronger consistency reduces availability and increases latency; weaker consistency improves both.
- 4Bounded Staleness provides strong consistency guarantees with configurable lag tolerance.
- 5Choose consistency based on application requirements, not as a global database setting.
- 6Multi-region writes require Eventual or Session consistency; Strong requires single-write region.
Frequently Asked Questions
What are the five Cosmos DB consistency levels?
Which consistency level should I choose?
Does consistency level affect Cosmos DB pricing?
Can I use different consistency levels for different queries?
What is the difference between Session and Eventual consistency?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.