DynamoDB Data Modeling
Design efficient DynamoDB tables with single-table design, GSIs, and access patterns.
Prerequisites
- Understanding of NoSQL database concepts
- AWS account with DynamoDB access
- Knowledge of access patterns for your application
DynamoDB Is Not a Relational Database
The most critical shift when working with DynamoDB is abandoning relational database thinking. In relational databases, you normalize data into tables, define relationships with foreign keys, and use JOINs at query time to combine data from multiple tables. In DynamoDB, there are no JOINs. There are no foreign keys. There is no query optimizer that figures out the best execution plan. You must model your data around your access patterns, denormalizing and pre-joining data at write time to support efficient reads.
DynamoDB guarantees single-digit millisecond performance at any scale, whether your table has 1 GB or 100 TB of data, whether you do 10 or 10 million requests per second. But this guarantee only holds if your data model is designed correctly. A poorly designed DynamoDB table can be slower, more expensive, and harder to work with than a relational database. The investment you make in upfront data modeling pays dividends in performance, cost, and operational simplicity at scale.
This guide covers the core data modeling techniques that make DynamoDB powerful: primary key design, single-table patterns, Global Secondary Indexes, and advanced patterns for complex access requirements.
Design for Access Patterns First
Before creating a DynamoDB table, write down every access pattern your application needs. Include read and write patterns, expected item sizes, query frequencies, and consistency requirements. Your table design, key schema, and index strategy should all be driven by these access patterns, not by the shape of your entities. The single biggest mistake teams make with DynamoDB is designing the table first and discovering access pattern limitations later.
Primary Key Design
Every DynamoDB table requires a primary key. You choose between two types:
- Simple primary key (partition key only): Used when items are uniquely identified by a single attribute. Good for key-value lookups where you always know the exact key.
- Composite primary key (partition key + sort key): Used when you need to model one-to-many relationships or query items within a partition. The partition key groups related items, and the sort key enables range queries within the group.
The partition key determines which physical partition stores your data. DynamoDB distributes data across partitions based on a hash of the partition key. An even distribution across partitions is critical for performance; a “hot partition” that receives disproportionate traffic can throttle your entire table.
Partition Key Selection Criteria
| Criteria | Good Partition Key | Bad Partition Key |
|---|---|---|
| Cardinality | High (millions of unique values) | Low (few distinct values like status or type) |
| Distribution | Even traffic across values | Skewed (one value gets 90% of traffic) |
| Access pattern | Known at query time | Requires scanning to find |
Composite Key Power
The sort key is what makes DynamoDB data modeling powerful. Within a partition, items are stored sorted by the sort key. This enables efficient range queries using comparison operators: begins_with, between,>, <, and =. By carefully designing your sort key, you can support multiple access patterns from a single table design.
{
"TableName": "ECommerceApp",
"KeySchema": [
{ "AttributeName": "PK", "KeyType": "HASH" },
{ "AttributeName": "SK", "KeyType": "RANGE" }
],
"AttributeDefinitions": [
{ "AttributeName": "PK", "AttributeType": "S" },
{ "AttributeName": "SK", "AttributeType": "S" },
{ "AttributeName": "GSI1PK", "AttributeType": "S" },
{ "AttributeName": "GSI1SK", "AttributeType": "S" },
{ "AttributeName": "GSI2PK", "AttributeType": "S" },
{ "AttributeName": "GSI2SK", "AttributeType": "S" }
],
"GlobalSecondaryIndexes": [
{
"IndexName": "GSI1",
"KeySchema": [
{ "AttributeName": "GSI1PK", "KeyType": "HASH" },
{ "AttributeName": "GSI1SK", "KeyType": "RANGE" }
],
"Projection": { "ProjectionType": "ALL" }
},
{
"IndexName": "GSI2",
"KeySchema": [
{ "AttributeName": "GSI2PK", "KeyType": "HASH" },
{ "AttributeName": "GSI2SK", "KeyType": "RANGE" }
],
"Projection": { "ProjectionType": "KEYS_ONLY" }
}
],
"BillingMode": "PAY_PER_REQUEST"
}Single-Table Design
Single-table design is the practice of storing multiple entity types in one DynamoDB table. This is the recommended pattern for most applications because it allows you to fetch related data in a single Query operation, reducing latency and cost compared to making multiple GetItem calls across separate tables.
The technique relies on overloading the partition key and sort key with generic names (PK and SK) and storing different entity types with different key prefixes. The prefix identifies the entity type, and the value after the prefix provides the unique identifier or relationship context.
E-Commerce Example
| PK | SK | Entity | Key Attributes |
|---|---|---|---|
| USER#u123 | PROFILE | User Profile | name, email, createdAt |
| USER#u123 | ORDER#2024-001 | Order (user view) | total, status, orderDate |
| USER#u123 | ORDER#2024-002 | Order (user view) | total, status, orderDate |
| ORDER#2024-001 | META | Order Details | shippingAddr, paymentMethod |
| ORDER#2024-001 | ITEM#prod-a | Order Line Item | quantity, price, productName |
| ORDER#2024-001 | ITEM#prod-b | Order Line Item | quantity, price, productName |
| PRODUCT#prod-a | DETAILS | Product | name, price, category, inventory |
With this design, you can satisfy multiple access patterns with simple, efficient queries:
- Get user profile: GetItem: PK=USER#u123, SK=PROFILE
- List user's orders: Query: PK=USER#u123, SK begins_with “ORDER#”
- Get order with all items: Query: PK=ORDER#2024-001 (returns meta + all items)
- Get user profile + recent orders: Query: PK=USER#u123 (returns profile + all orders in one call)
- Get specific order for user: GetItem: PK=USER#u123, SK=ORDER#2024-001
import boto3
from boto3.dynamodb.conditions import Key
table = boto3.resource('dynamodb').Table('ECommerceApp')
# Access Pattern 1: Get user profile (single item)
profile = table.get_item(
Key={'PK': 'USER#u123', 'SK': 'PROFILE'}
)['Item']
# Access Pattern 2: List all orders for a user (newest first)
orders = table.query(
KeyConditionExpression=(
Key('PK').eq('USER#u123') &
Key('SK').begins_with('ORDER#')
),
ScanIndexForward=False, # newest first (reverse sort)
Limit=10 # pagination
)['Items']
# Access Pattern 3: Get order with all line items
order_details = table.query(
KeyConditionExpression=Key('PK').eq('ORDER#2024-001')
)['Items']
# Returns: [order_meta, item_prod_a, item_prod_b]
# Access Pattern 4: Get user profile AND orders in one query
user_data = table.query(
KeyConditionExpression=Key('PK').eq('USER#u123')
)['Items']
# Returns: [profile, order_1, order_2, ...]
# Separate in application code by SK prefix
# Access Pattern 5: Paginated queries with ExclusiveStartKey
page_1 = table.query(
KeyConditionExpression=(
Key('PK').eq('USER#u123') &
Key('SK').begins_with('ORDER#')
),
Limit=5
)
# If more pages exist:
if 'LastEvaluatedKey' in page_1:
page_2 = table.query(
KeyConditionExpression=(
Key('PK').eq('USER#u123') &
Key('SK').begins_with('ORDER#')
),
Limit=5,
ExclusiveStartKey=page_1['LastEvaluatedKey']
)When to Use Multiple Tables
Single-table design is not always the best approach. Use separate tables when entities have vastly different access patterns (one needs strong consistency, the other needs eventual), different throughput requirements (one is write-heavy, the other is read-heavy), different TTL needs, or different backup requirements. Microservices should generally own their own tables to maintain service independence. Start with single-table design for related entities within a bounded context.
Global Secondary Indexes (GSIs)
Global Secondary Indexes provide alternate query patterns by projecting your data with a different key schema. Think of a GSI as an automatically maintained, eventually consistent copy of your table with different partition and sort keys. DynamoDB handles the replication from the base table to the GSI automatically.
In single-table design, you often use “overloaded” GSIs where the GSI key attributes hold different values depending on the entity type. This allows a single GSI to support multiple access patterns across different entity types.
GSI Overloading Pattern
| PK | SK | GSI1PK | GSI1SK | Entity |
|---|---|---|---|---|
| USER#u123 | ORDER#2024-001 | STATUS#PENDING | 2024-01-15 | Order |
| USER#u456 | ORDER#2024-002 | STATUS#SHIPPED | 2024-01-14 | Order |
| PRODUCT#prod-a | DETAILS | CATEGORY#electronics | PRODUCT#prod-a | Product |
| USER#u123 | PROFILE | EMAIL#user@example.com | USER#u123 | User |
This overloaded GSI enables additional access patterns:
- List pending orders sorted by date: Query GSI1: GSI1PK=STATUS#PENDING
- List products in a category: Query GSI1: GSI1PK=CATEGORY#electronics
- Look up user by email: Query GSI1: GSI1PK=EMAIL#user@example.com
GSI Design Best Practices
| Practice | Why |
|---|---|
| Use on-demand billing | GSI throughput scales independently; on-demand avoids throttling |
| Project only needed attributes | Reduces GSI storage costs and replication overhead |
| Use sparse indexes | Only items with the GSI key appear; efficient filtered views |
| Monitor GSI throttling | GSI throttling back-pressures base table writes |
| Limit to 5 GSIs maximum | Each GSI adds write cost; more than 5 usually indicates design issues |
GSI Throttling Can Block Writes
GSIs have their own throughput capacity separate from the base table. If a GSI is throttled (in provisioned mode) or exceeds its burst capacity, writes to the base table that would propagate to that throttled GSI are also throttled. This means a poorly designed GSI can slow down your entire application. Always use on-demand billing for tables with GSIs unless you have very predictable and stable traffic. Monitor the ThrottledRequests metric on each GSI independently.
Transactions and Conditional Writes
DynamoDB supports ACID transactions across up to 100 items in a single transaction. Transactions ensure that a group of writes either all succeed or all fail, providing consistency guarantees similar to relational databases for operations that must be atomic.
import time
import boto3
client = boto3.client('dynamodb')
# Transactional write: create order AND update inventory atomically
def place_order(user_id, order_id, product_id, quantity, price):
client.transact_write_items(
TransactItems=[
# Create the order record
{
'Put': {
'TableName': 'ECommerceApp',
'Item': {
'PK': {'S': f'USER#{user_id}'},
'SK': {'S': f'ORDER#{order_id}'},
'total': {'N': str(price * quantity)},
'status': {'S': 'PENDING'},
'orderDate': {'S': time.strftime('%Y-%m-%d')},
'GSI1PK': {'S': 'STATUS#PENDING'},
'GSI1SK': {'S': time.strftime('%Y-%m-%d')},
},
# Ensure order doesn't already exist
'ConditionExpression': 'attribute_not_exists(PK)',
}
},
# Create the line item
{
'Put': {
'TableName': 'ECommerceApp',
'Item': {
'PK': {'S': f'ORDER#{order_id}'},
'SK': {'S': f'ITEM#{product_id}'},
'quantity': {'N': str(quantity)},
'price': {'N': str(price)},
}
}
},
# Decrement inventory (with check)
{
'Update': {
'TableName': 'ECommerceApp',
'Key': {
'PK': {'S': f'PRODUCT#{product_id}'},
'SK': {'S': 'DETAILS'},
},
'UpdateExpression': 'SET inventory = inventory - :qty',
'ConditionExpression': 'inventory >= :qty',
'ExpressionAttributeValues': {
':qty': {'N': str(quantity)},
},
}
}
]
)
# If inventory check fails, the entire transaction rolls back
# No order is created and no inventory is decrementedConditional Writes for Optimistic Locking
Conditional writes allow you to specify conditions that must be true for a write to succeed. This is the DynamoDB equivalent of optimistic locking and is essential for preventing lost updates in concurrent environments.
from boto3.dynamodb.conditions import Attr
# Optimistic locking with version number
def update_product_price(product_id, new_price, expected_version):
try:
table.update_item(
Key={'PK': f'PRODUCT#{product_id}', 'SK': 'DETAILS'},
UpdateExpression='SET price = :price, version = :new_ver',
ConditionExpression=Attr('version').eq(expected_version),
ExpressionAttributeValues={
':price': new_price,
':new_ver': expected_version + 1,
},
ReturnValues='ALL_NEW',
)
except table.meta.client.exceptions.ConditionalCheckFailedException:
# Another process modified the item; re-read and retry
raise ConcurrentModificationError(
f'Product {product_id} was modified by another process'
)Advanced Patterns
Several advanced patterns extend the power of DynamoDB data modeling for complex access requirements that cannot be met with basic primary key and GSI designs.
Sparse Indexes
Sparse indexes take advantage of the fact that only items with the GSI key attributes appear in the index. If you only populate the GSI key on certain items, you create an efficient filtered view. For example, adding a GSI2PK attribute only to orders with status “FLAGGED” creates a GSI that contains only flagged orders, making them efficient to query without scanning all orders.
Composite Sort Keys
Combine multiple attributes in the sort key to enable multi-condition queries:
# Composite sort key: STATUS#date enables filtering by both
# SK = "STATUS#SHIPPED#2024-01-15"
# SK = "STATUS#PENDING#2024-01-16"
# SK = "STATUS#DELIVERED#2024-01-10"
# Query all shipped orders in January 2024
shipped_jan = table.query(
KeyConditionExpression=(
Key('PK').eq('ORDERS_BY_STATUS') &
Key('SK').between('STATUS#SHIPPED#2024-01-01', 'STATUS#SHIPPED#2024-01-31')
)
)
# Query all pending orders (any date)
all_pending = table.query(
KeyConditionExpression=(
Key('PK').eq('ORDERS_BY_STATUS') &
Key('SK').begins_with('STATUS#PENDING#')
)
)Write Sharding for Hot Partitions
For high-write partition keys (like counters or aggregates), append a random suffix to distribute writes across multiple partitions, then use scatter-gather reads to aggregate the results.
import random
SHARD_COUNT = 10
def increment_page_view(page_id):
"""Distribute writes across 10 shards to avoid hot partition."""
shard = random.randint(0, SHARD_COUNT - 1)
table.update_item(
Key={
'PK': f'PAGEVIEW#{page_id}',
'SK': f'SHARD#{shard}',
},
UpdateExpression='ADD view_count :inc',
ExpressionAttributeValues={':inc': 1},
)
def get_page_views(page_id):
"""Scatter-gather read across all shards."""
result = table.query(
KeyConditionExpression=(
Key('PK').eq(f'PAGEVIEW#{page_id}') &
Key('SK').begins_with('SHARD#')
)
)
return sum(item.get('view_count', 0) for item in result['Items'])TTL for Temporal Data
DynamoDB TTL automatically deletes items when a specified Unix timestamp attribute expires. This is essential for managing temporal data like sessions, temporary tokens, event logs, and cache entries without manual cleanup.
import time
# Create a session with automatic 1-hour expiration
table.put_item(
Item={
'PK': 'SESSION#abc123',
'SK': 'METADATA',
'userId': 'USER#u123',
'createdAt': int(time.time()),
'lastActivity': int(time.time()),
'ttl': int(time.time()) + 3600, # expires in 1 hour
}
)
# Create a temporary invite with 7-day expiration
table.put_item(
Item={
'PK': 'INVITE#team-alpha',
'SK': 'USER#u456',
'invitedBy': 'USER#u123',
'role': 'member',
'ttl': int(time.time()) + (7 * 86400), # 7 days
}
)
# Note: DynamoDB TTL deletion is eventually consistent
# Items may persist up to 48 hours after expiration
# Always filter expired items in application code:
# if item['ttl'] < current_time: skipTTL Deletions Are Free
DynamoDB does not charge for TTL deletions. This makes TTL an excellent mechanism for managing data lifecycle costs. Use it for session data, event logs, temporary tokens, and any data that loses value after a period of time. Note that TTL deletions are eventually consistent; items may appear in queries for up to 48 hours after their TTL expires. Always include a filter in your application code to exclude expired items.
Capacity Planning and Cost Optimization
DynamoDB offers two billing modes: on-demand (pay-per-request) and provisioned (pre-allocated throughput with optional auto-scaling). Choosing the right mode significantly impacts your costs.
| Feature | On-Demand | Provisioned |
|---|---|---|
| Cost per write (1 KB) | $1.25 per million | ~$0.65 per million (with auto-scaling) |
| Cost per read (4 KB) | $0.25 per million | ~$0.13 per million (with auto-scaling) |
| Burst handling | Automatic (up to 2x previous peak) | Burst credits (300 seconds of full capacity) |
| Scaling speed | Instant (within limits) | Minutes (auto-scaling reacts to metrics) |
| Best for | Unpredictable, spiky, or new workloads | Stable, predictable workloads (40%+ cheaper) |
# Register table for auto-scaling (provisioned mode)
aws application-autoscaling register-scalable-target \
--service-namespace dynamodb \
--resource-id "table/ECommerceApp" \
--scalable-dimension "dynamodb:table:WriteCapacityUnits" \
--min-capacity 5 \
--max-capacity 1000
# Set target tracking policy (target 70% utilization)
aws application-autoscaling put-scaling-policy \
--service-namespace dynamodb \
--resource-id "table/ECommerceApp" \
--scalable-dimension "dynamodb:table:WriteCapacityUnits" \
--policy-name "WriteAutoScaling" \
--policy-type "TargetTrackingScaling" \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "DynamoDBWriteCapacityUtilization"
},
"ScaleInCooldown": 60,
"ScaleOutCooldown": 60
}'
# Check current consumed capacity
aws dynamodb describe-table \
--table-name ECommerceApp \
--query 'Table.{
Status: TableStatus,
ItemCount: ItemCount,
Size: TableSizeBytes,
BillingMode: BillingModeSummary.BillingMode,
ReadCapacity: ProvisionedThroughput.ReadCapacityUnits,
WriteCapacity: ProvisionedThroughput.WriteCapacityUnits
}'DynamoDB Streams and Change Data Capture
DynamoDB Streams captures a time-ordered sequence of item-level modifications in a table and stores them for 24 hours. This enables event-driven architectures where changes to your DynamoDB data trigger downstream processing, such as updating search indexes, sending notifications, maintaining aggregates, or replicating data to other systems.
import json
from aws_lambda_powertools import Logger, Tracer
from aws_lambda_powertools.utilities.data_classes import DynamoDBStreamEvent
from aws_lambda_powertools.utilities.data_classes.dynamo_db_stream_event import (
DynamoDBRecord
)
logger = Logger()
tracer = Tracer()
@tracer.capture_lambda_handler
@logger.inject_lambda_context
def handler(event_dict, context):
event = DynamoDBStreamEvent(event_dict)
for record in event.records:
process_record(record)
def process_record(record: DynamoDBRecord):
if record.event_name == 'INSERT':
new_item = record.dynamodb.new_image
entity_type = new_item.get('SK', {}).get('S', '')
if entity_type.startswith('ORDER#'):
# New order created - send confirmation email
logger.info('New order', order_pk=new_item['PK']['S'])
send_order_confirmation(new_item)
elif record.event_name == 'MODIFY':
old_item = record.dynamodb.old_image
new_item = record.dynamodb.new_image
old_status = old_item.get('status', {}).get('S')
new_status = new_item.get('status', {}).get('S')
if old_status != new_status:
logger.info('Status changed',
old_status=old_status,
new_status=new_status
)
handle_status_change(new_item, old_status, new_status)Streams for Search Integration
A common pattern is to use DynamoDB Streams with Lambda to sync data to OpenSearch for full-text search. DynamoDB handles the primary data storage with fast key-value access, while OpenSearch provides the search capabilities that DynamoDB lacks. This gives you the best of both worlds: DynamoDB's performance and scalability for transactional operations, and OpenSearch's query flexibility for search and analytics.
Backup, Restore, and Global Tables
DynamoDB provides multiple mechanisms for data protection and multi-region availability.
Point-in-Time Recovery (PITR)
PITR provides continuous backups of your table data, allowing you to restore to any point in time within the last 35 days. Unlike on-demand backups, PITR requires no scheduling and provides second-level granularity. Enable PITR on all production tables.
Global Tables
DynamoDB Global Tables provide multi-region, multi-active replication. Writes to any region are automatically replicated to all other regions within seconds. This enables low-latency access for globally distributed applications and provides automatic failover if a region becomes unavailable.
# Create a Global Table (table must already exist in primary region)
# First, enable DynamoDB Streams
aws dynamodb update-table \
--table-name ECommerceApp \
--stream-specification StreamEnabled=true,StreamViewType=NEW_AND_OLD_IMAGES
# Create replica in another region
aws dynamodb update-table \
--table-name ECommerceApp \
--replica-updates '[
{
"Create": {
"RegionName": "eu-west-1"
}
}
]'
# Enable PITR on the table (recommended for all production tables)
aws dynamodb update-continuous-backups \
--table-name ECommerceApp \
--point-in-time-recovery-specification PointInTimeRecoveryEnabled=trueCommon Modeling Mistakes
Avoid these common DynamoDB data modeling mistakes that lead to poor performance, high costs, or architectural dead ends:
- Designing the table before defining access patterns. Always define access patterns first. The table schema should be derived from the access patterns, not the other way around.
- Using Scan operations in production. Scan reads every item in the table and is extremely expensive. If you find yourself needing Scan, your data model is missing an index.
- Hot partition keys. Using a low-cardinality attribute (like “status” or “date”) as a partition key concentrates traffic on a few partitions. Use high-cardinality attributes like user ID, device ID, or order ID.
- Large items. DynamoDB has a 400 KB item size limit and charges by the KB. Store large objects in S3 and put the S3 key in the DynamoDB item.
- Too many GSIs. Each GSI duplicates writes. Five GSIs means six writes for every base table write. Redesign your single-table layout before adding more GSIs.
- Not using batch operations. Use BatchGetItem and BatchWriteItem to reduce the number of API calls. Each batch can include up to 25 items (write) or 100 items (read).
Key Takeaways
Define all access patterns before designing your table. Use composite primary keys for one-to-many relationships. Single-table design reduces query latency by co-locating related data in the same partition. Use GSIs for alternate access patterns, but limit to 5 or fewer. Always use on-demand billing for unpredictable workloads and provisioned with auto-scaling for stable workloads (40% cheaper). Use transactions for operations that must be atomic across items. Leverage TTL for automatic cleanup of temporal data (free). Enable PITR on all production tables. DynamoDB rewards upfront design investment with consistent single-digit millisecond performance at any scale.
Key Takeaways
- 1Design DynamoDB tables around access patterns, not entity relationships.
- 2Single-table design reduces the number of tables and enables transactional operations.
- 3Partition keys determine data distribution, so choose high-cardinality keys.
- 4GSIs enable additional query patterns but consume separate throughput.
- 5Use sparse indexes and composite sort keys for efficient querying.
- 6On-demand capacity is best for unpredictable traffic; provisioned with auto-scaling for steady workloads.
Frequently Asked Questions
What is single-table design in DynamoDB?
How do I choose a good partition key?
When should I use a Global Secondary Index?
What is the difference between on-demand and provisioned capacity?
How do I handle many-to-many relationships in DynamoDB?
What are DynamoDB best practices for cost optimization?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.