ElastiCache: Redis vs Memcached
Choose between Redis and Memcached on AWS ElastiCache, covering caching strategies, cluster architecture, replication, and performance tuning.
Prerequisites
- Basic understanding of caching concepts
- Familiarity with AWS VPC networking
- Experience with application development in any language
Introduction to ElastiCache
Amazon ElastiCache is a fully managed in-memory data store and cache service that supports two popular engines: Redis and Memcached. In-memory caching is one of the most effective strategies for improving application performance. Instead of hitting your database for every request, you store frequently accessed data in memory where it can be retrieved in sub-millisecond time. ElastiCache handles the operational complexity of deploying, managing, and scaling in-memory caches, including patching, monitoring, failover, and backup.
Caching addresses a fundamental tension in application architecture: databases are optimized for durability and complex queries, but they are slow relative to memory access. A typical DynamoDB read takes 1-10 milliseconds, a PostgreSQL query takes 5-100 milliseconds, and a cross-region API call takes 50-300 milliseconds. An ElastiCache read takes 0.1-0.5 milliseconds, 10-1000x faster. For read-heavy workloads (which most applications are), adding a caching layer can dramatically reduce database load, lower latency, and save on database costs.
ElastiCache runs inside your VPC, providing network-level isolation. It integrates with CloudWatch for monitoring, SNS for notifications, and supports encryption at rest and in transit. Both Redis and Memcached support nodes ranging from small (cache.t3.micro) to very large (cache.r7g.16xlarge with 419 GB of memory).
ElastiCache Serverless
In late 2023, AWS launched ElastiCache Serverless for both Redis and Memcached. Serverless ElastiCache automatically scales capacity based on demand, eliminating the need to choose node types, manage clusters, or plan for capacity. You pay per GB of data stored and per ElastiCache Processing Unit (ECPU) consumed. This is the recommended starting point for new workloads unless you need fine-grained control over node placement or specific Redis configurations not supported in serverless mode.
Redis vs Memcached Comparison
Redis and Memcached are both in-memory key-value stores, but they have fundamentally different architectures and capabilities. Redis is a feature-rich data structure store that supports persistence, replication, pub/sub, Lua scripting, and complex data types. Memcached is a simpler, multi-threaded caching layer designed purely for caching with minimal overhead.
| Feature | Redis | Memcached |
|---|---|---|
| Data structures | Strings, hashes, lists, sets, sorted sets, streams, HyperLogLog, bitmaps, geospatial | Strings only (key-value) |
| Persistence | Yes (RDB snapshots, AOF) | No (data lost on restart) |
| Replication | Yes (primary-replica) | No |
| High availability | Yes (Multi-AZ with auto-failover) | No (client-side consistent hashing) |
| Cluster mode | Yes (data sharding across nodes) | Yes (client-side partitioning) |
| Pub/Sub | Yes | No |
| Lua scripting | Yes | No |
| Transactions | Yes (MULTI/EXEC) | No (CAS operations only) |
| Thread model | Single-threaded (I/O threads in Redis 6+) | Multi-threaded |
| Max item size | 512 MB | 1 MB (configurable to 128 MB) |
| Backup/restore | Yes (snapshots to S3) | No |
| Encryption | At rest & in transit | In transit only (TLS) |
| Ideal for | Complex caching, sessions, queues, leaderboards, real-time analytics | Simple caching with flat key-value data |
Choose Redis Unless You Have a Specific Reason Not To
Redis is the recommended choice for almost all use cases. It offers everything Memcached does plus persistence, replication, complex data structures, and high availability. Memcached still has a niche for very simple caching workloads where multi-threading provides a performance advantage on large nodes, or when you need the absolute simplest cache with no operational complexity. For new projects, start with Redis.
ElastiCache Cluster Architecture
ElastiCache Redis supports two deployment modes: cluster mode disabled(a single shard with one primary and up to 5 replicas) and cluster mode enabled (up to 500 shards, each with one primary and up to 5 replicas, for a maximum of 3,000 nodes). Cluster mode provides horizontal scaling by partitioning data across shards using hash slots.
Cluster Mode Disabled
In cluster mode disabled, all data resides on a single primary node (shard). Replicas are read-only copies that provide high availability (automatic failover if the primary fails) and read scaling (your application can direct reads to replicas). This mode is simpler to manage and is sufficient when your dataset fits in a single node's memory (up to ~400 GB with r7g.16xlarge).
Cluster Mode Enabled
In cluster mode enabled, data is partitioned across multiple shards using 16,384 hash slots. Each key is assigned to a hash slot (via CRC16), and each shard owns a range of hash slots. This provides horizontal scaling: you can distribute data and write load across many shards. Online resharding lets you add or remove shards without downtime.
# Create a Redis cluster mode disabled (single shard, 2 replicas)
aws elasticache create-replication-group \
--replication-group-id my-redis-cluster \
--replication-group-description "Production Redis cache" \
--engine redis \
--engine-version 7.1 \
--cache-node-type cache.r7g.large \
--num-cache-clusters 3 \
--multi-az-enabled \
--automatic-failover-enabled \
--cache-subnet-group-name my-cache-subnet-group \
--security-group-ids sg-0123456789abcdef0 \
--at-rest-encryption-enabled \
--transit-encryption-enabled \
--auth-token "YourStr0ngRedisP@ssword" \
--snapshot-retention-limit 7 \
--snapshot-window "03:00-04:00" \
--preferred-maintenance-window "sun:05:00-sun:06:00"
# Create a Redis cluster mode enabled (3 shards, 1 replica each)
aws elasticache create-replication-group \
--replication-group-id my-redis-sharded \
--replication-group-description "Sharded Redis cache" \
--engine redis \
--engine-version 7.1 \
--cache-node-type cache.r7g.large \
--num-node-groups 3 \
--replicas-per-node-group 1 \
--multi-az-enabled \
--automatic-failover-enabled \
--cache-subnet-group-name my-cache-subnet-group \
--security-group-ids sg-0123456789abcdef0 \
--at-rest-encryption-enabled \
--transit-encryption-enabled
# Create an ElastiCache Serverless cache
aws elasticache create-serverless-cache \
--serverless-cache-name my-serverless-redis \
--engine redis \
--cache-usage-limits '{
"DataStorage": {"Maximum": 10, "Unit": "GB"},
"ECPUPerSecond": {"Maximum": 15000}
}' \
--subnet-ids subnet-abc123 subnet-def456 \
--security-group-ids sg-0123456789abcdef0Caching Strategies & Patterns
Choosing the right caching strategy is as important as choosing the right cache engine. Different strategies trade off between consistency, performance, and complexity. The right choice depends on your data's read/write ratio, tolerance for stale data, and consistency requirements.
Lazy Loading (Cache-Aside)
The most common caching pattern. The application checks the cache first. On a cache hit, it returns the cached data. On a cache miss, it reads from the database, writes the result to the cache, and returns it. Data is only loaded into the cache when requested, so the cache naturally fills with the most frequently accessed data.
import Redis from "ioredis";
import { DynamoDBClient, GetItemCommand } from "@aws-sdk/client-dynamodb";
const redis = new Redis({
host: "my-redis-cluster.xxxxx.use1.cache.amazonaws.com",
port: 6379,
tls: {}, // Enable TLS for in-transit encryption
});
const dynamodb = new DynamoDBClient({ region: "us-east-1" });
// Lazy loading (cache-aside) pattern
async function getProduct(productId: string) {
const cacheKey = `product:${productId}`;
// 1. Check cache first
const cached = await redis.get(cacheKey);
if (cached) {
console.log("Cache HIT:", productId);
return JSON.parse(cached);
}
// 2. Cache miss - read from database
console.log("Cache MISS:", productId);
const result = await dynamodb.send(new GetItemCommand({
TableName: "Products",
Key: { productId: { S: productId } },
}));
if (!result.Item) return null;
const product = {
productId: result.Item.productId.S,
name: result.Item.name.S,
price: parseFloat(result.Item.price.N!),
category: result.Item.category.S,
};
// 3. Write to cache with TTL (5 minutes)
await redis.setex(cacheKey, 300, JSON.stringify(product));
return product;
}Write-Through
With write-through, every write goes to both the cache and the database simultaneously. This ensures the cache always has the latest data, eliminating stale reads. However, it adds latency to every write operation (two writes instead of one) and can fill the cache with data that may never be read. Use TTL expiration to evict unused data.
Write-Behind (Write-Back)
Write-behind writes to the cache immediately and asynchronously flushes changes to the database in the background (often in batches). This provides the lowest write latency but introduces a risk of data loss if the cache fails before the data is persisted. This pattern is best for write-heavy workloads where some data loss is acceptable (e.g., analytics counters, session activity).
Caching Strategy Comparison
| Strategy | Read Latency | Write Latency | Consistency | Best For |
|---|---|---|---|---|
| Lazy Loading | Fast (after first read) | Normal (DB only) | Eventual (stale until TTL) | Read-heavy, tolerance for stale data |
| Write-Through | Fast (always cached) | Slower (cache + DB) | Strong | Read-heavy, consistency required |
| Write-Behind | Fast (always cached) | Fastest (cache only) | Eventual (async DB write) | Write-heavy, some data loss OK |
| Read-Through | Fast (after first read) | Normal (DB only) | Eventual (stale until TTL) | Simplified app code with cache provider |
Cache Invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things.", Phil Karlton. TTL-based expiration is the simplest and most reliable invalidation strategy. Event-driven invalidation (delete cache keys when the source data changes) provides better consistency but adds complexity. Avoid trying to keep the cache perfectly in sync with the database. It is usually better to accept brief staleness with a short TTL than to build a fragile invalidation system.
Session Management with ElastiCache
Storing user sessions in ElastiCache is one of the most common use cases for Redis. Instead of storing sessions on individual web servers (which breaks when you scale to multiple servers), you store sessions in a centralized Redis instance that all servers can access. This enables stateless web servers that can be added, removed, or replaced without losing user sessions.
import Redis from "ioredis";
import crypto from "crypto";
const redis = new Redis({
host: "my-redis-cluster.xxxxx.use1.cache.amazonaws.com",
port: 6379,
tls: {},
});
interface Session {
userId: string;
email: string;
role: string;
loginTime: string;
lastActivity: string;
cart?: Array<{ productId: string; quantity: number }>;
}
const SESSION_TTL = 1800; // 30 minutes
// Create a new session
async function createSession(userData: Omit<Session, "loginTime" | "lastActivity">): Promise<string> {
const sessionId = crypto.randomUUID();
const session: Session = {
...userData,
loginTime: new Date().toISOString(),
lastActivity: new Date().toISOString(),
};
// Use Redis hash for structured session data
await redis.hmset(`session:${sessionId}`, {
userId: session.userId,
email: session.email,
role: session.role,
loginTime: session.loginTime,
lastActivity: session.lastActivity,
cart: JSON.stringify(session.cart || []),
});
// Set TTL - session expires after 30 minutes of inactivity
await redis.expire(`session:${sessionId}`, SESSION_TTL);
return sessionId;
}
// Retrieve and refresh a session
async function getSession(sessionId: string): Promise<Session | null> {
const data = await redis.hgetall(`session:${sessionId}`);
if (!data || Object.keys(data).length === 0) {
return null; // Session expired or does not exist
}
// Refresh TTL on every access (sliding expiration)
await redis.expire(`session:${sessionId}`, SESSION_TTL);
// Update last activity
await redis.hset(`session:${sessionId}`, "lastActivity", new Date().toISOString());
return {
userId: data.userId,
email: data.email,
role: data.role,
loginTime: data.loginTime,
lastActivity: data.lastActivity,
cart: data.cart ? JSON.parse(data.cart) : [],
};
}
// Destroy a session (logout)
async function destroySession(sessionId: string): Promise<void> {
await redis.del(`session:${sessionId}`);
}Session Storage with Redis Hash
Using Redis hashes (HMSET/HGETALL) for sessions is more efficient than serializing the entire session as a JSON string. Hashes let you read or update individual fields without fetching and rewriting the entire session. For example, updating the shopping cart requires only HSET session:abc cart "[...]" rather than fetching the full session, parsing JSON, updating, re-serializing, and writing back.
Data Modeling for Redis
Redis is not just a key-value store. It supports rich data structures that enable sophisticated use cases beyond simple caching. Understanding when to use each data structure is key to getting the most out of Redis.
Common Redis Data Patterns
| Data Structure | Redis Type | Use Case | Key Example |
|---|---|---|---|
| Simple cache | String (GET/SET) | Cache API responses, HTML fragments | cache:api:/products/123 |
| Structured objects | Hash (HSET/HGET) | User profiles, sessions, configurations | user:12345 |
| Leaderboards | Sorted Set (ZADD/ZRANGE) | Gaming scores, trending items, rankings | leaderboard:daily |
| Activity feeds | List (LPUSH/LRANGE) | Recent notifications, activity logs | feed:user:12345 |
| Unique visitors | HyperLogLog (PFADD/PFCOUNT) | Cardinality estimation (unique counts) | visitors:2025-01-15 |
| Tags/relationships | Set (SADD/SMEMBERS) | Tags, followers, mutual friends | tags:product:123 |
| Rate limiting | String (INCR) + TTL | API rate limiting per user/IP | ratelimit:api:user:12345 |
| Event streaming | Stream (XADD/XREAD) | Message queues, event logs | events:orders |
import Redis from "ioredis";
const redis = new Redis({ host: "my-redis.cache.amazonaws.com", port: 6379, tls: {} });
// --- Leaderboard with Sorted Sets ---
async function updateLeaderboard(userId: string, score: number) {
await redis.zadd("leaderboard:daily", score, userId);
}
async function getTopPlayers(count: number = 10) {
// ZREVRANGE returns highest scores first
return redis.zrevrange("leaderboard:daily", 0, count - 1, "WITHSCORES");
}
async function getPlayerRank(userId: string) {
// ZREVRANK returns 0-based rank (0 = highest score)
const rank = await redis.zrevrank("leaderboard:daily", userId);
return rank !== null ? rank + 1 : null;
}
// --- Rate Limiting with Sliding Window ---
async function checkRateLimit(userId: string, maxRequests: number, windowSeconds: number): Promise<boolean> {
const key = `ratelimit:${userId}`;
const now = Date.now();
const windowStart = now - (windowSeconds * 1000);
// Use a sorted set with timestamps as scores
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart); // Remove old entries
pipeline.zadd(key, now, `${now}-${Math.random()}`); // Add current request
pipeline.zcard(key); // Count requests in window
pipeline.expire(key, windowSeconds); // Set TTL for cleanup
const results = await pipeline.exec();
const requestCount = results?.[2]?.[1] as number;
return requestCount <= maxRequests;
}
// --- Distributed Lock ---
async function acquireLock(lockName: string, ttlMs: number): Promise<string | null> {
const lockId = crypto.randomUUID();
const result = await redis.set(
`lock:${lockName}`,
lockId,
"PX", ttlMs, // Expire in milliseconds
"NX" // Only set if not exists
);
return result === "OK" ? lockId : null;
}
async function releaseLock(lockName: string, lockId: string): Promise<boolean> {
// Lua script ensures atomic check-and-delete
const script = `
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
else
return 0
end
`;
const result = await redis.eval(script, 1, `lock:${lockName}`, lockId);
return result === 1;
}Replication & High Availability
ElastiCache Redis supports automatic replication with Multi-AZ failover. In this setup, a primary node handles all writes and replicates data asynchronously to one or more replica nodes in different Availability Zones. If the primary fails, ElastiCache automatically promotes a replica to primary, typically within 30-60 seconds.
Replication provides three benefits: high availability (automatic failover if the primary fails), read scaling (replicas can serve read requests, distributing read load), and data durability (data exists on multiple nodes in different AZs).
Failover Behavior
When Multi-AZ is enabled and the primary node fails, ElastiCache performs the following steps: (1) detects the failure via health checks, (2) selects a replica with the least replication lag, (3) promotes it to primary, (4) updates the DNS endpoint to point to the new primary, and (5) the old primary is replaced with a new replica when it recovers. The entire process typically takes 30-60 seconds. During failover, writes are unavailable but reads from replicas continue to work.
Replication Lag and Data Loss
Redis replication is asynchronous, so there is always a small lag between a write on the primary and its replication to replicas. During failover, any writes that were not yet replicated to the promoted replica are lost. For most caching use cases, this is acceptable because the cache can be rebuilt from the source database. For use cases where data loss is not acceptable (e.g., using Redis as a primary data store), consider using Redis with AOF persistence and accepting the higher write latency.
# Test failover behavior
aws elasticache test-failover \
--replication-group-id my-redis-cluster \
--node-group-id 0001
# View replication group details
aws elasticache describe-replication-groups \
--replication-group-id my-redis-cluster \
--query 'ReplicationGroups[0].{
Status: Status,
ClusterEnabled: ClusterEnabled,
MultiAZ: MultiAZ,
NodeGroups: NodeGroups[].{
NodeGroupId: NodeGroupId,
Status: Status,
Primary: PrimaryEndpoint.Address,
Reader: ReaderEndpoint.Address,
Members: NodeGroupMembers[].{
CacheClusterId: CacheClusterId,
CurrentRole: CurrentRole,
PreferredAZ: PreferredAvailabilityZone
}
}
}'Performance Tuning & Monitoring
ElastiCache performance tuning involves choosing the right node type, configuring parameters correctly, and monitoring key metrics to identify bottlenecks before they impact your application.
Key Metrics to Monitor
| Metric | Warning Threshold | What It Means |
|---|---|---|
CPUUtilization | > 65% (non-cluster), > 90% (cluster) | CPU is saturated; scale up or out |
EngineCPUUtilization | > 80% | Redis engine thread is busy (single-threaded bottleneck) |
DatabaseMemoryUsagePercentage | > 80% | Approaching memory limit; eviction may occur |
CacheHitRate | < 80% | Low hit rate means cache is not effective |
Evictions | > 0 (unexpected) | Cache is full and evicting data; increase memory |
CurrConnections | Approaching maxclients | Connection exhaustion risk |
ReplicationLag | > 1 second | Replica is behind primary, indicating high write load |
NetworkBandwidthInAllowanceExceeded | > 0 | Network bandwidth limit hit; scale up node type |
# Set up CloudWatch alarms for critical ElastiCache metrics
aws cloudwatch put-metric-alarm \
--alarm-name "redis-high-memory" \
--namespace "AWS/ElastiCache" \
--metric-name "DatabaseMemoryUsagePercentage" \
--dimensions Name=CacheClusterId,Value=my-redis-cluster-001 \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:ops-alerts"
aws cloudwatch put-metric-alarm \
--alarm-name "redis-evictions" \
--namespace "AWS/ElastiCache" \
--metric-name "Evictions" \
--dimensions Name=CacheClusterId,Value=my-redis-cluster-001 \
--statistic Sum \
--period 300 \
--evaluation-periods 1 \
--threshold 100 \
--comparison-operator GreaterThanThreshold \
--alarm-actions "arn:aws:sns:us-east-1:123456789012:ops-alerts"
# Check Redis SLOWLOG for slow commands
redis-cli -h my-redis.cache.amazonaws.com --tls SLOWLOG GET 10Connection Pooling
Each Redis connection consumes memory on the server. Lambda functions are particularly problematic because each concurrent invocation creates a new connection, and Lambda can scale to hundreds or thousands of concurrent executions. Use connection pooling in long-running applications (ECS, EC2) and keep Lambda Redis connections lean. For Lambda, initialize the Redis client outside the handler and reuse it across warm invocations.
Security & Encryption
ElastiCache runs inside your VPC, which means it is not accessible from the public internet by default. You control access using security groups (which resources can connect to the cache) and VPC subnet groups (which subnets the cache nodes are placed in).
For encryption, ElastiCache Redis supports both encryption at rest(data on disk is encrypted using AWS-managed or customer-managed KMS keys) andencryption in transit (TLS for all connections). Both should be enabled for production workloads. Note that encryption at rest must be enabled at cluster creation time and cannot be added to an existing cluster.
Redis AUTH and RBAC
Redis supports two authentication mechanisms in ElastiCache. The legacy AUTH token (a single password shared by all clients) provides basic access control. The newer Redis RBAC (Role-Based Access Control) provides user-level authentication with fine-grained command and key pattern permissions. RBAC is recommended for production environments where different applications or teams need different levels of access.
# Create a Redis user with RBAC
aws elasticache create-user \
--user-id app-readonly \
--user-name app-readonly \
--engine redis \
--access-string "on ~product:* ~cache:* +get +mget +hgetall +smembers -@all" \
--passwords "ReadOnlyP@ssword123"
aws elasticache create-user \
--user-id app-readwrite \
--user-name app-readwrite \
--engine redis \
--access-string "on ~* +@all -@dangerous" \
--passwords "ReadWriteP@ssword123"
# Create a user group and associate it with a cluster
aws elasticache create-user-group \
--user-group-id my-app-users \
--engine redis \
--user-ids default app-readonly app-readwrite
aws elasticache modify-replication-group \
--replication-group-id my-redis-cluster \
--user-group-ids-to-add my-app-usersSecrets Manager for Redis Credentials
Store your Redis AUTH tokens and RBAC passwords in AWS Secrets Manager with automatic rotation. Your application retrieves the password from Secrets Manager at startup, and when the secret rotates, the application picks up the new password on the next connection. This eliminates hardcoded passwords in application configuration and provides an audit trail of credential access.
Cost Optimization & Sizing
ElastiCache costs are driven by node type, number of nodes, and data transfer. Right-sizing your cluster is the most impactful cost optimization. Over-provisioning wastes money on unused capacity, while under-provisioning causes evictions and degraded performance.
Node Type Selection
| Node Family | Optimized For | Memory Range | Starting Price (On-Demand) |
|---|---|---|---|
| cache.t3 / cache.t4g | Dev/test, small workloads | 0.5 – 6.4 GB | $0.017/hour (t4g.micro) |
| cache.m7g | General purpose, balanced | 6.4 – 209 GB | $0.137/hour (m7g.large) |
| cache.r7g | Memory-intensive workloads | 13 – 419 GB | $0.257/hour (r7g.large) |
| cache.c7gn | Compute-intensive (high throughput) | 3 – 46 GB | $0.180/hour (c7gn.large) |
Cost Reduction Strategies
Reserved Nodes: Commit to 1 or 3 years for up to 55% savings compared to on-demand pricing. If your cache is always running (which most caches are), reserved nodes are almost always worth it.
Graviton nodes: The cache.t4g, cache.m7g, and cache.r7g families use AWS Graviton processors, which are approximately 20% cheaper and 20% more performant than equivalent Intel-based nodes.
Right-sizing: Monitor DatabaseMemoryUsagePercentage andEngineCPUUtilization. If memory usage is consistently below 50%, you are likely over-provisioned. If CPU is consistently below 10%, consider a smaller node type.
Data compression: Compress cached values before storing them in Redis. This reduces memory usage (and potentially lets you use a smaller node) at the cost of CPU time for compression and decompression. Libraries like lz4 or snappy provide fast compression with reasonable ratios.
TTL discipline: Always set TTLs on cached data. Without TTLs, your cache fills up over time with stale data that is never evicted, requiring a larger (and more expensive) node.
ElastiCache Serverless for Variable Workloads
If your workload has unpredictable traffic patterns, quiet during off-hours and spiking during peak, ElastiCache Serverless can be more cost-effective than provisioned nodes. You pay only for the data stored and ECPUs consumed, with no provisioning decisions. For steady-state workloads with predictable capacity needs, provisioned nodes with reserved pricing are typically cheaper.
Key Takeaways
- 1Redis supports complex data structures, persistence, replication, and Lua scripting.
- 2Memcached is simpler and better for basic key-value caching with multi-threaded performance.
- 3Lazy loading, write-through, and write-behind are the three primary caching strategies.
- 4ElastiCache Serverless eliminates capacity planning with automatic scaling.
- 5Cluster mode enabled (Redis) supports up to 500 shards for horizontal scaling.
- 6Always deploy in private subnets with encryption in transit and at rest.
Frequently Asked Questions
When should I use Redis vs Memcached?
What is ElastiCache Serverless?
How does ElastiCache integrate with RDS?
What is Redis Cluster Mode?
How do I handle cache invalidation?
Written by CloudToolStack Team
Cloud engineers and architects with hands-on experience across AWS, Azure, and GCP. We write guides based on real-world production patterns, not just documentation rewrites.
Disclaimer: This guide is for educational purposes. Cloud services change frequently; always refer to official documentation for the latest information. AWS, Azure, and GCP are trademarks of their respective owners.