The simplest possible caching architecture: a single Redis instance sitting between the application service and PostgreSQL. All cache operations hit one node. No sharding, no replication, no eviction tuning. Demonstrates why horizontal scaling and redundancy become essential as traffic grows beyond single-node capacity.
Distributed caching is one of the most frequently asked system design interview questions because it sits at the intersection of performance optimization, data consistency, and fault tolerance. Companies like Redis Labs, AWS (ElastiCache), Memcached, Twitter, Meta, Netflix, and virtually every major tech company rely on distributed caches to reduce database load and achieve sub-millisecond read latency. Interviewers expect candidates to reason about cache-aside patterns, eviction policies, thundering herd prevention, and the trade-offs between cache consistency and performance.
The naive approach uses the simplest possible architecture: a single Redis instance sitting between an application service and a PostgreSQL database. All cache operations — GET, SET, DELETE — hit one Redis node. The cache-aside pattern governs the read path: the application checks the cache first, on miss queries the database, writes the result back to the cache (backfill), and returns to the caller. Writes go to the database first, then invalidate the corresponding cache entry.
At low traffic (under 10K ops/sec) and small working sets (under 64GB), this architecture works well. Redis handles GET operations in sub-millisecond time, the database sees only 15% of total traffic (cache miss rate), and the system is easy to reason about — one cache node, one database, linear request flow. There is no hash ring, no routing logic, and no replication to manage.
The problems emerge at scale. First, memory: a single Redis node is limited to the machine's physical memory (26GB on cache.r7g.xlarge, 64GB on cache.r7g.2xlarge). If the working set exceeds this, either keys are evicted (lowering hit rate) or Redis returns OOM errors (the default noeviction policy). Second, throughput: Redis is single-threaded for command processing, capping at approximately 50K ops/sec on modern hardware. Beyond this, CPU becomes the bottleneck. Third, availability: a single node is a single point of failure. If Redis crashes or is restarted, 100% of traffic falls through to the database. The database is sized for ~15% of traffic (the expected miss rate). At 100% miss rate, it is overwhelmed by 6-7x its capacity, causing cascading failure across the entire application.
The thundering herd problem is particularly severe in the naive approach. When the Redis node restarts or a popular key expires, hundreds of concurrent requests miss simultaneously and all hit the database independently. Without single-flight coalescing (which requires a proxy layer), 100 concurrent misses on the same key generate 100 identical database queries. The V1 consistent hash variant solves this with a CacheProxyService that coalesces concurrent misses into a single database query.
This template exists to make the single-node bottleneck visible and measurable. Run the simulation at increasing traffic levels and watch Redis CPU climb linearly. Simulate a node failure and observe the thundering herd overwhelming the database. The comparison with the V1 consistent hash variant quantifies the improvement: 3 shards provide 3x throughput, consistent hashing minimizes rebalance disruption, and single-flight coalescing prevents the thundering herd entirely.
The naive distributed cache is a five-component architecture: Client, AppLB (Application Load Balancer), AppService, CacheNode (single Redis instance), and BackendDB (PostgreSQL). There is no hash ring, no proxy layer, no replication, no event stream, and no eviction policy tuning.
All traffic arrives at the AppLB (AWS ALB), which distributes requests across AppService pods using round-robin. The AppLB adds approximately 1.5ms of routing latency and can handle up to 20K RPS — well above the system's actual limits, which are constrained by the single Redis node. The load balancer is never the bottleneck; Redis is.
AppService implements the cache-aside logic for three types of operations. Cache reads (75% of traffic): AppService sends a GET to CacheNode. On hit (~85%), the value is returned in under 1ms. On miss, AppService queries BackendDB, writes the result to CacheNode (backfill), and returns to the caller. Total miss penalty: approximately 20ms (12ms DB read + 5ms network + 3ms processing). Cache writes (15% of traffic): AppService sends a SET to CacheNode with a TTL. The caller is responsible for writing to BackendDB first in the cache-aside pattern. Cache deletes (5% of traffic): AppService sends a DEL to CacheNode, immediately removing the key. Subsequent reads miss and backfill from BackendDB with the updated value.
CacheNode is a single Redis instance (cache.r7g.xlarge, 26GB memory) running in one availability zone. It holds the entire cache keyspace — no sharding, no partitioning. The default noeviction policy means Redis returns errors when memory is full rather than evicting old keys. This is the safest default (no silent data loss) but causes application errors if the working set exceeds 26GB. There is no replication: if this node fails, all cached data is lost and 100% of traffic falls through to BackendDB.
BackendDB is Amazon RDS PostgreSQL (db.r7g.xlarge, 4 vCPU, 32GB RAM) with a single table (kv_store) storing the authoritative key-value data. At 85% cache hit rate with 10K peak RPS, BackendDB sees approximately 1.5K read ops/sec — well within its capacity. The critical risk is cache failure: if CacheNode goes down, BackendDB receives 100% of traffic (10K RPS), exceeding its approximately 2K RPS capacity by 5x. This is the thundering herd scenario that makes the naive approach unsuitable for production workloads.
The architecture has no async processing — no Kafka stream, no warmup worker, no invalidation pipeline. Cache invalidation is synchronous: the caller writes to BackendDB, then calls DELETE on CacheNode. If the DELETE fails (network issue, Redis restart), the cache serves stale data until the key's TTL expires (300 seconds). This staleness window is the fundamental consistency limitation of the naive approach.
This sequence diagram traces the three primary flows: cache hit (fast path), cache miss with backfill (slow path), and cache invalidation after database write. The critical insight is the absence of single-flight coalescing — when a popular key expires, every concurrent request independently queries the database, creating a thundering herd.
The second insight is the complete dependency on one Redis node. Every cache operation traverses the same network path to the same process on the same machine. Any failure on this path (network partition, process crash, OOM) immediately degrades all application reads to database-only mode.
Step-by-Step Walkthrough
Pseudocode
// CACHE-ASIDE READ — the fundamental pattern
async function cacheGet(key: string): Promise<string> {
// Step 1: Check cache
const cached = await redis.get(key);
if (cached !== null) {
return cached; // HIT — sub-millisecond
}
// Step 2: Cache miss — fall through to database
// NO single-flight coalescing — every concurrent miss hits DB
const value = await db.query(
"SELECT value FROM kv_store WHERE key = $1", [key]
); // ~12ms
// Step 3: Backfill cache (fire-and-forget)
await redis.set(key, value, "EX", 300); // TTL 300 seconds
return value;
}
// CACHE INVALIDATION — after database write
async function invalidate(key: string): Promise<void> {
// Write to DB first (caller's responsibility)
// Then invalidate cache
await redis.del(key);
// If this fails, stale data persists until TTL (300s)
}The data model is minimal: one PostgreSQL table (kv_store) as the source of truth and one Redis keyspace as the cache. The PostgreSQL table uses a B-tree index on the key column for O(log N) lookups. Redis stores the same key-value pairs with a 300-second TTL.
The critical observation is the memory asymmetry: PostgreSQL can store unlimited data on disk, but Redis is limited to 26GB in memory. When the working set exceeds Redis capacity, keys must be evicted or SET operations fail with OOM errors.
Step-by-Step Walkthrough
Pseudocode
-- PostgreSQL: Source of truth
CREATE TABLE kv_store (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
updated_at TIMESTAMPTZ DEFAULT now()
);
-- Redis: Cached copy (ephemeral)
-- SET key value EX 300
-- All keys on a single node — no hash slots, no sharding
-- Memory limit: 26GB (cache.r7g.xlarge)
-- Eviction: noeviction (OOM errors when full)
-- Cache-aside read pattern:
-- 1. GET key from Redis
-- 2. If miss: SELECT value FROM kv_store WHERE key = $1
-- 3. SET key value EX 300 in Redis (backfill)
-- Cache invalidation pattern:
-- 1. UPDATE kv_store SET value = $2 WHERE key = $1 (DB write)
-- 2. DEL key from Redis (cache invalidation)Choice
One Redis instance holds the entire cache keyspace
Rationale
At low traffic (under 10K ops/sec) and small working sets (under 64GB), a single node eliminates all routing complexity. Every key goes to the same place — no hash ring, no CRC16 computation, no slot table. The trade-off is a hard ceiling on both memory (26GB) and throughput (~50K ops/sec). The V1 variant introduces consistent hashing across 3 shards to remove this ceiling, tripling capacity and providing fault isolation.
Choice
Single node with no replica for failover
Rationale
Redis replication adds a replica that receives all writes asynchronously, providing failover capability and read scaling. For the naive approach, the added complexity of managing replica lag, promoting replicas on failure, and handling split-brain scenarios is not justified. The consequence is severe: node failure = 100% cold cache = thundering herd to the database. The V2 variant adds replicas with automatic promotion.
Choice
Redis defaults to noeviction — returns errors when memory is full
Rationale
Noeviction is the safest default because it never silently discards data. The application receives an explicit error when memory is full, allowing it to handle the situation (fall back to database, alert operators). The alternative — allkeys-lru — silently evicts the least recently used keys, which can cause unexpected cache misses on keys the application assumed were cached. Production systems tune this carefully; the naive variant deliberately uses the default to demonstrate the OOM failure mode.
Choice
Concurrent cache misses on the same key each generate an independent database query
Rationale
Single-flight coalescing requires a coordination layer (the CacheProxyService in V1) that deduplicates concurrent requests for the same key, sending only one query to the database and distributing the result to all waiting callers. The naive approach has no proxy — AppService talks directly to Redis. Without coalescing, 100 concurrent misses on the same key generate 100 identical database queries, amplifying the thundering herd effect by orders of magnitude.
Choice
Application manages cache reads and writes explicitly
Rationale
Cache-aside gives the application full control over what is cached and when. The application reads from cache, on miss reads from database and backfills cache. Writes go to database first, then invalidate cache. This pattern is resilient — if Redis is down, reads transparently fall through to the database. Write-through (synchronous cache + DB writes) doubles write latency and creates failure coupling. Read-through (cache manages DB access) hides complexity but makes debugging harder.
Target RPS
~10K sustained (ceiling at single Redis node)
Latency (p99)
<1ms cache hit, ~20ms cache miss (DB fallback)
Storage
26 GB (single node memory limit)
Availability
~99% (single Redis node, no redundancy)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Cache GET (Redis) | O(1) — hash table lookup | O(1) — single key-value pair | Redis GET is O(1) regardless of keyspace size. Sub-millisecond latency on cache hits. The single-threaded execution model means all operations are serialized — high concurrency does not improve throughput beyond ~50K ops/sec. |
| Cache SET (Redis) | O(1) — hash table insert/update | O(1) — single key-value pair | Redis SET with TTL is O(1). Memory allocation may trigger eviction under allkeys-lru (O(1) amortized) or OOM error under noeviction. Each SET also updates the key's TTL in Redis's expires hash table. |
| Database fallback on miss (PostgreSQL) | O(log N) — B-tree index lookup on key | O(1) — single row returned | PostgreSQL B-tree index on key provides O(log N) lookup. At 1M rows, this is approximately 20 index page reads — well cached in the buffer pool for frequently accessed keys. The 12ms average latency includes network round-trip and buffer pool access. |
| Cache backfill after miss | O(1) — Redis SET + network round-trip | O(1) — single key-value pair written to Redis | After a database read, the application writes the value back to Redis. This is a fire-and-forget SET — if it fails (Redis OOM, network issue), the next read simply misses again. No retry logic in the naive approach. |
Source of truth for all cached data. AppService queries this table on cache miss and backfills CacheNode. At 85% hit rate with 10K peak RPS, BackendDB sees approximately 1.5K read ops/sec. Single table with B-tree index on key. If CacheNode fails, all 10K RPS hit this table directly, overwhelming the database.
Indexes: PK on key (B-tree)
Write-once, read-many pattern. Under normal operation (85% cache hit rate), this table sees low read volume (~1.5K QPS). The danger is cache failure: 100% miss rate sends 10K QPS to a database sized for 1.5K, causing connection pool exhaustion within seconds.
All cached key-value pairs stored on a single Redis instance. No sharding — the entire keyspace resides on one node. 300-second default TTL with noeviction policy. Zipfian access pattern means the top 1% of keys account for ~50% of traffic.
Indexes: Redis hash table (O(1) GET/SET)
Memory limit: 26GB on cache.r7g.xlarge. At ~1KB per entry, this holds approximately 26M keys. If the working set exceeds this, the noeviction policy causes SET operations to return OOM errors. No persistence (RDB/AOF disabled) — cache restart means full cold start.
Redis node crashes or is restarted (complete cache loss)
Impact
100% of traffic falls through to BackendDB. The database is sized for ~1.5K RPS (15% miss rate) but receives 10K RPS — a 6.7x overload. Connection pool exhaustion occurs within seconds. Query latency spikes from 12ms to 500ms+. Cascading application failures as upstream services timeout.
Mitigation
Add Redis replication (V2) for automatic failover with no data loss. As a stopgap in V0, implement circuit breaker in AppService: when Redis is unavailable, rate-limit database queries and return cached stale responses or graceful degradation errors. Pre-warm the replacement Redis node from a recent RDB snapshot (if persistence was enabled).
Hot key overwhelms single Redis node CPU (viral product page)
Impact
One extremely popular key receives 10,000+ ops/sec. Since Redis is single-threaded, this key consumes a disproportionate share of CPU cycles, slowing all other operations. p99 latency rises from 1ms to 5-10ms across all keys, not just the hot key.
Mitigation
The V2 variant solves this with L1 in-process caching: hot keys detected by CacheProxy are promoted to AppService local caches, serving in sub-100us without hitting Redis. In V0, the application can implement a per-key local cache manually (e.g., Guava Cache with 5-second TTL) for known hot keys.
Working set exceeds Redis memory (26GB limit reached)
Impact
With noeviction policy, all SET operations fail with OOM errors. Cache backfill after miss fails, so the cache cannot recover — every read misses forever. With allkeys-lru, keys are evicted to make room, but hit rate drops from 85% to 50% or lower, significantly increasing database load.
Mitigation
Upgrade to a larger instance (cache.r7g.2xlarge, 52GB) as a short-term fix. Long-term: shard across multiple nodes (V1) to distribute memory across 3+ machines. Monitor Redis memory utilization and alert at 70% to prevent surprise OOM.
Cache invalidation fails (network error on DELETE after DB write)
Impact
The database has the updated value but the cache retains the old value. The application serves stale data until the key's 300-second TTL expires. For time-sensitive data (prices, inventory counts), 5 minutes of staleness can cause incorrect behavior (selling out-of-stock items, showing wrong prices).
Mitigation
Reduce TTL to 30-60 seconds to limit staleness window. Implement retry logic for failed DELETEs with exponential backoff. The V2 variant solves this definitively with Kafka-based invalidation: the invalidation event is durably persisted and replayed until successfully processed.
| Component | Failure | Impact | Mitigation |
|---|---|---|---|
| CacheNode (Redis) | OOM crash from noeviction policy with full memory | All SET operations fail. Cache cannot backfill after misses. Hit rate degrades to zero over time as existing keys expire (TTL). Database receives 100% of traffic. | Monitor Redis memory utilization. Alert at 70%. Switch to allkeys-lru eviction policy in production. Scale to sharded architecture (V1) when working set exceeds 60% of node memory. |
| CacheNode (Redis) | Process crash or node reboot | Complete cache loss. Thundering herd: all traffic falls through to database. Without persistence (RDB/AOF), restart means empty cache — full cold start takes minutes to hours via organic miss-and-backfill. | Enable RDB snapshots every 5 minutes for faster warm-up on restart. Add replication (V2) for automatic failover with no data loss. Implement circuit breaker in AppService to rate-limit database queries during cache unavailability. |
| BackendDB (PostgreSQL) | Connection pool exhaustion during cache failure | Database max_connections (200) exceeded. All new queries fail with 'too many connections' error. Both cache misses and direct database queries fail. Total system outage. | Use connection pooling via PgBouncer (transaction mode) to multiplex application connections. Increase max_connections as a stopgap. Long-term: add read replicas and implement query routing to distribute cache-miss load. |
| AppLB (Load Balancer) | All AppService health checks fail | ALB returns 502 Bad Gateway. All traffic fails. Users cannot access any cached or uncached data. | Multi-AZ deployment with at least 2 pods per AZ. Configure health check thresholds to tolerate transient failures (3 consecutive failures before marking unhealthy). |
Vertical scaling only for Redis (upgrade instance size from cache.r7g.xlarge to cache.r7g.2xlarge to cache.r7g.4xlarge). Horizontal scaling for AppService via pod count increase (2 -> 4 -> 8 pods). Auto-scaling trigger: CPU utilization > 70% for 3 consecutive minutes. The ceiling is approximately 50K ops/sec regardless of AppService pod count, because the single Redis node is the throughput bottleneck. Beyond this ceiling, architectural changes are required: consistent hash sharding across multiple Redis nodes (V1) or Redis Cluster with replication (V2).
Key metrics to monitor: (1) Redis memory utilization — alert at 70%, critical at 85%. This is the single most important metric for the naive approach. (2) Redis CPU utilization — alert at 60% (single-threaded, so 60% means approaching serialization bottleneck). (3) Cache hit rate — should be ~85% under normal operation. Drop below 70% indicates memory pressure or working set growth. (4) Cache miss rate to database — should be ~1.5K QPS at 10K peak. Spike indicates Redis failure or hit rate degradation. (5) Redis connected clients — alert at 80% of maxclients. (6) Key eviction rate (if using allkeys-lru) — any evictions indicate memory pressure. (7) Database connection pool utilization — alert at 70% of max_connections. Dashboard: Grafana with panels for Redis memory/CPU, hit rate time series, miss-to-DB throughput, database connection pool, and key distribution histogram. SLIs: cache hit p99 < 2ms, miss + backfill p99 < 30ms, hit rate > 80%.
At 10K ops/sec peak: Redis cache.r7g.xlarge (~$175/month), PostgreSQL db.r7g.xlarge (~$350/month), ECS Fargate 2 pods (~$140/month), ALB (~$30/month). Total: ~$695/month. This is the cheapest variant. Scaling vertically to cache.r7g.2xlarge ($350/month) doubles memory to 52GB but does not improve throughput beyond ~50K ops/sec. The V1 consistent hash variant at 100K ops/sec costs approximately $1,800/month but handles 10x the throughput — the per-operation cost decreases from $0.069/1K-ops to $0.018/1K-ops as you scale beyond the naive approach's ceiling.
Redis authentication: require a password (requirepass) to prevent unauthorized access. Redis does not support fine-grained ACLs in the naive setup — any client with the password has full access. Network security: place Redis in a private subnet, accessible only from AppService pods via security group rules. No public endpoint. TLS: enable in-transit encryption between AppService and Redis (adds ~0.5ms latency per operation). Data sensitivity: do not cache PII (personally identifiable information) or secrets in Redis without encryption at rest. Redis data is stored in plain text in memory — anyone with access to the host can inspect it. Rate limiting: AppLB rate limits per client IP to prevent cache-busting attacks (attackers sending random keys to cause 100% miss rate).
Rolling deployment for AppService — replace one pod at a time while the ALB routes traffic to the remaining pod. Redis changes (instance type upgrade, parameter group changes) require a brief maintenance window (typically 5-30 seconds for ElastiCache parameter group apply). Database migrations run during low-traffic windows with schema changes requiring table locks. Zero-downtime deployment achievable for application code changes but not for Redis instance type changes or PostgreSQL schema migrations.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| V0: Naive (Single Redis Node) | T1 | <1ms hit, ~20ms miss | ~10K ops/sec | $695/month | Low | 99% (single node) |
| V1: Consistent Hash Sharding | T2 | <1ms hit, ~20ms miss | 100K ops/sec | $1,800/month | Medium | 99.5% (3 shards, no replication) |
| V2: Redis Cluster + Replication + Multi-Tier | T3 | <100us L1 hit, <1ms L2 hit | 500K ops/sec | $4,500/month | High | 99.99% (replicated, auto-failover) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Distributed caching tests five fundamental distributed systems concepts simultaneously: (1) data partitioning — how to distribute keys across nodes using consistent hashing or hash slots, (2) consistency — how to keep cache and database in sync (cache-aside vs write-through vs write-behind), (3) availability — how to survive node failures without thundering herd, (4) eviction — how to manage memory pressure with LRU/LFU policies, and (5) performance — how to achieve sub-millisecond latency with in-process L1 and distributed L2 tiers. Companies like Redis Labs, Twitter, Meta, Netflix, and AWS ask this question because caching is central to every large-scale system. The question scales from simple (single node, this variant) to complex (multi-tier L1/L2 with Kafka invalidation, the V2 variant), allowing interviewers to calibrate difficulty to the candidate's experience level.
Total cache failure. All cached data is lost. Every request that previously hit the cache (85% of traffic) now falls through to the database. The database is sized for approximately 15% of traffic (the expected miss rate). At 100% miss rate, it receives 6-7x more traffic than its capacity, causing connection pool exhaustion, query timeouts, and cascading failure across the application. This is the thundering herd problem. Recovery requires restarting Redis and waiting for the cache to warm up organically through miss-and-backfill, which takes minutes to hours depending on the working set size. The V1 variant distributes this risk: losing one of three shards affects only ~33% of keys. The V2 variant eliminates it: replicas promote automatically with no data loss.
The noeviction policy makes memory exhaustion explicit and loud. When Redis runs out of memory, SET commands return OOM errors that the application can catch, log, and handle (fall back to database-only mode, page the on-call engineer). With allkeys-lru, Redis silently evicts the least recently used keys to make room for new ones. This sounds convenient, but it can cause subtle bugs: a key the application assumed was cached gets evicted, leading to unexpected database load or stale data. In production, allkeys-lru is the correct choice — but it requires monitoring eviction rate, memory utilization, and hit rate to detect problems. The naive variant uses noeviction to demonstrate what happens when caching is treated as an afterthought.
Migrate when any of these thresholds are crossed: (1) working set exceeds 60% of single-node memory (15GB on cache.r7g.xlarge) — eviction pressure degrades hit rate, (2) Redis CPU exceeds 70% sustained — single-threaded command processing becomes the bottleneck, (3) cache availability SLA exceeds 99.5% — single node cannot provide higher availability without replication, (4) multiple services share the cache — cross-service consistency requires invalidation infrastructure (Kafka in V2). In practice, most production services hit threshold (1) or (3) first, typically at 10K-50K ops/sec with a working set of 10-30GB.
Vertical scaling (bigger instance) has three hard limits. Memory: the largest ElastiCache instance (cache.r7g.16xlarge) offers 419GB — sufficient for many workloads but insufficient for 100TB+ aggregate capacity at true scale. CPU: Redis is single-threaded for command processing, so a 64-core machine uses only 1 core for commands — CPU scales up but Redis does not use it. Failover: a single larger node is still a single point of failure. Rebooting a 419GB Redis node takes 10+ minutes for RDB load, during which all traffic hits the database. Sharding (V1) provides linear scaling of both memory and throughput. Sharding with replication (V2) adds fault tolerance on top.
Sign in to join the discussion.
Ready to design your own Distributed Cache?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator