Vetora logo
Hard10 componentsInterview: Very High

Distributed Cache — Consistent Hashing

Design a distributed in-memory cache with consistent hash ring routing, sharded Redis instances, cache-aside pattern, and single-flight coalescing to serve 100K ops/sec at sub-millisecond latency.

CachingConsistent HashingRedisLow Latency
Problem Statement

Distributed caching is one of the most frequently asked system design interview topics because it underpins the performance of virtually every large-scale web application. Companies like Facebook, Twitter, and Netflix rely on distributed caches to serve billions of requests per day with sub-millisecond latency. Without a caching layer, databases would buckle under the read load, and user-facing latency would spike from single-digit milliseconds into the hundreds.

The core challenge is distributing cache entries across multiple nodes while maintaining balanced load, minimizing disruption when nodes are added or removed, and preventing catastrophic failure scenarios like thundering herds. A naive modulo-based sharding strategy rehashes every key when the cluster size changes, causing a massive cache miss storm that can take down the origin database. Consistent hashing solves this by only remapping approximately 1/N of keys when a single node changes, but introduces its own complexity around virtual nodes, ring management, and hot key handling.

At production scale, a distributed cache cluster might handle 10 million operations per second across hundreds of terabytes of aggregate memory. The system must maintain a 92% or higher hit rate to keep database load manageable, deliver sub-millisecond p99 latency on cache hits, and gracefully degrade when individual shards fail. Additional concerns include cache warming on cold starts, TTL-based expiration, and write-through versus cache-aside consistency patterns.

This template models a Memcached/Redis-style cache with a proxy-based consistent hash ring, three independent Redis shards, an origin PostgreSQL database for cache misses, and a warmup worker for cold start mitigation. It covers the full spectrum of caching concerns that interviewers expect candidates to address.

Architecture Overview

The architecture follows a proxy-based consistent hashing model with a clear separation between the routing layer, the cache data layer, and the origin database. Client applications send cache GET, SET, and DELETE requests through an API Gateway that handles authentication and rate limiting, then through a load balancer that distributes traffic to the CacheProxyService fleet.

The CacheProxyService is the brain of the system. It maintains a consistent hash ring with 150 virtual nodes per physical shard (450 total), ensuring balanced key distribution within 5% variance. When a request arrives, the service hashes the key using MurmurHash3, locates the owning shard on the ring, and routes the request accordingly. On a cache hit (approximately 92% of reads), the value is returned in under 1 millisecond. On a miss, the proxy falls through to OriginDB (PostgreSQL), fetches the authoritative value, backfills the cache shard, and returns the result. Single-flight coalescing ensures that if 100 concurrent requests miss on the same key, only one database query is issued.

The three Redis shards (cache.r7g.xlarge, 26 GB each) are deployed across different availability zones for fault isolation. Each shard operates independently, holding roughly 33% of the keyspace. There is no cross-shard replication, which means a shard failure results in temporary cache misses for that partition until the replacement warms up. The AdminStream (Kafka) carries operational events such as shard additions and warmup requests. The WarmupWorker consumes these events and pre-populates cache shards by scanning OriginDB for hot keys, reducing the cold-start miss storm from minutes to seconds.

The write path follows the cache-aside pattern: the upstream application writes to OriginDB first, then the CacheProxyService invalidates or updates the corresponding cache entry. This approach is simpler and more resilient than write-through, because if the cache is temporarily unavailable, reads simply fall through to the database without data loss.

Architecture Preview
Loading architecture preview...
Key Design Decisions
Consistent Hashing over Modulo-N

Choice

Consistent hash ring with 150 virtual nodes per shard

Rationale

Modulo-N sharding rehashes all keys when the cluster size changes, causing a massive cache miss storm that can overwhelm the origin database. Consistent hashing only remaps approximately 1/N of keys when a shard is added or removed. With 150 virtual nodes per physical shard, key distribution stays balanced within 5% variance across the ring.

Proxy-Based Routing over Client-Side Hashing

Choice

Centralized CacheProxyService manages the hash ring

Rationale

Client-side hashing (the Memcached model) pushes ring management to every application, making it operationally expensive to keep hundreds of microservices in sync. The CacheProxyService centralizes ring management, connection pooling, and observability at the cost of one extra network hop (approximately 1ms per request). This trade-off is worthwhile for most organizations.

Cache-Aside over Write-Through

Choice

Reads check cache then fall through to DB; writes go to DB then invalidate cache

Rationale

Cache-aside is simpler and more resilient than write-through. If the cache is temporarily down, reads transparently fall through to the database without data loss. Write-through doubles write latency by requiring synchronous cache and database writes, creating tight failure coupling between the two layers.

Single-Flight Coalescing for Thundering Herd Prevention

Choice

Only one DB query per key per miss window; concurrent requests wait for the result

Rationale

When a popular key expires or is evicted, hundreds of concurrent requests may miss simultaneously. Without coalescing, all of those requests hit the database in parallel. Single-flight ensures only one database query is issued per key per miss window, reducing origin database load by up to 100x during cache stampede scenarios.

Scale & Performance

Target RPS

100,000 ops/sec (75K reads, 15K writes, 5K deletes, 5K admin)

Latency (p99)

< 1ms p99 on cache hits; < 20ms on cache misses with DB fallback

Storage

78 GB aggregate Redis memory (3 shards x 26 GB); origin DB on RDS PostgreSQL

Availability

99.95% — individual shard failures cause partial miss spikes, not full outage

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
What is consistent hashing and why is it used in distributed caches?

Consistent hashing is a technique that maps both cache keys and server nodes onto a circular hash space (a ring). Each key is assigned to the nearest node clockwise on the ring. The critical advantage over simple modulo-based hashing is minimal disruption during topology changes: when a node is added or removed, only approximately 1/N of keys need to be remapped rather than all of them. Virtual nodes (multiple points on the ring per physical server) improve balance. In this template, each shard has 150 virtual nodes, ensuring key distribution stays within 5% variance.

How does the cache-aside pattern work?

In the cache-aside pattern, the application checks the cache first on reads. On a cache hit, the value is returned immediately. On a miss, the application reads from the origin database, writes the result back to the cache (backfill), and returns it to the caller. For writes, the application writes to the database first, then invalidates or updates the cache entry. This pattern is simpler than write-through because the cache and database are loosely coupled: if the cache is unavailable, reads still work via the database fallback path.

What is the thundering herd problem and how does single-flight coalescing solve it?

The thundering herd problem occurs when a popular cache key expires and hundreds or thousands of concurrent requests simultaneously miss the cache, all hitting the origin database at once. This can overwhelm the database and cause cascading failures. Single-flight coalescing solves this by deduplicating concurrent misses for the same key: only the first request actually queries the database, while all subsequent requests for that key wait for the result. This reduces database load from N concurrent queries to exactly one per key per miss window.

Why are three separate Redis shards used instead of a Redis Cluster?

Using three independent Redis instances with proxy-based consistent hashing gives the system full control over key routing, connection management, and failover behavior. Redis Cluster automates sharding but introduces constraints such as hash slot migrations, limited cross-slot operations, and client driver complexity. The proxy model also allows non-Redis backends to be swapped in without changing clients. The trade-off is that the proxy adds a network hop, but this is typically under 1ms.

How does the system handle cache warming after a shard replacement?

When a cache shard is replaced (due to failure or maintenance), the new shard starts empty, causing a 100% miss rate for its portion of the keyspace. The WarmupWorker mitigates this by consuming admin events from the Kafka-based AdminStream (such as shard_added or warmup_requested), scanning the origin database for hot keys based on access frequency, and pre-loading them into the target shard. This reduces the cold-start miss storm from potentially minutes of elevated database load down to seconds.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own Distributed Cache?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator