Vetora logo
Medium9 componentsInterview: Very High

Distributed Cache — Consistent Hash Sharding

Industry-standard distributed cache using consistent hashing for key-to-shard routing. CacheProxyService maintains a hash ring with 150 virtual nodes per shard for balanced distribution across 3 Redis shards. Single-flight coalescing prevents thundering herd. WarmupWorker pre-populates shards on cold start.

Consistent HashingRedisShardingCache ProxyStorage
Problem Statement

The consistent hash sharding approach to distributed caching is the industry standard used by Memcached, Twemproxy (nutcracker), and mcrouter at companies like Twitter, Meta, and Netflix. It solves the two fundamental problems with the naive single-node approach: the throughput ceiling and the rebalance disruption.

The key insight is the consistent hash ring. Instead of using modulo-N (key % N shards) to route keys, which re-hashes ALL keys when a shard is added or removed, consistent hashing maps both keys and shards onto a virtual ring. Each key is assigned to the nearest shard clockwise on the ring. When a shard is added or removed, only the keys between the new/removed shard and its predecessor are affected — approximately 1/N of the total keyspace. With 3 shards, adding a 4th shard re-hashes ~25% of keys instead of 100%.

The CacheProxyService is the routing brain. It maintains the hash ring with 150 virtual nodes per physical shard (450 total virtual nodes) to ensure even key distribution. For each incoming cache operation, it computes MurmurHash3(key), locates the nearest virtual node on the ring, and routes the request to the corresponding physical shard. The proxy also implements single-flight coalescing: when multiple concurrent requests miss on the same key, only one database query is issued — the other requests wait for the result. This prevents the thundering herd that devastates the naive approach during key expiration or cache cold start.

Three independent Redis shards (cache.r7g.xlarge, 26GB each) provide 78GB of aggregate memory and approximately 100K ops/sec of aggregate throughput — 3x the naive approach's ceiling. Each shard operates independently: failure of one shard affects only ~33% of the cached keyspace. The remaining 67% continues serving normally. This fault isolation is a dramatic improvement over the naive approach's all-or-nothing failure mode.

The WarmupWorker addresses the cold-start problem. When a shard is replaced (hardware failure, scaling event), it starts empty — 100% miss rate for its portion of the keyspace. Without warmup, these misses overwhelm the database. The WarmupWorker consumes admin events from a Kafka stream, scans the database for frequently accessed keys, and pre-loads them into the new shard. This reduces the cold-start miss storm from minutes to seconds.

The primary trade-off is the proxy hop. Every cache operation passes through CacheProxyService, adding approximately 1ms of latency compared to direct client-to-cache connections (the Memcached model). For latency-critical paths where every microsecond matters, client-side hash ring libraries (like ketama) eliminate this hop at the cost of operational complexity — every client must maintain an up-to-date ring configuration. The V2 variant's L1 in-process cache provides the best of both worlds: sub-100us for hot keys (no network hop) and sub-1ms for warm keys (proxy + Redis hop).

Interviewers expect candidates to explain why consistent hashing is superior to modulo-N, analyze the rebalance impact mathematically (1/N keys affected), discuss the role of virtual nodes in load balancing, and reason about the thundering herd prevention via single-flight coalescing.

Architecture Overview

The consistent hash sharding architecture uses nine components organized into four layers: traffic entry (AppClient, ApiGateway, MainLB), routing (CacheProxyService), data stores (CacheShard1, CacheShard2, CacheShard3, OriginDB), and async processing (AdminStream, WarmupWorker).

The traffic entry layer handles authentication and load distribution. AppClient sends cache requests to ApiGateway (AWS API Gateway), which authenticates service-to-service calls via mTLS (~2ms) and rate-limits at 120K RPS. MainLB (AWS ALB) distributes traffic to CacheProxyService pods using round-robin.

CacheProxyService is the routing brain — 8 pods, each with 200 threads, for 200K sustained RPS with headroom. For each incoming request, it computes MurmurHash3(key), walks the consistent hash ring (450 virtual nodes) to find the owning shard, and routes the request. On cache hit (~92%), the response returns in under 1ms. On cache miss, the proxy implements single-flight coalescing: if another request for the same key is already in flight (fetching from OriginDB), the current request joins the wait group instead of issuing a duplicate database query. When the first query completes, all waiting requests receive the result simultaneously. This coalescing reduces database load by up to 100x during stampede scenarios.

Three Redis shards hold the distributed keyspace. Each shard is an independent cache.r7g.xlarge instance (26GB memory) in a different availability zone for fault isolation: Shard1 in AZ-a, Shard2 in AZ-b, Shard3 in AZ-c. The consistent hash ring distributes keys with approximately 5% variance across shards (thanks to 150 virtual nodes per shard). Each shard operates independently — no cross-shard communication, no consensus protocol, no distributed locking.

OriginDB (PostgreSQL, db.r7g.xlarge) is the source of truth. At 92% cache hit rate with 100K peak RPS, OriginDB sees approximately 8K read ops/sec — well within its capacity. The database uses read replicas for cache-miss reads and reserves the primary for writes. The kv_store table is partitioned by key hash across 16 partitions for balanced I/O.

AdminStream (Amazon MSK/Kafka) carries administrative events: shard_added, shard_removed, warmup_requested, eviction_alert. WarmupWorker (5 ECS Fargate workers) consumes these events and pre-populates cache shards by scanning OriginDB for hot keys. This reduces cold-start miss storms from minutes to seconds after shard replacement or scaling events.

Architecture Preview
Loading architecture preview...
Request Flow — Consistent Hash Routing with Single-Flight Coalescing

This sequence diagram traces the cache read path through the consistent hash ring, showing both the hit path (fast, ~1ms) and the miss path with single-flight coalescing (prevents thundering herd). The critical insight is the coalescing: when 100 concurrent requests miss on the same key, only 1 database query is issued.

The second flow shows the warmup path: when a new shard is added, WarmupWorker pre-populates it with hot keys from OriginDB, reducing the cold-start miss storm from minutes to seconds.

Loading diagram...

Step-by-Step Walkthrough

  1. 1Client sends cache request. ApiGateway authenticates via mTLS (~2ms). MainLB routes to CacheProxyService (round-robin)
  2. 2CacheProxyService computes MurmurHash3(key), performs binary search on the 450-vnode ring, identifies the owning shard
  3. 3On cache hit (~92%): Redis returns value in sub-millisecond time. Total end-to-end: ~3ms including gateway + proxy hops
  4. 4On cache miss: CacheProxyService checks the single-flight map. If another request for this key is already querying the DB, join the wait group. Otherwise, initiate DB query
  5. 5Database query completes (~12ms). Result is written to the correct shard (backfill) and returned to ALL waiting callers simultaneously

Pseudocode

// CONSISTENT HASH ROUTING + SINGLE-FLIGHT
class CacheProxyService {
    ring: ConsistentHashRing;  // 450 virtual nodes
    singleFlight: Map<string, Promise<string>>;

    async get(key: string): Promise<string> {
        // Step 1: Determine target shard via consistent hash
        const shard = this.ring.getShard(murmurHash3(key));

        // Step 2: Check cache
        const cached = await shard.get(key);
        if (cached !== null) return cached;  // HIT

        // Step 3: Single-flight coalescing on miss
        if (this.singleFlight.has(key)) {
            return this.singleFlight.get(key);  // Join wait group
        }

        const promise = this.fetchAndBackfill(key, shard);
        this.singleFlight.set(key, promise);
        try {
            return await promise;
        } finally {
            this.singleFlight.delete(key);
        }
    }

    async fetchAndBackfill(key, shard): Promise<string> {
        const value = await originDB.query(
            "SELECT value FROM kv_store WHERE key = $1", [key]
        );
        await shard.set(key, value, "EX", 300);  // Backfill
        return value;
    }
}
Data Schema (Sharded Cache + Database)

The data model spans three layers: PostgreSQL (source of truth), 3 Redis shards (distributed cache), and Kafka (admin events). The consistent hash ring determines which shard holds each key. PostgreSQL is partitioned by key hash for balanced I/O.

The key observation is the ring-to-shard mapping: MurmurHash3(key) produces a 32-bit hash, which is mapped to a position on the consistent hash ring. The nearest virtual node clockwise determines the physical shard. With 150 virtual nodes per shard, distribution is balanced within 5% variance.

Loading diagram...

Step-by-Step Walkthrough

  1. 1PostgreSQL kv_store stores all authoritative data, partitioned by key hash across 16 partitions for balanced I/O
  2. 2Each Redis shard holds approximately 33% of the keyspace as determined by the consistent hash ring (MurmurHash3 + 450 virtual nodes)
  3. 3Keys are NOT replicated across shards — each key exists on exactly one shard. Shard failure loses ~33% of cached data
  4. 4Kafka cache-admin topic carries operational events (shard_added, warmup_requested) consumed by WarmupWorker for cold-start mitigation

Pseudocode

-- PostgreSQL: Source of truth (partitioned)
CREATE TABLE kv_store (
    key TEXT PRIMARY KEY,
    value TEXT NOT NULL,
    updated_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY HASH (key);

-- Consistent hash ring routing (pseudocode)
-- MurmurHash3("user:12345") = 0xA3F2B1C0
-- Ring lookup: nearest vnode clockwise → Shard 2
-- Redis Shard 2: SET "user:12345" "..." EX 300

-- Virtual node distribution:
-- Shard 1: 150 vnodes → owns hash ranges [0x00..., 0x15..., ...]
-- Shard 2: 150 vnodes → owns hash ranges [0x05..., 0x1A..., ...]
-- Shard 3: 150 vnodes → owns hash ranges [0x0B..., 0x20..., ...]
-- Total: 450 vnodes on the ring, ~5% distribution variance
Key Design Decisions
Consistent Hashing vs Modulo-N

Choice

Consistent hash ring with 150 virtual nodes per shard instead of key % N

Rationale

Modulo-N (key % N shards) re-hashes ALL keys when a shard is added or removed. With 3 shards holding 10M keys, adding a 4th shard causes ~7.5M keys to be re-assigned — a massive cache miss storm. Consistent hashing only re-hashes ~1/N of keys (2.5M in this case). With 150 virtual nodes per physical shard (450 total), the key distribution is balanced within 5% variance, and shard additions/removals cause minimal disruption. This is the same principle used by Amazon DynamoDB's partition ring.

Proxy-Based Routing vs Client-Side Hashing

Choice

CacheProxyService handles all routing decisions centrally

Rationale

Client-side hashing (Memcached's ketama model) pushes ring management to every application service. With hundreds of microservices, keeping hash rings in sync across all clients is operationally expensive — a ring configuration update must propagate to every client simultaneously, or keys are routed to wrong shards. CacheProxyService centralizes ring management, connection pooling, and observability. The trade-off is one extra network hop (~1ms) per request. Twitter's Twemproxy and Meta's mcrouter use this same centralized proxy model.

Single-Flight Coalescing

Choice

Proxy deduplicates concurrent cache misses for the same key

Rationale

When a popular key expires or is evicted, hundreds of concurrent requests may miss simultaneously (thundering herd). Without coalescing, all 100 requests hit the database independently — 100 identical SELECT queries for the same key. Single-flight ensures only 1 DB query per key per miss window. The other 99 requests join a wait group and receive the result when the first query completes. This reduces database load by up to 100x during stampede scenarios and is the single most important improvement over the naive V0 approach.

WarmupWorker for Cold Start Mitigation

Choice

Dedicated worker pre-populates shards on startup or replacement

Rationale

A cold cache shard (after restart or replacement) has 100% miss rate for its ~33% of the keyspace. At 100K peak RPS, that means ~33K additional database queries per second until the shard warms up. WarmupWorker pre-loads hot keys based on access frequency data from OriginDB, reducing the cold-start impact from minutes to seconds. This is critical for maintaining latency SLOs during deployments and scaling events.

Independent Shards (No Cross-Shard Communication)

Choice

Each shard operates independently with no consensus protocol

Rationale

Redis shards do not communicate with each other — no gossip protocol, no leader election, no distributed locking. This eliminates consensus overhead and makes each shard's failure independent. The trade-off is no cross-shard operations: MGET across shards requires the proxy to fan out to all shards and merge results (latency proportional to the slowest shard). Transactions (MULTI/EXEC) are limited to keys on a single shard.

Scale & Performance

Target RPS

100K peak (3 shards x ~33K each)

Latency (p99)

<1ms cache hit, ~20ms cache miss (with single-flight coalescing)

Storage

78 GB aggregate (3 x 26 GB shards)

Availability

99.5% (3 independent shards, no replication)

Time & Space Complexity
OperationTimeSpaceNotes
Consistent hash ring lookupO(log V) — binary search on V virtual nodesO(V) — V virtual node entries in the ringWith 450 virtual nodes, the binary search is negligible (~9 comparisons). The ring is stored in a sorted array and searched via binary search. MurmurHash3 computation is O(key_length).
Cache GET (via proxy + Redis)O(1) — hash ring lookup + Redis hash table GETO(1) — single key-value pairTotal latency: ~1ms (0.1ms hash ring + 0.5ms network to proxy + 0.3ms Redis GET + 0.1ms return). The proxy hop adds ~1ms compared to direct client-to-Redis connections.
Single-flight coalesced missO(1) amortized — one DB query serves N concurrent waitersO(N) — N pending requests in the wait groupThe first miss triggers a database query (~12ms). Concurrent misses for the same key join a wait group. When the query completes, all N waiters receive the result. The database sees 1 query instead of N — a 100x reduction for hot keys.
Shard rebalance (add/remove shard)O(K/N) — re-hash K/N keys where K is total keys and N is shard countO(K/N) — migrated keys temporarily exist on both old and new shardsAdding a 4th shard to 3 shards re-hashes ~25% of keys. With 10M keys, approximately 2.5M keys migrate. Migration happens organically: keys on the old shard expire (TTL) and are backfilled to the new shard on next access.
Database Schema (HLD)
kv_store (PostgreSQL / OriginDB)

Source of truth for all cached data. CacheProxyService queries this table on cache miss (with single-flight coalescing) and backfills the appropriate shard. At 92% hit rate with 100K peak RPS, OriginDB sees approximately 8K read ops/sec. Partitioned by key hash across 16 partitions.

key TEXT PKvalue TEXT NOT NULLupdated_at TIMESTAMPTZ DEFAULT now()

Indexes: PK on key (B-tree, partitioned)

Strong consistency — single primary handles writes, read replicas serve cache-miss reads. No TTL on the database side — data persists indefinitely until explicitly deleted. The database is the authoritative source; the cache is ephemeral.

CacheShard 1/2/3 (Redis)

Three independent Redis instances, each holding ~33% of the keyspace as determined by the consistent hash ring. 150 virtual nodes per shard (450 total) ensure balanced key distribution within 5% variance.

key STRING (Redis key, routed by MurmurHash3 + consistent hash ring)value STRING (cached value)TTL INTEGER (300 seconds default)

Indexes: Redis hash table (O(1) GET/SET per shard)

Each shard holds approximately 20GB of data (26GB capacity with headroom). LRU eviction kicks in under memory pressure. No replication — if a shard fails, its ~33% of the keyspace is lost. Single-flight coalescing on the proxy prevents thundering herd during shard replacement.

cache-admin (Kafka topic)

Administrative events for cache cluster management. Consumed by WarmupWorker for shard warmup and scaling operations.

event_type TEXT (shard_added | warmup_requested | eviction_alert)shard_id TEXT (target shard identifier)timestamp BIGINT

Indexes: Partitioned by shard_id (8 partitions)

Low throughput — admin events are infrequent (during deployments, scaling events, or node replacements). WarmupWorker consumes events and pre-populates the target shard by scanning OriginDB for hot keys.

Event Contracts
cache_admincache-admin

Administrative events for cache cluster management, published on shard additions, removals, and warmup triggers. Consumed by WarmupWorker.

Key Schema

shard_id (string)

Value Schema

{ event_type: "shard_added" | "warmup_requested" | "eviction_alert", shard_id: string, timestamp: number }

What-If Scenarios

One of three shards fails (hardware failure, OOM crash)

Impact

~33% of the keyspace becomes unavailable. The consistent hash ring is updated to remove the failed shard. Keys formerly on the failed shard are now assigned to the next shard clockwise, which initially has 0% hit rate for those keys. Database load increases by ~33K QPS at peak. Single-flight coalescing limits the actual DB query increase to unique keys only.

Mitigation

WarmupWorker detects the shard_removed event and pre-populates the replacement shard with hot keys from OriginDB. Single-flight coalescing on CacheProxyService prevents the thundering herd. The V2 variant eliminates this entirely with replica auto-promotion — no data loss, no miss storm.

Hot key overwhelms a single shard (viral content)

Impact

One key receives 50K+ ops/sec, concentrated on a single shard (consistent hashing maps the key to exactly one shard). That shard's CPU saturates while the other two shards are idle. p99 latency for all keys on the affected shard spikes from 1ms to 10ms+.

Mitigation

The V2 variant's L1 in-process cache solves this: CacheProxy detects hot keys (>1000 ops/sec) and broadcasts them to AppService pods for local caching. In V1, the proxy can replicate hot keys across multiple shards (key:{shard_suffix}) at the cost of storage duplication.

CacheProxyService crash (all pods unavailable)

Impact

Total cache outage — all cache operations fail because the proxy is the only path to the Redis shards. 100% of traffic falls through to OriginDB, exceeding its capacity by 12x (100K RPS vs 8K capacity).

Mitigation

Run CacheProxyService as a multi-AZ deployment with at least 4 pods (2 per AZ). Health check interval: 5 seconds. Auto-scaling based on CPU and request queue depth. Circuit breaker in upstream services to fail fast when the proxy is unavailable.

Network partition between CacheProxyService and one shard

Impact

Requests routed to the partitioned shard timeout (30 second TCP timeout). The proxy marks the shard as unhealthy and removes it from the ring. Keys are re-routed to the next shard (cold miss + DB fallback). When the partition heals, the shard is re-added, but its data is stale (writes during the partition were missed).

Mitigation

Reduce health check timeout to 2 seconds for faster detection. Implement retry with failover to the next shard clockwise on the ring. On partition heal, the WarmupWorker triggers a warmup to backfill any stale keys.

Failure Modes & Resilience
ComponentFailureImpactMitigation
CacheProxyServiceThread exhaustion from slow shard responsesIf one shard is slow (GC pause, network issue), proxy threads waiting for responses pile up, eventually exhausting all 1600 threads (8 pods x 200). All cache operations stall — healthy shards are inaccessible because the proxy has no available threads.Implement per-shard connection pools with timeout (500ms). Use bulkhead pattern: dedicate separate thread pools per shard so a slow shard cannot exhaust threads needed for healthy shards. Set per-shard circuit breaker: open after 5 consecutive timeouts, close after 30 seconds of healthy responses.
CacheShard (Redis)Memory exhaustion causing OOM with LRU evictionUnder allkeys-lru eviction, Redis evicts the least recently used keys to make room. Hit rate drops from 92% to 60-70% as frequently accessed keys are evicted. Database load increases proportionally.Monitor Redis memory utilization per shard. Alert at 70%, critical at 85%. Scale by adding a 4th shard — consistent hashing re-distributes ~25% of keys automatically. Alternatively, increase instance size to cache.r7g.2xlarge (52GB).
OriginDB (PostgreSQL)Connection pool exhaustion during cold-start miss stormWarmupWorker scanning OriginDB for hot keys (parallel SELECT queries) plus organic miss-and-backfill queries exceed the 2000 connection limit. New queries fail with 'too many connections' error.Rate-limit WarmupWorker's parallel scan threads (5 concurrent queries max). Use PgBouncer for connection pooling in transaction mode. Implement query priority: cache-miss backfills take priority over warmup scans.
AdminStream (Kafka)Broker failure causing partition unavailabilityWarmupWorker cannot consume admin events. New shard additions are not detected — warmup does not trigger, causing extended cold-start miss storms.MSK with 3-way replication (replication factor = 3). In-sync replicas = 2. WarmupWorker also polls CacheProxyService's ring membership as a fallback — if a new shard appears in the ring but no Kafka event was received, trigger warmup manually.
Scaling Strategy

Horizontal scaling via shard count increase (3 -> 4 -> 6 -> 12 shards). Each shard addition increases capacity by ~33K ops/sec and ~26GB memory. Consistent hashing ensures only ~1/N keys re-hash on shard addition. CacheProxyService auto-scales based on CPU and request queue depth: add pods when CPU > 70% for 3 minutes. WarmupWorker scales based on AdminStream consumer lag. The architecture scales linearly to ~1M ops/sec with 30 shards without architectural changes. Beyond 1M ops/sec, the CacheProxyService becomes the bottleneck — at that point, the V2 variant's L1 in-process cache absorbs the hot traffic before it reaches the proxy.

Monitoring & Alerting

Key metrics to monitor: (1) Per-shard hit rate — should be ~92% each. Significant variance between shards indicates hash ring imbalance. (2) Single-flight coalescing rate — percentage of cache misses served from wait groups instead of independent DB queries. Higher is better. (3) Ring membership changes — alerts on shard additions/removals for operational awareness. (4) Proxy latency per shard — detect slow shards before they cause thread exhaustion. (5) WarmupWorker completion time — should complete within 60 seconds for a new shard. (6) Rebalance key migration count — track how many keys are re-hashed during ring changes. Dashboard: Grafana with panels for per-shard hit rate, single-flight coalescing ratio, proxy latency histogram per shard, ring membership timeline, and DB fallback QPS. SLIs: cache hit p99 < 2ms, coalesced miss p99 < 25ms, hit rate > 90%, proxy availability > 99.9%.

Cost Analysis

At 100K ops/sec peak: 3 Redis shards cache.r7g.xlarge (~$525/month), PostgreSQL db.r7g.xlarge (~$350/month), ECS Fargate 8 proxy pods + 5 warmup workers (~$520/month), MSK kafka.m7g.large (~$200/month), ALB + API Gateway (~$200/month). Total: ~$1,800/month. Per-operation cost: $0.018/1K-ops — 3.8x cheaper than the naive V0 at scale ($0.069/1K-ops). Adding a 4th shard (+$175/month) increases capacity by 33% to 133K ops/sec. The architecture scales linearly in cost with throughput.

Security Considerations

Service-to-service authentication: mTLS between ApiGateway and CacheProxyService, and between CacheProxyService and Redis shards. Redis AUTH password required on all shard connections. Network isolation: Redis shards in private subnets, accessible only from CacheProxyService security group. No public endpoints for any cache component. TLS in-transit encryption between all components. Data sensitivity: the cache stores application data — if the source data contains PII, the cached copy inherits the same sensitivity classification. Implement data classification tags on cache entries if compliance requires it. Access control: CacheProxyService acts as the sole gateway to Redis — applications cannot bypass the proxy to access shards directly, ensuring all access is authenticated and auditable.

Deployment Strategy

Blue-green deployment for CacheProxyService — deploy new version alongside old, switch traffic at ALB level after health check passes. Rolling deployment for WarmupWorker (2 workers at a time). Redis shard upgrades require a brief maintenance window (~30 seconds for ElastiCache parameter group apply). Shard additions: deploy new shard, add to ring on CacheProxyService, trigger warmup via AdminStream event. No application deployment needed for shard scaling — the proxy handles all routing changes. Canary deployment for proxy code changes: route 5% of traffic to new version, monitor hit rate and latency, promote to 100% after 15 minutes of stable metrics.

Real-World Examples
  • Twitter's Twemproxy (nutcracker) implements exactly this architecture: a proxy layer with consistent hashing to route keys to backend Memcached/Redis shards, with single-flight coalescing for thundering herd prevention
  • Meta's mcrouter serves billions of cache operations per second using a consistent hash ring to route keys to thousands of Memcached instances across multiple data centers
  • Netflix's EVCache uses a consistent hashing approach with zone-aware routing to distribute cache operations across AWS availability zones
  • AWS ElastiCache for Memcached uses client-side consistent hashing (no proxy) with the ketama algorithm for key-to-node routing
Solution Comparison
VariantTierLatencyThroughputCostComplexityReliability
V0: Naive (Single Redis Node)T1<1ms hit, ~20ms miss~10K ops/sec$695/monthLow99% (single node)
V1: Consistent Hash ShardingT2<1ms hit, ~20ms miss100K ops/sec$1,800/monthMedium99.5% (3 shards, no replication)
V2: Redis Cluster + Replication + Multi-TierT3<100us L1 hit, <1ms L2 hit500K ops/sec$4,500/monthHigh99.99% (replicated, auto-failover)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
Why consistent hashing instead of modulo-N for cache sharding?

Modulo-N (key % N) re-hashes ALL keys when N changes. Adding a 4th shard to a 3-shard cluster causes 75% of keys to map to a different shard — a massive cache miss storm that overwhelms the database. Consistent hashing maps keys and shards onto a virtual ring. Adding a shard only re-hashes the keys between the new shard and its predecessor — approximately 1/N of the total keyspace (25% for 3->4 shards). Virtual nodes (150 per shard) ensure even distribution despite the randomness of hash placement. This is the same algorithm used by Amazon DynamoDB, Apache Cassandra, and Memcached's ketama driver.

What is single-flight coalescing and why is it critical?

Single-flight coalescing prevents the thundering herd problem. When a popular key expires or is evicted, hundreds of concurrent requests miss on the same key simultaneously. Without coalescing, each miss independently queries the database — 100 concurrent misses generate 100 identical queries. Single-flight tracks in-flight cache miss requests by key. When a second request misses on a key that already has an in-flight database query, it joins a wait group instead of issuing a duplicate query. When the first query completes, all waiters receive the result. This reduces database load by the concurrency factor — potentially 100x or more for hot keys. Golang's singleflight package implements this pattern natively.

How do virtual nodes improve consistent hashing distribution?

With only 3 physical nodes on the hash ring, key distribution can be highly uneven — one node might own 50% of the ring by chance. Virtual nodes solve this by placing 150 hash points per physical node (450 total points on the ring). Each virtual node maps to its physical node. With 450 evenly distributed points, each physical node owns approximately 33% +/- 5% of the keyspace. The more virtual nodes, the more even the distribution — but at the cost of more memory for the ring data structure and slightly slower ring lookups (binary search over 450 entries vs 3).

What happens when a shard fails in the consistent hashing model?

When a shard fails, its ~33% of the keyspace becomes unavailable. The CacheProxyService detects the failure via health checks (typically within 5-10 seconds) and removes the shard from the hash ring. The affected keys are now assigned to the next shard clockwise on the ring, which initially has 0% hit rate for those keys — they must be fetched from OriginDB and backfilled. Single-flight coalescing limits the database load during this transition. Unlike the naive V0 approach (100% cache loss on failure), only ~33% of keys are affected. The V2 variant eliminates this entirely with replicas that promote automatically.

Why a WarmupWorker instead of just letting the cache warm up organically?

Organic warmup (miss-and-backfill) works but is slow and dangerous. A cold shard sees 100% miss rate for its ~33% of the keyspace. At 100K peak RPS, that means ~33K additional database queries per second. Even with single-flight coalescing, the number of unique keys missing is large enough to stress the database. WarmupWorker pre-loads hot keys proactively: it queries OriginDB for the most frequently accessed keys (derived from access logs or a key frequency counter) and pre-populates the new shard. This reduces the organic warmup period from 10-30 minutes to under 60 seconds.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own Distributed Cache?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator