Vetora logo
🐎Caching

Cache Stampede (Thundering Herd)

A cache stampede occurs when a popular cache entry expires and hundreds of concurrent requests simultaneously miss the cache and hit the database. Solutions include mutex/locking, probabilistic early expiration, stale-while-revalidate, and single-flight patterns.

Overview

A cache stampede, also known as the thundering herd problem, is one of the most dangerous failure modes in caching systems. It occurs when a popular cache entry expires and the flood of requests that were previously served by the cache simultaneously fall through to the database. If a key serves 10,000 requests per second and its TTL expires, all 10,000 requests in the next second will query the database with the same query, potentially overwhelming it with identical work. The database, designed to handle perhaps 1,000 queries per second total, is suddenly hit with 10x its capacity for a single key, causing cascading latency increases, timeouts, and potentially a full database outage.

The stampede problem is exacerbated by several factors. First, the most popular cache entries are the ones most likely to cause stampedes -- they serve the most traffic, so their expiry creates the largest surge. Second, the database query to regenerate the cache entry is often the most expensive query (that is why it was cached in the first place), so the database is hit with thousands of copies of its most expensive query simultaneously. Third, the stampede can cascade: the slow database responses cause the application to hold connections longer, exhausting connection pools, which causes timeouts in unrelated queries, which causes other cache entries to fail to refresh, which triggers more stampedes.

Four primary solutions address cache stampedes. (1) Mutex/lock: when a cache miss occurs, the first request acquires a lock (using Redis SETNX or similar) and refreshes the cache. Subsequent requests see the lock and either wait or return a stale value. This limits database load to one query per key per refresh cycle. (2) Probabilistic early expiration (XFetch): each request randomly decides whether to refresh the cache before TTL expires, with the probability increasing as TTL approaches. Statistically, one request will refresh early, preventing a synchronized miss. (3) Stale-while-revalidate: the cache serves stale data immediately while triggering a background refresh, eliminating the stampede entirely because no request ever experiences a true miss. (4) External refresh: a background cron job or scheduled task refreshes popular cache entries before they expire, removing the application from the refresh path entirely.

The single-flight pattern (Go's singleflight package, Node.js promise caching) is a request-level deduplication mechanism that complements these solutions. When multiple concurrent requests need the same data, single-flight ensures only one request actually executes the database query, and all other requests wait for and share the result. This is different from a distributed lock because it operates within a single process -- it deduplicates concurrent requests on the same server instance. Combined with a distributed mutex for cross-server deduplication, single-flight provides comprehensive stampede protection at both the local and distributed levels. The quantitative impact is significant: without protection, a popular key at 10,000 RPS generates 10,000 database queries in the first 100ms after expiry. With single-flight + mutex, it generates exactly 1.

Key Points
  • 1A cache stampede occurs when a popular key expires and thousands of concurrent requests simultaneously miss the cache, overwhelming the database with identical queries. The most popular keys cause the worst stampedes.
  • 2Mutex/lock pattern: the first request to miss the cache acquires a distributed lock (Redis SETNX) and refreshes the entry. Other requests either wait for the lock to release or serve a stale fallback. This limits DB load to one query per key per refresh.
  • 3Probabilistic early expiration (XFetch): each request computes a random value and refreshes the cache if it exceeds a threshold that increases as TTL approaches zero. Statistically, one request refreshes early, preventing synchronized expiry.
  • 4Stale-while-revalidate eliminates stampedes entirely by serving stale data immediately and refreshing asynchronously. No request ever experiences a true cache miss, so there is no thundering herd to prevent.
  • 5Single-flight (Go singleflight, promise deduplication) ensures only one concurrent request per key per process actually queries the database. Others wait for and share the result. This deduplicates at the instance level; pair with a distributed mutex for cluster-wide protection.
  • 6Quantitative impact: a key serving 10,000 RPS with hard TTL generates 10,000 DB queries in the first 100ms after expiry. With single-flight + distributed mutex, it generates exactly 1 query. The difference between 10,000 and 1 is the difference between a database outage and a non-event.
Simple Example

The Coffee Shop Analogy

A coffee shop brews a fresh pot every hour. At exactly 9:00 AM, the pot runs empty. The next 50 customers all arrive within minutes and each one asks the barista to brew a new pot. Without stampede protection, the barista tries to brew 50 pots simultaneously and the kitchen melts down. With a mutex: the first customer triggers a new brew, and a sign goes up saying 'Fresh pot brewing, please wait 5 minutes.' With stale-while-revalidate: the barista serves slightly old coffee from a thermos while brewing a fresh pot in the background -- nobody waits. With XFetch: one random customer at 8:55 AM asks for the pot to be refreshed early, so it is ready by 9:00 AM.

Real-World Examples

Facebook (Memcached Lease-Get)

Facebook's Memcached deployment, serving billions of requests per day, implements stampede prevention using a 'lease-get' mechanism. When a cache miss occurs, Memcached issues a short-lived lease token to the first requester. Subsequent requesters for the same key within the lease window receive a 'hot miss' response and are expected to wait briefly or use a stale value. Only the lease holder can write the refreshed value. This limits database queries to one per key per lease window, preventing stampedes across Facebook's massive cache infrastructure.

Varnish (Grace Mode)

Varnish HTTP cache implements stampede prevention through 'grace mode' -- a stale-while-revalidate mechanism. When a cached response's TTL expires, Varnish continues serving the stale response (within the grace period) while dispatching a single background request to the origin server. Concurrent requests for the same URL receive the stale response immediately. This eliminates thundering herd on popular pages entirely. Grace periods are typically 1-24 hours, providing resilience even during extended origin outages.

Reddit

Reddit experienced severe cache stampedes on front page posts, where a single popular post's cache expiry would trigger thousands of simultaneous database queries to regenerate the comment tree, vote counts, and ranking data. The team implemented a combination of probabilistic early refresh and single-flight deduplication. Hot posts are refreshed proactively by a background job before TTL expiry, and the single-flight pattern ensures that concurrent requests within the same application instance share a single database query result.

Trade-Offs
AspectDescription
Protection Strength vs Implementation ComplexityStale-while-revalidate provides the strongest protection (zero stampede) but requires tracking two TTLs and implementing background refresh. A distributed mutex adds external coordination (Redis lock). Single-flight is simpler but only deduplicates within a single process. The strongest protection combines all three at different layers but adds significant complexity.
Data Freshness vs Stampede RiskHard TTL provides the freshest data (always fetched on expiry) but creates stampede windows. Stale-while-revalidate and XFetch serve slightly stale data during the refresh window. For most applications, serving data that is a few seconds stale is vastly preferable to a database outage from an unprotected stampede.
Lock Contention vs Duplicate WorkA mutex approach prevents duplicate work (only one request refreshes) but introduces lock contention and potential deadlocks. If the lock holder crashes or times out, other requests are blocked. XFetch avoids locks entirely by relying on probability, but there is a small chance that zero requests refresh early (leaving the stampede unprotected) or multiple requests refresh simultaneously (wasted work but not catastrophic).
Proactive vs Reactive RefreshReactive approaches (mutex, XFetch, single-flight) respond to cache misses as they occur. Proactive refresh (cron job, background warmer) prevents misses entirely by refreshing popular keys before they expire. Proactive refresh eliminates stampedes completely but requires maintaining a list of keys to refresh and adds always-on infrastructure cost even during low-traffic periods.
Case Study

Facebook's Memcached Lease-Get for Stampede Prevention at Scale

Scenario

Facebook operates one of the world's largest Memcached deployments, serving trillions of cache requests per day across hundreds of thousands of cache servers. A single popular cache key (e.g., a viral post's engagement data) could receive millions of reads per second. When such a key expired, the thundering herd of cache misses would send millions of identical queries to the backend storage, overwhelming MySQL clusters and causing cascading failures across the social graph infrastructure.

Solution

Facebook designed the 'lease-get' protocol, published in their landmark 2013 paper 'Scaling Memcache at Facebook.' When the first request misses a cache key, Memcached issues a 64-bit lease token to that requester and marks the key as 'pending.' Subsequent requests for the same key within a 10-second window receive a notification that a lease is outstanding, and they either wait briefly (with exponential backoff) or return a slightly stale value from a secondary cache tier. Only the lease holder can SET the refreshed value, and the lease automatically expires after 10 seconds if the holder fails. This effectively serializes cache refreshes per key without requiring application-level distributed locks.

Outcome

The lease-get mechanism reduced peak database load during cache stampedes by over 99%. A key receiving 1 million reads per second would generate exactly 1 database query on cache miss instead of 1 million. This enabled Facebook to operate with much smaller database capacity because the worst-case load spike was eliminated. The pattern also prevented a class of stale-set bugs where concurrent reads would race to set outdated values. The 2013 paper documenting this approach has become one of the most cited references in distributed systems caching.

Common Mistakes
  • Not protecting popular cache keys against stampedes at all. If you have any key serving more than 100 RPS, cache expiry without protection will create a burst that can overwhelm the database. Implement at least single-flight deduplication for high-traffic keys.
  • Using a distributed mutex without a timeout. If the lock holder crashes or hangs, other requests wait indefinitely. Always set a short lock TTL (e.g., 5-10 seconds) that automatically releases the lock if the holder fails to refresh and release it.
  • Implementing XFetch with incorrect probability parameters. The refresh probability should increase as TTL approaches zero, not be a flat percentage. The original XFetch paper provides the formula: refresh if random() < delta * beta * log(random()), where delta is recomputation time and beta is a tuning parameter.
  • Conflating single-flight with distributed locking. Single-flight deduplicates concurrent requests within a single process instance. If you have 100 server instances, a cache miss still generates up to 100 database queries (one per instance) without a distributed lock or lease mechanism. Use both: single-flight within each instance, and a distributed mutex across instances.
Related Concepts

See Cache Stampede (Thundering Herd) in action

Explore system design templates that use cache stampede (thundering herd) and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Watch cache stampede happen in real-time and test single-flight mitigation

Metrics to watch
cache_hit_ratedb_query_ratep99_latency_ms
Run Simulation
Test Your Understanding

1A cache key serving 5,000 requests per second expires with hard TTL and no stampede protection. What is the approximate number of database queries generated in the first second?

2How does the stale-while-revalidate pattern prevent cache stampedes?

3What is the difference between single-flight deduplication and a distributed mutex for stampede prevention?

Deeper Reading