1What problem does jittered TTL solve that uniform TTL does not?
Time-to-live (TTL) determines when cached entries expire. Hard TTL removes entries immediately at expiry, soft TTL serves stale data while refreshing in the background, and jittered TTL adds randomness to prevent synchronized expiration stampedes.
Time-to-live (TTL) is the fundamental mechanism for controlling data freshness in caches. Every cached entry has a lifespan -- when that lifespan expires, the entry must be refreshed from the source of truth. The TTL value directly controls the trade-off between data freshness and cache efficiency: short TTLs ensure data is never far from current but reduce hit rates and increase database load; long TTLs maximize hit rates but allow stale data to persist. Choosing the right TTL and expiration strategy is one of the most impactful decisions in caching architecture.
Hard TTL is the simplest strategy: when an entry's TTL expires, it is immediately deleted (or marked for lazy deletion on the next access). The next request for that key results in a cache miss, triggering a fresh fetch from the database. Hard TTL is easy to implement and reason about, but it creates a predictable vulnerability: when a popular cache entry expires, all concurrent requests for that key simultaneously miss the cache and hit the database. This is the thundering herd problem, and it is particularly dangerous for entries that were set at the same time (e.g., during cache warming) because they all expire simultaneously.
Soft TTL, also known as stale-while-revalidate, addresses the thundering herd by separating the concepts of 'stale' and 'expired.' An entry has two timestamps: a soft TTL (when it becomes stale) and a hard TTL (when it is truly expired). When a request arrives for a stale-but-not-expired entry, the cache serves the stale data immediately (fast response) and triggers an asynchronous background refresh. The next request gets the refreshed data. HTTP cache-control headers support this natively via the stale-while-revalidate directive. This pattern eliminates the cache miss latency penalty for popular keys because the user never waits for a refresh -- they always get a response immediately.
Jittered TTL adds random variance to expiration times to prevent synchronized expiry. Instead of setting TTL=300s for all entries, you set TTL=300s + random(-30s, +30s). This spreads expirations over a 60-second window instead of having all entries expire at exactly the same moment. Jitter is critical when cache warming or bulk loading populates many entries simultaneously -- without jitter, all those entries expire at once, creating a stampede. CDNs like Cloudflare use tiered TTLs with jitter across their caching layers: edge caches use shorter TTLs (60s) with jitter, regional tiers use medium TTLs (300s), and origin shields use longer TTLs (3600s). Sliding TTL is another variant that resets the expiration timer on every access, keeping hot entries alive indefinitely while cold entries naturally expire.
The Milk Expiration Date
Think of TTL like expiration dates on milk cartons. Hard TTL: the milk is thrown away exactly on the expiration date, even if it is still fine. If everyone in the office bought milk on the same day, all the milk expires at once, and everyone rushes to the store simultaneously (thundering herd). Soft TTL: the milk is labeled 'best by' (soft TTL) and 'discard by' (hard TTL). After the 'best by' date, you still drink the milk but ask someone to pick up a fresh carton (background refresh). Jittered TTL: instead of all cartons showing the same date, each one has a slightly different expiration, so the office never runs out of all milk at once.
CDN Cache-Control Headers
HTTP cache-control headers implement TTL strategies natively. max-age=300 sets a hard TTL of 300 seconds. stale-while-revalidate=60 adds a 60-second soft TTL window after max-age expires, during which stale content is served while the CDN fetches fresh content from the origin. stale-if-error=3600 allows serving stale content for up to an hour if the origin is unreachable. These headers cascade across browser cache, CDN edge, and CDN origin shield.
Redis EXPIRE Command
Redis supports per-key TTL via the EXPIRE (seconds) and PEXPIRE (milliseconds) commands. Redis uses lazy expiration (entries are checked on access) combined with periodic active expiration (a background task samples expired keys 10 times per second). The combination ensures expired keys are eventually cleaned up without blocking the event loop. Redis does not natively support stale-while-revalidate, so application code must implement this pattern explicitly.
Cloudflare
Cloudflare uses tiered TTLs across its CDN infrastructure. Edge data centers (close to users) cache with short TTLs (30-60 seconds) to minimize staleness. Regional tiers (shared across nearby edges) use medium TTLs (300 seconds). Origin shields (closest to the customer's origin server) use long TTLs (3600 seconds). Each tier adds jitter to prevent synchronized expiry. This tiered approach reduces origin traffic by 99%+ while keeping edge content relatively fresh.
| Aspect | Description |
|---|---|
| Data Freshness vs Cache Hit Rate | The fundamental TTL trade-off. Short TTLs (5-30 seconds) keep data nearly current but result in frequent cache misses, increasing database load. Long TTLs (5-60 minutes) maximize hit rates but allow data to be significantly stale. The optimal TTL depends on how frequently the underlying data changes and how stale the application can tolerate. |
| Simplicity (Hard TTL) vs Resilience (Soft TTL) | Hard TTL is trivial to implement but creates predictable stampede points. Soft TTL (stale-while-revalidate) eliminates stampedes and cache miss latency but adds complexity: the cache must track two timestamps, trigger background refreshes, and handle concurrent refresh requests. Most applications benefit from soft TTL on high-traffic keys. |
| Uniform TTL vs Per-Key TTL | A global TTL is simple to configure and reason about but applies the same freshness requirement to all data. Per-key TTL matches TTL to each data type's change frequency but increases configuration complexity and makes cache behavior harder to predict. A good middle ground is per-category TTL (e.g., user data: 5 min, product data: 1 min, static content: 24 hr). |
| Jitter Overhead vs Stampede Prevention | Jitter adds a small amount of randomness to TTL values, which slightly complicates debugging (you cannot predict exact expiry times) and marginally reduces average hit rates (some entries expire earlier than necessary). But it prevents potentially catastrophic synchronized expiry storms, making it a worthwhile trade-off for any cache with more than a few hundred entries. |
Cloudflare's Tiered TTL Strategy for Global Content Delivery
Scenario
Cloudflare serves over 20% of all web traffic through its global CDN. A single origin server might serve content to 200+ edge data centers worldwide. Without careful TTL management, every edge cache expiring simultaneously would create a coordinated stampede of 200+ requests hitting the customer's origin server at the same instant, potentially overwhelming it.
Solution
Cloudflare implemented a tiered caching architecture with cascading TTLs and jitter at every layer. Edge caches use short TTLs (30-60 seconds) with +/- 15% jitter, ensuring edge content stays relatively fresh while spreading expirations. Regional tiers aggregate requests from multiple edges with medium TTLs (5 minutes). Origin shields, the last layer before the customer's server, use long TTLs (1 hour). When an edge cache misses, it checks the regional tier before the origin shield, and only on a shield miss does the request reach the customer's origin. Each tier supports stale-while-revalidate to serve stale content during background refresh.
Outcome
The tiered TTL strategy reduces origin traffic by over 99% for popular content. Origin servers that would receive 200 simultaneous revalidation requests (one per edge) instead receive 1-3 requests per TTL window (aggregated through regional tiers). Jitter ensures that even bulk cache purges (e.g., after a deployment) do not create synchronized expiry storms. The stale-while-revalidate behavior means users never experience the latency of a cold cache miss -- they always get a response in single-digit milliseconds from the nearest edge.
See TTL Strategies (Hard, Soft, Jittered) in action
Explore system design templates that use ttl strategies (hard, soft, jittered) and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What problem does jittered TTL solve that uniform TTL does not?
2How does stale-while-revalidate differ from hard TTL expiration?
3A cache entry has a 60-second TTL and receives 500 requests per second. How many cache hits occur per TTL window before the entry expires?