Vetora logo
Caching

TTL Strategies (Hard, Soft, Jittered)

Time-to-live (TTL) determines when cached entries expire. Hard TTL removes entries immediately at expiry, soft TTL serves stale data while refreshing in the background, and jittered TTL adds randomness to prevent synchronized expiration stampedes.

Overview

Time-to-live (TTL) is the fundamental mechanism for controlling data freshness in caches. Every cached entry has a lifespan -- when that lifespan expires, the entry must be refreshed from the source of truth. The TTL value directly controls the trade-off between data freshness and cache efficiency: short TTLs ensure data is never far from current but reduce hit rates and increase database load; long TTLs maximize hit rates but allow stale data to persist. Choosing the right TTL and expiration strategy is one of the most impactful decisions in caching architecture.

Hard TTL is the simplest strategy: when an entry's TTL expires, it is immediately deleted (or marked for lazy deletion on the next access). The next request for that key results in a cache miss, triggering a fresh fetch from the database. Hard TTL is easy to implement and reason about, but it creates a predictable vulnerability: when a popular cache entry expires, all concurrent requests for that key simultaneously miss the cache and hit the database. This is the thundering herd problem, and it is particularly dangerous for entries that were set at the same time (e.g., during cache warming) because they all expire simultaneously.

Soft TTL, also known as stale-while-revalidate, addresses the thundering herd by separating the concepts of 'stale' and 'expired.' An entry has two timestamps: a soft TTL (when it becomes stale) and a hard TTL (when it is truly expired). When a request arrives for a stale-but-not-expired entry, the cache serves the stale data immediately (fast response) and triggers an asynchronous background refresh. The next request gets the refreshed data. HTTP cache-control headers support this natively via the stale-while-revalidate directive. This pattern eliminates the cache miss latency penalty for popular keys because the user never waits for a refresh -- they always get a response immediately.

Jittered TTL adds random variance to expiration times to prevent synchronized expiry. Instead of setting TTL=300s for all entries, you set TTL=300s + random(-30s, +30s). This spreads expirations over a 60-second window instead of having all entries expire at exactly the same moment. Jitter is critical when cache warming or bulk loading populates many entries simultaneously -- without jitter, all those entries expire at once, creating a stampede. CDNs like Cloudflare use tiered TTLs with jitter across their caching layers: edge caches use shorter TTLs (60s) with jitter, regional tiers use medium TTLs (300s), and origin shields use longer TTLs (3600s). Sliding TTL is another variant that resets the expiration timer on every access, keeping hot entries alive indefinitely while cold entries naturally expire.

Key Points
  • 1Hard TTL deletes cache entries at a fixed time after creation. Simple to implement and reason about, but vulnerable to thundering herd when popular entries expire and all requests simultaneously hit the database.
  • 2Soft TTL (stale-while-revalidate) serves stale data immediately while refreshing in the background. The user never experiences cache miss latency for stale entries, and the cache is refreshed asynchronously without a stampede.
  • 3Jittered TTL adds random variance (e.g., +/- 10%) to prevent synchronized expiry. Essential when many entries are created simultaneously (cache warming, bulk loads) to spread expirations over time.
  • 4TTL selection requires balancing freshness vs hit rate. A 60-second TTL on a key accessed 100 times/sec means 6,000 hits per TTL window but data could be up to 60 seconds stale. A 5-second TTL reduces staleness but yields only 500 hits per window.
  • 5Per-key TTL allows different freshness requirements for different data types. User session tokens might use 30-minute TTL, product prices 5-minute TTL, and static metadata 24-hour TTL, each matching the data's change frequency.
  • 6Sliding TTL (reset on access) keeps hot entries alive indefinitely, which maximizes hit rate for popular keys but risks serving increasingly stale data. Best combined with a maximum TTL cap to bound staleness.
Simple Example

The Milk Expiration Date

Think of TTL like expiration dates on milk cartons. Hard TTL: the milk is thrown away exactly on the expiration date, even if it is still fine. If everyone in the office bought milk on the same day, all the milk expires at once, and everyone rushes to the store simultaneously (thundering herd). Soft TTL: the milk is labeled 'best by' (soft TTL) and 'discard by' (hard TTL). After the 'best by' date, you still drink the milk but ask someone to pick up a fresh carton (background refresh). Jittered TTL: instead of all cartons showing the same date, each one has a slightly different expiration, so the office never runs out of all milk at once.

Real-World Examples

CDN Cache-Control Headers

HTTP cache-control headers implement TTL strategies natively. max-age=300 sets a hard TTL of 300 seconds. stale-while-revalidate=60 adds a 60-second soft TTL window after max-age expires, during which stale content is served while the CDN fetches fresh content from the origin. stale-if-error=3600 allows serving stale content for up to an hour if the origin is unreachable. These headers cascade across browser cache, CDN edge, and CDN origin shield.

Redis EXPIRE Command

Redis supports per-key TTL via the EXPIRE (seconds) and PEXPIRE (milliseconds) commands. Redis uses lazy expiration (entries are checked on access) combined with periodic active expiration (a background task samples expired keys 10 times per second). The combination ensures expired keys are eventually cleaned up without blocking the event loop. Redis does not natively support stale-while-revalidate, so application code must implement this pattern explicitly.

Cloudflare

Cloudflare uses tiered TTLs across its CDN infrastructure. Edge data centers (close to users) cache with short TTLs (30-60 seconds) to minimize staleness. Regional tiers (shared across nearby edges) use medium TTLs (300 seconds). Origin shields (closest to the customer's origin server) use long TTLs (3600 seconds). Each tier adds jitter to prevent synchronized expiry. This tiered approach reduces origin traffic by 99%+ while keeping edge content relatively fresh.

Trade-Offs
AspectDescription
Data Freshness vs Cache Hit RateThe fundamental TTL trade-off. Short TTLs (5-30 seconds) keep data nearly current but result in frequent cache misses, increasing database load. Long TTLs (5-60 minutes) maximize hit rates but allow data to be significantly stale. The optimal TTL depends on how frequently the underlying data changes and how stale the application can tolerate.
Simplicity (Hard TTL) vs Resilience (Soft TTL)Hard TTL is trivial to implement but creates predictable stampede points. Soft TTL (stale-while-revalidate) eliminates stampedes and cache miss latency but adds complexity: the cache must track two timestamps, trigger background refreshes, and handle concurrent refresh requests. Most applications benefit from soft TTL on high-traffic keys.
Uniform TTL vs Per-Key TTLA global TTL is simple to configure and reason about but applies the same freshness requirement to all data. Per-key TTL matches TTL to each data type's change frequency but increases configuration complexity and makes cache behavior harder to predict. A good middle ground is per-category TTL (e.g., user data: 5 min, product data: 1 min, static content: 24 hr).
Jitter Overhead vs Stampede PreventionJitter adds a small amount of randomness to TTL values, which slightly complicates debugging (you cannot predict exact expiry times) and marginally reduces average hit rates (some entries expire earlier than necessary). But it prevents potentially catastrophic synchronized expiry storms, making it a worthwhile trade-off for any cache with more than a few hundred entries.
Case Study

Cloudflare's Tiered TTL Strategy for Global Content Delivery

Scenario

Cloudflare serves over 20% of all web traffic through its global CDN. A single origin server might serve content to 200+ edge data centers worldwide. Without careful TTL management, every edge cache expiring simultaneously would create a coordinated stampede of 200+ requests hitting the customer's origin server at the same instant, potentially overwhelming it.

Solution

Cloudflare implemented a tiered caching architecture with cascading TTLs and jitter at every layer. Edge caches use short TTLs (30-60 seconds) with +/- 15% jitter, ensuring edge content stays relatively fresh while spreading expirations. Regional tiers aggregate requests from multiple edges with medium TTLs (5 minutes). Origin shields, the last layer before the customer's server, use long TTLs (1 hour). When an edge cache misses, it checks the regional tier before the origin shield, and only on a shield miss does the request reach the customer's origin. Each tier supports stale-while-revalidate to serve stale content during background refresh.

Outcome

The tiered TTL strategy reduces origin traffic by over 99% for popular content. Origin servers that would receive 200 simultaneous revalidation requests (one per edge) instead receive 1-3 requests per TTL window (aggregated through regional tiers). Jitter ensures that even bulk cache purges (e.g., after a deployment) do not create synchronized expiry storms. The stale-while-revalidate behavior means users never experience the latency of a cold cache miss -- they always get a response in single-digit milliseconds from the nearest edge.

Common Mistakes
  • Using the same TTL for all data types. User session tokens, product prices, and static images have very different change frequencies and freshness requirements. Apply per-category TTLs that match each data type's update pattern.
  • Setting TTL to exactly round numbers (60s, 300s, 3600s) without jitter. When cache warming or bulk operations populate many entries with the same TTL, they all expire at the same moment. Always add jitter (+/- 10-20%) to prevent synchronized expiry.
  • Not implementing stale-while-revalidate for high-traffic keys. When a popular key with hard TTL expires, all concurrent requests miss the cache and hit the database simultaneously. Soft TTL eliminates this stampede by serving stale data during background refresh.
  • Setting TTL too short out of fear of stale data. A 5-second TTL on a key accessed 1,000 times/sec means 5,000 hits per TTL window. Changing to 1 second quadruples database load for only marginally fresher data. Measure the actual impact of staleness before shortening TTL.
Related Concepts

See TTL Strategies (Hard, Soft, Jittered) in action

Explore system design templates that use ttl strategies (hard, soft, jittered) and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Watch TTL expiry impact cache hit ratio over time

Metrics to watch
cache_hit_ratiostale_read_pctrefresh_ratememory_usage_mb
Run Simulation
Test Your Understanding

1What problem does jittered TTL solve that uniform TTL does not?

2How does stale-while-revalidate differ from hard TTL expiration?

3A cache entry has a 60-second TTL and receives 500 requests per second. How many cache hits occur per TTL window before the entry expires?

Deeper Reading