Vetora logo
🗑️Caching

Eviction Policies: LRU, LFU, ARC, TinyLFU

Eviction policies determine which cache entries to remove when the cache is full. LRU, LFU, ARC, and TinyLFU each optimize for different access patterns, balancing hit rate, memory overhead, and implementation complexity.

Overview

Every cache has a finite size, and when it is full, adding a new entry requires evicting an existing one. The eviction policy -- the algorithm that decides which entry to remove -- has a profound impact on cache hit rate and, by extension, system performance. A poorly chosen eviction policy can reduce hit rates by 10-30% compared to the optimal policy for a given workload, translating directly into increased database load, higher latency, and reduced throughput.

LRU (Least Recently Used) is the most widely deployed eviction policy. It evicts the entry that has not been accessed for the longest time, betting that recently accessed data is more likely to be accessed again (temporal locality). LRU can be implemented in O(1) using a doubly-linked list (for ordering) combined with a hash map (for lookup). Every access moves the entry to the head of the list; eviction removes from the tail. The clock algorithm is an efficient LRU approximation used in operating system page replacement, using a circular buffer with reference bits instead of a linked list. Redis implements several LRU variants: allkeys-lru (evict any key), volatile-lru (evict only keys with TTL), and an approximate LRU that samples a subset of keys to reduce overhead.

LFU (Least Frequently Used) evicts the entry with the fewest accesses, making it ideal for workloads with skewed access patterns where a small set of keys receives the vast majority of requests. However, naive LFU has a significant problem: entries that were popular in the past but are no longer accessed accumulate high frequency counts and resist eviction, polluting the cache with stale hot data. Solutions include frequency decay (halving counts periodically) and windowed LFU (counting accesses within a recent time window). ARC (Adaptive Replacement Cache), developed by IBM Research, solves the LRU-vs-LFU choice by maintaining two LRU lists -- one for recently accessed entries and one for frequently accessed entries -- and dynamically adjusting the balance between them based on observed access patterns.

TinyLFU, the algorithm behind the Caffeine caching library (the standard JVM cache), represents the state of the art. It uses a Count-Min Sketch -- a probabilistic data structure -- as an admission filter that estimates access frequency with minimal memory overhead. When a new entry arrives, TinyLFU compares its estimated frequency against the entry it would evict; the new entry is admitted only if it is predicted to be accessed more often. Window-TinyLFU (W-TinyLFU) adds a small admission window that uses LRU to handle burst access patterns, combining the strengths of both LRU and LFU. Benchmarks consistently show Caffeine's W-TinyLFU achieving near-optimal hit rates across diverse workloads, outperforming pure LRU by 5-15% and pure LFU by similar margins.

Key Points
  • 1LRU (Least Recently Used) evicts the entry not accessed for the longest time. Implemented in O(1) with a doubly-linked list + hash map. Simple, effective for temporal locality, and the default choice for most caches.
  • 2LFU (Least Frequently Used) evicts the entry with the fewest accesses. Better than LRU for skewed workloads (few hot keys, many cold keys) but requires frequency decay to avoid cache pollution from historically popular but now-cold entries.
  • 3ARC (Adaptive Replacement Cache) maintains two LRU lists and self-tunes the balance between recency and frequency based on observed access patterns. It adapts automatically to workload changes without manual configuration.
  • 4TinyLFU uses a Count-Min Sketch to estimate access frequency with minimal memory overhead (8 bits per entry). It admits new entries only if they are predicted to be more valuable than the entry they would evict, achieving near-optimal hit rates.
  • 5Segmented LRU divides the cache into hot, warm, and cold segments. Entries are promoted through segments on access and evicted from the cold segment. This provides better hit rates than simple LRU for mixed workloads.
  • 6The clock algorithm approximates LRU using a circular buffer with reference bits, avoiding the overhead of maintaining a linked list. It is the standard page replacement algorithm in operating systems like Linux and Windows.
Simple Example

The Bookshelf Analogy

You have a small bookshelf (cache) that holds 10 books, but you own 100 books stored in a closet (database). When the shelf is full and you want to add a new book, you need an eviction policy. LRU: remove the book you have not touched in the longest time. LFU: remove the book you have read the fewest times total. ARC: keep track of which strategy would have been better recently and adjust. TinyLFU: before putting a new book on the shelf, check if you are likely to read it more often than the book it would replace -- if not, leave the shelf as is.

Real-World Examples

Redis

Redis supports six eviction policies: noeviction (return errors when full), allkeys-lru, volatile-lru, allkeys-lfu, volatile-lfu, allkeys-random, and volatile-random. Redis uses an approximate LRU/LFU that samples a configurable number of keys (default 5) per eviction, trading perfect ordering for significantly lower overhead. The maxmemory-samples parameter controls the sample size -- increasing it improves accuracy but adds CPU cost per eviction.

Caffeine (W-TinyLFU)

Caffeine is the standard JVM caching library, used by Spring, Micronaut, and hundreds of production Java applications. Its W-TinyLFU policy maintains a small admission window (1% of cache size, LRU) and a main space (99%, segmented LRU). New entries enter the window; on window eviction, they compete against the main space victim using a Count-Min Sketch frequency estimate. Benchmarks show Caffeine achieving 3-15% higher hit rates than Guava Cache (LRU) across diverse workloads.

Memcached

Memcached uses a slab-class LRU eviction policy. Memory is divided into slab classes of different sizes, and each slab class maintains its own LRU list. When a slab class is full, the least recently used item in that class is evicted. This means a large item can only be evicted by another large item, preventing small hot items from evicting large cold items of different sizes. Modern Memcached adds segmented LRU within each slab class (HOT, WARM, COLD queues).

Trade-Offs
AspectDescription
Hit Rate vs Implementation ComplexitySimple LRU provides good hit rates with O(1) operations and minimal code complexity. TinyLFU achieves 5-15% higher hit rates but requires a Count-Min Sketch, admission logic, and segment management. For most applications, the marginal hit rate improvement from advanced policies does not justify the implementation complexity -- use a library like Caffeine instead of building from scratch.
Memory Overhead per EntryLRU requires two pointers per entry (doubly-linked list) plus a hash map entry. LFU adds a frequency counter. ARC doubles the metadata by maintaining two lists plus ghost entries. TinyLFU's Count-Min Sketch adds a fixed overhead (typically 8 bits per tracked entry) but scales better because the sketch size is independent of the cache size.
Adaptability vs PredictabilityARC and TinyLFU adapt automatically to workload changes, which is powerful but can make cache behavior harder to predict and debug. LRU's behavior is deterministic and easy to reason about: the least recently used entry is always evicted. For debugging cache issues, LRU's simplicity is an advantage.
Scan ResistanceLRU is vulnerable to scan pollution: a single full-table scan can flush the entire cache of hot entries. LFU and TinyLFU are scan-resistant because a one-time access does not give an entry enough frequency to displace frequently accessed entries. ARC handles scans by detecting the pattern and shifting toward frequency-based eviction.
Case Study

Caffeine's W-TinyLFU Replacing Guava Cache at a Large SaaS Platform

Scenario

A SaaS platform serving 50 million API requests per day used Guava Cache (LRU eviction) as its in-process L1 cache. Cache hit rates averaged 72%, with significant variance across workloads. Some API endpoints had highly skewed access patterns (few hot keys, many cold keys) where LRU performed poorly because cold keys accessed in bursts evicted hot keys, only to be evicted themselves shortly after.

Solution

The team replaced Guava Cache with Caffeine across all services -- a drop-in replacement requiring only a dependency swap and minor API changes. Caffeine's W-TinyLFU policy used a Count-Min Sketch to estimate access frequency, admitting new entries only if they were predicted to be accessed more frequently than the entry they would evict. The admission window handled burst patterns that pure LFU would have missed.

Outcome

Cache hit rates increased from 72% to 84% across all services, with some skewed-access endpoints seeing improvements from 60% to 88%. This 12-percentage-point average improvement reduced database query volume by approximately 40%, cutting database CPU utilization from 75% to 45%. The migration required no changes to business logic -- only the cache library was swapped. Caffeine's throughput was also higher than Guava's due to lock-free concurrent data structures.

Common Mistakes
  • Defaulting to LRU without considering the workload's access pattern. LRU performs poorly for skewed workloads (Zipfian distribution) where LFU or TinyLFU would achieve significantly higher hit rates. Profile your cache miss patterns before choosing a policy.
  • Implementing LRU with a simple sorted array or priority queue instead of a doubly-linked list + hash map. Array-based LRU requires O(n) operations per access, which becomes a bottleneck at scale. The linked list + hash map combination provides O(1) for both access and eviction.
  • Ignoring the scan pollution problem. A single batch job that iterates over the entire dataset can flush a warm LRU cache, degrading hit rates for hours. Use segmented LRU, LFU, or TinyLFU for workloads that mix point queries with scans.
  • Using time-based eviction (TTL) as a substitute for size-based eviction. TTL controls data freshness; eviction policies control memory usage. A cache needs both: TTL to prevent stale data and an eviction policy to prevent out-of-memory conditions.
Related Concepts

See Eviction Policies: LRU, LFU, ARC, TinyLFU in action

Explore system design templates that use eviction policies: lru, lfu, arc, tinylfu and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Compare LRU vs LFU eviction under varying workloads

Metrics to watch
cache_hit_ratioeviction_ratememory_usage_mbp99_latency_ms
Run Simulation
Test Your Understanding

1What data structures are used to implement LRU cache with O(1) time complexity for both access and eviction?

2Why is LRU vulnerable to cache pollution from full-table scans?

3What advantage does TinyLFU's Count-Min Sketch provide over traditional LFU frequency counters?

Deeper Reading