Cache

Storage

In-memory data store that accelerates reads by serving frequently accessed data without querying the primary database.

Overview

A Cache is a high-speed, in-memory data store that sits between your application services and the primary database, serving frequently accessed data with sub-millisecond latency instead of the 5–50ms latency of a database query. Caching is one of the most effective performance optimization techniques in distributed systems — a well-configured cache can reduce database load by 80–95% and cut average response latency by an order of magnitude. In Vetora's simulator, the Cache component models hit rates, eviction policies, and the critical interaction between cache and database under varying load patterns.

Cache-aside (lazy loading) is the most common caching pattern. The application first checks the cache for requested data. On a cache hit, it returns the cached value immediately. On a cache miss, it queries the database, stores the result in the cache for future requests, and returns it to the caller. This pattern is simple and widely used, but it has two challenges: the first request for any piece of data always hits the database (cold start), and cached data can become stale if the underlying database is updated without invalidating the cache.

Write-through caching writes data to both the cache and the database simultaneously on every write operation. This ensures the cache is always up-to-date but adds write latency (both writes must complete before confirming success). Write-behind (write-back) caching writes to the cache immediately and asynchronously writes to the database later, reducing write latency but risking data loss if the cache fails before the async write completes. Vetora lets you experiment with all three patterns and observe their effects on consistency, latency, and database load.

Eviction policies determine which cache entries are removed when the cache reaches its memory limit. LRU (Least Recently Used) evicts the entry that has not been accessed for the longest time — it works well for most workloads. LFU (Least Frequently Used) evicts the entry with the fewest total accesses, favoring popular items. TTL (Time-to-Live) based expiration removes entries after a fixed duration regardless of access patterns. Modern systems like Redis combine TTL with LRU/LFU for hybrid eviction.

The thundering herd problem occurs when a popular cache entry expires and hundreds of concurrent requests simultaneously miss the cache and hit the database. Solutions include request coalescing (only one request fetches from the database while others wait), probabilistic early expiration (randomly refreshing entries before they expire), and cache locking (a distributed lock prevents multiple fetchers). Cache warming — pre-populating the cache at startup or before peak hours — prevents cold-start stampedes.

When to Use

+Read-heavy workloads where the same data is requested repeatedly — cache hit rates above 80% provide dramatic latency improvement
+Database offloading to reduce query volume and connection pressure on the primary database
+Session storage for web application sessions, avoiding per-request database lookups
+Computed results that are expensive to generate — ML model predictions, aggregated analytics, rendered templates
+Rate limiter counters and distributed locks requiring low-latency atomic operations

Not Recommended

-Write-heavy workloads where data changes more often than it is read — cache invalidation overhead outweighs benefits
-Data that must always be perfectly consistent — caching inherently introduces a staleness window
-Large datasets that exceed available memory — caching only benefits hot data that fits in memory
-Unique or low-cardinality queries — if every request is for different data, cache hit rates will be near zero

Key Parameters in Vetora

Parameter	Description	Typical Values
memorySizeMB	Total memory allocated to the cache. Determines how much data can be stored before eviction begins.	256MB–64GB depending on dataset size
evictionPolicy	Algorithm for selecting entries to remove when memory is full: LRU, LFU, or random.	LRU for general workloads, LFU for skewed access patterns
ttlSeconds	Default time-to-live for cached entries. Balances data freshness against hit rate — longer TTLs increase hits but also staleness.	60–3600 seconds
readLatencyMs	Time to retrieve a cached value on a hit. In-memory caches achieve sub-millisecond reads.	0.1–1ms for local cache, 1–5ms for network cache (Redis)

Real-World Examples

Redis

The most popular in-memory data structure store, supporting strings, hashes, lists, sets, sorted sets, streams, and more. Used as a cache, message broker, and session store by millions of applications worldwide.

Memcached

Simple, high-performance distributed memory caching system. Focuses purely on key-value caching with multi-threaded architecture. Used by Facebook (now Meta) for caching billions of objects.

Application-Level Caches (Caffeine, Guava)

In-process caching libraries that avoid network overhead entirely. Caffeine (Java) provides near-optimal cache hit rates using the Window TinyLFU algorithm. Used when sub-microsecond cache access is required.

Frequently Asked Questions

What is caching in system design and why is it important?

Caching stores frequently accessed data in a high-speed memory layer (like Redis or Memcached) to avoid repeated expensive database queries. It is important because it reduces database load by 80–95%, cuts response latency from 5–50ms (database) to under 1ms (cache), and enables your system to handle significantly higher throughput. In system design interviews, caching is one of the first optimizations to discuss when addressing read-heavy workloads, high-traffic APIs, and database scaling limitations.

What is the difference between cache-aside, write-through, and write-behind?

Cache-aside (lazy loading): the application checks the cache first, fetches from the database on miss, and populates the cache — simple but can serve stale data. Write-through: writes go to both cache and database synchronously — always consistent but adds write latency. Write-behind (write-back): writes go to cache immediately, then asynchronously to the database — lowest write latency but risks data loss if the cache fails before the async write. Most systems use cache-aside for reads combined with explicit cache invalidation on writes.

What is the thundering herd problem in caching?

Thundering herd occurs when a popular cache entry expires and hundreds of concurrent requests simultaneously miss the cache, all hitting the database at once. This can overwhelm the database and cause cascading failures. Solutions include request coalescing (only one request fetches from DB while others wait for the result), cache locking (a distributed lock prevents parallel fetches), stale-while-revalidate (serve expired data while refreshing in the background), and probabilistic early expiration (randomly refresh before TTL expires to stagger expiration across requests).

How do you choose between LRU and LFU eviction policies?

LRU (Least Recently Used) evicts the entry not accessed for the longest time. It works well for workloads with temporal locality — recently accessed items are likely to be accessed again soon. LFU (Least Frequently Used) evicts the entry with the fewest total accesses, preserving popular items even if they have not been accessed recently. LFU is better for skewed workloads where a small set of items is consistently popular (product catalog top sellers, trending posts). Modern algorithms like W-TinyLFU (used in Caffeine) combine both approaches for near-optimal hit rates.

When should you use Redis vs. Memcached?

Use Redis when you need data structures beyond simple key-value (lists, sets, sorted sets, streams, Pub/Sub), persistence for cache durability, Lua scripting for atomic operations, or cluster mode for distributed deployment. Use Memcached when you need simple key-value caching with multi-threaded performance, minimal operational overhead, and the simplest possible interface. Redis is the default choice for most modern applications due to its versatility. Memcached excels in simple caching scenarios where its multi-threaded architecture provides higher throughput per node.

Related Components

DatabaseStorage

Persistent data store supporting SQL or NoSQL models with ACID transactions, replication, sharding, ...

ServiceCompute

Application server or microservice that processes requests, runs business logic, and communicates wi...

CDNTraffic

Content Delivery Network that caches and serves content from edge locations close to users, reducing...

Load BalancerTraffic

Distributes incoming traffic across multiple server instances using algorithms like round-robin, lea...

Try Cache in the Simulator

Build architectures with Cache and 13 other component types. Run discrete event simulations and get AI-powered feedback.

Open Playground