Vetora logo
Caching

Write-Back (Write-Behind) Cache

Write-back caching writes to the cache only and asynchronously flushes to the database in batches. It provides the lowest write latency of any caching pattern but risks data loss if the cache crashes before flushing. Ideal for write-heavy workloads tolerant of small data loss windows.

Overview

Write-back caching, also known as write-behind, inverts the write-through approach by writing to the cache only and deferring the database write to a later, asynchronous process. When the application writes data, the cache accepts the write and immediately returns success. A background process then flushes dirty cache entries to the database in batches, typically on a timer (e.g., every 5 seconds) or when a batch size threshold is reached. This decoupling of the write acknowledgment from the database persistence provides the lowest possible write latency of any caching pattern.

The performance benefits of write-back are substantial. Because the application only waits for the cache write (typically sub-millisecond for Redis), write operations complete 5-10x faster than write-through or direct database writes. Batch coalescing further reduces database load: if the same key is updated 100 times in a 5-second window, only the final value needs to be written to the database, reducing 100 database writes to 1. For write-heavy workloads like gaming leaderboards, IoT telemetry ingestion, or real-time analytics counters, this coalescing effect can reduce database write volume by 90% or more.

The fundamental risk of write-back caching is data loss. If the cache crashes or loses power before dirty entries are flushed to the database, those writes are lost permanently. This is the same trade-off that CPU caches and operating system page caches make -- they accept a small data loss window in exchange for dramatically better write performance. Mitigation strategies include write-ahead logging (WAL) in the cache layer, replication across cache nodes, and reducing the flush interval to minimize the loss window. Redis, for example, supports AOF (Append-Only File) persistence that logs every write to disk before acknowledging it, providing durability at the cost of some latency.

Write-back also introduces ordering complexity. If the application writes key A then key B, the background flush might write B before A if they are batched differently. For workloads where ordering matters (event sourcing, audit logs), the flush process must maintain causal ordering or use sequence numbers. Additionally, read-your-own-writes consistency is naturally provided (the latest value is always in the cache), but other clients reading from the database may see stale data until the flush completes. Despite these complexities, write-back is the right choice for workloads where write throughput and latency dominate the requirements and small data loss windows are acceptable.

Key Points
  • 1Write-back writes to the cache only and returns success immediately. The database write happens asynchronously in a background process, providing sub-millisecond write latency regardless of database performance.
  • 2Batch coalescing is a major advantage: multiple writes to the same key within a flush window are merged into a single database write. This can reduce database write volume by 90%+ for hot keys that are updated frequently.
  • 3Data loss risk is the primary trade-off. If the cache crashes before dirty entries are flushed to the database, those writes are permanently lost. The maximum data loss window equals the flush interval (e.g., 5 seconds).
  • 4Durability can be improved with write-ahead logging in the cache (Redis AOF, persistent queues) and replication across cache nodes, but these mitigations add latency that partially erodes the write-back performance advantage.
  • 5Write ordering is not guaranteed by default. If ordering matters, the flush process must maintain sequence numbers or use ordered queues to ensure writes reach the database in the correct order.
  • 6Read-your-own-writes is naturally consistent because the latest value is always in the cache. However, readers querying the database directly will see stale data until the next flush completes.
Simple Example

The Sticky Note Board

Imagine an office where instead of filing every document immediately, you stick it on a board (cache). A clerk comes by every 5 minutes to collect all the notes from the board and file them in the cabinet (database). Writing a note is instant -- you just stick it on the board. If five people update the same note in 5 minutes, the clerk only files the final version, saving 4 trips to the cabinet. But if the board falls off the wall before the clerk arrives, all unfiled notes are lost. This is write-back caching: blazingly fast writes with a small risk of data loss.

Real-World Examples

Linux Page Cache

The Linux kernel uses write-back for its page cache (dirty page writeback). When an application writes to a file, the kernel marks the page as dirty in memory and returns immediately. The pdflush daemon periodically writes dirty pages to disk, typically every 5 seconds or when dirty pages exceed a threshold. This is why an unexpected power loss can lose recent file writes. The vm.dirty_writeback_centisecs and vm.dirty_expire_centisecs kernel parameters control the flush interval and dirty page lifetime.

CPU L1/L2 Caches

Modern CPU caches (L1, L2) use write-back by default. When the CPU writes to a cache line, the line is marked dirty and the write is not propagated to main memory immediately. The cache line is written back to RAM only when it is evicted (due to capacity or conflict). This provides single-cycle write latency instead of the 100+ cycle latency of a main memory write. The MESI protocol ensures coherence across cores by invalidating stale copies in other caches.

Gaming Leaderboards (Redis to PostgreSQL)

Online games frequently use Redis sorted sets as a write-back cache for leaderboards. Player score updates write to Redis only (sub-millisecond), and a background job flushes the leaderboard state to PostgreSQL every 10-30 seconds for persistence and analytics. During peak gameplay, a popular leaderboard might receive 10,000 score updates per second. Batch coalescing reduces this to a few hundred PostgreSQL writes per flush, as each player's score is only written once with its latest value.

Trade-Offs
AspectDescription
Write Latency vs Data DurabilityWrite-back provides the lowest write latency (sub-millisecond for cache-only writes) but introduces a data loss window. If the cache fails before flushing, writes since the last flush are lost. The flush interval directly trades latency for durability: shorter intervals reduce the loss window but increase database write frequency.
Database Load Reduction vs ComplexityBatch coalescing can reduce database writes by 90%+ for hot keys, dramatically reducing database load and cost. However, the flush process adds operational complexity: it must handle retries, ordering, conflict resolution, and monitoring to ensure writes eventually reach the database.
Write Throughput vs Read Consistency (for DB readers)Write-back maximizes write throughput by decoupling writes from the database. But any system reading directly from the database (analytics pipelines, batch jobs, other services) sees stale data until the flush completes. All readers must go through the cache for consistent reads.
Batch Efficiency vs Flush LatencyLarger batch sizes and longer flush intervals improve coalescing efficiency (more writes merged, fewer DB operations). But they also increase the staleness window for database readers and the data loss window in case of cache failure. Tuning these parameters requires understanding the workload's write patterns.
Case Study

Gaming Leaderboard with Redis Write-Back at Scale

Scenario

A multiplayer online game with 2 million concurrent players needed real-time leaderboard updates. Each match completion triggered a score update, generating approximately 50,000 writes per second to the leaderboard during peak hours. Direct PostgreSQL writes at this volume caused lock contention, replication lag, and p99 write latencies exceeding 200ms, degrading the player experience.

Solution

The team implemented a write-back caching pattern using Redis sorted sets as the primary leaderboard store. Score updates wrote to Redis only (ZADD command, sub-millisecond). A background worker flushed the complete leaderboard state to PostgreSQL every 30 seconds using batch UPSERT operations. The flush coalesced all updates for each player within the 30-second window into a single PostgreSQL row, reducing 50,000 writes/sec to approximately 2,000 rows per flush (one per unique player who scored in that window).

Outcome

Write latency dropped from 200ms (PostgreSQL direct) to 0.3ms (Redis). Database write volume decreased by 97% due to batch coalescing. The 30-second flush interval meant a maximum data loss window of 30 seconds of scores in the event of Redis failure, which was acceptable for a gaming context. Redis replication (primary + replica) further reduced the practical risk. PostgreSQL was freed from write contention and could serve analytics queries efficiently.

Common Mistakes
  • Using write-back for data that absolutely cannot be lost (financial transactions, payment records, audit logs). The data loss window, however small, is unacceptable for these workloads. Use write-through or direct database writes instead.
  • Not implementing durability measures in the cache layer. At minimum, enable Redis AOF persistence or replicate to a standby node. Running write-back on a single unreplicated cache instance is asking for data loss.
  • Ignoring write ordering in the flush process. If the application writes A=1 then A=2, a naive batch flush might apply them out of order, leaving A=1 in the database. Use timestamps or sequence numbers to ensure last-writer-wins semantics.
  • Forgetting that database readers see stale data. If analytics or batch jobs query the database directly, they miss unflushed writes. Either route all reads through the cache or accept the staleness and document it for downstream consumers.
Related Concepts

See Write-Back (Write-Behind) Cache in action

Explore system design templates that use write-back (write-behind) cache and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Simulate write-back cache durability risks under failures

Metrics to watch
write_latency_msdata_loss_on_crashcache_hit_ratiothroughput_rps
Run Simulation
Test Your Understanding

1What is the primary risk of write-back (write-behind) caching?

2How does batch coalescing reduce database load in write-back caching?

3Which real-world system uses write-back caching as its default behavior?

Deeper Reading