Vetora logo
🗳️Replication

Quorums (R + W > N)

Quorums define the minimum number of replica acknowledgments needed for reads (R) and writes (W) in a replicated system. The quorum condition R + W > N guarantees that any read will overlap with the most recent write, providing tunable consistency without a single leader. Quorums power Cassandra, DynamoDB, and etcd.

Overview

Quorums are a fundamental coordination primitive in distributed systems that enable consistency without a single leader. The core idea is simple: if you write data to enough replicas (W) and read from enough replicas (R), and the sum R + W exceeds the total number of replicas (N), then the set of nodes you read from must overlap with the set of nodes you wrote to. This overlap guarantees that at least one node in your read set has the most recent write. By comparing version numbers or timestamps across the R responses, the client can identify and return the latest value.

The most common quorum configuration is N=3, W=2, R=2. With three replicas, writes wait for two acknowledgments and reads query two replicas. Since 2 + 2 = 4 > 3, there is always at least one node in the intersection. This balanced configuration provides both read and write fault tolerance: the system can tolerate one node failure for both reads and writes. Alternative configurations trade off differently. N=3, W=3, R=1 is read-optimized: reads are extremely fast (single replica) but writes require all three replicas to be available, making writes fragile. N=3, W=1, R=3 is write-optimized: writes are fast (single ACK) but reads must query all replicas and wait for all to respond, increasing read latency and reducing read availability. The choice depends entirely on the workload's read/write ratio and latency requirements.

Sloppy quorums extend the standard quorum model to improve availability during partial failures. In a strict quorum, only the N designated replicas for a key can participate in quorum operations. If fewer than W of those N replicas are reachable, writes fail. A sloppy quorum allows the write to succeed by using non-designated nodes as temporary participants. These temporary nodes store the data with a 'hint' indicating which designated node should eventually receive it -- a process called hinted handoff. When the designated node recovers, the temporary node forwards the data. Sloppy quorums are powerful for availability but break the R + W > N consistency guarantee because the 'N' is no longer the same set of nodes for reads and writes. A read using the designated N replicas may miss data that was written to a temporary node via sloppy quorum.

A common misconception is that quorums provide linearizability. They do not, by themselves. Consider a scenario where a write reaches W=2 of 3 replicas, and a subsequent read queries R=2 replicas, one of which has the new value and one which has the old value. The read detects the disagreement and picks the newer value (correct). But during the brief window between the write completing on the first replica and the second replica, another read might see only the old value on both queried replicas. This race condition means quorum reads can return stale data during concurrent writes. True linearizability requires additional mechanisms: read repair during the read (Dynamo-style), or a full consensus protocol like Raft or Paxos that establishes a total order of operations. Systems like etcd use Raft quorums specifically because Raft provides linearizability, not just quorum overlap.

Key Points
  • 1The quorum condition R + W > N guarantees that the read set and write set overlap by at least one node. This overlapping node has the most recent write, allowing the client to identify the latest value by comparing version numbers across responses.
  • 2Common configurations: N=3/W=2/R=2 (balanced fault tolerance), N=3/W=3/R=1 (read-optimized, writes need all nodes), N=3/W=1/R=3 (write-optimized, reads query all nodes). Each configuration trades off read latency, write latency, and fault tolerance differently.
  • 3Sloppy quorums allow writes to succeed on non-designated nodes during partial failures, improving availability via hinted handoff. However, they break the R + W > N guarantee because reads and writes may target different node sets, weakening consistency.
  • 4Quorums alone do not guarantee linearizability. During concurrent reads and writes, a quorum read can return a stale value if the read happens to query replicas that have not yet received the latest write. Linearizability requires read repair, consensus protocols (Raft, Paxos), or other synchronization mechanisms.
  • 5Read repair and anti-entropy complement quorums by converging replicas that have stale data. Without these mechanisms, a replica that missed a write remains stale until it participates in a quorum read that detects the discrepancy.
  • 6Cassandra's per-query consistency levels (ANY, ONE, QUORUM, ALL) implement tunable quorums, allowing developers to choose different R/W values for different operations within the same cluster. This is one of the most practical applications of quorum theory in production systems.
Simple Example

The Jury Verdict Analogy

Imagine a jury of 5 members (N=5) deciding a case. The prosecution needs 3 jurors to agree for a conviction (W=3). The defense can poll 3 jurors to check the verdict (R=3). Since 3 + 3 = 6 > 5, at least one juror in the defense's poll participated in the conviction vote. That juror knows the actual verdict, ensuring the defense always learns the true outcome. If the prosecution only needed 1 juror (W=1), the defense would need to poll all 5 (R=5) to be sure of finding that one juror, making the check much slower. This is the quorum trade-off: requiring more agreement at write time (W) reduces the number of checks needed at read time (R), and vice versa.

Real-World Examples

Apache Cassandra

Cassandra implements tunable per-query consistency levels that map directly to quorum configurations. With a replication factor of 3 (N=3), consistency level QUORUM requires W=2 for writes and R=2 for reads, satisfying R + W > N. Level ONE (W=1 or R=1) provides lower latency but weaker consistency. Level ALL (W=3 or R=3) provides strongest consistency but lowest availability (any single node failure blocks the operation). LOCAL_QUORUM restricts the quorum to the local datacenter, reducing cross-DC latency while maintaining intra-DC consistency. This per-query tuning is one of Cassandra's most powerful features for multi-tenant workloads.

Amazon DynamoDB

DynamoDB abstracts quorum mechanics behind two read consistency options: eventually consistent reads (default) and strongly consistent reads. Eventually consistent reads can return data from any single replica, providing low latency but potentially stale results (effectively R=1). Strongly consistent reads route to the Paxos leader of the replication group, guaranteeing the latest value (effectively a leader-read quorum). Writes always go to the leader and are replicated to two additional nodes, ensuring W=2 (majority of N=3). This simplified quorum model makes DynamoDB easier to use than Cassandra's raw tunable consistency while still providing the fundamental quorum guarantees.

etcd

etcd uses Raft consensus, which is a quorum-based protocol. In a 3-node etcd cluster, every write requires acknowledgment from 2 of 3 nodes (W=2) through the Raft leader. Reads can be served by the leader alone (linearizable reads, since the leader has the latest committed state) or from followers (with some staleness). Unlike Dynamo-style quorums, Raft quorums provide true linearizability because the Raft protocol establishes a total order of operations through leader-based log replication. This makes etcd suitable for coordination tasks (leader election, service discovery, distributed locks) where strict ordering is required.

Trade-Offs
AspectDescription
Read Latency vs Write LatencyIncreasing W (more write acknowledgments) decreases the required R (fewer read queries needed for consistency), and vice versa. W=3/R=1 makes reads fast but writes slow and fragile (all nodes must be up). W=1/R=3 makes writes fast but reads slow and must query all replicas. N=3/W=2/R=2 balances both but neither is minimized. Choose based on your workload's read/write ratio.
Fault Tolerance vs PerformanceA quorum system with N=3/W=2 tolerates 1 node failure for writes. Increasing N to 5 with W=3 tolerates 2 failures but adds storage overhead and replication traffic. Higher N improves fault tolerance but increases the total amount of data stored and replicated across the cluster, consuming more network bandwidth and storage.
Strict Quorum Consistency vs Sloppy Quorum AvailabilityStrict quorums maintain the R + W > N guarantee by requiring operations to use designated replicas only. Sloppy quorums improve availability by allowing non-designated nodes to participate, but break the consistency guarantee. The choice depends on whether your application can tolerate temporary inconsistency in exchange for higher write availability during partial failures.
Quorum Consistency vs LinearizabilityQuorum overlap (R + W > N) guarantees eventual detection of the latest write but does not provide linearizability. During concurrent reads and writes, quorum reads can return stale values. If your application requires linearizable reads, you need either a consensus protocol (Raft, Paxos) on top of quorums, or read repair with synchronization barriers, both of which add latency and complexity.
Case Study

Cassandra Quorum Tuning for a Real-Time Bidding Platform

Scenario

A real-time bidding (RTB) platform serving programmatic advertising needed sub-10ms read latency for bid decisions (reading user profiles to target ads) while maintaining strong consistency for budget tracking (ensuring ad campaigns do not overspend). The platform used Cassandra with a replication factor of 3 across two datacenters. Initially, all operations used QUORUM consistency, resulting in 15-25ms read latency for bid decisions (too slow for the 50ms auction deadline) and occasional budget overruns when consistency level was lowered to ONE for performance.

Solution

The team implemented per-query consistency tuning. Bid-decision reads (user profiles, segment data) used LOCAL_ONE consistency: a single local replica responded in 2-5ms, providing the speed needed for auction deadlines. The staleness risk was acceptable because user profiles change infrequently and a slightly outdated profile still produces relevant ad targeting. Budget writes and reads used LOCAL_QUORUM: budget decrements required local majority acknowledgment, and budget reads checked local majority, ensuring the R + W > N condition held within each datacenter. Cross-datacenter replication ran asynchronously, with a reconciliation service detecting and resolving budget conflicts between datacenters hourly.

Outcome

Bid-decision read latency dropped from 18ms (p50) to 3ms (p50), enabling the platform to meet auction deadlines with margin. Budget overruns dropped to near-zero because LOCAL_QUORUM writes and reads maintained consistency within each datacenter. The per-query consistency approach processed over 500,000 bid requests per second across the cluster while maintaining budget accuracy to within 0.01%. The hourly cross-DC budget reconciliation caught and corrected the small discrepancies from async replication.

Common Mistakes
  • Assuming R + W > N guarantees linearizability. Quorum overlap ensures you will eventually read the latest write, but during concurrent operations, reads can return stale values. If you need linearizability, use a consensus protocol (Raft, Paxos) or add read-repair barriers to your quorum reads.
  • Using the same consistency level for all queries. Different operations have different consistency requirements. User profile reads can tolerate eventual consistency (R=1) while financial transactions need quorum or stronger. Cassandra's per-query consistency levels exist precisely for this reason -- use them.
  • Forgetting that sloppy quorums break the R + W > N guarantee. When writes go to temporary nodes via hinted handoff, subsequent strict-quorum reads may not find the data until hinted handoff completes. Monitor hinted handoff queue depth as an indicator of temporary consistency gaps.
  • Overlooking the tail latency impact of quorum reads/writes. A quorum operation completes when the Kth-fastest replica responds (where K is W for writes, R for reads). This means latency is determined by the slowest node in the quorum, not the fastest. A single slow replica (GC pause, disk I/O) inflates latency for all quorum operations targeting that replica set.
Related Concepts

See Quorums (R + W > N) in action

Explore system design templates that use quorums (r + w > n) and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Tune quorum parameters (N, R, W) and observe consistency-latency trade-offs

Metrics to watch
read_latency_p50_mswrite_latency_p50_msstale_read_pct
Run Simulation
Test Your Understanding

1In a system with N=5, W=3, R=3, how many node failures can the system tolerate for writes?

2Why don't quorums alone guarantee linearizability?

3What does a sloppy quorum with hinted handoff sacrifice compared to a strict quorum?

Deeper Reading