1What does the CAP theorem state about distributed data stores during a network partition?
The CAP theorem states that a distributed data store can guarantee at most two of three properties -- Consistency, Availability, and Partition Tolerance -- during a network partition. Understanding CAP is essential for choosing the right trade-offs when designing distributed systems.
The CAP theorem, first conjectured by Eric Brewer in his 2000 PODC keynote and formally proven by Seth Gilbert and Nancy Lynch in 2002, establishes a fundamental impossibility result for distributed data stores. It states that any networked shared-data system can provide at most two of three guarantees simultaneously: Consistency (C), Availability (A), and Partition Tolerance (P). Because network partitions are inevitable in any distributed system, the practical choice during a partition is between consistency and availability.
Consistency in CAP means linearizability -- every read receives the most recent write or an error. This is stronger than the 'C' in ACID, which refers to database invariants. Availability means every request to a non-failing node receives a response, without guaranteeing it contains the latest write. Partition Tolerance means the system continues to operate despite arbitrary message loss or delay between nodes. A partition is not a theoretical edge case; it is a routine event in any system spanning multiple machines, racks, or data centers. Network cables are cut, switches fail, and cloud availability zones lose connectivity to each other.
The most common misconception about CAP is treating it as a permanent architectural choice -- 'pick 2 of 3.' In reality, partitions are rare events. When the network is healthy, a well-designed system can provide both consistency and availability. The theorem only forces a choice during an active partition. A CP system (like ZooKeeper) will reject writes or reads it cannot guarantee are consistent during a partition, sacrificing availability. An AP system (like DynamoDB in its default configuration) will continue serving requests during a partition, but different nodes may return different values for the same key, sacrificing consistency. After the partition heals, AP systems must reconcile divergent state, typically through conflict resolution strategies like last-writer-wins, vector clocks, or application-level merge functions.
The PACELC extension (proposed by Daniel Abadi in 2012) refines CAP by addressing system behavior when there is no partition. PACELC states: if there is a Partition, choose Availability or Consistency; Else, choose Latency or Consistency. This captures the reality that even in normal operation, replicating data synchronously (for consistency) adds latency compared to asynchronous replication (for lower latency). PACELC explains why systems like DynamoDB (PA/EL) and Cassandra (PA/EL) optimize for availability and latency, while Spanner (PC/EC) pays a latency cost for global consistency using TrueTime. Understanding PACELC helps designers make informed trade-offs for the 99.99% of the time when there is no partition, not just the rare failure scenarios CAP addresses.
The Two ATMs Problem
Imagine a bank with two ATMs connected by a network, both accessing the same account with a $500 balance. The network cable between them is cut (partition). A customer tries to withdraw $400 from ATM-A. The bank has two choices: (1) Allow the withdrawal -- the customer gets their money (available), but ATM-B still thinks the balance is $500, and a second $400 withdrawal there would overdraft the account (inconsistent). (2) Reject the withdrawal until the network is restored -- the account stays consistent, but the customer is denied service (unavailable). There is no option that gives both correct balances and uninterrupted service during the partition.
Amazon DynamoDB
DynamoDB is an AP system by default, prioritizing availability and partition tolerance. During a partition, DynamoDB continues accepting reads and writes on all reachable nodes using eventual consistency. Conflicts are resolved via last-writer-wins using vector clocks. DynamoDB also offers a strongly consistent read option that routes to the leader node, effectively providing CP behavior per-request at the cost of higher latency and reduced availability during partitions.
Google Spanner
Spanner is a CP system that provides globally consistent reads and writes using TrueTime, a GPS- and atomic-clock-synchronized time API. During a partition, Spanner will make affected data unavailable rather than serve inconsistent results. The trade-off is higher write latency (commits require cross-region Paxos consensus), but Spanner achieves 99.999% availability through redundant infrastructure, making the CP trade-off less painful in practice.
Apache ZooKeeper
ZooKeeper is a CP coordination service used for leader election, configuration management, and distributed locking. During a partition, if a ZooKeeper node is separated from the quorum (majority of nodes), it stops serving reads and writes to prevent returning stale coordination data. This CP choice is critical because stale leader-election data could cause split-brain, where two nodes both believe they are the leader.
| Aspect | Description |
|---|---|
| Consistency vs Availability | The core CAP trade-off. CP systems reject requests during partitions to maintain correctness, which can mean downtime for affected users. AP systems continue serving requests but may return stale or conflicting data, requiring conflict resolution mechanisms and application-level compensation logic. |
| Latency Impact of CP | Maintaining strong consistency in a distributed system typically requires synchronous replication or consensus protocols (Paxos, Raft). These add round-trip latency to every write and sometimes to reads. Cross-region consensus can add 50-200ms per operation, which may be unacceptable for latency-sensitive workloads. |
| Operational Complexity | AP systems require conflict resolution strategies (last-writer-wins, CRDTs, application-level merge) that add design and operational complexity. CP systems require quorum management, leader election, and careful partition detection. Neither choice eliminates complexity -- it shifts it to a different part of the system. |
| Application-Level Compensation | AP systems push consistency responsibility to the application layer. Developers must handle read-repair, anti-entropy processes, and business-logic-level conflict resolution. For example, Amazon's shopping cart merges conflicting cart states by taking the union of items, preferring to show extra items over losing additions. |
Amazon's Dynamo Paper (2007) -- Shopping Cart Availability
Scenario
Amazon's e-commerce platform required that customers could always add items to their shopping cart, even during infrastructure failures and network partitions. A lost 'add to cart' operation directly translated to lost revenue. The existing relational database approach sacrificed availability for consistency, causing cart errors during peak traffic events like holiday sales.
Solution
Amazon built Dynamo, an AP key-value store that prioritized availability and partition tolerance over strong consistency. Dynamo uses consistent hashing for data partitioning, vector clocks for conflict detection, and a 'sloppy quorum' with hinted handoff to remain writable even when some nodes are unreachable. When conflicting versions of a shopping cart are detected (because writes occurred on different sides of a partition), the application merges them by taking the union of all items -- it is better to show an item the customer removed than to lose an item they added.
Outcome
Dynamo achieved 99.9th percentile read latency under 300ms and provided 'always-writable' availability during multiple infrastructure failures. The shopping cart error rate dropped by orders of magnitude. The Dynamo paper (published at SOSP 2007) became one of the most influential papers in distributed systems, directly inspiring open-source systems like Cassandra, Riak, and Voldemort, and influencing the design of DynamoDB.
See CAP Theorem in action
Explore system design templates that use cap theorem and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What does the CAP theorem state about distributed data stores during a network partition?
2A distributed cache continues serving stale data when some nodes cannot communicate. Which CAP trade-off has it made?
3Which real-world system is classified as CP, using synchronized clocks for global consistency?