Replication

Keeping copies of data in sync across machines for durability and availability.

Concepts

Leader-Follower (Primary-Replica) ReplicationP0

Leader-follower replication designates one node as the writable leader and replicates changes to N read-only followers via a replication log. It is the most widely deployed replication topology, powering PostgreSQL streaming replication, MySQL binlog replication, and MongoDB replica sets.

Multi-Leader ReplicationP0

Multi-leader replication allows multiple nodes to accept writes independently, replicating changes to each other asynchronously. It enables multi-datacenter writes, offline-first applications, and collaborative editing, but introduces the fundamental challenge of write conflict resolution.

Leaderless (Dynamo-Style) ReplicationP0

Leaderless replication allows any node to accept reads and writes, using quorum-based coordination (R + W > N) to ensure consistency without a designated leader. Pioneered by Amazon's Dynamo paper, this model prioritizes availability during network partitions and powers systems like DynamoDB, Cassandra, and Riak.

Synchronous vs Asynchronous ReplicationP0

Synchronous replication waits for follower acknowledgment before confirming a write, guaranteeing durability at the cost of latency. Asynchronous replication confirms writes immediately on the leader, providing lower latency but risking data loss on leader failure. Semi-synchronous mode balances the two by keeping one follower synchronous and the rest asynchronous.

Quorums (R + W > N)P0

Quorums define the minimum number of replica acknowledgments needed for reads (R) and writes (W) in a replicated system. The quorum condition R + W > N guarantees that any read will overlap with the most recent write, providing tunable consistency without a single leader. Quorums power Cassandra, DynamoDB, and etcd.

Replication Lag and Read-Your-WritesP0

Replication lag is the delay between a write being committed on the leader and applied on followers. It causes stale reads, read-your-writes violations, and monotonic read anomalies. Understanding and mitigating lag is essential for building correct applications on asynchronously replicated databases.