1In a leader-follower replication setup, what happens when a client writes to the leader and immediately reads from a follower?
Leader-follower replication designates one node as the writable leader and replicates changes to N read-only followers via a replication log. It is the most widely deployed replication topology, powering PostgreSQL streaming replication, MySQL binlog replication, and MongoDB replica sets.
Leader-follower replication, also called primary-replica or master-slave replication, is the most common replication architecture in production databases. A single node -- the leader (or primary) -- accepts all write operations. After processing a write, the leader appends the change to a replication log (such as PostgreSQL's write-ahead log or MySQL's binary log) and streams these log entries to one or more follower (replica) nodes. Followers apply the log entries in order, maintaining a copy of the leader's data that can serve read queries. This architecture provides a clear write-ordering guarantee because all mutations flow through a single point of serialization.
The replication log is the heart of the system. In WAL-based (physical) replication, the leader ships the exact byte-level changes to its storage files. This is efficient but ties replicas to the same storage engine version and architecture. In logical (row-based) replication, the leader ships higher-level change records -- insert row X, update row Y, delete row Z -- which are decoupled from the physical storage format. Logical replication enables heterogeneous setups (different PostgreSQL versions, or replicating to a data warehouse) at the cost of slightly higher overhead. MySQL's GTID-based binlog replication uses globally unique transaction identifiers to track replication position, making failover and replica provisioning significantly simpler than older file-position-based approaches.
Failover is the most operationally complex aspect of leader-follower replication. When the leader fails, the system must detect the failure (typically via heartbeat timeouts), promote a follower to become the new leader, and redirect clients to the new leader. This process introduces several risks. If the timeout is too short, transient network glitches can trigger unnecessary failovers (flapping). If the timeout is too long, the system experiences extended write unavailability. The most dangerous failure mode is split-brain: if the old leader comes back online without realizing it has been replaced, two nodes accept writes simultaneously, causing divergent data that is extremely difficult to reconcile. Production systems mitigate split-brain using fencing tokens, STONITH (Shoot The Other Node In The Head), or consensus-based leader election via tools like etcd or ZooKeeper.
Read-after-write consistency is a persistent challenge with leader-follower replication. When a client writes to the leader and then immediately reads from a follower, the follower may not have received the latest change yet, causing the client to see stale data -- effectively losing their own write. Solutions include routing reads of recently-written data to the leader (read-your-writes guarantee), using synchronous replication for at least one follower (semi-synchronous mode), or including a version token with reads so that followers can wait until they have caught up to the required version before responding.
The Head Chef and Line Cooks
Think of a restaurant kitchen where the head chef (leader) is the only one who decides each dish's recipe and writes it on the order board. Three line cooks (followers) watch the board and replicate each dish exactly as written. Customers can ask any line cook what today's special is (read), but only the head chef can change the menu (write). If the head chef calls in sick, the sous chef (most experienced line cook) takes over -- but there is a brief period of confusion while the other cooks learn who is in charge. If the head chef comes back without knowing the sous chef took over, both might try to change the menu simultaneously, leading to conflicting dishes (split-brain).
PostgreSQL
PostgreSQL uses streaming replication based on its write-ahead log (WAL). The primary continuously streams WAL records to standby servers over TCP connections. PostgreSQL supports synchronous replication (the primary waits for at least one standby to flush WAL to disk before confirming a commit) and asynchronous replication (the primary confirms immediately). The pg_stat_replication view provides real-time visibility into replication lag, replay position, and standby state. Tools like Patroni automate failover using etcd or ZooKeeper for leader election.
MySQL
MySQL uses binary log (binlog) replication with Global Transaction Identifiers (GTIDs). The leader writes row-change events to the binlog, and followers pull these events and replay them. GTIDs assign a unique identifier to every transaction across the cluster, making it simple to determine which transactions a follower has and has not applied -- critical for automated failover. MySQL Group Replication extends this with Paxos-based consensus for automatic leader election and conflict detection, bridging toward multi-leader capabilities.
MongoDB
MongoDB's replica sets implement leader-follower replication with automatic failover. A replica set consists of a primary (leader) and one or more secondaries (followers). The primary records all mutations in the oplog (operations log), and secondaries tail this log to stay current. When the primary becomes unreachable, secondaries hold a Raft-inspired election to choose a new primary, typically completing failover in under 10 seconds. MongoDB supports configurable write concern (w:1 for async, w:majority for sync) and read preference (primary, secondary, nearest) for per-operation consistency tuning.
| Aspect | Description |
|---|---|
| Write Throughput Bottleneck | All writes must pass through a single leader node, making leader capacity the ceiling for write throughput. Unlike leaderless or multi-leader systems, adding more followers does not increase write capacity. For write-heavy workloads, this bottleneck often drives teams toward sharding (partitioning data across multiple leader-follower groups) or multi-leader replication. |
| Failover Complexity and Risk | Automated failover introduces risk of split-brain (two leaders accepting writes), data loss (if the new leader is behind the old leader), and client disruption (connections must be rerouted). Manual failover avoids some risks but increases recovery time. Most production deployments use consensus-based failover (Patroni, Orchestrator) to balance speed and safety. |
| Replication Lag vs Consistency | Asynchronous replication provides low write latency but allows followers to fall behind, causing stale reads. Synchronous replication eliminates lag but adds latency to every write (one network round trip to the sync follower) and reduces write availability -- if the sync follower is unreachable, writes block until timeout. Semi-synchronous is a practical middle ground but still has edge cases during failover. |
| Operational Overhead | Managing replication slots, monitoring lag, handling replica drift, provisioning new replicas from backups, and testing failover procedures all add operational burden. PostgreSQL replication slots prevent WAL recycling until all replicas have consumed the data, which can fill disk if a replica goes offline. These operational details often determine whether a team can run replication reliably in production. |
GitHub's MySQL Replication and Orchestrator-Based Failover
Scenario
GitHub runs one of the largest MySQL deployments outside of hyperscale cloud providers, serving millions of developers. Their primary MySQL cluster handles all repository metadata writes (issues, pull requests, user data) with dozens of read replicas serving the high read volume. Before automated failover, a primary failure required manual intervention, causing extended write outages during incidents and on-call burnout for database engineers.
Solution
GitHub built and open-sourced Orchestrator, a MySQL replication topology management and failover tool. Orchestrator continuously monitors the replication topology, detects primary failures via configurable health checks, and automatically promotes the most up-to-date replica to primary using GTID-based replication positioning. It handles replica re-pointing, DNS updates, and proxy reconfiguration as part of the failover workflow. GitHub also implemented a proxy layer (ProxySQL and later freno for throttling) that can redirect writes to the new primary within seconds of detection.
Outcome
Automated failover reduced GitHub's MySQL write outage duration from 20-45 minutes (manual) to under 30 seconds. Orchestrator's topology awareness prevented split-brain by fencing the old primary before promoting a new one. The tool has been adopted by companies including Booking.com, Shopify, and Slack for their MySQL replication management, becoming a de facto standard for MySQL high availability.
See Leader-Follower (Primary-Replica) Replication in action
Explore system design templates that use leader-follower (primary-replica) replication and run traffic simulations to see how these concepts perform under real load.
Browse Templates1In a leader-follower replication setup, what happens when a client writes to the leader and immediately reads from a follower?
2What is the primary risk during leader-follower failover when the old leader comes back online?
3MySQL's GTID-based replication improves upon older file-position-based replication primarily by: