What is important about Read Replicas regarding "Read replicas scale read throughput linearly. Each replica h..."?

Read replicas scale read throughput linearly. Each replica handles a portion of read queries, reducing load on the primary database and enabling read-heavy workloads to serve millions of users.

What is important about Read Replicas regarding "Asynchronous replication minimizes write latency but introdu..."?

Asynchronous replication minimizes write latency but introduces replication lag, meaning replicas may serve slightly stale data. The lag is typically milliseconds but can grow to seconds under heavy write load.

What is important about Read Replicas regarding "Read-your-writes consistency routes a user's reads to the pr..."?

Read-your-writes consistency routes a user's reads to the primary for a short window after they perform a write, ensuring they see their own changes. Other users' reads go to replicas as usual.

What is important about Read Replicas regarding "Read replicas can be placed in different geographic regions ..."?

Read replicas can be placed in different geographic regions to reduce read latency for globally distributed users. A replica in Europe serves European users from a nearby data center instead of cross-Atlantic to the US primary.

What is important about Read Replicas regarding "Replicas can serve as failover targets. If the primary fails..."?

Replicas can serve as failover targets. If the primary fails, a replica can be promoted to primary (manual or automatic failover). This provides both read scalability and high availability from the same infrastructure.

What is important about Read Replicas regarding "Read replicas do not scale writes. All writes still go to th..."?

Read replicas do not scale writes. All writes still go to the single primary database. If write throughput is the bottleneck, sharding or multi-primary replication is needed instead of (or in addition to) read replicas.

Vetora

📈Scalability

Read Replicas

Discover how read replicas scale database read throughput by distributing queries across multiple copies of the data, enabling read-heavy applications to serve millions of users.

Overview

Read replicas are copies of a primary (leader) database that receive real-time updates through replication and serve read queries. The primary database handles all write operations (INSERT, UPDATE, DELETE), and these changes are asynchronously replicated to one or more read replicas. Read-heavy application queries are then directed to the replicas instead of the primary, distributing the read load across multiple database instances.

Most production workloads are read-heavy. A social media platform might have a 100:1 read-to-write ratio -- for every post created, it is read by hundreds or thousands of followers. An e-commerce catalog is updated a few times per day but browsed by millions of shoppers. For these workloads, the primary database's CPU, memory, and I/O are consumed primarily by read queries. Adding read replicas linearly scales read throughput: two replicas can handle roughly 2x the read traffic of a single database, three replicas handle 3x, and so on.

Replication can be synchronous or asynchronous. Synchronous replication guarantees that a write is committed on both the primary and the replica before acknowledging the write to the client. This provides strong consistency but increases write latency because each write must wait for the replica to confirm. Asynchronous replication commits the write on the primary immediately and sends it to replicas in the background. This minimizes write latency but introduces replication lag: the replica may be a few milliseconds to a few seconds behind the primary, meaning reads from the replica may return slightly stale data.

Managing replication lag is the central challenge of read replica architectures. Applications must decide which queries can tolerate stale data (catalog browsing, dashboard aggregations, search results) and which require strong consistency (account balance checks, order status immediately after placement). The common pattern is to route reads that require fresh data to the primary and all other reads to replicas, sometimes called 'read-your-writes consistency' when the application ensures that a user who just performed a write sees their own change on the next read.

Key Points

1Read replicas scale read throughput linearly. Each replica handles a portion of read queries, reducing load on the primary database and enabling read-heavy workloads to serve millions of users.
2Asynchronous replication minimizes write latency but introduces replication lag, meaning replicas may serve slightly stale data. The lag is typically milliseconds but can grow to seconds under heavy write load.
3Read-your-writes consistency routes a user's reads to the primary for a short window after they perform a write, ensuring they see their own changes. Other users' reads go to replicas as usual.
4Read replicas can be placed in different geographic regions to reduce read latency for globally distributed users. A replica in Europe serves European users from a nearby data center instead of cross-Atlantic to the US primary.
5Replicas can serve as failover targets. If the primary fails, a replica can be promoted to primary (manual or automatic failover). This provides both read scalability and high availability from the same infrastructure.
6Read replicas do not scale writes. All writes still go to the single primary database. If write throughput is the bottleneck, sharding or multi-primary replication is needed instead of (or in addition to) read replicas.

Simple Example

The Copy Center Analogy

Think of a popular textbook that every student in a university needs. Instead of having everyone line up at the single original (primary database), the university prints several copies and distributes them to different libraries across campus (read replicas). Students can read any copy at their nearest library without waiting. When the author releases an updated edition, the original is updated first, and new copies are distributed to the libraries (replication). There is a short period where some libraries still have the old edition (replication lag), but for most purposes, the content is close enough. If a student needs to verify the absolute latest version, they go to the main library where the original is kept (reading from primary).

Real-World Examples

Amazon RDS

Amazon's Relational Database Service supports up to 15 read replicas for MySQL and PostgreSQL, and up to 5 for Oracle. Amazon.com itself uses read replicas extensively for its product catalog -- the catalog is updated by sellers and internal systems (writes to primary), but browsed by millions of shoppers (reads from replicas across multiple regions). Aurora's replication has sub-10ms lag, making it suitable for most read-after-write use cases.

Facebook

Facebook operates one of the largest MySQL deployments in the world, with hundreds of read replicas per primary database. Their TAO (The Associations and Objects) caching layer sits in front of the replicas, further reducing database load. The architecture handles trillions of read queries per day. Replication lag is managed through a custom consistency protocol that ensures users see their own recent actions immediately.

GitHub

GitHub uses MySQL with multiple read replicas per shard. When a developer pushes code or opens a pull request (write), the change goes to the primary. When other developers browse repositories, view diffs, or search code (reads), these queries are distributed across replicas. GitHub monitors replication lag in real time and will redirect reads to the primary if a replica falls too far behind, ensuring users do not see confusingly stale data.

Trade-Offs

Aspect	Description
Read Scalability vs Consistency	Asynchronous read replicas provide linear read scalability but serve stale data during replication lag. Synchronous replicas provide consistent reads but increase write latency and reduce write throughput. The choice depends on whether the application can tolerate eventual consistency for read queries.
Infrastructure Cost vs Performance	Each read replica is a full database instance requiring compute, memory, and storage resources. The cost scales linearly with the number of replicas. For workloads with moderate read traffic, a single database with proper indexing and caching may be more cost-effective than maintaining multiple replicas.
Operational Overhead	Each replica must be monitored for replication lag, disk space, query performance, and health. Schema migrations must be coordinated across all replicas. Replica promotion for failover must be tested and automated. The operational burden grows with each additional replica.
Query Routing Complexity	The application or a database proxy must decide which queries go to the primary and which go to replicas. This routing logic must handle edge cases like read-your-writes consistency, transaction isolation, and replica failover. Incorrect routing can lead to stale reads or unnecessary primary load.

Case Study

Shopify's Read Replica Strategy for Black Friday Traffic

Scenario

Shopify hosts over 2 million online stores that collectively experience a massive traffic spike during Black Friday and Cyber Monday. The read-to-write ratio during this period exceeds 200:1 as millions of shoppers browse product pages, check inventory, and view cart totals, while the much smaller number of actual purchases generate writes. The primary MySQL databases for each shard were approaching their read capacity limits during previous peak events.

Solution

Shopify added multiple read replicas to each MySQL shard and implemented intelligent query routing in their Rails application. Product catalog reads, inventory availability checks (with a tolerance for seconds-old data), and store configuration reads were routed to replicas. Order creation, payment processing, and inventory deduction writes were routed to the primary. A custom middleware tracked recent writes per user session and temporarily routed that user's reads to the primary for 5 seconds after each write, implementing read-your-writes consistency without requiring synchronous replication.

Outcome

The read replica strategy allowed Shopify to handle 4x more read traffic during Black Friday 2023 compared to the previous year without adding primary database capacity. Replication lag remained under 100 milliseconds for 99.9% of the peak period. The primary databases experienced 60% less CPU utilization because the bulk of read queries were offloaded to replicas. The cost of the additional replicas was a fraction of what it would have cost to vertically scale the primary databases to handle the same read volume.

Common Mistakes

⚠Sending all reads to replicas without considering replication lag. Critical reads (account balance after a deposit, order status after placement) must go to the primary to ensure the user sees their most recent changes. Only route reads to replicas when eventual consistency is acceptable.
⚠Not monitoring replication lag. A replica that falls minutes behind the primary serves badly stale data and may cause confusing user experiences (items appearing in stock that are sold out, deleted comments still showing). Alert when lag exceeds an acceptable threshold and either redirect traffic or investigate the cause.
⚠Using read replicas as a substitute for query optimization. If slow queries are overwhelming the primary, adding replicas just distributes the slow queries across more machines. Fix the queries first (add indexes, optimize joins, reduce result sets), then add replicas if read volume is still the bottleneck.
⚠Neglecting replica promotion testing. If the primary fails and a replica must be promoted, the process should be well-tested and automated. An untested promotion procedure during a production incident leads to extended downtime and potential data loss.

Related Concepts

Database Sharding Connection Pooling Horizontal vs Vertical Scaling Stateless Service Design Auto-Scaling & Elasticity

See Read Replicas in action

Explore system design templates that use read replicas and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Route read traffic to replicas and measure lag

Metrics to watch

replication_lag_msread_throughput_rpsstale_read_pctreplica_utilization_pct

Run Simulation

Test Your Understanding

1A user deposits money into their bank account and immediately checks their balance. The balance shows the old amount. What read consistency pattern would fix this?

2An application has 3 read replicas behind a load balancer. During a peak write burst, replication lag on Replica 3 grows to 30 seconds while Replicas 1 and 2 remain under 1 second. What should the query routing layer do?

Deeper Reading