1An application's write-heavy database is at CPU capacity on the largest available instance. Adding read replicas has no effect. What is the most appropriate next step?
Understand the two fundamental approaches to scaling systems: adding more machines (horizontal) versus upgrading existing machines (vertical). Learn when to use each strategy and the trade-offs involved.
Scaling is the ability of a system to handle increasing amounts of work by adding resources. There are two fundamental approaches: vertical scaling (scaling up) and horizontal scaling (scaling out). Vertical scaling means upgrading an existing machine with more powerful hardware -- faster CPUs, more RAM, larger SSDs. Horizontal scaling means adding more machines to your pool, distributing the workload across multiple servers that collectively handle the traffic.
Vertical scaling is the simpler approach. When your database is slow, you move it to a server with 256 GB of RAM instead of 64 GB. When your API server is CPU-bound, you upgrade to a machine with more cores. The application code does not need to change because it still runs on a single machine. However, vertical scaling has a hard ceiling: there is a maximum amount of CPU, RAM, and I/O that any single machine can provide, and the cost grows superlinearly as you approach the top-end hardware specifications.
Horizontal scaling, by contrast, adds capacity by deploying more instances of your application behind a load balancer. There is no theoretical upper limit -- you can keep adding machines as demand grows. However, horizontal scaling introduces significant complexity. Your application must be designed to run across multiple machines, which means handling state distribution, session management, data consistency, and inter-node communication. Not all workloads can be easily parallelized, and some operations (like transactions on a single database) may still require vertical scaling as a component of a larger horizontally-scaled system.
In practice, most production systems use a combination of both strategies. Individual components are vertically scaled to a practical limit, and the overall system is horizontally scaled by running multiple instances of each component. Understanding when and how to apply each approach is one of the most important skills in system design.
The Restaurant Analogy
Think of a busy restaurant. Vertical scaling is like replacing your chef with a faster, more experienced chef who can cook more dishes per hour. Eventually, even the best chef reaches their limit. Horizontal scaling is like hiring additional chefs and setting up more cooking stations. Each chef works independently on different orders, and a host (load balancer) distributes incoming orders across all available chefs. You can keep adding chefs as demand grows, but you need a system to coordinate them -- making sure the same order is not prepared twice and that each chef has access to the ingredients they need.
Netflix
Netflix horizontally scales its streaming service across thousands of stateless microservice instances running in AWS. Each service handles a portion of the traffic, and auto-scaling adjusts instance counts based on demand. However, their Cassandra database clusters are also vertically scaled with high-memory instances to maintain low-latency reads for the personalization engine.
Slack
Slack initially ran on a vertically-scaled MySQL database. As they grew to millions of users, they transitioned to a horizontally-sharded MySQL architecture called Vitess. The application tier was already horizontally scaled, but the database tier required significant re-architecture to move from vertical to horizontal scaling.
Stack Overflow
Stack Overflow famously handles millions of daily requests with a relatively small number of powerful, vertically-scaled servers. Their SQL Server database runs on high-end hardware with large amounts of RAM and fast NVMe storage. This demonstrates that vertical scaling can be effective when the workload fits within the ceiling of available hardware and the operational simplicity is valued.
| Aspect | Description |
|---|---|
| Complexity | Vertical scaling is operationally simple -- a single server to monitor, patch, and back up. Horizontal scaling introduces distributed system challenges: network partitions, data consistency, service discovery, and load balancing configuration. |
| Cost Efficiency | Horizontal scaling offers better cost efficiency at large scale because commodity hardware is cheaper per unit of compute than high-end servers. However, for small to medium workloads, a single powerful server can be more cost-effective than managing a cluster of smaller machines. |
| Availability | Horizontal scaling provides natural high availability -- if one machine fails, others continue serving traffic. Vertical scaling creates a single point of failure unless paired with a standby replica, which adds cost and complexity. |
| Data Consistency | Vertical scaling on a single database provides strong ACID consistency by default. Horizontal scaling typically requires accepting eventual consistency, implementing distributed transactions, or using consensus protocols, all of which add latency and operational burden. |
Twitter's Migration from Vertical to Horizontal Scaling
Scenario
In its early years, Twitter ran on a monolithic Ruby on Rails application backed by a single MySQL database. As the platform grew from thousands to millions of users, the database became the bottleneck. Vertical scaling was applied first -- upgrading to increasingly powerful database servers -- but traffic growth outpaced hardware improvements, leading to the infamous 'fail whale' error page during peak events.
Solution
Twitter re-architected its system from a monolith to a horizontally-scaled microservices architecture. They decomposed the application into independent services (timeline, user, tweet, social graph), each with its own data store. The tweet storage was sharded across a fleet of MySQL instances using Gizzard, their custom sharding framework. The timeline service used Redis clusters for pre-materialized feeds. Stateless API servers were horizontally scaled behind load balancers with auto-scaling based on traffic patterns.
Outcome
The horizontally-scaled architecture allowed Twitter to handle over 500 million tweets per day and serve billions of timeline reads. The fail whale became a rarity. Each service could be scaled independently based on its specific load characteristics, and individual component failures no longer brought down the entire platform. The transition took several years and required significant engineering investment, illustrating the substantial cost of migrating from vertical to horizontal scaling after initial design decisions have been made.
See Horizontal vs Vertical Scaling in action
Explore system design templates that use horizontal vs vertical scaling and run traffic simulations to see how these concepts perform under real load.
Browse Templates1An application's write-heavy database is at CPU capacity on the largest available instance. Adding read replicas has no effect. What is the most appropriate next step?
2According to Amdahl's Law, if 10% of a workload is inherently serial, what is the maximum speedup achievable by adding an infinite number of parallel processors?