Vetora logo
๐Ÿ“ˆScalability

Horizontal vs Vertical Scaling

Understand the two fundamental approaches to scaling systems: adding more machines (horizontal) versus upgrading existing machines (vertical). Learn when to use each strategy and the trade-offs involved.

Overview

Scaling is the ability of a system to handle increasing amounts of work by adding resources. There are two fundamental approaches: vertical scaling (scaling up) and horizontal scaling (scaling out). Vertical scaling means upgrading an existing machine with more powerful hardware -- faster CPUs, more RAM, larger SSDs. Horizontal scaling means adding more machines to your pool, distributing the workload across multiple servers that collectively handle the traffic.

Vertical scaling is the simpler approach. When your database is slow, you move it to a server with 256 GB of RAM instead of 64 GB. When your API server is CPU-bound, you upgrade to a machine with more cores. The application code does not need to change because it still runs on a single machine. However, vertical scaling has a hard ceiling: there is a maximum amount of CPU, RAM, and I/O that any single machine can provide, and the cost grows superlinearly as you approach the top-end hardware specifications.

Horizontal scaling, by contrast, adds capacity by deploying more instances of your application behind a load balancer. There is no theoretical upper limit -- you can keep adding machines as demand grows. However, horizontal scaling introduces significant complexity. Your application must be designed to run across multiple machines, which means handling state distribution, session management, data consistency, and inter-node communication. Not all workloads can be easily parallelized, and some operations (like transactions on a single database) may still require vertical scaling as a component of a larger horizontally-scaled system.

In practice, most production systems use a combination of both strategies. Individual components are vertically scaled to a practical limit, and the overall system is horizontally scaled by running multiple instances of each component. Understanding when and how to apply each approach is one of the most important skills in system design.

Key Points
  • 1Vertical scaling (scaling up) adds more power to a single machine -- more CPU, RAM, or storage. It requires no code changes but has a hardware ceiling and creates a single point of failure.
  • 2Horizontal scaling (scaling out) adds more machines to distribute load. It has no theoretical ceiling but requires stateless application design and introduces distributed system complexity.
  • 3Vertical scaling cost grows superlinearly: a machine with 2x the RAM often costs 3-4x the price. Horizontal scaling cost grows linearly -- 2x the machines costs 2x the price.
  • 4Database scaling is often the bottleneck. Reads can be horizontally scaled with replicas, but writes typically require vertical scaling or sharding, which adds significant complexity.
  • 5Stateless services are trivial to scale horizontally. Stateful services (databases, caches, session stores) require careful architectural decisions around data partitioning and replication.
  • 6Most cloud providers make horizontal scaling easier through auto-scaling groups, managed load balancers, and container orchestration platforms like Kubernetes.
Simple Example

The Restaurant Analogy

Think of a busy restaurant. Vertical scaling is like replacing your chef with a faster, more experienced chef who can cook more dishes per hour. Eventually, even the best chef reaches their limit. Horizontal scaling is like hiring additional chefs and setting up more cooking stations. Each chef works independently on different orders, and a host (load balancer) distributes incoming orders across all available chefs. You can keep adding chefs as demand grows, but you need a system to coordinate them -- making sure the same order is not prepared twice and that each chef has access to the ingredients they need.

Real-World Examples

Netflix

Netflix horizontally scales its streaming service across thousands of stateless microservice instances running in AWS. Each service handles a portion of the traffic, and auto-scaling adjusts instance counts based on demand. However, their Cassandra database clusters are also vertically scaled with high-memory instances to maintain low-latency reads for the personalization engine.

Slack

Slack initially ran on a vertically-scaled MySQL database. As they grew to millions of users, they transitioned to a horizontally-sharded MySQL architecture called Vitess. The application tier was already horizontally scaled, but the database tier required significant re-architecture to move from vertical to horizontal scaling.

Stack Overflow

Stack Overflow famously handles millions of daily requests with a relatively small number of powerful, vertically-scaled servers. Their SQL Server database runs on high-end hardware with large amounts of RAM and fast NVMe storage. This demonstrates that vertical scaling can be effective when the workload fits within the ceiling of available hardware and the operational simplicity is valued.

Trade-Offs
AspectDescription
ComplexityVertical scaling is operationally simple -- a single server to monitor, patch, and back up. Horizontal scaling introduces distributed system challenges: network partitions, data consistency, service discovery, and load balancing configuration.
Cost EfficiencyHorizontal scaling offers better cost efficiency at large scale because commodity hardware is cheaper per unit of compute than high-end servers. However, for small to medium workloads, a single powerful server can be more cost-effective than managing a cluster of smaller machines.
AvailabilityHorizontal scaling provides natural high availability -- if one machine fails, others continue serving traffic. Vertical scaling creates a single point of failure unless paired with a standby replica, which adds cost and complexity.
Data ConsistencyVertical scaling on a single database provides strong ACID consistency by default. Horizontal scaling typically requires accepting eventual consistency, implementing distributed transactions, or using consensus protocols, all of which add latency and operational burden.
Case Study

Twitter's Migration from Vertical to Horizontal Scaling

Scenario

In its early years, Twitter ran on a monolithic Ruby on Rails application backed by a single MySQL database. As the platform grew from thousands to millions of users, the database became the bottleneck. Vertical scaling was applied first -- upgrading to increasingly powerful database servers -- but traffic growth outpaced hardware improvements, leading to the infamous 'fail whale' error page during peak events.

Solution

Twitter re-architected its system from a monolith to a horizontally-scaled microservices architecture. They decomposed the application into independent services (timeline, user, tweet, social graph), each with its own data store. The tweet storage was sharded across a fleet of MySQL instances using Gizzard, their custom sharding framework. The timeline service used Redis clusters for pre-materialized feeds. Stateless API servers were horizontally scaled behind load balancers with auto-scaling based on traffic patterns.

Outcome

The horizontally-scaled architecture allowed Twitter to handle over 500 million tweets per day and serve billions of timeline reads. The fail whale became a rarity. Each service could be scaled independently based on its specific load characteristics, and individual component failures no longer brought down the entire platform. The transition took several years and required significant engineering investment, illustrating the substantial cost of migrating from vertical to horizontal scaling after initial design decisions have been made.

Common Mistakes
  • โš Premature horizontal scaling: Adding complexity by distributing a system across multiple machines before a single well-optimized server has reached its limits. Profile and optimize first; distribute second.
  • โš Ignoring the database: Horizontally scaling the application tier while leaving the database as a vertically-scaled single instance. The database invariably becomes the bottleneck, and retrofitting sharding is far more difficult than planning for it.
  • โš Stateful services behind load balancers: Deploying services that store session state in local memory behind a round-robin load balancer. When requests are routed to a different instance, the state is lost. Always externalize state to a shared store (Redis, database) before horizontally scaling.
  • โš Assuming linear performance gains: Expecting that doubling the number of servers will double throughput. Amdahl's Law dictates that the serial portion of a workload limits the maximum speedup from parallelization. Communication overhead between nodes also reduces effective throughput.
Related Concepts

See Horizontal vs Vertical Scaling in action

Explore system design templates that use horizontal vs vertical scaling and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Scale an e-commerce service horizontally under peak traffic

Metrics to watch
throughput_rpsp99_latency_mscpu_utilization_pctinstance_count
Run Simulation
Test Your Understanding

1An application's write-heavy database is at CPU capacity on the largest available instance. Adding read replicas has no effect. What is the most appropriate next step?

2According to Amdahl's Law, if 10% of a workload is inherently serial, what is the maximum speedup achievable by adding an infinite number of parallel processors?

Deeper Reading