1A service expects 100M requests per day with a 5x peak-to-average ratio. If each server handles 2,000 RPS, how many servers are needed for peak traffic (before headroom)?
Capacity planning is the discipline of determining the compute, storage, network, and memory resources needed to serve a system's expected workload at target performance levels. It combines traffic forecasting, performance modeling, and infrastructure provisioning to ensure systems can handle peak loads without degradation.
Capacity planning is one of the most practical skills in system design, bridging the gap between theoretical architecture and real infrastructure decisions. The core question is deceptively simple: how many servers, how much storage, and how much bandwidth do you need? The answer requires combining business requirements (expected users, growth rate, peak events), technical benchmarks (how much traffic each server can handle), and operational safety margins (headroom for spikes, failures, and maintenance).
The process starts with demand estimation. How many users will the system serve? What is the read-to-write ratio? What is the peak-to-average traffic ratio? For a social media feed, peak traffic might be 3-5x average (during major events). For an e-commerce site, Black Friday might be 10-20x normal. These multipliers determine whether you plan for 10,000 RPS or 200,000 RPS -- a 20x difference in infrastructure cost.
Next, you benchmark individual components. A single PostgreSQL instance might handle 10,000 simple queries per second. A Redis node might handle 100,000 operations per second. A single application server might handle 500 RPS for a typical API. You divide the peak demand by per-component capacity to determine how many instances you need, then add headroom -- typically 30-50% above peak to handle unexpected spikes, rolling deployments (where some instances are temporarily unavailable), and organic growth before the next planning cycle.
Storage capacity planning adds a time dimension. If each user generates 10KB of data per day and you have 10 million users, that is 100GB per day, 36TB per year, before replication. With 3x replication and 2 years of retention, you need over 200TB of storage. These calculations seem simple but the details matter -- data compression ratios, index overhead, WAL storage, backup retention, and tombstone accumulation all affect the real number. The best capacity plans are living documents, reviewed quarterly against actual usage and adjusted based on observed growth curves.
Planning a Photo-Sharing Service
You are building a photo-sharing service expecting 10 million daily active users. Each user uploads 2 photos/day (avg 2MB each) and views 50 photos/day. Write throughput: 10M * 2 = 20M photos/day = ~230 photos/sec. Read throughput: 10M * 50 = 500M reads/day = ~5,800 reads/sec. Storage per day: 20M * 2MB = 40TB. Per year with 3x replication: 40TB * 365 * 3 = 43.8PB. If a single storage node holds 10TB, you need ~4,380 storage nodes for one year of data. This back-of-envelope calculation reveals that storage -- not compute -- is the primary infrastructure cost.
Netflix
Netflix capacity plans for 'peak of peak' -- the highest traffic point during the highest-traffic day of the year (typically a major series premiere on a Sunday evening). They provision enough capacity to handle this peak plus 30% headroom across multiple AWS regions. Their Titus container platform pre-warms instances before anticipated traffic spikes based on show premiere schedules.
Twitter/X
Twitter experiences massive traffic spikes during global events (Super Bowl, elections, breaking news). Their capacity planning models include a 'major event multiplier' of 10-20x normal traffic for tweet reads and 5-10x for tweet writes. During the 2014 World Cup, they pre-provisioned 10x normal capacity in anticipation of peak moments like goals.
Slack
Slack's capacity planning accounts for the 'Monday morning spike' -- when millions of users sign in simultaneously at 9 AM across time zones. They discovered that connection establishment (WebSocket handshake, authentication, channel list loading) is 5-10x more expensive than steady-state message delivery, requiring disproportionate capacity for the daily sign-in wave.
| Aspect | Description |
|---|---|
| Over-provisioning vs Under-provisioning | Over-provisioning wastes money but ensures reliability. Under-provisioning saves money but risks outages during traffic spikes. The cost of an outage (lost revenue, SLA penalties, brand damage) typically dwarfs the cost of extra servers. Most teams err on the side of over-provisioning by 30-50%. |
| Reserved vs On-Demand Capacity | Reserved instances (1-3 year commitments) cost 30-60% less than on-demand but lock you into a capacity estimate. The optimal strategy: reserve capacity for your baseline load (the minimum you always need) and use on-demand or spot instances for peak and burst traffic. |
| Horizontal vs Vertical Scaling | Vertical scaling (bigger machines) is simpler but has a ceiling and a single point of failure. Horizontal scaling (more machines) is unlimited but adds complexity (load balancing, data distribution, consensus). Capacity planning for horizontal systems must account for coordination overhead -- adding the 11th node does not add 1/10th more capacity. |
| Precision vs Speed of Planning | Detailed capacity models (simulating every component, modeling queueing delays) are accurate but time-consuming. Back-of-envelope estimates (quick multiplication with safety margins) are fast but rough. Use back-of-envelope for initial design and detailed modeling when the cost difference between estimates justifies the analysis time. |
Slack's 'Monday Morning Thundering Herd' Problem
Scenario
Every Monday morning, millions of Slack users reconnected simultaneously between 8-9 AM across time zones. The reconnection storm involved WebSocket handshakes, authentication token validation, channel membership queries, and message history fetches. This 'thundering herd' pattern generated 5-10x normal traffic within a 30-minute window, consistently degrading performance and occasionally causing partial outages.
Solution
Slack implemented a multi-layered capacity strategy: (1) pre-provisioned extra instances before the Monday morning window using scheduled auto-scaling; (2) introduced client-side jitter to spread reconnections over a wider window (instead of all clients reconnecting at exactly 9:00, they added random 0-60 second delays); (3) cached channel membership and recent messages aggressively so reconnections did not hit the database; (4) implemented connection coalescing where a single server-to-database connection served multiple user sessions.
Outcome
Monday morning peak traffic was reduced from 10x to 3x normal load through client-side jitter. Pre-provisioned capacity handled the remaining spike without degradation. Database load during reconnection dropped 80% due to caching. The key insight was that capacity planning is not just about adding servers -- reshaping the demand curve (jitter) is often more effective and cheaper than provisioning for the unmodified peak.
See Capacity Planning in action
Explore system design templates that use capacity planning and run traffic simulations to see how these concepts perform under real load.
Browse Templates1A service expects 100M requests per day with a 5x peak-to-average ratio. If each server handles 2,000 RPS, how many servers are needed for peak traffic (before headroom)?
2Why is planning for average traffic instead of peak traffic a critical mistake?