Vetora logo
๐Ÿ“ŠPerformance

Capacity Planning

Capacity planning is the discipline of determining the compute, storage, network, and memory resources needed to serve a system's expected workload at target performance levels. It combines traffic forecasting, performance modeling, and infrastructure provisioning to ensure systems can handle peak loads without degradation.

Overview

Capacity planning is one of the most practical skills in system design, bridging the gap between theoretical architecture and real infrastructure decisions. The core question is deceptively simple: how many servers, how much storage, and how much bandwidth do you need? The answer requires combining business requirements (expected users, growth rate, peak events), technical benchmarks (how much traffic each server can handle), and operational safety margins (headroom for spikes, failures, and maintenance).

The process starts with demand estimation. How many users will the system serve? What is the read-to-write ratio? What is the peak-to-average traffic ratio? For a social media feed, peak traffic might be 3-5x average (during major events). For an e-commerce site, Black Friday might be 10-20x normal. These multipliers determine whether you plan for 10,000 RPS or 200,000 RPS -- a 20x difference in infrastructure cost.

Next, you benchmark individual components. A single PostgreSQL instance might handle 10,000 simple queries per second. A Redis node might handle 100,000 operations per second. A single application server might handle 500 RPS for a typical API. You divide the peak demand by per-component capacity to determine how many instances you need, then add headroom -- typically 30-50% above peak to handle unexpected spikes, rolling deployments (where some instances are temporarily unavailable), and organic growth before the next planning cycle.

Storage capacity planning adds a time dimension. If each user generates 10KB of data per day and you have 10 million users, that is 100GB per day, 36TB per year, before replication. With 3x replication and 2 years of retention, you need over 200TB of storage. These calculations seem simple but the details matter -- data compression ratios, index overhead, WAL storage, backup retention, and tombstone accumulation all affect the real number. The best capacity plans are living documents, reviewed quarterly against actual usage and adjusted based on observed growth curves.

Key Points
  • 1Always plan for peak traffic, not average. A system that handles average load but collapses under peak is a system that fails when it matters most. Common peak-to-average ratios: 2-3x for enterprise SaaS, 5-10x for consumer apps, 10-50x for event-driven systems (ticket sales, flash sales).
  • 2Use the formula: nodes_needed = (peak_QPS / per_node_QPS) * (1 + headroom_fraction). With peak_QPS=50,000, per_node_QPS=1,000, and 50% headroom: nodes = (50000/1000) * 1.5 = 75 nodes.
  • 3Storage planning must account for replication, retention, and growth: total_storage = daily_data * retention_days * replication_factor * (1 + index_overhead). A 3x replication factor triples your raw storage needs.
  • 4Network bandwidth is often the forgotten constraint. A service returning 50KB responses at 10,000 RPS needs 500MB/s = 4Gbps of network throughput. This exceeds a single 1Gbps NIC and requires load balancing across multiple hosts.
  • 5Auto-scaling helps but is not a substitute for capacity planning. Auto-scaling reacts to load (with 2-10 minute lag for new instances); capacity planning ensures baseline resources are available. Use capacity planning for base load and auto-scaling for unexpected spikes.
  • 6Review capacity plans quarterly against actual usage. Over-provisioning wastes money; under-provisioning causes outages. Track actual-vs-planned utilization to improve future estimates.
Simple Example

Planning a Photo-Sharing Service

You are building a photo-sharing service expecting 10 million daily active users. Each user uploads 2 photos/day (avg 2MB each) and views 50 photos/day. Write throughput: 10M * 2 = 20M photos/day = ~230 photos/sec. Read throughput: 10M * 50 = 500M reads/day = ~5,800 reads/sec. Storage per day: 20M * 2MB = 40TB. Per year with 3x replication: 40TB * 365 * 3 = 43.8PB. If a single storage node holds 10TB, you need ~4,380 storage nodes for one year of data. This back-of-envelope calculation reveals that storage -- not compute -- is the primary infrastructure cost.

Real-World Examples

Netflix

Netflix capacity plans for 'peak of peak' -- the highest traffic point during the highest-traffic day of the year (typically a major series premiere on a Sunday evening). They provision enough capacity to handle this peak plus 30% headroom across multiple AWS regions. Their Titus container platform pre-warms instances before anticipated traffic spikes based on show premiere schedules.

Twitter/X

Twitter experiences massive traffic spikes during global events (Super Bowl, elections, breaking news). Their capacity planning models include a 'major event multiplier' of 10-20x normal traffic for tweet reads and 5-10x for tweet writes. During the 2014 World Cup, they pre-provisioned 10x normal capacity in anticipation of peak moments like goals.

Slack

Slack's capacity planning accounts for the 'Monday morning spike' -- when millions of users sign in simultaneously at 9 AM across time zones. They discovered that connection establishment (WebSocket handshake, authentication, channel list loading) is 5-10x more expensive than steady-state message delivery, requiring disproportionate capacity for the daily sign-in wave.

Trade-Offs
AspectDescription
Over-provisioning vs Under-provisioningOver-provisioning wastes money but ensures reliability. Under-provisioning saves money but risks outages during traffic spikes. The cost of an outage (lost revenue, SLA penalties, brand damage) typically dwarfs the cost of extra servers. Most teams err on the side of over-provisioning by 30-50%.
Reserved vs On-Demand CapacityReserved instances (1-3 year commitments) cost 30-60% less than on-demand but lock you into a capacity estimate. The optimal strategy: reserve capacity for your baseline load (the minimum you always need) and use on-demand or spot instances for peak and burst traffic.
Horizontal vs Vertical ScalingVertical scaling (bigger machines) is simpler but has a ceiling and a single point of failure. Horizontal scaling (more machines) is unlimited but adds complexity (load balancing, data distribution, consensus). Capacity planning for horizontal systems must account for coordination overhead -- adding the 11th node does not add 1/10th more capacity.
Precision vs Speed of PlanningDetailed capacity models (simulating every component, modeling queueing delays) are accurate but time-consuming. Back-of-envelope estimates (quick multiplication with safety margins) are fast but rough. Use back-of-envelope for initial design and detailed modeling when the cost difference between estimates justifies the analysis time.
Case Study

Slack's 'Monday Morning Thundering Herd' Problem

Scenario

Every Monday morning, millions of Slack users reconnected simultaneously between 8-9 AM across time zones. The reconnection storm involved WebSocket handshakes, authentication token validation, channel membership queries, and message history fetches. This 'thundering herd' pattern generated 5-10x normal traffic within a 30-minute window, consistently degrading performance and occasionally causing partial outages.

Solution

Slack implemented a multi-layered capacity strategy: (1) pre-provisioned extra instances before the Monday morning window using scheduled auto-scaling; (2) introduced client-side jitter to spread reconnections over a wider window (instead of all clients reconnecting at exactly 9:00, they added random 0-60 second delays); (3) cached channel membership and recent messages aggressively so reconnections did not hit the database; (4) implemented connection coalescing where a single server-to-database connection served multiple user sessions.

Outcome

Monday morning peak traffic was reduced from 10x to 3x normal load through client-side jitter. Pre-provisioned capacity handled the remaining spike without degradation. Database load during reconnection dropped 80% due to caching. The key insight was that capacity planning is not just about adding servers -- reshaping the demand curve (jitter) is often more effective and cheaper than provisioning for the unmodified peak.

Common Mistakes
  • โš Planning for average traffic instead of peak. If your system handles 1,000 RPS on average but peaks at 10,000 RPS during promotions, you need capacity for 10,000 RPS (plus headroom). Sizing for the average guarantees failure at peak.
  • โš Forgetting replication and redundancy in storage calculations. Raw data of 10TB becomes 30TB with 3x replication, 45TB with backups, and 60TB+ with WAL logs and indexes. Always multiply by your replication factor and add 50% for operational overhead.
  • โš Ignoring the thundering herd effect after failures or deployments. When a cache expires or a service restarts, all requests simultaneously hit the backend. Capacity plans must account for cache cold-start scenarios and staggered rollouts.
  • โš Assuming linear scaling when adding nodes. Due to coordination overhead, network contention, and shared state, adding 10% more nodes typically yields 7-8% more throughput, not 10%. Benchmark actual scaling efficiency before committing to capacity numbers.
Related Concepts

See Capacity Planning in action

Explore system design templates that use capacity planning and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Plan capacity for a 10x flash sale traffic spike

Metrics to watch
peak_rpsheadroom_pctsaturation_point_rpscost_per_request
Run Simulation
Test Your Understanding

1A service expects 100M requests per day with a 5x peak-to-average ratio. If each server handles 2,000 RPS, how many servers are needed for peak traffic (before headroom)?

2Why is planning for average traffic instead of peak traffic a critical mistake?

Deeper Reading