What is important about Back-of-Envelope Estimation regarding "Memorize key latency numbers: L1 cache ~1ns, L2 cache ~4ns, ..."?

Memorize key latency numbers: L1 cache ~1ns, L2 cache ~4ns, RAM ~100ns, SSD random read ~100us, HDD seek ~10ms, same-datacenter round trip ~0.5ms, US coast-to-coast ~70ms. These differ by orders of magnitude and determine which storage/networking approach is viable.

What is important about Back-of-Envelope Estimation regarding "Know common throughput benchmarks: a web server handles 100-..."?

Know common throughput benchmarks: a web server handles 100-1,000 RPS, Redis handles 100K ops/sec, PostgreSQL handles 5K-50K QPS (depending on query complexity), Kafka handles 1M+ messages/sec per broker, an SSD does 10K-100K IOPS.

What is important about Back-of-Envelope Estimation regarding "Use the 'seconds in a day' shortcut: 86,400 seconds/day is a..."?

Use the 'seconds in a day' shortcut: 86,400 seconds/day is approximately 10^5. So 1M requests/day = 10^6/10^5 = ~10 RPS. 1B requests/day = ~10,000 RPS. This conversion is the most common back-of-envelope step.

What is important about Back-of-Envelope Estimation regarding "Always estimate storage with replication and growth: raw_dat..."?

Always estimate storage with replication and growth: raw_data * replication_factor * retention_period * growth_multiplier. If each user generates 1KB/day, 100M users = 100GB/day = 36TB/year raw, 108TB with 3x replication, ~200TB with 2-year retention and growth.

What is important about Back-of-Envelope Estimation regarding "Calculate bandwidth separately from compute: a 10KB response..."?

Calculate bandwidth separately from compute: a 10KB response at 10,000 RPS = 100MB/s = 800Mbps. This is within a single 1Gbps NIC but would saturate it. Multiple NICs or smaller responses may be needed.

What is important about Back-of-Envelope Estimation regarding "Round aggressively to powers of 10. Use 10^3 instead of 1,02..."?

Round aggressively to powers of 10. Use 10^3 instead of 1,024 for KB-to-MB conversions. Use 10^5 instead of 86,400 for seconds per day. The goal is order-of-magnitude accuracy, not precision.

Vetora

📝Performance

Back-of-Envelope Estimation

Back-of-envelope estimation is the practice of making quick, approximate calculations to evaluate system design feasibility, compare architectural alternatives, and identify potential bottlenecks. It is a critical interview skill and an essential engineering practice for avoiding costly design mistakes before writing any code.

Overview

Back-of-envelope estimation is arguably the most underrated skill in system design. Before writing a single line of code, a five-minute calculation can reveal whether your architecture is feasible, which component will be the bottleneck, and how much infrastructure you need. Jeff Dean at Google popularized this practice, famously providing 'Numbers Every Engineer Should Know' -- a set of latency and throughput benchmarks that serve as building blocks for quick estimations.

The technique is named after the physicist Enrico Fermi, who was famous for making surprisingly accurate estimates from minimal information (Fermi estimation). The key insight is that order-of-magnitude accuracy is sufficient for most design decisions. Whether you need 10 servers or 15 does not matter at the design stage -- what matters is whether you need 10 or 10,000. Back-of-envelope calculations with generous rounding give you this order-of-magnitude answer in minutes.

The process follows a structured pattern: (1) clarify the requirements (DAU, data size, read/write ratio), (2) estimate traffic volume (requests per second), (3) estimate storage needs (data per user * users * retention), (4) estimate bandwidth (response size * RPS), (5) estimate compute (RPS / per-server-capacity), and (6) identify the bottleneck (is it CPU, storage, bandwidth, or I/O?). Each step uses simple multiplication and division with rounded numbers.

Critical reference numbers to internalize: a day has approximately 100,000 seconds (actually 86,400, but 10^5 is close enough for estimation). One million requests per day is about 12 requests per second. A single server can typically handle 100-1,000 RPS for web applications. A hard drive holds 1-10 TB. A single SSD can do 10,000-100,000 IOPS. Network bandwidth on a modern server is 1-25 Gbps. These reference points, combined with the specific requirements of your system, let you derive infrastructure needs in minutes.

Key Points

1Memorize key latency numbers: L1 cache ~1ns, L2 cache ~4ns, RAM ~100ns, SSD random read ~100us, HDD seek ~10ms, same-datacenter round trip ~0.5ms, US coast-to-coast ~70ms. These differ by orders of magnitude and determine which storage/networking approach is viable.
2Know common throughput benchmarks: a web server handles 100-1,000 RPS, Redis handles 100K ops/sec, PostgreSQL handles 5K-50K QPS (depending on query complexity), Kafka handles 1M+ messages/sec per broker, an SSD does 10K-100K IOPS.
3Use the 'seconds in a day' shortcut: 86,400 seconds/day is approximately 10^5. So 1M requests/day = 10^6/10^5 = ~10 RPS. 1B requests/day = ~10,000 RPS. This conversion is the most common back-of-envelope step.
4Always estimate storage with replication and growth: raw_data * replication_factor * retention_period * growth_multiplier. If each user generates 1KB/day, 100M users = 100GB/day = 36TB/year raw, 108TB with 3x replication, ~200TB with 2-year retention and growth.
5Calculate bandwidth separately from compute: a 10KB response at 10,000 RPS = 100MB/s = 800Mbps. This is within a single 1Gbps NIC but would saturate it. Multiple NICs or smaller responses may be needed.
6Round aggressively to powers of 10. Use 10^3 instead of 1,024 for KB-to-MB conversions. Use 10^5 instead of 86,400 for seconds per day. The goal is order-of-magnitude accuracy, not precision.

Simple Example

Estimating Twitter's Storage for Tweets

Twitter has approximately 500 million tweets per day. Each tweet is at most 280 characters (roughly 280 bytes of text, but with metadata -- user ID, timestamp, indexes -- approximately 1KB per tweet). Daily storage: 500M * 1KB = 500GB/day. Yearly: 500GB * 365 = ~180TB/year. With 3x replication: 540TB/year. Over 5 years: 2.7PB. This tells us: (1) storage is substantial but manageable with modern infrastructure, (2) the primary cost driver is storage, not compute, and (3) we will need distributed storage that can handle petabyte-scale data. The entire calculation takes 60 seconds.

Real-World Examples

Google

Jeff Dean's 'Numbers Every Programmer Should Know' became the canonical reference for back-of-envelope estimation. Google engineers use these numbers daily to evaluate whether a proposed design can meet latency requirements. For example, if a design requires reading from HDD on the critical path (10ms seek), it cannot meet a 5ms latency target -- the physical limitation eliminates the option without any prototyping.

Instagram

When Instagram was scaling to 1 billion users, engineers estimated photo storage needs: 100M daily uploads * 5MB average (multiple resolutions) = 500TB/day new data. This back-of-envelope calculation drove the decision to use a dedicated blob storage system (eventually S3) rather than storing photos in a database, and informed the CDN caching strategy needed to serve reads.

WhatsApp's engineering team estimated that their 2 billion users send an average of 65 billion messages per day. Quick math: 65B / 86,400 = ~750,000 messages/sec. At 1KB per message, that is 750MB/s of ingest throughput. This estimation drove their choice of Erlang (optimized for massive concurrency) and their storage architecture (mnesia for recent messages, cold storage for older ones).

Trade-Offs

Aspect	Description
Speed vs Accuracy	Back-of-envelope estimates sacrifice precision for speed. Rounding 86,400 to 100,000 introduces 15% error, and assumptions about user behavior can be off by 2-5x. However, the 10-minute estimate that says 'you need 50-100 servers' is more useful than the 3-month benchmarking project that says 'you need exactly 73 servers' -- because by the time the benchmark completes, requirements have changed.
Simplicity vs Realism	Estimates assume uniform behavior (all users are equally active, traffic is evenly distributed), which is rarely true. Power-law distributions mean 1% of users generate 30% of traffic. For initial design decisions, uniform assumptions are fine; for production capacity planning, you need actual traffic measurements.
Optimistic vs Conservative Assumptions	Optimistic estimates (no replication, no headroom, peak = average) lead to under-provisioning and outages. Conservative estimates (3x replication, 50% headroom, 10x peak factor) lead to over-spending. The best practice: estimate conservatively for infrastructure (build for peak + headroom) and optimistically for feasibility checks (if even the optimistic case does not work, abandon the approach).
Top-Down vs Bottom-Up Estimation	Top-down (start with total users, derive per-component load) is faster but may miss per-component constraints. Bottom-up (start with component benchmarks, calculate how many users each can serve) is more accurate but slower. Use top-down for initial feasibility, bottom-up for detailed capacity planning.

Case Study

Using Estimation to Choose Between SQL and NoSQL

Scenario

A startup was designing a social media platform expecting 50M DAU within two years. The team debated whether to use PostgreSQL (familiar, relational) or Cassandra (distributed, write-optimized) for storing the activity feed. Each user's feed would contain the 200 most recent posts from people they follow, with an average of 200 followees per user.

Solution

Back-of-envelope estimation settled the debate. Feed reads: 50M DAU * 10 feed loads/day = 500M reads/day = ~6,000 reads/sec. Feed writes (fan-out-on-write): each post is written to all followers' feeds. If 5M users post daily, each post fans out to 200 followers = 1B feed writes/day = ~12,000 writes/sec. Total: 6,000 reads + 12,000 writes = 18,000 ops/sec. A single PostgreSQL instance handles ~10,000 ops/sec, so 2-3 instances with read replicas would suffice for the first year. Cassandra's advantage (linear write scalability) would only matter at 100K+ writes/sec.

Outcome

The team chose PostgreSQL for the initial launch, saving 6 months of development time (no need to learn Cassandra's data modeling). They planned a migration to Cassandra when write throughput exceeded 50,000/sec. The estimation prevented premature optimization and let them ship faster. When they did migrate 18 months later, the actual numbers matched the estimation within 2x, validating the approach.

Common Mistakes

⚠Not memorizing the key reference numbers. Without knowing that an SSD random read is ~100us and an HDD seek is ~10ms, you cannot reason about whether a storage design meets latency requirements. Invest 30 minutes memorizing Jeff Dean's numbers.
⚠Forgetting the peak-to-average ratio. If your estimation uses average traffic but the system must handle 10x peak, you are off by an order of magnitude. Always ask: 'What is the peak multiplier?' and apply it.
⚠Using precise numbers when order-of-magnitude accuracy suffices. Writing '86,400 seconds/day' instead of '~100,000' slows down the calculation without improving the design decision. Round aggressively to powers of 10.
⚠Estimating compute but forgetting storage and bandwidth. Many back-of-envelope calculations stop at 'we need X servers for compute' but ignore that the response payloads require Y Gbps of network bandwidth or Z petabytes of storage over 3 years.

Related Concepts

Capacity Planning Latency vs Throughput Load Testing & Benchmarking Horizontal vs Vertical Scaling CDN & Edge Caching

See Back-of-Envelope Estimation in action

Explore system design templates that use back-of-envelope estimation and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Validate back-of-envelope estimates with real simulation data

Metrics to watch

actual_vs_estimated_rpsstorage_growth_gbbandwidth_mbpsp99_latency_ms

Run Simulation

Test Your Understanding

1A service receives 500 million requests per day. Approximately how many requests per second is this?

2If an HDD seek takes ~10ms and you need sub-5ms response time for a key-value lookup, which conclusion is correct?

Deeper Reading