Performance

Latency optimization, profiling, and capacity planning techniques.

Concepts

Latency is the time it takes for a single request to travel from the client to the server and back, while throughput is the number of requests a system can handle per unit of time. These two metrics are fundamentally linked but often in tension -- optimizing for one frequently comes at the cost of the other.

P99 & Tail LatencyP0

P99 (99th percentile) latency measures the worst-case response time experienced by 1 in 100 requests. Tail latency -- the latency at p99, p99.9, and beyond -- reveals problems that averages and medians hide. In distributed systems with fan-out, tail latency is amplified: a single slow component makes the entire request slow.

Capacity PlanningP0

Capacity planning is the discipline of determining the compute, storage, network, and memory resources needed to serve a system's expected workload at target performance levels. It combines traffic forecasting, performance modeling, and infrastructure provisioning to ensure systems can handle peak loads without degradation.

Profiling & Flame GraphsP1

Profiling is the practice of measuring where a program spends its time and resources (CPU, memory, I/O) to identify performance bottlenecks. Flame graphs are a visualization technique that makes profiling data intuitive by showing the call stack hierarchy and the relative cost of each function, enabling engineers to quickly pinpoint hot paths.

Back-of-Envelope EstimationP0

Back-of-envelope estimation is the practice of making quick, approximate calculations to evaluate system design feasibility, compare architectural alternatives, and identify potential bottlenecks. It is a critical interview skill and an essential engineering practice for avoiding costly design mistakes before writing any code.

CDN & Edge CachingP1

A Content Delivery Network (CDN) is a globally distributed network of proxy servers that caches content at locations close to end users. Edge caching extends this concept beyond static assets to include dynamic content, API responses, and even compute, reducing latency by serving requests from the nearest edge location rather than the distant origin server.

Load Testing & BenchmarkingP1

Load testing is the practice of simulating realistic traffic against a system to measure performance under expected and peak conditions. Benchmarking is the related practice of measuring the maximum throughput and latency characteristics of individual components. Together, they validate capacity plans, identify bottlenecks, and establish performance baselines before production deployment.