Load Balancer

Traffic

Distributes incoming traffic across multiple server instances using algorithms like round-robin, least connections, or consistent hashing.

Overview

A Load Balancer is one of the most critical components in any distributed system, responsible for distributing incoming network traffic across multiple backend servers to ensure no single server bears too much load. Load balancers improve availability (if one server fails, traffic is automatically redirected), increase throughput (multiple servers handle requests in parallel), and enable horizontal scaling (you can add or remove servers without affecting clients). In Vetora's simulator, the Load Balancer component models multiple distribution algorithms, health checking, and session affinity to show how different configurations affect latency, throughput, and failure handling.

Load balancing algorithms determine how traffic is distributed. Round-robin is the simplest — requests cycle through servers in sequence. It works well when all servers are identical and requests are homogeneous. Least connections routes each request to the server with the fewest active connections, which naturally handles heterogeneous request durations — long-running requests don't cause one server to accumulate a disproportionate queue. Weighted round-robin assigns different weights to servers, useful when servers have different capacities (e.g., a mix of m5.large and m5.2xlarge instances). Consistent hashing maps requests to servers using a hash ring, ensuring that the same key (user ID, session ID) always routes to the same server — essential for stateful services and caches.

Layer 4 (L4) load balancers operate at the TCP/UDP transport layer, making routing decisions based on IP address and port without inspecting packet contents. They are extremely fast (sub-millisecond overhead) but cannot route based on HTTP path, headers, or cookies. Layer 7 (L7) load balancers operate at the application layer, understanding HTTP/HTTPS/gRPC protocols. They can route based on URL path (/api/v1 vs /api/v2), host header (api.example.com vs admin.example.com), cookie values, and request content. The tradeoff is higher processing overhead (1–5ms) compared to L4.

Health checks are how load balancers detect failed or degraded servers. Active health checks send periodic probes (HTTP GET to a /health endpoint) and remove unresponsive servers from the pool. Passive health checks monitor real traffic — if a server returns too many 5xx errors within a window, it is temporarily removed. Combining both approaches provides the fastest and most reliable failure detection.

Sticky sessions (session affinity) route all requests from the same client to the same backend server, typically using cookies or IP-based hashing. This is necessary for applications that store session state in server memory rather than in an external store. However, sticky sessions reduce the load balancer's effectiveness — they can create hot spots if one user generates disproportionate traffic, and they complicate server replacement during deployments or failures.

When to Use

+Any architecture with multiple instances of the same service — the load balancer is the distribution layer
+Achieving high availability — automatic failover when health checks detect unhealthy instances
+Horizontal scaling — add instances behind the load balancer to handle increased traffic without client changes
+TLS termination — offload CPU-intensive SSL/TLS processing from backend servers to the load balancer
+Canary deployments — route a small percentage of traffic to a new version before full rollout

Not Recommended

-Single-instance deployments — a load balancer with one backend adds latency without benefit
-Peer-to-peer architectures where clients communicate directly with each other
-Event-driven systems where work is distributed via message queues rather than request/response patterns

Key Parameters in Vetora

Parameter	Description	Typical Values
algorithm	Traffic distribution strategy: roundRobin, leastConnections, weightedRoundRobin, or consistentHashing.	leastConnections for heterogeneous workloads, consistentHashing for stateful services
healthCheckIntervalMs	How frequently the load balancer probes each backend server's health endpoint.	5,000–15,000ms (5–15 seconds)
maxConnectionsPerServer	Connection limit per backend server. Prevents any single server from being overwhelmed by too many concurrent connections.	1,000–10,000
stickySession	Whether to enable session affinity, routing all requests from the same client to the same server.	Disabled for stateless services, enabled for session-bound applications

Real-World Examples

NGINX

High-performance L7 load balancer and reverse proxy serving over 30% of the web. Supports round-robin, least connections, IP hash, and generic hash algorithms with active health checks.

HAProxy

Open-source L4/L7 load balancer known for extreme performance (millions of connections) and fine-grained health checking. Used by GitHub, Stack Overflow, and Instagram.

AWS ALB/NLB

Application Load Balancer (L7) for HTTP/gRPC routing with path-based rules and target groups. Network Load Balancer (L4) for ultra-low-latency TCP/UDP with millions of connections per second.

Envoy Proxy

Cloud-native L7 proxy designed for microservices. Used as the data plane in service meshes like Istio. Supports advanced load balancing, circuit breaking, and observability.

Frequently Asked Questions

What is a Load Balancer and why is it important?

A Load Balancer distributes incoming network traffic across multiple backend servers to prevent any single server from becoming a bottleneck. It is important because it enables high availability (automatic failover when servers fail), horizontal scalability (add servers to handle more traffic), and improved performance (parallel request processing). In system design interviews, load balancers are a fundamental component that appears in virtually every architecture.

What is the difference between L4 and L7 load balancing?

Layer 4 (L4) load balancers route traffic based on IP address and TCP/UDP port without inspecting packet contents. They are extremely fast (sub-millisecond) but cannot make routing decisions based on HTTP paths, headers, or cookies. Layer 7 (L7) load balancers understand HTTP/HTTPS/gRPC and can route based on URL path, host header, cookies, and request body. L7 adds 1–5ms of overhead but enables content-based routing, TLS termination, and request transformation. Most modern architectures use L7 for HTTP traffic and L4 for non-HTTP protocols.

How do load balancer health checks work?

Health checks are periodic probes that the load balancer sends to each backend server to determine if it is healthy and can accept traffic. Active health checks send HTTP GET requests to a /health endpoint every 5–15 seconds. If a server fails to respond or returns an error after 2–3 consecutive checks, it is removed from the pool. Passive health checks monitor real traffic responses — if a server produces too many errors, it is temporarily marked unhealthy. Combined active/passive checks provide the fastest failure detection.

When should you use consistent hashing in a load balancer?

Use consistent hashing when requests for the same key should consistently route to the same server. Common use cases include caching layers (ensuring cache hits by routing the same URLs to the same cache server), stateful services with in-memory session data, and sharded databases where each server handles a specific key range. Consistent hashing also minimizes redistribution when servers are added or removed — only K/N keys are remapped (where K is total keys and N is total servers), compared to full redistribution with modular hashing.

What are sticky sessions and when should they be avoided?

Sticky sessions (session affinity) bind a client to a specific backend server for the duration of their session, typically using cookies or IP-based hashing. They are necessary when session state is stored in server memory. However, they should be avoided when possible because they create uneven load distribution (one heavy user can overload their pinned server), complicate deployments (you can't drain a server without disrupting sessions), and reduce availability (if the pinned server fails, the session is lost). The best practice is to externalize session state to Redis or a database, eliminating the need for sticky sessions.

Related Components

API GatewayTraffic

Centralized entry point that handles authentication, rate limiting, request routing, and protocol tr...

ServiceCompute

Application server or microservice that processes requests, runs business logic, and communicates wi...

Rate LimiterTraffic

Controls request throughput using algorithms like token bucket and sliding window to protect service...

CDNTraffic

Content Delivery Network that caches and serves content from edge locations close to users, reducing...

Try Load Balancer in the Simulator

Build architectures with Load Balancer and 13 other component types. Run discrete event simulations and get AI-powered feedback.

Open Playground