1What is the primary advantage of L4 load balancers over L7 load balancers?
L4 load balancers operate at the transport layer (TCP/UDP), routing based on IP and port without inspecting content. L7 load balancers operate at the application layer (HTTP), enabling content-based routing, SSL termination, and header manipulation. Most large-scale architectures use both in a tiered configuration.
Load balancers operate at different layers of the OSI network model, and the layer at which they operate fundamentally determines their capabilities, performance characteristics, and appropriate use cases. L4 (Layer 4, transport layer) and L7 (Layer 7, application layer) load balancers are not competing technologies -- they are complementary, and most large-scale architectures use both in a tiered configuration. Understanding the trade-offs between them is essential for designing systems that balance performance with routing intelligence.
L4 load balancers operate at the transport layer, making routing decisions based on TCP or UDP packet headers: source IP, destination IP, source port, and destination port. They do not inspect the payload, decrypt TLS, or parse HTTP headers. This simplicity is their strength: without the overhead of content inspection, L4 load balancers can handle millions of connections per second and tens of gigabits of throughput on modest hardware. L4 load balancers route traffic using two primary mechanisms. NAT (Network Address Translation) rewrites the destination IP of incoming packets to the selected backend server, and the return traffic flows back through the load balancer for reverse NAT. DSR (Direct Server Return) rewrites only the inbound packet's destination MAC address, allowing the backend server to respond directly to the client without routing return traffic through the load balancer. DSR is critical for bandwidth-heavy workloads like video streaming, where return traffic (the actual video) is orders of magnitude larger than request traffic.
L7 load balancers operate at the application layer, fully understanding the protocol being load balanced (typically HTTP/HTTPS). They terminate TLS connections (decrypting at the load balancer), parse HTTP headers and URLs, inspect cookies, and make intelligent routing decisions based on request content. An L7 load balancer can route /api/* requests to API servers and /static/* requests to a CDN origin. It can read a session cookie to provide sticky sessions without IP-based hashing. It can add, modify, or remove HTTP headers (X-Forwarded-For, X-Request-ID). It can enforce rate limiting, integrate with a WAF (Web Application Firewall), compress responses, and cache static content. This intelligence comes at a cost: TLS termination, HTTP parsing, and content inspection require significantly more CPU than L4 packet forwarding, resulting in 10-100x lower throughput per instance.
The standard production architecture at scale uses L4 in front of L7. A tier of L4 load balancers (Linux IPVS, Google Maglev, AWS NLB) distributes TCP connections across a pool of L7 load balancers (Nginx, Envoy, HAProxy). The L4 tier handles raw connection distribution at wire speed using consistent hashing (ensuring the same TCP connection always reaches the same L7 instance), while the L7 tier provides intelligent HTTP routing, TLS termination, and request-level features. This separation of concerns allows the L4 tier to scale to millions of connections per second (it just forwards packets) while the L7 tier can be scaled independently to handle HTTP parsing and application logic. Google's architecture exemplifies this: Maglev (L4) distributes across GFE (Google Front End, L7) instances, which perform TLS termination, protocol negotiation (HTTP/2, HTTP/3), and route to backend services.
The Airport Security Analogy
An L4 load balancer is like a traffic controller at an airport entrance who routes cars to different parking garages based on their license plate number (IP address). The controller does not know what the passengers are carrying -- just where their car should go. Very fast, handles many cars per minute. An L7 load balancer is like the security checkpoint inside the terminal: it opens every bag (decrypts TLS), reads boarding passes (HTTP headers), checks passports (authentication), and directs passengers to the correct gate (backend server) based on their destination. Much slower per passenger, but makes intelligent routing decisions based on content.
Google Maglev
Maglev is Google's custom L4 load balancer that uses ECMP (Equal-Cost Multi-Path) routing and consistent hashing to distribute incoming packets across backend servers. Maglev handles over 10 million packets per second per machine using kernel-bypass networking (bypassing the Linux network stack entirely for raw packet processing). It uses consistent hashing with a custom algorithm (Maglev hashing) that provides near-perfect load distribution and minimal disruption when backends are added or removed. Maglev sits in front of Google's L7 layer (GFE).
Envoy
Envoy is a high-performance L7 proxy and load balancer originally built by Lyft, now widely used as a service mesh sidecar (Istio). Envoy provides HTTP/1.1, HTTP/2, and gRPC load balancing with advanced features: circuit breaking, outlier detection, zone-aware routing, automatic retries, and distributed tracing integration. As a sidecar proxy, Envoy runs alongside each microservice instance, providing L7 load balancing, observability, and security without modifying application code.
AWS NLB vs ALB
AWS offers both: Network Load Balancer (NLB) operates at L4, handling millions of requests per second with ultra-low latency and supporting TCP, UDP, and TLS passthrough. Application Load Balancer (ALB) operates at L7, supporting HTTP/HTTPS content-based routing, path-based routing, host-based routing, and integration with WAF and Cognito authentication. A common AWS pattern is NLB forwarding to ALB, or NLB for non-HTTP traffic (gRPC, custom TCP protocols) and ALB for HTTP services.
| Aspect | Description |
|---|---|
| Throughput vs Routing Intelligence | L4 handles 10-100x more connections per second than L7 because it forwards packets without parsing content. L7 provides content-based routing, TLS termination, and request-level features but requires significantly more CPU per connection. The trade-off is raw throughput versus routing sophistication. |
| DSR Efficiency vs Return Path Visibility | DSR (L4) eliminates the load balancer from the response path, massively reducing LB bandwidth requirements. However, the LB cannot inspect, modify, or log responses. L7 sees both request and response, enabling response compression, header injection, and detailed access logging, but all return traffic flows through the LB. |
| TLS Termination Location | L7 terminates TLS at the load balancer, allowing it to inspect HTTP traffic and make routing decisions. This requires the LB to have the TLS certificate and adds CPU overhead for decryption. L4 passes encrypted traffic through to backends, which must handle TLS themselves. TLS termination at L7 is simpler to manage (one certificate location) but the LB becomes a potential decryption bottleneck. |
| Health Check Depth | L4 health checks verify TCP connectivity (can the server accept a SYN?). A server might accept TCP connections while its application is deadlocked, unhealthy, or returning errors. L7 health checks issue HTTP requests and verify the response (status 200, correct body), catching application-level failures that L4 checks miss. |
Google's Maglev + GFE Architecture -- L4 and L7 Working Together
Scenario
Google's infrastructure handles billions of requests per second across Search, YouTube, Gmail, and Cloud Platform. No single load balancer can handle both the raw packet throughput needed to distribute this traffic AND the intelligent HTTP routing (TLS termination, HTTP/2 negotiation, header inspection, backend selection) needed for application-level features. Google needed an architecture that could handle tens of millions of packets per second while still providing sophisticated application-layer routing.
Solution
Google developed a two-tier architecture. Maglev, a custom L4 load balancer, sits at the network edge. Maglev uses kernel-bypass networking (DPDK-style) to process over 10 million packets per second per machine. Incoming packets are distributed across Maglev instances using ECMP (Equal-Cost Multi-Path) routing. Maglev uses consistent hashing (Maglev hashing algorithm) to forward each TCP connection to a specific GFE (Google Front End) instance. GFE operates at L7: it terminates TLS, negotiates HTTP/2 or QUIC, inspects HTTP headers, and routes requests to the appropriate backend service. This separation allows Maglev to scale with hardware (adding NICs and machines) while GFE scales with CPU for application-layer processing.
Outcome
The Maglev + GFE architecture handles all of Google's incoming internet traffic. Maglev achieves line-rate packet processing (10+ Gbps per machine) with consistent hashing that disrupts less than 1% of connections when a Maglev instance is added or removed. GFE provides application-aware routing, TLS termination for hundreds of millions of concurrent HTTPS connections, and seamless HTTP/2 and HTTP/3 negotiation. The two-tier architecture has been replicated in many organizations (IPVS + Nginx, NLB + ALB) and validated the L4-in-front-of-L7 pattern as the standard for large-scale load balancing.
See L4 vs L7 Load Balancers in action
Explore system design templates that use l4 vs l7 load balancers and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the primary advantage of L4 load balancers over L7 load balancers?
2Why is Direct Server Return (DSR) important for video streaming workloads?
3Why do large-scale architectures use L4 in front of L7 rather than L7 alone?