What is important about L4 vs L7 Load Balancers regarding "L4 load balancers route based on IP and port only, without i..."?

L4 load balancers route based on IP and port only, without inspecting packet contents. They cannot make routing decisions based on HTTP URLs, headers, or cookies. Their simplicity allows handling millions of connections per second with minimal CPU overhead.

What is important about L4 vs L7 Load Balancers regarding "L7 load balancers terminate TLS, parse HTTP, and enable cont..."?

L7 load balancers terminate TLS, parse HTTP, and enable content-based routing (URL path, headers, cookies). This intelligence enables features like sticky sessions, A/B testing, rate limiting, and WAF, but at 10-100x lower throughput than L4.

What is important about L4 vs L7 Load Balancers regarding "Direct Server Return (DSR) eliminates the load balancer from..."?

Direct Server Return (DSR) eliminates the load balancer from the return path. The backend responds directly to the client, bypassing the LB. This is critical for video streaming and large file downloads where return traffic is 100-1000x larger than request traffic.

What is important about L4 vs L7 Load Balancers regarding "The L4-in-front-of-L7 architecture is standard at scale. L4 ..."?

The L4-in-front-of-L7 architecture is standard at scale. L4 (IPVS, Maglev, NLB) distributes TCP connections to L7 (Nginx, Envoy, ALB), which performs HTTP routing and TLS termination. This separates connection distribution from application-layer intelligence.

What is important about L4 vs L7 Load Balancers regarding "L4 load balancers cannot perform TLS termination because the..."?

L4 load balancers cannot perform TLS termination because they do not inspect packet contents. All TLS termination happens at L7 or at the backend server. This means L4 LBs forward encrypted traffic, requiring backends to handle TLS if no L7 LB exists.

What is important about L4 vs L7 Load Balancers regarding "Health checks differ between L4 and L7. L4 can only check TC..."?

Health checks differ between L4 and L7. L4 can only check TCP connectivity (SYN-ACK). L7 can issue HTTP health check requests and verify response status codes, headers, and body content, providing application-aware health monitoring.

Vetora

🔀Networking & Protocols

L4 vs L7 Load Balancers

L4 load balancers operate at the transport layer (TCP/UDP), routing based on IP and port without inspecting content. L7 load balancers operate at the application layer (HTTP), enabling content-based routing, SSL termination, and header manipulation. Most large-scale architectures use both in a tiered configuration.

Overview

Load balancers operate at different layers of the OSI network model, and the layer at which they operate fundamentally determines their capabilities, performance characteristics, and appropriate use cases. L4 (Layer 4, transport layer) and L7 (Layer 7, application layer) load balancers are not competing technologies -- they are complementary, and most large-scale architectures use both in a tiered configuration. Understanding the trade-offs between them is essential for designing systems that balance performance with routing intelligence.

L4 load balancers operate at the transport layer, making routing decisions based on TCP or UDP packet headers: source IP, destination IP, source port, and destination port. They do not inspect the payload, decrypt TLS, or parse HTTP headers. This simplicity is their strength: without the overhead of content inspection, L4 load balancers can handle millions of connections per second and tens of gigabits of throughput on modest hardware. L4 load balancers route traffic using two primary mechanisms. NAT (Network Address Translation) rewrites the destination IP of incoming packets to the selected backend server, and the return traffic flows back through the load balancer for reverse NAT. DSR (Direct Server Return) rewrites only the inbound packet's destination MAC address, allowing the backend server to respond directly to the client without routing return traffic through the load balancer. DSR is critical for bandwidth-heavy workloads like video streaming, where return traffic (the actual video) is orders of magnitude larger than request traffic.

L7 load balancers operate at the application layer, fully understanding the protocol being load balanced (typically HTTP/HTTPS). They terminate TLS connections (decrypting at the load balancer), parse HTTP headers and URLs, inspect cookies, and make intelligent routing decisions based on request content. An L7 load balancer can route /api/* requests to API servers and /static/* requests to a CDN origin. It can read a session cookie to provide sticky sessions without IP-based hashing. It can add, modify, or remove HTTP headers (X-Forwarded-For, X-Request-ID). It can enforce rate limiting, integrate with a WAF (Web Application Firewall), compress responses, and cache static content. This intelligence comes at a cost: TLS termination, HTTP parsing, and content inspection require significantly more CPU than L4 packet forwarding, resulting in 10-100x lower throughput per instance.

The standard production architecture at scale uses L4 in front of L7. A tier of L4 load balancers (Linux IPVS, Google Maglev, AWS NLB) distributes TCP connections across a pool of L7 load balancers (Nginx, Envoy, HAProxy). The L4 tier handles raw connection distribution at wire speed using consistent hashing (ensuring the same TCP connection always reaches the same L7 instance), while the L7 tier provides intelligent HTTP routing, TLS termination, and request-level features. This separation of concerns allows the L4 tier to scale to millions of connections per second (it just forwards packets) while the L7 tier can be scaled independently to handle HTTP parsing and application logic. Google's architecture exemplifies this: Maglev (L4) distributes across GFE (Google Front End, L7) instances, which perform TLS termination, protocol negotiation (HTTP/2, HTTP/3), and route to backend services.

Key Points

1L4 load balancers route based on IP and port only, without inspecting packet contents. They cannot make routing decisions based on HTTP URLs, headers, or cookies. Their simplicity allows handling millions of connections per second with minimal CPU overhead.
2L7 load balancers terminate TLS, parse HTTP, and enable content-based routing (URL path, headers, cookies). This intelligence enables features like sticky sessions, A/B testing, rate limiting, and WAF, but at 10-100x lower throughput than L4.
3Direct Server Return (DSR) eliminates the load balancer from the return path. The backend responds directly to the client, bypassing the LB. This is critical for video streaming and large file downloads where return traffic is 100-1000x larger than request traffic.
4The L4-in-front-of-L7 architecture is standard at scale. L4 (IPVS, Maglev, NLB) distributes TCP connections to L7 (Nginx, Envoy, ALB), which performs HTTP routing and TLS termination. This separates connection distribution from application-layer intelligence.
5L4 load balancers cannot perform TLS termination because they do not inspect packet contents. All TLS termination happens at L7 or at the backend server. This means L4 LBs forward encrypted traffic, requiring backends to handle TLS if no L7 LB exists.
6Health checks differ between L4 and L7. L4 can only check TCP connectivity (SYN-ACK). L7 can issue HTTP health check requests and verify response status codes, headers, and body content, providing application-aware health monitoring.

Simple Example

The Airport Security Analogy

An L4 load balancer is like a traffic controller at an airport entrance who routes cars to different parking garages based on their license plate number (IP address). The controller does not know what the passengers are carrying -- just where their car should go. Very fast, handles many cars per minute. An L7 load balancer is like the security checkpoint inside the terminal: it opens every bag (decrypts TLS), reads boarding passes (HTTP headers), checks passports (authentication), and directs passengers to the correct gate (backend server) based on their destination. Much slower per passenger, but makes intelligent routing decisions based on content.

Real-World Examples

Google Maglev

Maglev is Google's custom L4 load balancer that uses ECMP (Equal-Cost Multi-Path) routing and consistent hashing to distribute incoming packets across backend servers. Maglev handles over 10 million packets per second per machine using kernel-bypass networking (bypassing the Linux network stack entirely for raw packet processing). It uses consistent hashing with a custom algorithm (Maglev hashing) that provides near-perfect load distribution and minimal disruption when backends are added or removed. Maglev sits in front of Google's L7 layer (GFE).

Envoy

Envoy is a high-performance L7 proxy and load balancer originally built by Lyft, now widely used as a service mesh sidecar (Istio). Envoy provides HTTP/1.1, HTTP/2, and gRPC load balancing with advanced features: circuit breaking, outlier detection, zone-aware routing, automatic retries, and distributed tracing integration. As a sidecar proxy, Envoy runs alongside each microservice instance, providing L7 load balancing, observability, and security without modifying application code.

AWS NLB vs ALB

AWS offers both: Network Load Balancer (NLB) operates at L4, handling millions of requests per second with ultra-low latency and supporting TCP, UDP, and TLS passthrough. Application Load Balancer (ALB) operates at L7, supporting HTTP/HTTPS content-based routing, path-based routing, host-based routing, and integration with WAF and Cognito authentication. A common AWS pattern is NLB forwarding to ALB, or NLB for non-HTTP traffic (gRPC, custom TCP protocols) and ALB for HTTP services.

Trade-Offs

Aspect	Description
Throughput vs Routing Intelligence	L4 handles 10-100x more connections per second than L7 because it forwards packets without parsing content. L7 provides content-based routing, TLS termination, and request-level features but requires significantly more CPU per connection. The trade-off is raw throughput versus routing sophistication.
DSR Efficiency vs Return Path Visibility	DSR (L4) eliminates the load balancer from the response path, massively reducing LB bandwidth requirements. However, the LB cannot inspect, modify, or log responses. L7 sees both request and response, enabling response compression, header injection, and detailed access logging, but all return traffic flows through the LB.
TLS Termination Location	L7 terminates TLS at the load balancer, allowing it to inspect HTTP traffic and make routing decisions. This requires the LB to have the TLS certificate and adds CPU overhead for decryption. L4 passes encrypted traffic through to backends, which must handle TLS themselves. TLS termination at L7 is simpler to manage (one certificate location) but the LB becomes a potential decryption bottleneck.
Health Check Depth	L4 health checks verify TCP connectivity (can the server accept a SYN?). A server might accept TCP connections while its application is deadlocked, unhealthy, or returning errors. L7 health checks issue HTTP requests and verify the response (status 200, correct body), catching application-level failures that L4 checks miss.

Case Study

Google's Maglev + GFE Architecture -- L4 and L7 Working Together

Scenario

Google's infrastructure handles billions of requests per second across Search, YouTube, Gmail, and Cloud Platform. No single load balancer can handle both the raw packet throughput needed to distribute this traffic AND the intelligent HTTP routing (TLS termination, HTTP/2 negotiation, header inspection, backend selection) needed for application-level features. Google needed an architecture that could handle tens of millions of packets per second while still providing sophisticated application-layer routing.

Solution

Google developed a two-tier architecture. Maglev, a custom L4 load balancer, sits at the network edge. Maglev uses kernel-bypass networking (DPDK-style) to process over 10 million packets per second per machine. Incoming packets are distributed across Maglev instances using ECMP (Equal-Cost Multi-Path) routing. Maglev uses consistent hashing (Maglev hashing algorithm) to forward each TCP connection to a specific GFE (Google Front End) instance. GFE operates at L7: it terminates TLS, negotiates HTTP/2 or QUIC, inspects HTTP headers, and routes requests to the appropriate backend service. This separation allows Maglev to scale with hardware (adding NICs and machines) while GFE scales with CPU for application-layer processing.

Outcome

The Maglev + GFE architecture handles all of Google's incoming internet traffic. Maglev achieves line-rate packet processing (10+ Gbps per machine) with consistent hashing that disrupts less than 1% of connections when a Maglev instance is added or removed. GFE provides application-aware routing, TLS termination for hundreds of millions of concurrent HTTPS connections, and seamless HTTP/2 and HTTP/3 negotiation. The two-tier architecture has been replicated in many organizations (IPVS + Nginx, NLB + ALB) and validated the L4-in-front-of-L7 pattern as the standard for large-scale load balancing.

Common Mistakes

⚠Using an L7 load balancer for raw TCP or UDP traffic that does not need content inspection. L7 adds unnecessary overhead for non-HTTP protocols. Use an L4 LB (NLB, IPVS) for database connections, gRPC (if not needing header routing), custom TCP protocols, and UDP traffic.
⚠Expecting L4 load balancers to provide session affinity based on cookies or HTTP headers. L4 cannot inspect HTTP content. Session affinity at L4 is limited to source IP hashing, which breaks when clients are behind NAT. Use L7 for cookie-based or header-based session affinity.
⚠Not considering DSR for bandwidth-heavy workloads. For video streaming or large file downloads, the response payload is 100-1000x larger than the request. Without DSR, all return traffic flows through the LB, making it a bandwidth bottleneck. DSR eliminates this bottleneck entirely.
⚠Using a single tier (L7 only) at scale and running into throughput limits. A single L7 LB instance might handle 50K-100K requests per second. Placing an L4 tier in front distributes TCP connections across multiple L7 instances, scaling HTTP processing horizontally.

Related Concepts

Load Balancing Algorithms TCP vs UDP HTTP/1.1 vs HTTP/2 vs HTTP/3 Horizontal vs Vertical Scaling Rate Limiting

See L4 vs L7 Load Balancers in action

Explore system design templates that use l4 vs l7 load balancers and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Simulate L4 vs L7 load balancing for e-commerce traffic

Metrics to watch

routing_latency_msthroughput_rpsconnection_reuse_ratiohealth_check_latency_ms

Run Simulation

Test Your Understanding

1What is the primary advantage of L4 load balancers over L7 load balancers?

2Why is Direct Server Return (DSR) important for video streaming workloads?

3Why do large-scale architectures use L4 in front of L7 rather than L7 alone?

Deeper Reading