Vetora logo
🌐Performance

CDN & Edge Caching

A Content Delivery Network (CDN) is a globally distributed network of proxy servers that caches content at locations close to end users. Edge caching extends this concept beyond static assets to include dynamic content, API responses, and even compute, reducing latency by serving requests from the nearest edge location rather than the distant origin server.

Overview

The speed of light imposes a hard floor on network latency: approximately 1ms per 200km of fiber optic cable, or about 70ms for a US coast-to-coast round trip and 130ms for a transatlantic round trip. No amount of server optimization can overcome this physical limit. CDNs solve this problem by eliminating the distance: instead of every request traveling to a central origin server, content is cached at hundreds of edge locations worldwide, so the request only needs to travel to the nearest point of presence (PoP), typically 5-30ms away.

Traditional CDNs cache static assets: images, CSS, JavaScript bundles, fonts, and video files. These assets change infrequently and are identical for all users, making them ideal for caching. Modern CDNs extend caching to dynamic content through techniques like edge-side includes (ESI, where parts of a page are cached independently), surrogate keys (where related cache entries can be invalidated as a group), and stale-while-revalidate (serving cached content while asynchronously fetching a fresh version from origin).

Edge computing takes the CDN concept further by moving computation itself to the edge. Cloudflare Workers, AWS Lambda@Edge, and Deno Deploy execute application code at edge locations, enabling personalization, A/B testing, authentication, and API routing without a round trip to the origin. This blurs the line between CDN and application server: the edge can generate dynamic responses, transform content, and even query distributed databases like Cloudflare D1 or DynamoDB Global Tables.

Cache invalidation remains the fundamental challenge. Phil Karlton's famous quote -- 'There are only two hard things in Computer Science: cache invalidation and naming things' -- applies especially to CDNs because the cache is distributed across hundreds of locations. Invalidating a cached resource across all PoPs takes time (typically 1-10 seconds, but sometimes longer), during which different users may see different versions. Strategies include TTL-based expiration (simple but potentially stale), event-driven purging (fast but complex), and cache versioning via URL fingerprinting (append a hash to the URL so new versions get new cache keys, avoiding invalidation entirely).

Key Points
  • 1CDNs reduce latency by serving content from the nearest edge location. A request from Tokyo to a US-East origin takes ~150ms; from Tokyo to a Tokyo PoP takes ~5ms. For static assets, this is a 30x latency improvement with no application changes.
  • 2Cache-Control headers drive CDN behavior: max-age sets the TTL, s-maxage sets the CDN-specific TTL (distinct from browser cache), stale-while-revalidate serves cached content while fetching fresh content in the background, and no-cache forces revalidation on every request.
  • 3URL fingerprinting (e.g., app.a3f8b2c.js) eliminates the invalidation problem for static assets. Since the URL changes with the content, the old URL remains cached (harmless) and the new URL is fetched fresh. This enables very long TTLs (1 year) with instant updates.
  • 4Edge compute (Cloudflare Workers, Lambda@Edge) runs application code at edge locations with sub-millisecond cold start times. Use cases include A/B testing, geolocation-based routing, request authentication, header manipulation, and serving personalized content without an origin round trip.
  • 5CDN costs are based on bandwidth (data transferred from edge to users) and requests (per-request pricing). For high-traffic sites, CDN costs can be significant. Optimizing asset sizes (compression, image optimization) reduces both latency and CDN costs.
  • 6Multi-CDN strategies use DNS-based routing to direct traffic to the fastest or most available CDN for each user's location. This improves reliability (failover if one CDN has issues) and performance (choose the CDN with the best PoP for each region).
Simple Example

Serving a Product Image from the Edge

An e-commerce site hosts product images on a US-East origin server. A customer in Sydney, Australia requests a product image. Without a CDN, the request travels ~15,000km to US-East and back -- approximately 200ms of network latency alone. With a CDN, the first request goes to origin (200ms), and the CDN caches the image at the Sydney PoP. All subsequent requests from Australian users are served from the Sydney PoP in ~5ms. If the product gets 10,000 views per day from Australia, 9,999 are served at 5ms instead of 200ms. The origin sees only 1 request instead of 10,000, reducing both latency and origin load.

Real-World Examples

Netflix

Netflix operates Open Connect, a purpose-built CDN with appliances embedded directly in ISP networks. During off-peak hours, popular content is pre-positioned on these appliances so that during peak viewing, streams are served from hardware inside the user's ISP -- often zero network hops away. This is why Netflix streams rarely buffer even during peak hours: the content is already local.

Shopify

Shopify serves millions of storefronts through Cloudflare's CDN. They use edge caching with surrogate keys for intelligent invalidation: when a product price changes, only cache entries containing that product are purged (not the entire store). Combined with stale-while-revalidate, customers see sub-100ms page loads while the origin handles only uncached requests.

Vercel

Vercel's platform deploys Next.js applications to edge locations worldwide using incremental static regeneration (ISR). Pages are statically generated and cached at the edge, but can be regenerated in the background on a configurable schedule. This provides CDN-level performance (5-30ms response times) with near-real-time content updates -- a middle ground between fully static and fully dynamic.

Trade-Offs
AspectDescription
Cache Freshness vs LatencyLonger TTLs mean higher cache hit rates and lower latency, but users may see stale content. Shorter TTLs keep content fresh but increase origin load and reduce the CDN's effectiveness. Stale-while-revalidate offers a compromise: serve the cached version immediately while fetching a fresh version in the background.
Edge Compute vs Origin ComputeMoving compute to the edge reduces latency but introduces constraints: edge environments have limited CPU time (typically 10-50ms), limited memory (128MB), and no access to the full application database. Complex business logic that needs database joins or transaction support must stay at the origin.
CDN Cost vs Origin CostCDN bandwidth is typically more expensive per GB than origin bandwidth ($0.02-0.08/GB for CDN vs $0.01/GB for cloud egress). However, the CDN reduces origin compute costs (fewer requests reach origin) and improves performance. The break-even point depends on cache hit ratio -- above 90% hit rate, CDN typically saves money overall.
Global Consistency vs Regional PerformanceA single origin ensures consistency (all users see the same data) but adds latency for distant users. Edge caching improves regional performance but introduces consistency windows where different PoPs serve different versions. For financial or transactional data, this inconsistency is unacceptable; for media content, it is invisible.
Case Study

Cloudflare's Edge Network Stopping a Record DDoS Attack

Scenario

In 2023, Cloudflare mitigated the largest-ever HTTP DDoS attack at 71 million requests per second (RPS). The attack targeted a customer's API endpoint, which, if it had reached the origin server, would have overwhelmed any conceivable backend infrastructure. The customer's origin servers could handle approximately 50,000 RPS -- the attack was 1,400x their capacity.

Solution

Cloudflare's edge network absorbed the attack across its 300+ PoPs worldwide. Each PoP independently detected and filtered malicious traffic using machine learning models running at the edge. Legitimate requests were forwarded to the origin; attack traffic was dropped at the edge without consuming origin resources. The edge's distributed nature meant no single location needed to handle the full 71M RPS -- the load was spread across the global network.

Outcome

The customer experienced zero downtime and no performance degradation during the attack. Origin server load remained normal (~30,000 RPS of legitimate traffic). This demonstrated that CDNs are not just performance tools but also security infrastructure. The edge network's distributed capacity (hundreds of Tbps globally) can absorb attacks that would overwhelm any single-origin deployment. The cost of the CDN was a fraction of what equivalent DDoS mitigation infrastructure would cost to self-host.

Common Mistakes
  • Setting Cache-Control headers incorrectly or not setting them at all. Without explicit caching directives, CDN behavior is unpredictable -- some CDNs cache by default, others do not. Always set Cache-Control with appropriate max-age, s-maxage, and revalidation directives.
  • Caching personalized or authenticated content at the edge. If a CDN caches a page containing user-specific data (account info, shopping cart), other users may see that data. Always use Cache-Control: private or no-store for authenticated responses.
  • Not using URL fingerprinting for static assets. Without fingerprinting, deploying a new version of app.js means purging the CDN cache and hoping all 300 PoPs update simultaneously. With fingerprinting (app.a3f8b2.js), old and new versions coexist peacefully.
  • Ignoring CDN cache hit ratio in performance monitoring. A CDN with a 30% hit rate provides minimal benefit and adds cost. Monitor cache hit ratio and investigate low rates -- common causes include overly short TTLs, query string variations, and Cookie-based cache busting.
Related Concepts

See CDN & Edge Caching in action

Explore system design templates that use cdn & edge caching and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Measure CDN cache hit ratio and origin offload for video

Metrics to watch
cdn_hit_ratioorigin_bandwidth_mbpsedge_latency_msp99_latency_ms
Run Simulation
Test Your Understanding

1What does the Cache-Control directive 'stale-while-revalidate' do?

2Why is URL fingerprinting (e.g., app.a3f8b2.js) preferred over cache purging for static assets?

Deeper Reading