Medium10 componentsInterview: Very High

Instagram — CDN + Presigned Upload (Async Processing)

Q: Why presigned URLs instead of server-side upload proxying?

Presigned URLs eliminate the bandwidth bottleneck entirely. The API server generates a time-limited, size-limited URL (~2ms, ~500 bytes) that authorizes the client to upload directly to S3. The server never sees the image bytes. At 5K uploads/sec x 2MB = 10GB/sec, proxying would require 100+ API pods just for bandwidth. With presigned URLs, the same 6 pods handle 5K uploads/sec because they only process lightweight metadata requests. Security is maintained: the presigned URL expires after 15 minutes, enforces a maximum file size, and restricts the S3 key prefix.

Q: What happens during the 5-10 second window before image variants are ready?

The post is visible in feeds with CDN URLs that may 404 briefly until ImageWorker completes processing. The client handles this gracefully: a loading shimmer placeholder is shown for image slots that return 404. On retry (automatic, every 2 seconds), the CDN URL resolves once the variant is uploaded to S3. In practice, the 92nd percentile processing time is under 5 seconds, so most users never see the placeholder. This is the fundamental trade-off for not blocking uploads on image processing.

Q: How does the fanout-on-write model break down for celebrities?

A celebrity with 50M followers triggers 50M Redis LPUSH operations per post. At 1KB per entry and ~0.1ms per Redis write, the fanout takes approximately 5,000 seconds (83 minutes) and writes 50GB to Redis. During this fanout storm, FeedCache write throughput is consumed by one user's post, degrading feed freshness for everyone. This is the celebrity / 'whale' problem — the motivation for hybrid fanout in the V2 variant, where celebrities' posts are pulled on read instead of pushed on write.

Q: Why Redis for feed cache instead of Memcached?

Redis supports list data structures (LPUSH, LRANGE, LTRIM) that map naturally to feed operations. A feed is a list of post IDs, and LRANGE returns a paginated slice in O(offset + count) time. Memcached only supports key-value pairs — representing a feed would require serializing/deserializing the entire list on every read and write. Redis also supports TTL-based expiry per key, cluster mode for horizontal scaling, and persistence for durability across restarts.

Q: How does CloudFront CDN handle cache invalidation for images?

It does not need to. Image URLs are immutable — they contain the media_id, and a new upload creates a new URL (new media_id = new CDN key). Existing images never change once processed, so CDN TTL can be set to maximum (86400s or longer). This is a critical design advantage of content-addressed storage: cache invalidation, one of the two hard problems in computer science, is avoided entirely. The only scenario requiring invalidation is content moderation (NSFW removal), which uses CloudFront invalidation API with batch processing.

Industry-standard Instagram architecture using presigned URL uploads (API server never touches image bytes), async image processing via Kafka workers (thumbnail + medium + large variants), Redis feed cache (pre-computed per-user feeds), and CloudFront CDN for global image delivery at 95% hit rate. The standard production approach for any media-heavy application at moderate scale.

CDNKafkaRedisAsync ProcessingInstagram

Try in Simulator

Problem Statement

The CDN + presigned upload approach to Instagram represents the industry-standard architecture for media-heavy applications at production scale. It solves the three critical bottlenecks of the naive approach: API server bandwidth saturation from proxying image bytes, slow synchronous thumbnail generation blocking upload responses, and absence of CDN causing global latency penalties for image delivery.

The key architectural insight is that the API server should never touch image bytes. When a user uploads a photo, the client first calls POST /api/v1/media/upload, which returns a presigned S3 URL in approximately 2ms. The client then uploads the 2MB image directly to S3 using this presigned URL — the API server is completely bypassed for the actual data transfer. This eliminates the 1GB/sec bandwidth bottleneck (at 500 uploads/sec) that makes the naive approach unscalable. The API server's role is reduced to a lightweight URL-signing operation.

Once the image lands in S3, an event is published to Kafka (upload-complete topic). ImageWorker consumers download the original from S3 and create three resized variants: thumbnail (150px, ~15KB), medium (600px, ~80KB), and large (1080px, ~250KB) using libvips. Each variant is uploaded back to S3 and made available via CloudFront CDN. The entire processing pipeline is asynchronous — the upload response returns in under 200ms, and variants are available within 5-10 seconds. The client shows a loading placeholder during this brief interval.

The feed system pre-computes per-user feeds in Redis. When a user creates a post, PostService writes the post record to PostgreSQL and pushes the post_id to each follower's feed cache in Redis. FeedReader serves feed requests by reading from Redis via LRANGE — at 92% cache hit rate, most feed reads complete in under 5ms from cache. On cache miss, FeedReader falls back to querying PostgreSQL.

CloudFront CDN serves all image reads from 200+ edge locations worldwide. At 95% cache hit rate, only 5% of image reads reach the S3 origin. Popular images (viral posts, celebrity content) stay hot at edge locations with long TTLs (86400s for immutable media). A user in Tokyo receives images in approximately 10ms from the nearest edge location instead of 180ms from the S3 origin.

The primary trade-offs of this approach are eventual consistency (image variants may not be ready for 5-10 seconds after upload), chronological feed ordering (no ML ranking), and celebrity write amplification (a user with 50M followers triggers 50M Redis cache writes per post). The V2 variant addresses all three: hybrid fanout eliminates celebrity write amplification, ML ranking replaces chronological ordering, and Cassandra provides linear-scale post storage.

Interviewers expect candidates to explain the presigned URL pattern (why the API server should not proxy image bytes), reason about async image processing trade-offs (eventual consistency vs upload latency), discuss CDN cache hit rates and their impact on origin load, and identify the celebrity write amplification problem as the motivation for the hybrid fanout approach in V2.

Architecture Overview

The CDN + presigned upload architecture uses ten components organized into four layers: traffic ingestion (ReadClient, UploadClient, ApiGateway, MainLB), application services (FeedReader, PostService), data stores (FeedCache/Redis, PostDB/PostgreSQL, MediaStore/S3), async pipeline (ImageStream/Kafka, ImageWorker), and content delivery (CloudFront CDN).

The upload path (write) handles 15% of traffic. UploadClient sends POST /api/v1/media/upload to the ApiGateway, which authenticates the JWT token (~3ms) and routes to PostService via MainLB. PostService generates a presigned S3 URL (~2ms) and returns it to the client. The client uploads the 2MB image directly to MediaStore (S3) — the API server never sees the image bytes. After successful S3 upload, PostService publishes an upload-complete event to ImageStream (Kafka). ImageWorker consumes the event, downloads the original from S3, creates three resized variants using libvips, uploads them back to S3, and updates PostDB with CDN URLs.

The read path handles 85% of traffic. ReadClient sends GET /api/v1/feed to the ApiGateway, which routes to FeedReader via MainLB. FeedReader checks FeedCache (Redis) for the pre-computed feed — at 92% hit rate, the feed is served from cache in ~2ms. The response contains post metadata with CDN URLs for image variants. The client fetches images directly from CloudFront CDN (~10ms at edge). On cache miss, FeedReader queries PostDB (PostgreSQL) and reconstructs the feed.

FeedCache (Redis) stores pre-computed per-user feeds as sorted lists of post IDs. When a user creates a post, PostService writes the post_id to each follower's feed list in Redis. This fanout-on-write approach enables O(1) feed reads but creates O(followers) write amplification per post — the critical limitation that the V2 variant addresses with hybrid fanout.

PostDB (PostgreSQL) is the source of truth for posts, users, follows, and media metadata. Indexed by (author_id, created_at) for feed reconstruction on cache miss. 32 partitions with 3 replicas for durability.

ImageStream (Kafka) decouples the upload path from image processing. 16 partitions for ImageWorker parallelism. Messages are small (~512 bytes) containing media_id and S3 key — the actual image is read from S3 by the worker.

CloudFront CDN is configured as an S3 origin for the image bucket. 200+ edge locations provide sub-100ms image delivery worldwide. Long TTL (86400s) on immutable media means images are cached at edge for 24 hours. Cache invalidation is not needed because image URLs include the media_id (immutable — a new upload creates a new URL).

Architecture Preview

Loading architecture preview...

Open in Simulator

Key Design Decisions

Presigned URL Upload (Direct-to-S3)

Choice

Client uploads directly to S3 via presigned URL instead of proxying through API

Rationale

At 5K uploads/sec x 2MB avg = 10GB/sec of image data. Proxying through the API tier would require massive bandwidth and memory on the API servers. Presigned URLs let the client upload directly to S3, and the API server only pays the ~2ms cost of signing the URL. This is Instagram's actual production pattern. The presigned URL includes an expiration (typically 15 minutes) and size limit for security.

Async Image Processing via Kafka

Choice

Kafka event triggers ImageWorker to create resized variants asynchronously

Rationale

Image resizing (thumbnail + medium + large) takes 200-500ms per image using libvips. Doing this synchronously in the upload path would push upload latency to 700ms+. Async via Kafka means the upload completes in ~100ms, and variants are ready within 5-10 seconds. The post may be visible before all variants are ready — the client shows a loading placeholder for the brief interval. Kafka provides at-least-once delivery guarantees for image processing jobs.

CDN for Image Delivery (CloudFront)

Choice

CloudFront CDN with S3 origin for global image delivery at 95% hit rate

Rationale

Instagram serves billions of images daily. Without CDN, all image traffic hits the S3 origin — at 500K feed reads/sec x 5 images per feed x 200KB per image = ~500 Tbps. CDN edge caching absorbs 95%+ of this traffic, delivering images in under 100ms globally. CDN is the single most critical component for Instagram's read path. Image URLs are immutable (keyed by media_id), enabling infinite TTL without cache invalidation complexity.

Redis Feed Cache (Fanout-on-Write)

Choice

Pre-computed per-user feeds in Redis, written on post creation

Rationale

Feed reads are the highest-volume operation (70% of traffic, 70K RPS peak). Pre-computing feeds in Redis (list of post IDs per user) enables O(1) feed reads via LRANGE. At 92% hit rate, only ~5.6K reads/sec hit the database. Without the cache, PostDB would need to handle 70K complex feed queries/sec involving JOIN operations.

Separate FeedReader and PostService

Choice

Independent services for reads and writes with different scaling characteristics

Rationale

Feed reads (70% of traffic) and post creation/uploads (15% of traffic) have very different access patterns and scaling needs. FeedReader is read-optimized (cache lookups), while PostService handles writes (DB inserts, S3 presigned URLs, Kafka events). Separating them allows independent scaling: 12 FeedReader pods vs 6 PostService pods. If PostService is degraded, feed reads continue unaffected.

Multiple Image Variants (Thumbnail, Medium, Large)

Choice

Three resized variants per uploaded image for bandwidth-appropriate delivery

Rationale

Mobile screens range from 320px to 4K. Serving a single 4K image to a 320px thumbnail wastes bandwidth (5MB vs 15KB). Three variants (150px thumbnail, 600px medium, 1080px large) let the client request the appropriate size based on viewport. CDN caches each variant independently. At 200KB average per variant, bandwidth savings are approximately 60% compared to serving only the original.

Scale & Performance

Target RPS

100K peak (70K feed reads + 15K post detail + 15K writes)

Latency (p99)

<200ms upload, <500ms feed (cache hit ~2ms), <100ms image (CDN)

Storage

~150TB/day growth (3 variants x 100M posts/day)

Availability

99.9% (multi-AZ, CDN edge redundancy)

Database Schema (HLD)

posts

Post records with CDN URLs for image variants. Written by PostService on post creation, updated by ImageWorker after async processing. Read by FeedReader on cache miss. Indexed by (author_id, created_at) for feed reconstruction.

post_id UUID PKauthor_id UUID FK (indexed)caption TEXTmedia_urls JSONB (CDN URLs for thumbnail, medium, large)like_count INTEGER DEFAULT 0created_at TIMESTAMPTZ (indexed)

Indexes: PK on post_id, idx_posts_author_time ON (author_id, created_at DESC)

media_urls JSONB stores CDN URLs for all three variants: {thumbnail: 'https://cdn.../150.webp', medium: 'https://cdn.../600.webp', large: 'https://cdn.../1080.webp'}. ImageWorker updates this field after async processing. Until updated, the field contains placeholder URLs that 404 at CDN.

follows

Social graph edges. Read by PostService during fanout-on-write to enumerate followers. Read by FeedReader on cache miss to reconstruct the feed. Indexed on both follower_id and followee_id.

follow_id UUID PKfollower_id UUID (indexed)followee_id UUID (indexed)created_at TIMESTAMPTZ

Indexes: PK on follow_id, idx_follows_follower ON (follower_id), idx_follows_followee ON (followee_id), UNIQUE ON (follower_id, followee_id)

Read by FanoutWorker to enumerate followers for cache writes. At 1,200 posts/sec with avg 500 followers each, the followee_id index handles 600K lookups/sec across all post creations. Partitioned across 32 PostgreSQL shards.

feed_cache (Redis)

Pre-computed per-user feeds stored as Redis lists. Key: feed:{user_id}. Value: ordered list of post IDs with CDN URLs. Written by PostService on post creation (LPUSH to each follower's feed). Read by FeedReader on feed request (LRANGE).

KEY: feed:{user_id}VALUE: List of {post_id, author_id, cdn_urls, created_at}TTL: 300 seconds

Indexes: Sorted by insertion order (most recent first via LPUSH)

LTRIM keeps each feed list at max 500 entries. At 500M users with 20% DAU, approximately 100M feed keys are active. 92% hit rate. LRU eviction for inactive users. 6 Redis nodes, 18GB effective capacity.

Event Contracts

Upload Completeupload-complete

Published by PostService after a client completes direct-to-S3 image upload via presigned URL. Consumed by ImageWorker to trigger async resizing into three variants (thumbnail 150px, medium 600px, large 1080px). Partitioned by media_id for per-image ordering. Expected 5K msg/sec at peak upload rate.

Key Schema

media_id (string) — partitioned by media_id for per-image ordering

Value Schema

{ "media_id": "string", "s3_key": "string", "file_type": "string", "post_id": "string | null" }

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
V0: Naive (API-Proxied Upload + No CDN)	T1	500-800ms upload, 200-500ms feed, 50-200ms image	~1K RPS total	$1,130/month	Low	99% (single DB)
V1: CDN + Presigned Upload (Async Processing)	T2	<200ms upload, <500ms feed (cache hit ~2ms), <100ms image (CDN)	100K RPS peak	$3,000/month	Medium	99.9% (multi-AZ)
V2: Async Media Pipeline (Hybrid Feed + Whale Detection)	T3	<150ms upload, <200ms feed, <50ms image (CDN)	1M RPS peak	$15,000/month	Very High	99.95% (multi-AZ, tiered CDN)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why presigned URLs instead of server-side upload proxying?

Presigned URLs eliminate the bandwidth bottleneck entirely. The API server generates a time-limited, size-limited URL (~2ms, ~500 bytes) that authorizes the client to upload directly to S3. The server never sees the image bytes. At 5K uploads/sec x 2MB = 10GB/sec, proxying would require 100+ API pods just for bandwidth. With presigned URLs, the same 6 pods handle 5K uploads/sec because they only process lightweight metadata requests. Security is maintained: the presigned URL expires after 15 minutes, enforces a maximum file size, and restricts the S3 key prefix.

What happens during the 5-10 second window before image variants are ready?

The post is visible in feeds with CDN URLs that may 404 briefly until ImageWorker completes processing. The client handles this gracefully: a loading shimmer placeholder is shown for image slots that return 404. On retry (automatic, every 2 seconds), the CDN URL resolves once the variant is uploaded to S3. In practice, the 92nd percentile processing time is under 5 seconds, so most users never see the placeholder. This is the fundamental trade-off for not blocking uploads on image processing.

How does the fanout-on-write model break down for celebrities?

A celebrity with 50M followers triggers 50M Redis LPUSH operations per post. At 1KB per entry and ~0.1ms per Redis write, the fanout takes approximately 5,000 seconds (83 minutes) and writes 50GB to Redis. During this fanout storm, FeedCache write throughput is consumed by one user's post, degrading feed freshness for everyone. This is the celebrity / 'whale' problem — the motivation for hybrid fanout in the V2 variant, where celebrities' posts are pulled on read instead of pushed on write.

Why Redis for feed cache instead of Memcached?

Redis supports list data structures (LPUSH, LRANGE, LTRIM) that map naturally to feed operations. A feed is a list of post IDs, and LRANGE returns a paginated slice in O(offset + count) time. Memcached only supports key-value pairs — representing a feed would require serializing/deserializing the entire list on every read and write. Redis also supports TTL-based expiry per key, cluster mode for horizontal scaling, and persistence for durability across restarts.

How does CloudFront CDN handle cache invalidation for images?

It does not need to. Image URLs are immutable — they contain the media_id, and a new upload creates a new URL (new media_id = new CDN key). Existing images never change once processed, so CDN TTL can be set to maximum (86400s or longer). This is a critical design advantage of content-addressed storage: cache invalidation, one of the two hard problems in computer science, is avoided entirely. The only scenario requiring invalidation is content moderation (NSFW removal), which uses CloudFront invalidation API with batch processing.

Related Templates

Instagram — Naive (API-Proxied Upload + No CDN)Instagram — Async Media Pipeline (Hybrid Feed + Whale Detection)Twitter Feed — Fanout-on-Write

Discussion

Ready to design your own Instagram?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator