Hard12 componentsInterview: Very High

Instagram — Async Media Pipeline (Hybrid Feed + Whale Detection)

Q: How does whale detection work in practice?

FanoutWorker receives a post-created event from Kafka containing the author_id and follower_count. If follower_count = 10,000, fanout is skipped. The threshold is stored in the users table and cached in Redis with a 1-hour TTL. When a user crosses 10K followers, the is_whale flag is updated in PostDB and the cache is invalidated. The transition is not instant — a user at 9,999 followers gets fanout; at 10,001 they switch to pull. This creates a brief inconsistency window where some followers receive fanout and new followers do not, but this resolves within one cache TTL cycle.

Full production Instagram architecture with async media processing pipeline (4 image variants + WebP optimization), hybrid push/pull feed model (fanout-on-write for normal users, pull-on-read for celebrities with 10K+ followers), ML-based feed ranking, Cassandra for linear-scale post storage, and tiered CDN with 97% hit rate. Handles 1M+ RPS at Instagram scale.

CassandraKafkaHybrid FanoutML RankingCDNInstagram

Try in Simulator

Problem Statement

The async media pipeline approach to Instagram represents the full production architecture required at Instagram-scale: 1,200 photo uploads per second, 500K feed reads per second, ~50 PB of accumulated media storage, and global image delivery under 50ms. This variant solves three problems that the V1 CDN approach cannot handle: celebrity write amplification, chronological feed limitations, and PostgreSQL scaling ceilings.

The celebrity write amplification problem is the most critical. In the V1 approach, when a user creates a post, PostService writes the post_id to every follower's feed cache in Redis. For a normal user with 500 followers, this is 500 Redis writes — fast and manageable. But for a celebrity with 50 million followers, a single post triggers 50 million Redis LPUSH operations. At ~0.1ms per write, this takes approximately 5,000 seconds (83 minutes) and writes 50GB to Redis. During this fanout storm, FeedCache write throughput is consumed by one user's post, degrading feed freshness for all users on the platform. This is the 'whale problem' — named because a few large accounts create disproportionate system load.

The hybrid fanout model solves this elegantly. FanoutWorker checks the author's follower count before fanning out. If the author has fewer than 10,000 followers (99.5% of users), normal fanout-on-write proceeds — the post_id is pushed to each follower's feed cache. If the author has 10,000 or more followers (the 'whale' threshold), fanout is skipped entirely. Instead, when a user reads their feed, FeedService pulls recent posts from whale accounts they follow by querying PostDB directly. The feed is a merge of two sources: pre-computed fanout results from Redis (normal users) and real-time whale pulls from Cassandra (celebrities). This hybrid approach caps fanout cost at 10,000 writes maximum while maintaining sub-200ms feed read latency.

Cassandra replaces PostgreSQL as the post storage engine. At 1,200 writes/sec with billions of accumulated posts, PostgreSQL encounters three scaling problems: B-tree index depth grows logarithmically with row count (degrading write latency), range queries on multi-TB tables require expensive sequential scans, and horizontal scaling requires application-level sharding. Cassandra provides linear-scale writes (add nodes for more throughput), native time-series partitioning (partition by author_id, cluster by created_at DESC for efficient range queries), and tunable consistency (LOCAL_QUORUM for writes, LOCAL_ONE for fast reads).

The media processing pipeline generates four image variants per upload: thumbnail (150px, ~10KB WebP), medium (600px, ~60KB WebP), large (1080px, ~200KB WebP), and XL (2048px, ~400KB WebP). WebP format provides 25-34% smaller file sizes than JPEG at equivalent quality, saving approximately $150K/month in CDN egress costs at Instagram scale. JPEG fallbacks are generated for older browsers. ImageWorker runs 32 parallel workers consuming from 64 Kafka partitions, processing each image in ~500ms.

ML-based feed ranking replaces chronological ordering. FeedService scores each candidate post using precomputed feature vectors stored in Redis: predicted like probability, comment probability, save probability, and content affinity scores. Posts are ranked by a weighted combination of these signals rather than by timestamp alone. This increases engagement metrics (likes, comments, time-on-app) by 15-30% compared to chronological feeds, which is why Instagram, Facebook, TikTok, and Twitter all switched from chronological to ranked feeds.

Interviewers expect candidates to explain the whale problem and hybrid fanout solution, reason about Cassandra's advantages for time-series post data, discuss the trade-offs of ML ranking vs chronological ordering, and analyze the image variant pipeline (why four sizes, why WebP, why async).

Architecture Overview

The async media pipeline architecture uses twelve components organized into five layers: traffic ingestion (ReadClient, UploadClient, ApiGateway, ReadLB, UploadLB), application services (FeedService, UploadService), data stores (FeedCache/Redis, PostDB/Cassandra, ObjectStorage/S3), async pipeline (UploadStream/Kafka, ImageWorker, FanoutWorker), and content delivery (CloudFront CDN).

The write path handles 20% of total traffic through a dedicated UploadLB. UploadService generates presigned S3 URLs for direct client-to-S3 image upload (2ms), persists post records to PostDB (Cassandra, ~10ms write), and publishes events to UploadStream (Kafka). Two Kafka topics drive the async pipeline: upload-complete triggers ImageWorker to generate four image variants, and post-created triggers FanoutWorker to distribute the post to followers' feed caches. FanoutWorker implements whale detection: if the author has fewer than 10K followers, it writes the post_id to each follower's feed cache in Redis; if 10K+, it skips fanout. This hybrid model caps write amplification at 10K Redis writes per post regardless of celebrity status.

The read path handles 80% of total traffic through a dedicated ReadLB. FeedService implements the hybrid feed model: it reads pre-computed fanout results from FeedCache (Redis LRANGE, ~2ms) and queries PostDB (Cassandra) for recent posts from whale accounts the user follows (~5ms per whale query). The two result sets are merged by timestamp and then ranked using ML scoring. Precomputed feature vectors per user are stored in a separate Redis cache space (rank:{user_id}), enabling sub-millisecond scoring of candidate posts. The final ranked feed contains post metadata with CDN URLs; the client fetches images directly from CloudFront.

FeedCache is a 12-node Redis cluster with 36GB effective capacity. Two cache spaces: user feeds (feed:{user_id}) for fanout results, and ranking features (rank:{user_id}) for ML model inputs. 94% hit rate on feed reads. LRU eviction with 600-second TTL.

PostDB is Amazon Keyspaces (Cassandra-compatible) with 64 partitions and 3 replicas. Posts are partitioned by author_id and clustered by created_at DESC — this data model directly supports the whale pull pattern: SELECT * FROM posts WHERE author_id = ? AND created_at > ? LIMIT 20. Cassandra's LSM-tree storage engine handles 1,200 writes/sec without the B-tree degradation that PostgreSQL would experience at billions of rows.

ImageWorker runs 32 parallel workers consuming from 64 Kafka partitions on the upload-complete topic. Each worker downloads the original image from S3, creates four variants (thumbnail, medium, large, XL) in WebP format with JPEG fallbacks, uploads all variants to S3, and updates PostDB with CDN URLs. Processing time is ~500ms per image. The 32 workers can sustain ~60 images/sec, with burst capacity handled by Kafka buffering.

FanoutWorker runs 16 workers consuming from the post-created topic. For normal users (under 10K followers), it enumerates followers from PostDB and executes Redis LPUSH for each follower's feed key. Average fanout time: ~50ms (500 followers x 0.1ms per Redis write). For whales (10K+ followers), fanout is skipped — the only cost is the follower_count check.

CloudFront CDN uses tiered caching: edge locations (200+ worldwide, ~5ms), regional edge caches (13 locations, ~15ms), and S3 origin (~50ms). Tiered caching achieves 97% overall hit rate — popular images stay hot at edge, less popular images are served from regional cache. Content negotiation at edge serves WebP to supported browsers, JPEG to others.

Architecture Preview

Loading architecture preview...

Open in Simulator

Request Flow — Hybrid Feed with Whale Detection

This sequence diagram traces the two primary flows: post creation with hybrid fanout, and feed reading with whale merge and ML ranking. The critical insight is the whale detection in FanoutWorker — a simple follower_count check determines whether to push (normal) or skip (whale). On the read side, FeedService merges two data sources: pre-computed fanout from Redis and real-time whale pulls from Cassandra.

The second insight is the ML ranking step. After merging fanout and whale results, FeedService scores each candidate post using precomputed feature vectors from Redis. This transforms the feed from a chronological list into a personalized, engagement-optimized ranking — the same approach used by Instagram, Facebook, and TikTok.

Loading diagram...

Step-by-Step Walkthrough

1UploadService persists post to Cassandra and publishes post-created to Kafka. Response returns immediately — fanout is async
2FanoutWorker checks follower_count: under 10K triggers fanout-on-write (LPUSH to each follower's Redis feed); 10K+ skips fanout entirely (whale detection)
3FeedService reads fanout results from Redis LRANGE (~2ms cache hit). This contains posts from normal users only
4FeedService queries Cassandra for recent posts from whale accounts the user follows (~5ms per whale query). Merges with fanout results by timestamp
5FeedService loads ML feature vector from Redis and scores each candidate post by predicted engagement. Returns ranked feed with CDN URLs
6Client fetches images from CloudFront CDN. 97% hit rate means most images served from edge in ~5ms

Key Design Decisions

Hybrid Fanout (Push for Normal, Pull for Whales)

Choice

Fanout-on-write for users under 10K followers, pull-on-read for 10K+

Rationale

Pure fanout-on-write creates O(F) write amplification where F = follower count. A celebrity with 50M followers triggers 50M Redis writes per post — taking 83 minutes and writing 50GB. Hybrid fanout caps cost at 10K writes maximum. FeedService merges pushed content (from Redis) with pulled whale content (from Cassandra) at read time, adding only ~5ms per whale query. Instagram and Twitter both use this hybrid approach in production to balance feed freshness with system stability.

Cassandra for Post Storage

Choice

Amazon Keyspaces (Cassandra-compatible) instead of PostgreSQL

Rationale

At billions of posts, PostgreSQL B-tree indexes degrade: index depth grows logarithmically, range queries require multi-page scans, and horizontal sharding requires application-level routing. Cassandra provides O(1) writes via LSM-tree (append-only), native time-series partitioning (partition by author_id, cluster by created_at DESC), and linear horizontal scaling (add nodes for throughput). Tunable consistency: LOCAL_QUORUM for writes, LOCAL_ONE for reads.

Four Image Variants with WebP

Choice

Thumbnail (150px), Medium (600px), Large (1080px), XL (2048px) in WebP + JPEG

Rationale

WebP achieves 25-34% smaller files than JPEG at equivalent quality. At 500K feed reads/sec x 5 images x 80KB average = 200GB/sec CDN egress. 30% reduction saves 60GB/sec bandwidth (~$150K/month in CDN costs). The XL variant serves high-DPI displays without upscaling artifacts. JPEG fallback maintained for Safari pre-14 and IE11 via Accept header negotiation at CDN edge.

ML-Based Feed Ranking

Choice

Predicted engagement scoring instead of chronological ordering

Rationale

Chronological feeds are dominated by high-frequency posters regardless of content quality. ML ranking scores each post by predicted engagement: P(like), P(comment), P(save), and content affinity. Users see content they are most likely to engage with, increasing time-on-app by 15-30%. Feature vectors are precomputed offline and cached in Redis for sub-millisecond scoring. A/B testing via ranking_model parameter enables gradual rollout of model updates.

Separate Read and Write Load Balancers

Choice

Dedicated ReadLB and UploadLB with independent scaling

Rationale

Read traffic (80%) and write traffic (20%) have fundamentally different characteristics. Reads are small metadata responses from cache (~2ms). Writes involve presigned URL generation, Cassandra inserts, and Kafka event publishing (~80ms). Separate LBs enable independent circuit breaking: if the upload path degrades, feed reads continue unaffected. This CQRS-at-the-infrastructure-level pattern is standard at Instagram scale.

Tiered CDN Caching

Choice

Edge -> Regional -> Origin three-tier cache hierarchy

Rationale

Standard CDN edge-only caching achieves ~85% hit rate. Adding regional edge caches (13 locations between edge and origin) catches requests that miss at the edge but are popular within a geographic region — boosting overall hit rate to 97%. The 12% improvement means 12% less traffic hitting S3 origin: at 500K image reads/sec, that is 60K fewer origin requests per second, reducing S3 cost and latency for cache-miss requests.

Scale & Performance

Target RPS

1M peak (650K feed reads + 150K post detail + 200K writes)

Latency (p99)

<150ms upload, <200ms feed (hybrid merge), <50ms image (CDN)

Storage

~50 PB accumulated (4 variants x WebP + JPEG per image)

Availability

99.95% (multi-AZ, tiered CDN, Cassandra RF=3)

Time & Space Complexity

Operation	Time	Space	Notes
Feed read (hybrid merge)	O(W) where W = number of whale accounts followed	O(N + W*20) where N = fanout cache entries	Redis LRANGE is O(offset + count) for the fanout portion. Each whale pull is a Cassandra query (~5ms). Merge sort of two sorted lists is O(N + M). Typical user follows 0-3 whales, so total added latency is 0-15ms.
Fanout-on-write (normal user post)	O(F) where F = follower count (max 10K)	O(F) — one Redis LPUSH per follower	Capped at 10K writes. At 0.1ms per Redis write, max fanout time is ~1 second. Average: 500 followers x 0.1ms = 50ms. Async via Kafka — does not block post creation.
Image processing (4 variants)	O(1) per image, ~500ms wall-clock	O(image_size x 4) — four variants in memory during processing	CPU-bound on libvips resizing. Each variant is independent — could be parallelized across threads for ~200ms total, but sequential is simpler and 500ms is well within the 15-second SLO.
ML ranking (feed scoring)	O(C) where C = candidate posts (typically 50-200)	O(C) — score array	Dot product of 40-dim feature vector with each candidate's feature vector. At 200 candidates: 200 x 40 multiplications = 8,000 FLOPs — sub-microsecond on modern CPU. The dominant cost is Redis lookup for the feature vector (~0.5ms), not the computation.

Database Schema (HLD)

posts (Cassandra)

Post records partitioned by author_id with clustering on created_at DESC. Supports efficient whale pull queries: SELECT * FROM posts WHERE author_id = ? AND created_at > ? LIMIT 20. Written by UploadService, updated by ImageWorker (media_urls). Counter columns for like_count and comment_count.

post_id UUIDauthor_id UUID (PARTITION KEY)created_at TIMESTAMP (CLUSTERING KEY DESC)caption TEXTmedia_urls MAP<TEXT, TEXT> (variant -> CDN URL)like_count COUNTERcomment_count COUNTERlocation TEXT

Indexes: PRIMARY KEY (author_id, created_at) WITH CLUSTERING ORDER BY (created_at DESC)

Cassandra partition key = author_id groups all posts by author for efficient range scans. Clustering by created_at DESC means the most recent posts are read first (no sorting needed). At 1,200 writes/sec, Cassandra's LSM-tree handles the throughput without B-tree degradation. Compaction strategy: TimeWindowCompactionStrategy for time-series data.

feed_cache (Redis Cluster)

Pre-computed per-user feeds from fanout-on-write (normal users only). Key: feed:{user_id}. Written by FanoutWorker, read by FeedService. LRU eviction, 600s TTL. 12-node cluster, 36GB effective capacity.

KEY: feed:{user_id}VALUE: List of {post_id, author_id, cdn_urls, created_at}TTL: 600 secondsMAX LENGTH: 500 entries (LTRIM after LPUSH)

Indexes: Sorted by insertion order (LPUSH = most recent first)

Only contains posts from non-whale accounts (under 10K followers). Whale posts are merged at read time by FeedService querying Cassandra. 94% hit rate for active users.

ranking_features (Redis Cluster)

ML ranking feature vectors per user. Precomputed by offline ML pipeline. Read by FeedService during feed ranking to score candidate posts.

KEY: rank:{user_id}VALUE: Hash of {feature_vector: float[], model_version: string}TTL: 3600 seconds

Feature vector includes: content category affinity (20 dims), engagement history (10 dims), recency weights (5 dims), social proximity scores (5 dims). Total: 40 floating-point values per user. Updated by offline pipeline every 6 hours.

users (Cassandra)

User profiles with follower count for whale detection. Read by FanoutWorker to check is_whale flag. Read by FeedService to enrich feed with author metadata.

user_id UUID PKusername TEXTfollower_count INTis_whale BOOLEAN (follower_count >= 10,000)created_at TIMESTAMP

Indexes: PRIMARY KEY (user_id)

is_whale flag updated on follow/unfollow when follower_count crosses the 10K threshold. Cached in Redis with 1-hour TTL to avoid per-request Cassandra lookups.

Event Contracts

Upload Completeupload-complete

Published by UploadService after a client completes direct-to-S3 image upload. Consumed by ImageWorker to trigger async resizing into four variants (thumbnail 150px, medium 600px, large 1080px, XL 2048px) in WebP + JPEG. 64 Kafka partitions for 32 workers. Expected 50K msg/sec at peak.

Key Schema

media_id (string) — partitioned by media_id for per-image ordering

Value Schema

{ "media_id": "string", "s3_key": "string", "file_type": "string", "post_id": "string | null" }

Post Createdpost-created

Published by UploadService after post creation. Consumed by FanoutWorker to implement hybrid fanout: if follower_count < 10K, fanout-on-write pushes post_id to each follower's Redis feed cache. If >= 10K (whale), fanout is skipped — FeedService pulls whale posts on read. 64 partitions keyed by author_id.

Key Schema

author_id (string) — partitioned by author_id for per-author ordering

Value Schema

{ "post_id": "string", "author_id": "string", "follower_count": "integer", "created_at": "string" }

What-If Scenarios

Celebrity with 50M followers posts a photo

Impact

In V1 (pure fanout): 50M Redis writes, 83 minutes, 50GB written. In V2 (hybrid): zero Redis writes — the post is stored in Cassandra only. When the celebrity's 50M followers read their feeds, each triggers a Cassandra query: SELECT * FROM posts WHERE author_id = ? AND created_at > ? LIMIT 20. At 5ms per query, Cassandra can handle this via its distributed architecture (64 partitions spread the load). The post appears in all followers' feeds within one feed refresh cycle.

Mitigation

Hybrid fanout is the mitigation. The V2 architecture handles this scenario by design. Additional optimization: pre-cache whale posts in a dedicated Redis set (whale_posts:{author_id}) to avoid hitting Cassandra for every feed read that includes a whale.

Kafka UploadStream goes down for 5 minutes

Impact

Image processing stops: no new variants are generated. New posts are persisted in Cassandra (the write path to PostDB is not affected) but CDN URLs point to nonexistent variants (404 at CDN). Fanout stops: new posts from normal users are not pushed to followers' feed caches. Feeds become stale for 5 minutes. On Kafka recovery, consumers resume from last committed offset — all buffered events are processed, and system catches up within minutes.

Mitigation

Kafka multi-AZ deployment with min.insync.replicas=2 for durability. UploadService implements local event buffer with retry on Kafka publish failure. FeedService detects high consumer lag and temporarily switches to pull-only mode for all users.

Redis FeedCache cluster loses a node

Impact

1/12th of feed cache data is lost (one shard). Feed reads for affected users fall back to Cassandra (pull model), increasing PostDB read load by ~8%. Feed latency for affected users increases from ~2ms to ~20ms (Cassandra query instead of Redis LRANGE). Redis Cluster automatically redistributes slots to surviving nodes within 30 seconds.

Mitigation

Redis Cluster with automatic failover. Replica nodes promote to primary within 15 seconds. FeedService implements graceful degradation: on Redis connection error, fall back to Cassandra pull. Auto-scaling adds a replacement node within 5 minutes.

Failure Modes & Resilience

Component	Failure	Impact	Mitigation
FeedCache (Redis Cluster)	Node failure or memory exhaustion	Feed reads for affected shard fall back to Cassandra. Latency increases from 2ms to 20ms for 1/12th of users. FanoutWorker writes to the failed shard are buffered in Kafka until Redis recovers.	Redis Cluster with replica per shard. Automatic failover in 15 seconds. Memory alerts at 80% utilization. LRU eviction prevents OOM. Graceful degradation to Cassandra pull on Redis error.
PostDB (Cassandra)	Partition hotspot from viral author receiving millions of writes	All writes to the hotspot partition queue. Other partitions are unaffected. Like counter updates for viral posts may experience elevated latency (50ms instead of 10ms). Feed reads for posts on the hot partition see stale data.	Cassandra's consistent hashing distributes partitions across nodes. Counter columns handle concurrent increments via CRDT merge. If a single partition is too hot, virtual node rebalancing distributes the load. For extremely viral posts, a write buffer in Redis can absorb burst writes and batch them to Cassandra.
ImageWorker	All 32 workers crash simultaneously	Image processing stops. Kafka buffers upload-complete events (5M message queue limit). New posts appear in feeds with placeholder images (CDN URLs that 404). No data loss — events are durable in Kafka. On worker recovery, all buffered images are processed.	Auto-scaling replaces crashed workers within 2 minutes. Kafka retention set to 24 hours — workers can catch up after extended outages. Health check alerts trigger on worker count < 16.
FanoutWorker	Consumer lag exceeds 60 seconds	Feeds become stale — new posts from normal users do not appear in followers' cached feeds. Posts are still persisted in Cassandra, so no data loss. Users refreshing their feed see content up to 60 seconds old.	Auto-scaling adds workers when lag exceeds 30 seconds. FeedService detects high lag and switches to pull-only mode temporarily. Kafka consumer group rebalancing distributes load across available workers.

Scaling Strategy

Horizontal scaling for all components. FeedService auto-scales based on CPU (> 60% for 3 minutes). UploadService auto-scales based on request queue depth. ImageWorker scales based on Kafka consumer lag (> 30 seconds). FanoutWorker scales based on Kafka consumer lag. Cassandra scales by adding nodes (online, no downtime). Redis scales by adding shards (online resharding). CDN scales automatically. The architecture is designed for 10x growth without architectural changes — from 1M to 10M RPS by adding compute and storage nodes.

Monitoring & Alerting

Key metrics: (1) Feed latency (p50, p99) with breakdown: Redis cache lookup, whale Cassandra pull, ML ranking, total. Alert at p99 > 300ms. (2) Fanout consumer lag (seconds behind) — alert at 30s, critical at 60s. (3) Image processing throughput (images/sec) and lag — alert if processing rate drops below upload rate for 5 minutes. (4) CDN hit rate — alert if drops below 90% (indicates cache configuration issue). (5) Cassandra write latency per partition — alert at p99 > 50ms (hotspot detection). (6) Redis memory utilization per node — alert at 80%, critical at 90%. (7) Whale detection accuracy — monitor is_whale flag changes per day. Dashboard: Grafana with panels for feed latency histogram, fanout lag, image processing pipeline throughput, CDN hit rate breakdown (edge vs regional vs origin), Cassandra partition heat map, and Redis cluster memory utilization.

Cost Analysis

At Instagram scale (1M RPS peak): Cassandra/Keyspaces (~$5,000/month for 64 partitions RF=3), Redis Cluster 12 nodes cache.r7g.4xlarge (~$4,000/month), MSK Kafka kafka.m7g.xlarge (~$1,500/month), ECS Fargate services + workers (~$3,000/month), CloudFront CDN (~$8,000/month for ~200TB/day egress), S3 storage (~$20,000/month for 50PB with lifecycle). Total: ~$41,500/month. The per-user cost at 500M MAU is ~$0.083/user/month. Compared to V1 at $3,000/month for 100K RPS, V2 handles 10x the throughput at 14x the cost — the cost-per-request decreases from $0.03/1K to $0.04/1K, but the system handles fundamentally different scale requirements (whales, ML ranking, global tiered CDN).

Security Considerations

Presigned URL security: URLs expire after 15 minutes, enforce maximum file size (20MB), and restrict S3 key prefix to user-specific paths (preventing path traversal). Content moderation: ImageWorker includes NSFW detection (Amazon Rekognition) during variant generation — flagged images are quarantined before CDN distribution. JWT authentication with 3ms overhead per request. Rate limiting: per-user upload cap (100 uploads/hour), per-user interaction cap (1,000 likes/hour). CDN signed URLs for premium content. Cassandra encryption at rest (AWS KMS) and in transit (TLS). Redis AUTH with TLS for cache access.

Deployment Strategy

Blue-green deployment for FeedService and UploadService via ECS service updates. ImageWorker and FanoutWorker use rolling deployment (one worker at a time while Kafka rebalances). Cassandra schema changes use online DDL (ALTER TABLE ADD column). Redis Cluster supports online resharding for capacity changes. CDN configuration changes propagate to all edge locations within 15 minutes. ML ranking model updates via A/B testing: new model deployed to 5% of traffic, gradually increased to 100% over 48 hours if engagement metrics improve.

Real-World Examples

•Instagram's production architecture uses a similar hybrid fanout model with approximately 15K follower threshold for whale detection, along with Cassandra for post storage
•Twitter's fanout service uses a similar hybrid approach: fanout-on-write for users under ~1M followers, pull-on-read for mega-accounts like celebrities and brands
•Pinterest uses a Kafka-based image processing pipeline to generate multiple pin variants (thumbnails, closeups, original) with WebP optimization for bandwidth savings
•TikTok uses tiered CDN caching with regional edge caches to achieve sub-50ms video thumbnail delivery globally

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
V0: Naive (API-Proxied Upload + No CDN)	T1	500-800ms upload, 200-500ms feed, 50-200ms image	~1K RPS total	$1,130/month	Low	99% (single DB)
V1: CDN + Presigned Upload (Async Processing)	T2	<200ms upload, <500ms feed (cache hit ~2ms), <100ms image (CDN)	100K RPS peak	$3,000/month	Medium	99.9% (multi-AZ)
V2: Async Media Pipeline (Hybrid Feed + Whale Detection)	T3	<150ms upload, <200ms feed, <50ms image (CDN)	1M RPS peak	$15,000/month	Very High	99.95% (multi-AZ, tiered CDN)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

How does whale detection work in practice?

FanoutWorker receives a post-created event from Kafka containing the author_id and follower_count. If follower_count < 10,000, normal fanout proceeds. If >= 10,000, fanout is skipped. The threshold is stored in the users table and cached in Redis with a 1-hour TTL. When a user crosses 10K followers, the is_whale flag is updated in PostDB and the cache is invalidated. The transition is not instant — a user at 9,999 followers gets fanout; at 10,001 they switch to pull. This creates a brief inconsistency window where some followers receive fanout and new followers do not, but this resolves within one cache TTL cycle.

Why Cassandra instead of DynamoDB for post storage?

Both provide linear-scale writes and partition-based data distribution. Cassandra was chosen for: (1) native clustering column support — posts are clustered by created_at DESC within each author_id partition, enabling efficient range queries for whale pulls without a secondary index, (2) tunable consistency — LOCAL_ONE reads for feed (fast, eventual) vs LOCAL_QUORUM writes for durability, and (3) open protocol — avoids AWS lock-in. DynamoDB would work similarly but requires GSI for time-range queries and has less predictable pricing at high throughput.

How does ML feed ranking handle cold-start for new users?

New users have no engagement history, so feature vectors are empty. The ranking model falls back to chronological ordering (timestamp-based scoring) for the first 50 interactions. After 50 likes/comments/saves, the offline ML pipeline generates an initial feature vector based on content category preferences. The model improves with each subsequent interaction. Cold-start latency is approximately 2-3 days for a typical new user before personalized ranking takes effect.

What happens when FanoutWorker falls behind (Kafka consumer lag)?

If FanoutWorker consumer lag exceeds 30 seconds, feeds become stale — new posts from normal users do not appear in followers' cached feeds until the lag is consumed. Posts are still persisted in PostDB (Cassandra), so they are not lost. FeedService can detect high lag and temporarily switch all feeds to pull mode (query PostDB directly), trading higher read latency for feed freshness. Auto-scaling triggers add more FanoutWorker instances when lag exceeds a threshold.

How does the system handle viral posts that receive millions of likes simultaneously?

Cassandra counter columns handle like_count increments with eventual consistency — concurrent increments are merged without locking. At 100K likes/sec on a single post, the counter column handles this natively via Cassandra's counter CRDT. The like event is also published to Kafka for real-time analytics (trending detection). FeedService reads like_count from PostDB on each feed request, so the displayed count may lag the true count by a few seconds due to Cassandra replication lag (~5ms). This is acceptable — users do not notice single-digit count differences on posts with millions of likes.

Related Templates

Instagram — Naive (API-Proxied Upload + No CDN)Instagram — CDN + Presigned Upload (Async Processing)Twitter Feed — Hybrid Fanout (Push + Pull)

Discussion

Ready to design your own Instagram?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator