Industry-standard fanout-on-write architecture: CQRS-lite with API Gateway routing, Kafka for async fanout, Redis for pre-computed feeds, DynamoDB for durable storage. Feed reads are O(1) via Redis LRANGE at ~2ms latency.
The fanout-on-write (push model) architecture is the standard production approach used by social feed platforms handling millions of feed reads per second. It solves the fundamental problem that makes the naive fanout-on-read approach unworkable: the O(following_count) database queries per feed read that overwhelm the database at a few hundred concurrent users.
The key architectural shift is reversing where the computational cost lives. Instead of computing feeds at read time (O(N) queries per read, O(1) per write), feeds are pre-computed at write time: when a user posts a tweet, the system pushes the tweet ID into every follower's cached feed in Redis. This makes reads O(1) — a single Redis LRANGE returning 20 tweet IDs in ~2ms — at the cost of O(N) cache writes per tweet, where N is the author's follower count.
This cost reversal is a massive net win because social feeds have extremely asymmetric traffic: reads outnumber writes by 100:1 or more. Converting 1 expensive write into N cache writes that make 100+ reads free is one of the most powerful optimizations in distributed systems. The write cost is further amortized by making fanout asynchronous via Kafka — the user sees 'tweet posted' immediately while workers fan out in the background.
The architecture uses CQRS-lite: an API Gateway splits traffic by HTTP method. GET requests (feed reads, profile views) route to the read path — FeedReader service backed by Redis cache with DynamoDB fallback. POST requests (tweet posts, likes) route to the write path — TweetWriter service that persists to DynamoDB and publishes events to Kafka. FanoutWorkers consume from Kafka and complete the write-amplification loop by pushing tweet IDs into followers' cached feeds.
The primary limitation is the 'whale problem': a celebrity with 50 million followers triggers 50 million Redis writes per tweet. At 5 microseconds per write, that is 250 seconds of work — a 4-minute backlog per celebrity tweet. This pure push model does NOT solve the whale problem. The Hybrid Push+Pull variant addresses this by using fanout-on-read for high-follower accounts.
This architecture appears in virtually every system design interview for social media roles. Interviewers expect candidates to explain the read/write cost reversal, articulate the Kafka decoupling pattern, reason about cache consistency, and identify the whale problem as the motivation for the hybrid approach.
The fanout-on-write system uses 9 components organized into a CQRS-lite pipeline with separate read and write paths. The components are: ReadClient, WriteClient, API Gateway, Read Load Balancer, Write Load Balancer, FeedReader service, TweetWriter service, FeedCache (Redis), TweetStore (DynamoDB), FanoutStream (Kafka), and FanoutWorker.
All traffic enters through the API Gateway, which performs JWT authentication (~3ms), rate limiting (250K RPS cap), and CQRS routing. GET requests (feed reads, profile views — 76% of traffic) route to the Read Load Balancer, which distributes across 15 FeedReader pods. POST requests (tweet posts, likes — 24% of traffic) route to the Write Load Balancer, which distributes across 10 TweetWriter pods.
The read path is optimized for speed. FeedReader checks the user's pre-computed feed in FeedCache (Redis). At ~92% cache hit rate, most feeds are served from Redis in ~2ms via LRANGE on a bounded list of tweet IDs. On cache miss (new user, cold start, expired TTL), FeedReader falls back to TweetStore (DynamoDB) to reconstruct the feed from followed users' recent tweets and backfills the cache. The read path is the highest-throughput component: 15 pods handle 152K peak read RPS.
The write path handles persistence and fanout initiation. TweetWriter validates the tweet, persists to TweetStore (DynamoDB — durable write, ~50ms), and publishes a tweet_created event to FanoutStream (Kafka). The Kafka publish is fire-and-forget — the HTTP response returns as soon as the DB write succeeds. Likes update the like counter in DynamoDB but do NOT trigger fanout.
The fanout pipeline is fully asynchronous. FanoutWorkers consume tweet_created events from Kafka (64 partitions, partitioned by author_id for ordering). For each event, the worker looks up the author's follower list and writes the tweet ID into each follower's pre-computed feed in FeedCache via Redis LPUSH + LTRIM. At 200 average followers per tweet, each event generates ~200 cache writes. With 80 parallel workers and pipelined Redis writes, fanout completes within seconds for most users.
The architecture scales horizontally at each tier. FeedReader pods scale with read RPS. TweetWriter pods scale with write RPS. FanoutWorkers scale with fanout volume. FeedCache nodes scale with working set size. Each tier can be scaled independently without affecting the others — this is the core benefit of the CQRS split.
Choice
Pre-compute feeds at write time via async Kafka fanout
Rationale
Reverses the cost structure: reads become O(1) via Redis LRANGE (~2ms) at the cost of O(N) cache writes per tweet. Since reads outnumber writes by 100:1 in social feeds, this is a massive net win. The async Kafka pipeline means the user sees 'tweet posted' immediately while fanout happens in the background.
Choice
API Gateway routes GET to read path, POST to write path
Rationale
Twitter traffic is ~76% reads, 24% writes. Separating them enables independent scaling: 15 FeedReader pods for the read-heavy path vs 10 TweetWriter pods for writes. Each service is optimized for its access pattern. The shared API Gateway keeps the split transparent to clients — no separate URLs needed.
Choice
Each user's feed stored as a bounded list of tweet IDs in Redis
Rationale
Each feed is a list of the last 200 tweet IDs. Redis LPUSH on write, LRANGE on read — both O(1) for bounded lists. At ~92% hit rate, Redis absorbs the vast majority of feed reads. The bounded list (LTRIM to 200) prevents unbounded memory growth.
Choice
Tweet events published to Kafka, consumed by dedicated fanout workers
Rationale
If tweet POST waited for fanout to complete, a user with 100K followers would wait for 100K cache writes (~5 seconds). Kafka decouples the write acknowledgment from fanout processing. Partitioning by author_id ensures per-author ordering. Kafka also provides replay capability if workers crash.
Choice
Partition by author_user_id, sort by timestamp for range queries
Rationale
Tweets are immutable, accessed by ID or by author+time range. DynamoDB's partition-key model is ideal. 64 partitions with 3 replicas give both throughput and durability. Eventual consistency is fine since tweets are immutable.
Choice
All users fan out regardless of follower count
Rationale
This variant deliberately does NOT solve the whale problem to demonstrate it clearly. A celebrity with 50M followers triggers 50M Redis writes per tweet — a 4-minute fanout storm. The Hybrid variant solves this by classifying high-follower users as whales and storing their tweets separately.
Target RPS
200K+ feed reads/sec
Latency (p99)
<5ms feed reads (92% cache hit)
Storage
~10 TB/year (DynamoDB + Redis)
Availability
99.9% (replicated components)
Immutable tweet storage partitioned by author_user_id with tweet_id (Snowflake) as sort key. Supports efficient time-range queries per author for cache miss reconstruction. Write path: 20K PutItems/sec at peak. Read path: ~12K Query operations/sec from FeedReader cache misses.
Partition: author_id
64 partitions x 3 replicas. Eventual consistency (immutable tweets). ~60K ops/sec at peak across 64 partitions.
Pre-computed per-user timeline stored as a sorted list of the last 200 tweet IDs. Written by FanoutWorker via LPUSH + LTRIM on every new tweet from a followed user. Read by FeedReader via LRANGE for paginated timeline. 300s TTL with LRU eviction.
10M users x 200 tweet IDs x 8 bytes = ~16GB hot data. Working set ~20GB, capacity ~18.5GB, ~92% hit rate.
Emitted by TweetWriter on every new tweet. Partitioned by author_id across 64 partitions for per-author ordering. Consumed by FanoutWorker which pushes tweet IDs into followers' cached feeds. Expected throughput: ~20K events/sec at peak.
Key Schema
author_id: string (partition key for per-author ordering)
Value Schema
{ tweet_id: string (Snowflake ID), author_id: string, created_at: string (ISO timestamp) }
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Fanout-on-Read) | T1 | 150ms-3s feed reads | ~500 concurrent users | $500/month (single DB) | Low — no cache, no workers | 99% (single DB, no failover) |
| Fanout-on-Write (Push) | T2 | <5ms feed reads (Redis) | 200K+ feed reads/sec | $3,000/month (CQRS + Redis + Kafka) | Medium — Kafka, Redis, workers | 99.9% (replicated components) |
| Hybrid Push+Pull (Whale-Aware) | T3 | <10ms feed reads (dual-cache merge) | 1M+ feed reads/sec | $4,500/month (dual cache + workers) | High — whale detection, dual cache | 99.9% (replicated, whale-resilient) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
The whale problem occurs when a celebrity with millions of followers posts a tweet. In pure fanout-on-write, every follower's cached feed must be updated. A user with 50M followers triggers 50M Redis LPUSH operations. At 5 microseconds per write, that is 250 seconds of work — a 4-minute fanout backlog from a single tweet. This variant demonstrates the problem; the Hybrid variant solves it by using fanout-on-read for high-follower (whale) users.
At 152K peak read RPS and 92% cache hit rate, only ~12K requests/sec miss through to DynamoDB. Without the cache, all 152K would hit the database — requiring 10x more database capacity. The cache absorbs the vast majority of read load, making Redis the performance-critical component. Active users always have warm caches due to continuous fanout writes from followed users' tweets.
The API Gateway inspects the HTTP method on each request. GET requests matching /api/v1/* route to the Read Load Balancer (and on to FeedReader). POST requests matching /api/v1/* route to the Write Load Balancer (and on to TweetWriter). This is the simplest CQRS implementation — no separate URLs, no client changes. The gateway also handles JWT authentication and rate limiting as cross-cutting concerns.
The unfollow is recorded in the database immediately, but the unfollowed user's tweets remain in the follower's cached feed until TTL expiry (300 seconds) or until they are pushed out by newer tweets (LTRIM). There is no eager purge — the cache is eventually consistent. For most users this is imperceptible, but it means a recently unfollowed user's tweets may appear in the feed for up to 5 minutes.
At 200 average followers per tweet and 20K tweets/sec at peak, fanout generates 4M cache writes/sec. Each worker processes ~200 cache writes per event at 5ms per event (pipelined Redis LPUSH). That is 200 events/sec per worker. 4M writes / 200 writes per event = 20K events/sec total. 20K / 200 events/sec/worker = 100 workers needed at peak. 80 workers provide ~80% utilization at peak with Kafka consumer lag absorbing bursts.
Sign in to join the discussion.
Ready to design your own Twitter Feed?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator