Production-grade hybrid fanout: push for normal users (<100K followers), pull for whales (>=100K followers). Solves the celebrity fanout storm problem while preserving O(1) feed reads via dual-cache merge.
The hybrid push+pull architecture is the production-grade solution to the Twitter feed problem that solves the 'whale problem' — the fundamental limitation of pure fanout-on-write. When a celebrity with 50 million followers posts a tweet, pure fanout-on-write triggers 50 million Redis cache writes, creating a minutes-long backlog that delays feed updates for all users. The hybrid approach eliminates this by treating high-follower users differently from regular users.
The key insight is that follower count follows a power-law distribution. The vast majority of users (99%+) have fewer than 100,000 followers, and for these users, fanout-on-write works perfectly — the fanout cost is bounded and completes in seconds. Only a tiny fraction of users (~0.1%) are 'whales' with 100,000+ followers, but these whales would generate 90%+ of the fanout work in a pure push model. The hybrid approach handles each group with the strategy that works best for them.
Normal users (< 100K followers) use fanout-on-write: their tweets are pushed to followers' cached feeds via Kafka workers, exactly as in the standard approach. Feed reads for followers of only normal users are a single Redis LRANGE — O(1) at ~2ms. Whale users (>= 100K followers) skip fanout entirely: their tweets are written to a dedicated WhaleCache (Redis). At feed read time, the FeedReader merges the user's pre-computed feed (FeedCache, containing non-whale tweets) with recent tweets from whales the user follows (WhaleCache). This merge adds ~5ms to the read path but eliminates the catastrophic write amplification.
The numbers make the case compelling. A whale with 50M followers posting a tweet: in pure push, 50M Redis writes taking 250 seconds; in hybrid, 1 WhaleCache write taking 2ms. The read-side cost of the hybrid merge is small: an average user follows ~5 whales, so the merge adds 5 cache lookups (~2ms) plus a sort merge (~2ms) — negligible compared to the eliminated 250-second fanout storm.
The whale threshold (100K followers in this template) is a tunable parameter. Setting it lower reduces write amplification further but increases the number of whale cache lookups on the read path. Setting it higher reduces read-side merge work but allows larger fanout storms. The optimal threshold depends on the follower distribution, feed read frequency, and infrastructure capacity. In practice, Twitter and similar platforms set the threshold between 10K and 500K followers.
This is the architecture interviewers expect senior candidates to arrive at after discussing the naive and fanout-on-write approaches. The progression from pull to push to hybrid demonstrates deepening understanding of the cost trade-offs. Candidates who can articulate the whale problem, propose the hybrid solution, and reason about the threshold tuning stand out in system design interviews at Twitter/X, Meta, Threads, and other social platforms.
The hybrid system uses 11 components organized into a CQRS-lite architecture with whale-aware routing on both the read and write paths. The key additions over the pure fanout-on-write variant are the WhaleCache (dedicated Redis for whale tweets) and the whale classification logic in TweetWriter.
The read path has three data sources instead of two. FeedReader executes two parallel cache reads: (1) LRANGE on FeedCache for the user's pre-computed non-whale feed, and (2) MGET on WhaleCache for recent tweets from each whale the user follows. It then merges the two sorted lists by timestamp — a standard sorted-array merge in O(K) where K is the total tweet count. On FeedCache miss, it falls back to TweetStore (DynamoDB) to reconstruct the non-whale portion. WhaleCache lookups happen regardless of FeedCache hit/miss status because whale tweets are never stored in FeedCache.
The write path includes whale classification. When TweetWriter receives a new tweet, it persists to TweetStore (DynamoDB — all tweets, both normal and whale). Then it checks the author's follower count. If the author has fewer than 100K followers (normal user), TweetWriter publishes a tweet_created event to FanoutStream (Kafka) for async fanout to FeedCache. If the author has 100K or more followers (whale), TweetWriter writes the tweet directly to WhaleCache (Redis SET) — no Kafka event, no fanout. This is the critical optimization: whale tweets bypass the entire fanout pipeline.
The FanoutWorker pool is smaller than in the pure push variant (40 workers vs 80). With whales excluded, the maximum fanout per event is capped at 100K cache writes (the whale threshold). The 95th percentile is much lower — most users have fewer than 1,000 followers. The reduced worker pool saves infrastructure cost while providing ample capacity for normal-user fanout.
WhaleCache is a small, hot Redis cluster. With approximately 1,000 whale users and 200 recent tweets each, the cache holds ~200K keys consuming ~500MB of memory. The hit rate is ~99% because whale tweets are extremely hot — read by millions of followers. Each whale's recent tweets are stored under a key pattern whale:{user_id} and refreshed on every whale tweet post with a 300s TTL.
FeedCache stores pre-computed feeds for all users, but containing only non-whale tweet IDs. This is the same structure as the pure push variant: feed:{user_id} with up to 200 tweet IDs. The 90% hit rate is slightly lower than the pure push variant's 92% because whale tweets are excluded — some feeds that were previously warm (kept alive by frequent whale tweets) now require WhaleCache augmentation.
The TweetStore (DynamoDB) persists all tweets from both normal and whale users. It includes an is_whale boolean flag indicating the author's classification at write time. This supports cache miss reconstruction: when FeedReader rebuilds a feed from TweetStore, it can filter out whale tweets (which should come from WhaleCache instead).
This sequence diagram shows the two key flows unique to the hybrid architecture: (1) the write path with whale classification, where TweetWriter decides whether to trigger Kafka fanout (normal user) or write to WhaleCache (whale user), and (2) the read path with dual-cache merge, where FeedReader queries both FeedCache and WhaleCache and merges the results. The whale classification is the critical decision point that prevents 50M-write fanout storms.
Step-by-Step Walkthrough
Pseudocode
// Write path — whale-aware routing
async function postTweet(author_id, content):
tweet_id = generateSnowflakeId()
// 1. Persist to durable store (all tweets)
await tweetStore.putItem({
tweet_id, author_id, content,
created_at: now(), is_whale: isWhale(author_id)
}) // ~50ms
// 2. Route based on follower count
follower_count = await getFollowerCount(author_id)
if follower_count < 100_000:
// Normal user: async fanout via Kafka
await kafka.publish("tweet-created", {
tweet_id, author_id, created_at: now()
}) // ~5ms, fire-and-forget
else:
// Whale: write directly to WhaleCache
await whaleCache.lpush("whale:" + author_id, tweet_id)
await whaleCache.ltrim("whale:" + author_id, 0, 199)
// ~2ms — no fanout storm!
return { status: 201, tweet_id }
// Read path — dual-cache merge
async function getFeed(user_id):
// Parallel cache reads
[feedTweets, whaleIds] = await Promise.all([
feedCache.lrange("feed:" + user_id, 0, 19), // ~2ms
getFollowedWhaleIds(user_id) // from follows table
])
// Fetch whale tweets
whaleTweets = await whaleCache.mget(
whaleIds.map(id => "whale:" + id)
) // ~2ms for ~5 whales
// Merge two sorted lists by timestamp
merged = mergeSorted(feedTweets, whaleTweets.flat())
return merged.slice(0, 20) // Top 20
// Total: ~10ms (2ms + 2ms + 2ms merge + overhead)Choice
Users with <100K followers use fanout-on-write; users with >=100K followers store tweets in WhaleCache
Rationale
Pure push creates 50M Redis writes when a celebrity with 50M followers tweets — a 4-minute backlog. The hybrid approach eliminates this: whale tweets go to WhaleCache (1 write) and are merged at read time. The read-side cost is small: ~5 whales per user x 2ms per lookup = 10ms added to the read path. This trades a 250-second write storm for a 10ms read overhead — a 25,000x improvement.
Choice
Separate Redis cluster for whale users' tweets, keyed by whale user ID
Rationale
Storing whale tweets by author (whale:{user_id}) instead of by reader (feed:{user_id}) means each whale tweet is written once and read by all followers. With ~1000 whales and 200 tweets each, WhaleCache is only ~200K keys (~500MB) — tiny and extremely hot. Separating it from FeedCache prevents whale read patterns from affecting normal feed cache eviction.
Choice
Users with 100,000+ followers classified as whales at tweet-post time
Rationale
100K balances write cost (max 100K fanout writes per normal-user tweet, completing in ~500ms) against read cost (average user follows ~5 whales, adding ~10ms merge time). The threshold is checked at tweet-post time, not continuously — a user crossing the threshold changes classification on their next tweet. 300s TTL on both caches handles stale data from threshold transitions.
Choice
No retroactive migration when a user crosses the whale threshold
Rationale
When a user's follower count crosses 100K, the classification changes on their next tweet — there is no migration of existing cached feeds. Stale entries in FeedCache (containing pre-threshold tweets) expire via 300s TTL. This lazy approach avoids the complexity of eager migration (scanning all followers' caches to add/remove entries) at the cost of up to 5 minutes of stale data during the transition.
Choice
FeedReader merges FeedCache and WhaleCache results by timestamp
Rationale
Both sources return tweet IDs sorted by timestamp. Merging two sorted arrays is O(K) where K = total tweets — approximately 2ms of CPU for a 20-tweet feed. This is negligible compared to the network RTT for the cache lookups themselves. The merge produces a unified timeline indistinguishable from a pure push feed.
Choice
40 workers (vs 80 in pure push) since whales are excluded from fanout
Rationale
With whales excluded, the maximum fanout per event drops from unbounded to 100K, and the 95th percentile drops to under 1K. The reduced worker pool saves 50% on worker infrastructure while providing ample capacity. Kafka consumer lag absorbs any temporary bursts from users near the whale threshold.
Target RPS
1M+ feed reads/sec
Latency (p99)
<10ms feed reads (dual-cache merge)
Storage
~10 TB/year (DynamoDB + dual Redis)
Availability
99.9% (replicated, whale-resilient)
Immutable tweet storage for both normal and whale users. Partitioned by author_user_id. Includes an is_whale flag indicating the author's classification at write time, used during cache miss reconstruction to filter whale tweets (which should come from WhaleCache).
Partition: author_id
64 partitions x 3 replicas. Eventual consistency. The is_whale flag prevents double-counting during cache miss reconstruction.
Pre-computed per-user timeline containing only NON-WHALE tweet IDs. Whale tweets are excluded and served from WhaleCache instead. Structure identical to the pure push variant but content is filtered to exclude whale authors.
~90% hit rate (slightly lower than pure push because whale tweets no longer keep feeds warm). 300s TTL, LRU eviction.
Recent tweets from whale users (>=100K followers). One key per whale, storing their last 200 tweet IDs. Written by TweetWriter on whale tweet posts. Read by FeedReader via MGET for each whale the requesting user follows.
~1000 whales x 200 tweets = ~200K keys (~500MB). ~99% hit rate. Written by TweetWriter, not by FanoutWorker.
Emitted by TweetWriter for normal-user tweets only (<100K followers). Whale tweets bypass Kafka entirely and go directly to WhaleCache. Max fanout per event is capped at 100K (the whale threshold).
Key Schema
author_id: string (partition key for per-author ordering)
Value Schema
{ tweet_id: string (Snowflake ID), author_id: string, created_at: string (ISO timestamp) }
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Fanout-on-Read) | T1 | 150ms-3s feed reads | ~500 concurrent users | $500/month (single DB) | Low — no cache, no workers | 99% (single DB, no failover) |
| Fanout-on-Write (Push) | T2 | <5ms feed reads (Redis) | 200K+ feed reads/sec | $3,000/month (CQRS + Redis + Kafka) | Medium — Kafka, Redis, workers | 99.9% (replicated components) |
| Hybrid Push+Pull (Whale-Aware) | T3 | <10ms feed reads (dual-cache merge) | 1M+ feed reads/sec | $4,500/month (dual cache + workers) | High — whale detection, dual cache | 99.9% (replicated, whale-resilient) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
In pure fanout-on-write, a celebrity with 50M followers triggers 50M Redis writes per tweet — taking ~250 seconds. The hybrid approach classifies users with 100K+ followers as 'whales' and stores their tweets in a dedicated WhaleCache instead of fanning out. At feed read time, FeedReader merges the user's pre-computed non-whale feed with whale tweets from WhaleCache. This replaces a 250-second write storm with a ~5ms read-time merge.
The transition is lazy. When a user's follower count crosses 100K (in either direction), the new classification applies to their next tweet. There is no retroactive migration of existing cached feeds. Pre-threshold tweets in FeedCache (or WhaleCache) expire via 300s TTL. During the ~5 minute transition window, some followers may see slightly stale data, but this is imperceptible in practice.
An average user follows approximately 5 whales. The WhaleCache lookup for 5 whales is a Redis MGET — a single round-trip returning all 5 results in ~2ms. The CPU time for merging two sorted lists of 20 tweets is ~2ms. Total overhead: ~5ms added to the ~5ms FeedCache read, for a total of ~10ms. Users who follow zero whales pay no overhead — their feed is served from FeedCache alone.
If whale tweets were stored in FeedCache (per-reader), a whale with 50M followers would still require 50M cache entries — one per follower's feed. WhaleCache stores tweets per-author instead of per-reader: one entry per whale, read by all followers. This is the key insight — storing by author instead of by reader turns O(N) writes into O(1) writes at the cost of O(W) reads per feed request, where W is the number of whales the user follows.
Twitter's real architecture uses a similar hybrid approach, though with additional complexity: ML-based ranking (not just chronological), real-time engagement signals, multiple cache tiers, and global distribution. The whale threshold concept maps to Twitter's 'celebrity special-casing' in their fanout system. The core insight — split fanout strategy by follower count — is the same. This template captures the essential architecture without the production-specific complexity.
Sign in to join the discussion.
Ready to design your own Twitter Feed?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator