Hard11 componentsInterview: High

Twitter Feed — Hybrid Push+Pull (Whale-Aware)

Q: How does the hybrid approach solve the whale problem?

In pure fanout-on-write, a celebrity with 50M followers triggers 50M Redis writes per tweet — taking ~250 seconds. The hybrid approach classifies users with 100K+ followers as 'whales' and stores their tweets in a dedicated WhaleCache instead of fanning out. At feed read time, FeedReader merges the user's pre-computed non-whale feed with whale tweets from WhaleCache. This replaces a 250-second write storm with a ~5ms read-time merge.

Q: What happens when a user crosses the 100K follower threshold?

The transition is lazy. When a user's follower count crosses 100K (in either direction), the new classification applies to their next tweet. There is no retroactive migration of existing cached feeds. Pre-threshold tweets in FeedCache (or WhaleCache) expire via 300s TTL. During the ~5 minute transition window, some followers may see slightly stale data, but this is imperceptible in practice.

Q: How much read latency does the whale merge add?

An average user follows approximately 5 whales. The WhaleCache lookup for 5 whales is a Redis MGET — a single round-trip returning all 5 results in ~2ms. The CPU time for merging two sorted lists of 20 tweets is ~2ms. Total overhead: ~5ms added to the ~5ms FeedCache read, for a total of ~10ms. Users who follow zero whales pay no overhead — their feed is served from FeedCache alone.

Q: Why not just cache-aside the whale tweets into FeedCache?

If whale tweets were stored in FeedCache (per-reader), a whale with 50M followers would still require 50M cache entries — one per follower's feed. WhaleCache stores tweets per-author instead of per-reader: one entry per whale, read by all followers. This is the key insight — storing by author instead of by reader turns O(N) writes into O(1) writes at the cost of O(W) reads per feed request, where W is the number of whales the user follows.

Q: How does this compare to Twitter/X's actual production architecture?

Twitter's real architecture uses a similar hybrid approach, though with additional complexity: ML-based ranking (not just chronological), real-time engagement signals, multiple cache tiers, and global distribution. The whale threshold concept maps to Twitter's 'celebrity special-casing' in their fanout system. The core insight — split fanout strategy by follower count — is the same. This template captures the essential architecture without the production-specific complexity.

Production-grade hybrid fanout: push for normal users (<100K followers), pull for whales (>=100K followers). Solves the celebrity fanout storm problem while preserving O(1) feed reads via dual-cache merge.

SocialHybrid FanoutWhale DetectionCQRSProduction-Grade

Try in Simulator

Problem Statement

The hybrid push+pull architecture is the production-grade solution to the Twitter feed problem that solves the 'whale problem' — the fundamental limitation of pure fanout-on-write. When a celebrity with 50 million followers posts a tweet, pure fanout-on-write triggers 50 million Redis cache writes, creating a minutes-long backlog that delays feed updates for all users. The hybrid approach eliminates this by treating high-follower users differently from regular users.

The key insight is that follower count follows a power-law distribution. The vast majority of users (99%+) have fewer than 100,000 followers, and for these users, fanout-on-write works perfectly — the fanout cost is bounded and completes in seconds. Only a tiny fraction of users (~0.1%) are 'whales' with 100,000+ followers, but these whales would generate 90%+ of the fanout work in a pure push model. The hybrid approach handles each group with the strategy that works best for them.

Normal users (< 100K followers) use fanout-on-write: their tweets are pushed to followers' cached feeds via Kafka workers, exactly as in the standard approach. Feed reads for followers of only normal users are a single Redis LRANGE — O(1) at ~2ms. Whale users (>= 100K followers) skip fanout entirely: their tweets are written to a dedicated WhaleCache (Redis). At feed read time, the FeedReader merges the user's pre-computed feed (FeedCache, containing non-whale tweets) with recent tweets from whales the user follows (WhaleCache). This merge adds ~5ms to the read path but eliminates the catastrophic write amplification.

The numbers make the case compelling. A whale with 50M followers posting a tweet: in pure push, 50M Redis writes taking 250 seconds; in hybrid, 1 WhaleCache write taking 2ms. The read-side cost of the hybrid merge is small: an average user follows ~5 whales, so the merge adds 5 cache lookups (~2ms) plus a sort merge (~2ms) — negligible compared to the eliminated 250-second fanout storm.

The whale threshold (100K followers in this template) is a tunable parameter. Setting it lower reduces write amplification further but increases the number of whale cache lookups on the read path. Setting it higher reduces read-side merge work but allows larger fanout storms. The optimal threshold depends on the follower distribution, feed read frequency, and infrastructure capacity. In practice, Twitter and similar platforms set the threshold between 10K and 500K followers.

This is the architecture interviewers expect senior candidates to arrive at after discussing the naive and fanout-on-write approaches. The progression from pull to push to hybrid demonstrates deepening understanding of the cost trade-offs. Candidates who can articulate the whale problem, propose the hybrid solution, and reason about the threshold tuning stand out in system design interviews at Twitter/X, Meta, Threads, and other social platforms.

Architecture Overview

The hybrid system uses 11 components organized into a CQRS-lite architecture with whale-aware routing on both the read and write paths. The key additions over the pure fanout-on-write variant are the WhaleCache (dedicated Redis for whale tweets) and the whale classification logic in TweetWriter.

The read path has three data sources instead of two. FeedReader executes two parallel cache reads: (1) LRANGE on FeedCache for the user's pre-computed non-whale feed, and (2) MGET on WhaleCache for recent tweets from each whale the user follows. It then merges the two sorted lists by timestamp — a standard sorted-array merge in O(K) where K is the total tweet count. On FeedCache miss, it falls back to TweetStore (DynamoDB) to reconstruct the non-whale portion. WhaleCache lookups happen regardless of FeedCache hit/miss status because whale tweets are never stored in FeedCache.

The write path includes whale classification. When TweetWriter receives a new tweet, it persists to TweetStore (DynamoDB — all tweets, both normal and whale). Then it checks the author's follower count. If the author has fewer than 100K followers (normal user), TweetWriter publishes a tweet_created event to FanoutStream (Kafka) for async fanout to FeedCache. If the author has 100K or more followers (whale), TweetWriter writes the tweet directly to WhaleCache (Redis SET) — no Kafka event, no fanout. This is the critical optimization: whale tweets bypass the entire fanout pipeline.

The FanoutWorker pool is smaller than in the pure push variant (40 workers vs 80). With whales excluded, the maximum fanout per event is capped at 100K cache writes (the whale threshold). The 95th percentile is much lower — most users have fewer than 1,000 followers. The reduced worker pool saves infrastructure cost while providing ample capacity for normal-user fanout.

WhaleCache is a small, hot Redis cluster. With approximately 1,000 whale users and 200 recent tweets each, the cache holds ~200K keys consuming ~500MB of memory. The hit rate is ~99% because whale tweets are extremely hot — read by millions of followers. Each whale's recent tweets are stored under a key pattern whale:{user_id} and refreshed on every whale tweet post with a 300s TTL.

FeedCache stores pre-computed feeds for all users, but containing only non-whale tweet IDs. This is the same structure as the pure push variant: feed:{user_id} with up to 200 tweet IDs. The 90% hit rate is slightly lower than the pure push variant's 92% because whale tweets are excluded — some feeds that were previously warm (kept alive by frequent whale tweets) now require WhaleCache augmentation.

The TweetStore (DynamoDB) persists all tweets from both normal and whale users. It includes an is_whale boolean flag indicating the author's classification at write time. This supports cache miss reconstruction: when FeedReader rebuilds a feed from TweetStore, it can filter out whale tweets (which should come from WhaleCache instead).

Architecture Preview

Loading architecture preview...

Open in Simulator

Request Flow — Hybrid Fanout

This sequence diagram shows the two key flows unique to the hybrid architecture: (1) the write path with whale classification, where TweetWriter decides whether to trigger Kafka fanout (normal user) or write to WhaleCache (whale user), and (2) the read path with dual-cache merge, where FeedReader queries both FeedCache and WhaleCache and merges the results. The whale classification is the critical decision point that prevents 50M-write fanout storms.

Loading diagram...

Step-by-Step Walkthrough

1User posts a tweet — API Gateway routes POST to TweetWriter via Write LB
2TweetWriter persists the tweet to TweetStore (DynamoDB PutItem, ~50ms durable write)
3TweetWriter checks author's follower count against the 100K whale threshold
4Normal user (<100K followers): publish tweet_created event to Kafka; FanoutWorker fans out to followers' FeedCache entries
5Whale user (>=100K followers): write directly to WhaleCache (Redis SET, ~2ms) — no Kafka, no fanout
6User opens their feed — API Gateway routes GET to FeedReader via Read LB
7FeedReader issues two parallel cache reads: LRANGE on FeedCache (non-whale tweets) and MGET on WhaleCache (whale tweets)
8FeedReader merges the two sorted lists by timestamp and returns the top 20 tweets (~10ms total)

Pseudocode

// Write path — whale-aware routing
async function postTweet(author_id, content):
    tweet_id = generateSnowflakeId()

    // 1. Persist to durable store (all tweets)
    await tweetStore.putItem({
        tweet_id, author_id, content,
        created_at: now(), is_whale: isWhale(author_id)
    })  // ~50ms

    // 2. Route based on follower count
    follower_count = await getFollowerCount(author_id)
    if follower_count < 100_000:
        // Normal user: async fanout via Kafka
        await kafka.publish("tweet-created", {
            tweet_id, author_id, created_at: now()
        })  // ~5ms, fire-and-forget
    else:
        // Whale: write directly to WhaleCache
        await whaleCache.lpush("whale:" + author_id, tweet_id)
        await whaleCache.ltrim("whale:" + author_id, 0, 199)
        // ~2ms — no fanout storm!

    return { status: 201, tweet_id }

// Read path — dual-cache merge
async function getFeed(user_id):
    // Parallel cache reads
    [feedTweets, whaleIds] = await Promise.all([
        feedCache.lrange("feed:" + user_id, 0, 19),  // ~2ms
        getFollowedWhaleIds(user_id)                   // from follows table
    ])

    // Fetch whale tweets
    whaleTweets = await whaleCache.mget(
        whaleIds.map(id => "whale:" + id)
    )  // ~2ms for ~5 whales

    // Merge two sorted lists by timestamp
    merged = mergeSorted(feedTweets, whaleTweets.flat())
    return merged.slice(0, 20)  // Top 20
    // Total: ~10ms (2ms + 2ms + 2ms merge + overhead)

Key Design Decisions

Hybrid Fanout (Push for Normal, Pull for Whales)

Choice

Users with <100K followers use fanout-on-write; users with >=100K followers store tweets in WhaleCache

Rationale

Pure push creates 50M Redis writes when a celebrity with 50M followers tweets — a 4-minute backlog. The hybrid approach eliminates this: whale tweets go to WhaleCache (1 write) and are merged at read time. The read-side cost is small: ~5 whales per user x 2ms per lookup = 10ms added to the read path. This trades a 250-second write storm for a 10ms read overhead — a 25,000x improvement.

Dedicated WhaleCache

Choice

Separate Redis cluster for whale users' tweets, keyed by whale user ID

Rationale

Storing whale tweets by author (whale:{user_id}) instead of by reader (feed:{user_id}) means each whale tweet is written once and read by all followers. With ~1000 whales and 200 tweets each, WhaleCache is only ~200K keys (~500MB) — tiny and extremely hot. Separating it from FeedCache prevents whale read patterns from affecting normal feed cache eviction.

100K Follower Whale Threshold

Choice

Users with 100,000+ followers classified as whales at tweet-post time

Rationale

100K balances write cost (max 100K fanout writes per normal-user tweet, completing in ~500ms) against read cost (average user follows ~5 whales, adding ~10ms merge time). The threshold is checked at tweet-post time, not continuously — a user crossing the threshold changes classification on their next tweet. 300s TTL on both caches handles stale data from threshold transitions.

Lazy Whale Transition

Choice

No retroactive migration when a user crosses the whale threshold

Rationale

When a user's follower count crosses 100K, the classification changes on their next tweet — there is no migration of existing cached feeds. Stale entries in FeedCache (containing pre-threshold tweets) expire via 300s TTL. This lazy approach avoids the complexity of eager migration (scanning all followers' caches to add/remove entries) at the cost of up to 5 minutes of stale data during the transition.

Sorted Merge at Read Time

Choice

FeedReader merges FeedCache and WhaleCache results by timestamp

Rationale

Both sources return tweet IDs sorted by timestamp. Merging two sorted arrays is O(K) where K = total tweets — approximately 2ms of CPU for a 20-tweet feed. This is negligible compared to the network RTT for the cache lookups themselves. The merge produces a unified timeline indistinguishable from a pure push feed.

Reduced Fanout Worker Pool

Choice

40 workers (vs 80 in pure push) since whales are excluded from fanout

Rationale

With whales excluded, the maximum fanout per event drops from unbounded to 100K, and the 95th percentile drops to under 1K. The reduced worker pool saves 50% on worker infrastructure while providing ample capacity. Kafka consumer lag absorbs any temporary bursts from users near the whale threshold.

Scale & Performance

Target RPS

1M+ feed reads/sec

Latency (p99)

<10ms feed reads (dual-cache merge)

Storage

~10 TB/year (DynamoDB + dual Redis)

Availability

99.9% (replicated, whale-resilient)

Database Schema (HLD)

tweets (DynamoDB)

Immutable tweet storage for both normal and whale users. Partitioned by author_user_id. Includes an is_whale flag indicating the author's classification at write time, used during cache miss reconstruction to filter whale tweets (which should come from WhaleCache).

tweet_id VARCHAR (Snowflake ID, sort key)author_id VARCHAR (partition key)content TEXT (max 280 chars)created_at TIMESTAMPTZmedia_ids VARCHAR[] (optional)is_whale BOOLEAN (author classification at write time)

Partition: author_id

64 partitions x 3 replicas. Eventual consistency. The is_whale flag prevents double-counting during cache miss reconstruction.

feed:{user_id} (FeedCache / Redis)

Pre-computed per-user timeline containing only NON-WHALE tweet IDs. Whale tweets are excluded and served from WhaleCache instead. Structure identical to the pure push variant but content is filtered to exclude whale authors.

tweet_ids LIST (ordered list of max 200 non-whale tweet IDs)updated_at STRING (ISO timestamp of last fanout write)

~90% hit rate (slightly lower than pure push because whale tweets no longer keep feeds warm). 300s TTL, LRU eviction.

whale:{user_id} (WhaleCache / Redis)

Recent tweets from whale users (>=100K followers). One key per whale, storing their last 200 tweet IDs. Written by TweetWriter on whale tweet posts. Read by FeedReader via MGET for each whale the requesting user follows.

tweet_ids LIST (ordered list of max 200 whale tweet IDs)author_id STRING (whale user ID)updated_at STRING (ISO timestamp of last whale tweet)

~1000 whales x 200 tweets = ~200K keys (~500MB). ~99% hit rate. Written by TweetWriter, not by FanoutWorker.

Event Contracts

tweet-created (normal users only)tweet-created

Emitted by TweetWriter for normal-user tweets only (<100K followers). Whale tweets bypass Kafka entirely and go directly to WhaleCache. Max fanout per event is capped at 100K (the whale threshold).

Key Schema

author_id: string (partition key for per-author ordering)

Value Schema

{ tweet_id: string (Snowflake ID), author_id: string, created_at: string (ISO timestamp) }

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
Naive (Fanout-on-Read)	T1	150ms-3s feed reads	~500 concurrent users	$500/month (single DB)	Low — no cache, no workers	99% (single DB, no failover)
Fanout-on-Write (Push)	T2	<5ms feed reads (Redis)	200K+ feed reads/sec	$3,000/month (CQRS + Redis + Kafka)	Medium — Kafka, Redis, workers	99.9% (replicated components)
Hybrid Push+Pull (Whale-Aware)	T3	<10ms feed reads (dual-cache merge)	1M+ feed reads/sec	$4,500/month (dual cache + workers)	High — whale detection, dual cache	99.9% (replicated, whale-resilient)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

How does the hybrid approach solve the whale problem?

In pure fanout-on-write, a celebrity with 50M followers triggers 50M Redis writes per tweet — taking ~250 seconds. The hybrid approach classifies users with 100K+ followers as 'whales' and stores their tweets in a dedicated WhaleCache instead of fanning out. At feed read time, FeedReader merges the user's pre-computed non-whale feed with whale tweets from WhaleCache. This replaces a 250-second write storm with a ~5ms read-time merge.

What happens when a user crosses the 100K follower threshold?

The transition is lazy. When a user's follower count crosses 100K (in either direction), the new classification applies to their next tweet. There is no retroactive migration of existing cached feeds. Pre-threshold tweets in FeedCache (or WhaleCache) expire via 300s TTL. During the ~5 minute transition window, some followers may see slightly stale data, but this is imperceptible in practice.

How much read latency does the whale merge add?

An average user follows approximately 5 whales. The WhaleCache lookup for 5 whales is a Redis MGET — a single round-trip returning all 5 results in ~2ms. The CPU time for merging two sorted lists of 20 tweets is ~2ms. Total overhead: ~5ms added to the ~5ms FeedCache read, for a total of ~10ms. Users who follow zero whales pay no overhead — their feed is served from FeedCache alone.

Why not just cache-aside the whale tweets into FeedCache?

If whale tweets were stored in FeedCache (per-reader), a whale with 50M followers would still require 50M cache entries — one per follower's feed. WhaleCache stores tweets per-author instead of per-reader: one entry per whale, read by all followers. This is the key insight — storing by author instead of by reader turns O(N) writes into O(1) writes at the cost of O(W) reads per feed request, where W is the number of whales the user follows.

How does this compare to Twitter/X's actual production architecture?

Twitter's real architecture uses a similar hybrid approach, though with additional complexity: ML-based ranking (not just chronological), real-time engagement signals, multiple cache tiers, and global distribution. The whale threshold concept maps to Twitter's 'celebrity special-casing' in their fanout system. The core insight — split fanout strategy by follower count — is the same. This template captures the essential architecture without the production-specific complexity.

Related Templates

Twitter Feed — Naive (Fanout-on-Read)Twitter Feed — Fanout-on-Write (Push Model)

Discussion

Ready to design your own Twitter Feed?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator