Medium12 componentsInterview: High

Notification System — Multi-Channel Fanout

Q: Why add WebSocket when push notifications already work?

Push notifications take 2-5 seconds for delivery via FCM/APN and require the user to NOT be in the app. When a user is actively looking at the application, WebSocket delivers in ~10ms — 500x faster. This is critical for chat-like experiences, real-time alerts, and any notification the user should see immediately. Slack, Discord, and WhatsApp all use WebSocket for in-app delivery, falling back to push only when the user closes the app.

Q: How does presence tracking work and what happens on false positives?

WebSocketService updates a presence:{user_id} key in Redis on every connect, disconnect, and 30-second heartbeat. The key has a 60-second TTL. If the user's browser crashes without a clean disconnect, the key expires after 60 seconds (one missed heartbeat). During this window, FanoutWorker may route to in-app for a user who is no longer online — the WebSocket send will fail silently, and the notification is logged as undelivered. A compensating job can re-route failed in-app notifications to push.

Q: How does the system handle a broadcast to 1 million recipients?

NotifyService publishes a single raw event to RawStream with a segment ID. FanoutWorker resolves the segment to 1M user IDs, checks preferences for each (pipelined Redis MGET), and expands to ~2M channel events (1M users x ~2 channels each). At the 50K expansion/sec rate limit, full expansion takes ~40 seconds. Push delivery follows at pipeline pace. The API returns 202 instantly — the caller does not wait for expansion.

Q: Why not use a single Kafka cluster with partitioned topics?

You can use a single Kafka cluster with separate topics — the 'two Kafka clusters' in the architecture refers to two logical topic groups (raw-notifications and channel-specific topics), not necessarily separate physical clusters. In practice, most deployments use a single MSK cluster with separate topics. The key is that each topic has independent consumer groups, enabling independent scaling and backpressure per channel.

Q: How does this compare to the Async Pipeline variant?

The Async Pipeline variant uses a single Kafka topic with channel routing embedded in the message. All workers consume from the same topic and filter by channel. This simpler design works well for push and email but cannot support WebSocket (which needs a persistent connection server, not a stateless worker). The Multi-Channel Fanout adds: (1) in-app delivery via WebSocket, (2) presence-aware routing, (3) broadcast expansion, and (4) per-channel independent scaling. The trade-off is 12 components vs 8 and higher operational complexity.

Q: What is the WebSocket connection capacity and how does it scale?

WebSocketService runs 6 pods with 200 threads each, supporting ~200K concurrent WebSocket connections. Each connection consumes ~50KB of memory (buffer + state). Total memory: 200K x 50KB = ~10GB across 6 pods. Scaling is horizontal: adding pods increases connection capacity linearly. At 1M concurrent connections, you would need ~30 pods. Connection routing uses consistent hashing on user_id to ensure a user's connections always land on the same pod.

Production-grade notification system with a two-stage Kafka pipeline and WebSocket for sub-second in-app delivery. FanoutWorker expands per-recipient and per-channel, routing to dedicated channel streams. Presence tracking enables intelligent push-vs-in-app routing.

NotificationsKafkaWebSocketFanoutMulti-ChannelPresence Tracking

Try in Simulator

Problem Statement

The multi-channel fanout architecture is the production-grade approach to notification systems that adds three critical capabilities missing from the single-stage Async Pipeline variant: in-app real-time delivery via WebSocket, per-channel independent scaling, and broadcast notification expansion. This is the architecture used by platforms like Slack, Discord, and WhatsApp where sub-second in-app delivery is a core product requirement.

The most significant addition is the WebSocket channel for in-app notifications. The Async Pipeline variant supports only push (FCM/APN) and email (SendGrid/SES) — both of which require the user to not be actively using the app. When a user is online and looking at the application, the fastest delivery mechanism is a WebSocket push directly to the open browser tab or mobile app. WebSocketService maintains persistent connections with 30-second heartbeats, tracks user presence in a dedicated Redis cache (PresenceCache), and delivers notifications in under 200ms. This is 25x faster than push notifications (5 seconds) and 150x faster than email (30 seconds).

The two-stage Kafka pipeline is the architectural innovation that enables this multi-channel delivery. Stage 1: NotifyService publishes raw notification events to RawStream (Kafka) — each event contains the rendered content, recipient list, and requested channels. The API returns 202 in ~50ms. Stage 2: FanoutWorker consumes from RawStream and expands each event per-recipient and per-channel. For each recipient, it checks user preferences (UserPrefsCache) for opt-out and channel selection, checks presence (PresenceCache) for online/offline status, and publishes per-channel events to the appropriate ChannelStream topic (push-events, email-events, or inapp-events). This two-stage design keeps the API thin (one Kafka message per request) while enabling sophisticated per-recipient routing.

Presence-aware routing is a key differentiator. When the FanoutWorker expands a notification for a specific recipient, it checks PresenceCache to determine if the user is currently connected via WebSocket. If online, the notification is routed to inapp-events for sub-second WebSocket delivery. If offline, it goes to push-events for FCM/APN delivery. This avoids the annoyance of receiving a push notification when you are already looking at the app, and it reduces unnecessary push delivery costs. Users can configure per-channel preferences to receive both channels if desired.

The per-channel architecture enables independent scaling. PushWorker (10 instances) handles ~8K pushes/sec with 200ms per FCM call. EmailWorker (4 instances) handles ~5K emails/sec with batching. WebSocketService (6 pods) supports ~200K concurrent connections. Each channel has its own Kafka consumer group, its own DLQ, its own retry policy, and its own scaling triggers. A SendGrid outage does not affect push or in-app delivery. A spike in push volume does not delay email delivery. This isolation is impossible with a single-stage pipeline where all channels share the same Kafka topic and consumer group.

Broadcast notification expansion is another critical capability. When a notification targets a segment (e.g., all users in a region, all premium subscribers), the raw event contains a segment ID rather than individual user IDs. The FanoutWorker resolves the segment to individual users and expands accordingly — a single raw event can expand to millions of per-channel events. Rate limiting at the FanoutWorker level (50K expansions/sec) prevents this expansion from overwhelming the ChannelStream topics. The Async Pipeline variant does not support broadcast — it requires the caller to enumerate individual recipients.

This is the architecture that senior candidates should propose after discussing the Async Pipeline variant. The progression from sync to async to multi-channel fanout demonstrates deepening understanding of notification system trade-offs. The key additions — WebSocket for real-time, two-stage pipeline for fanout, presence for intelligent routing — are exactly what interviewers at communication platforms like Slack, Twilio, and Discord expect.

Architecture Overview

The multi-channel fanout system uses 12 components organized into a two-stage Kafka pipeline with per-channel delivery and WebSocket for real-time in-app notifications. The components are: Client, API Gateway, Load Balancer, NotifyService, UserPrefsCache (Redis), TemplateDB (PostgreSQL), RawStream (Kafka), FanoutWorker, PresenceCache (Redis), ChannelStream (Kafka with 3 topics), PushWorker, EmailWorker, and WebSocketService.

The API layer is similar to the Async Pipeline variant but optimized for higher throughput. Client requests enter through the API Gateway (JWT auth, 20K RPS rate limit), pass through the Load Balancer (40K RPS capacity, 2.5x headroom), and reach NotifyService (10 ECS Fargate pods, 100 threads each, 1000 total concurrent capacity). NotifyService validates the request, renders the template from TemplateDB with the caller's variables, and publishes a raw notification event to RawStream (Kafka). The API returns 202 Accepted in approximately 50ms. Notably, NotifyService does NOT check user preferences — preference resolution is deferred to the FanoutWorker where per-recipient expansion happens. This keeps the API thin and fast.

RawStream (Amazon MSK, 8 partitions) carries raw notification events from NotifyService to FanoutWorker. Each event contains the rendered content, recipient list (which may be individual user IDs or a segment ID for broadcast notifications), and requested channels. Partitioned by notification_id for ordering. This is Stage 1 of the two-stage pipeline — the raw, unexpanded notification event that has not yet been resolved to individual per-channel delivery tasks.

FanoutWorker (12 ECS Fargate instances, 4 vCPU/8 GB each) is the core expansion engine that transforms raw notifications into per-channel delivery events. For each raw event consumed from RawStream, it executes four steps: (1) resolves segment IDs to individual user IDs if the notification targets a user segment rather than explicit recipients; (2) checks per-user preferences in UserPrefsCache (Redis) via pipelined MGET for opt-out flags, quiet hours windows, and enabled channel selection; (3) checks presence in PresenceCache (Redis) to determine whether each user is currently online via WebSocket; (4) expands each event into per-recipient x per-channel events and publishes to the appropriate ChannelStream topic (push-events, email-events, or inapp-events). A single raw event targeting 1,000 recipients across 2 channels expands to approximately 2,000 channel events. Rate-limited to 50K expansions per second to prevent overwhelming ChannelStream during large broadcast notifications.

ChannelStream (Amazon MSK, 32 partitions distributed across 3 topics) carries per-channel events to the appropriate delivery workers. Three topics: push-events (16 partitions for the highest-volume channel), email-events (8 partitions), inapp-events (8 partitions). Each topic has its own independent consumer group, enabling independent scaling, independent backpressure handling, and independent dead-letter queues. This is Stage 2 of the pipeline — the expanded, routed notification events ready for final delivery to end users.

PushWorker (10 instances) consumes push-events and delivers via FCM for Android and APN for iOS (~200ms per external call). EmailWorker (4 instances) consumes email-events and delivers via SendGrid/SES (~500ms per batch, batching up to 10 emails per API call for throughput efficiency). Both implement idempotent delivery using a Redis dedup set and exponential backoff retry with DLQ for permanent failures. WebSocketService (6 ECS Fargate pods, 200 threads each, supporting ~200K concurrent connections) consumes inapp-events and pushes notifications over persistent WebSocket connections (~10ms delivery latency). It manages the full connection lifecycle: upgrade, heartbeat (30-second interval), reconnection, and graceful disconnect. On each connect and heartbeat, it updates PresenceCache.

PresenceCache (Amazon ElastiCache for Redis, dedicated 2-node cluster, 13 GB each) tracks which users are currently connected via WebSocket. Key pattern: presence:{user_id} with a 60-second TTL that auto-expires on missed heartbeats. This enables presence-aware routing in FanoutWorker: online users receive in-app delivery via WebSocket (sub-200ms), offline users receive push delivery via FCM/APN (2-5 seconds). The separate Redis cluster prevents volatile presence updates (written every 30 seconds per connected user) from evicting long-lived user preferences stored in UserPrefsCache with 1-hour TTL.

Architecture Preview

Loading architecture preview...

Open in Simulator

Request Flow — Multi-Channel Fanout with Presence

This sequence diagram shows the three phases: (1) API acceptance (~50ms), (2) fanout expansion with preference and presence checks, and (3) per-channel delivery. The critical addition over the Async Pipeline variant is the presence check that routes online users to WebSocket for sub-200ms delivery instead of 2-5 second push notifications.

Loading diagram...

Step-by-Step Walkthrough

1Client sends POST /api/v1/notifications with template_id, recipients, channels, and priority
2NotifyService renders the template from TemplateDB (~10ms)
3NotifyService publishes raw notification event to RawStream (Kafka, ~5ms)
4API returns 202 Accepted (~50ms total)
5FanoutWorker consumes the raw event from RawStream
6FanoutWorker checks user preferences in UserPrefsCache (Redis MGET, ~2ms per batch)
7FanoutWorker checks user presence in PresenceCache (Redis MGET, ~1ms per batch)
8FanoutWorker expands per-recipient x per-channel and publishes to ChannelStream topics
9PushWorker consumes from push-events and delivers via FCM/APN (~200ms)
10EmailWorker consumes from email-events and delivers via SendGrid/SES (~500ms)
11WebSocketService consumes from inapp-events and pushes via WebSocket (~10ms)

Pseudocode

// Phase 1 — API path (~50ms)
async function sendNotification(template_id, recipients, channels, variables, priority):
    template = await db.query("SELECT * FROM templates WHERE template_id = $1", [template_id])
    rendered = renderTemplate(template.body_template, variables)
    await kafka.publish("raw-notifications", {
        notification_id: generateUUID(),
        rendered_content: rendered, recipients, channels, priority
    })
    return { status: 202, notification_id }

// Phase 2 — Fanout expansion (FanoutWorker)
async function expandNotification(rawEvent):
    prefs = await redis.mget(rawEvent.recipients.map(r => "prefs:" + r))
    presence = await redis.mget(rawEvent.recipients.map(r => "presence:" + r))

    for each (recipient, pref, online) in zip(recipients, prefs, presence):
        if pref.opt_out || inQuietHours(pref): continue
        for each channel in intersect(rawEvent.channels, pref.channels):
            if channel == "push" && online: channel = "inapp"  // presence-aware routing
            topic = channel + "-events"
            await kafka.publish(topic, {
                notification_id: rawEvent.notification_id,
                recipient_id: recipient, channel,
                rendered_content: rawEvent.rendered_content
            })

// Phase 3 — WebSocket delivery (~10ms)
async function deliverInApp(event):
    connection = connections.get(event.connection_id)
    if connection && connection.isAlive():
        connection.send(JSON.stringify(event))  // ~10ms
        await db.execute("INSERT INTO delivery_log (...)", [...])

Key Design Decisions

Two-Stage Kafka Pipeline

Choice

RawStream for API decoupling, ChannelStream for per-channel delivery

Rationale

Stage 1 decouples the API from fanout expansion (API returns in ~50ms). Stage 2 decouples fanout from delivery (each channel scales independently). A single-stage pipeline would require all workers to understand preference logic, presence checking, and multi-channel routing. Two stages separate concerns: FanoutWorker handles expansion; channel workers handle delivery.

WebSocket for In-App Delivery

Choice

Persistent WebSocket connections with 30-second heartbeats for sub-200ms delivery

Rationale

Push notifications take 2-5 seconds (FCM network roundtrip). Email takes 30+ seconds. WebSocket delivers in ~10ms — 500x faster than push. For users actively in the app, this is the optimal channel. Presence tracking ensures WebSocket is used only when the user is online, falling back to push for offline users.

Presence-Aware Routing

Choice

FanoutWorker checks PresenceCache to route online users to in-app, offline to push

Rationale

Sending push notifications to users who are already looking at the app is redundant and annoying. Presence-aware routing checks a 60-second TTL key in Redis to determine if the user has an active WebSocket connection. Online users get in-app delivery; offline users get push. This reduces unnecessary push volume by ~35% and delivers a better user experience.

Dedicated PresenceCache

Choice

Separate Redis cluster for presence tracking, distinct from UserPrefsCache

Rationale

Presence data is volatile — updated every 30 seconds per connected user, with 60-second TTL. UserPrefsCache stores stable data with 1-hour TTL. Mixing volatile and stable data in one cache causes volatile updates to evict stable entries via LRU. Separate clusters prevent this interference and allow different capacity planning.

Per-Channel Kafka Topics

Choice

Three separate topics (push-events, email-events, inapp-events) instead of one

Rationale

Push, email, and in-app have different latency SLAs (200ms, 5s, 30s), worker pool sizes (10, 4, 6), and failure modes. Separate topics enable independent consumer groups with independent scaling, backpressure, and DLQ. A single topic would waste Kafka bandwidth as workers skip irrelevant messages.

Fanout at Worker Level

Choice

API publishes one raw event; FanoutWorker expands per-recipient x per-channel

Rationale

A broadcast to 100K recipients would create 100K+ Kafka messages at API time, overwhelming NotifyService and slowing the API for all callers. Moving expansion to FanoutWorker keeps the API thin (one message per request) and lets dedicated workers handle write amplification at their own pace, rate-limited to 50K expansions/sec.

Scale & Performance

Target RPS

15K+ notifications/sec (API), 50K delivery events/sec

Latency (p99)

<200ms API, <200ms in-app, <5s push, <30s email

Storage

~1 TB/year (PostgreSQL + dual Kafka retention)

Availability

99.9% (per-channel retry, DLQ, replicated components)

Database Schema (HLD)

templates

Stores notification templates with per-channel message bodies containing {{variable}} placeholders. Read-heavy, write-sparse. Fetched by NotifyService on every notification send for rendering.

template_id VARCHAR PKchannel VARCHAR (push, email, inapp)subject VARCHAR (email subject template, nullable)body_template TEXT (message body with {{variables}})

Indexes: PRIMARY KEY (template_id)

Small table (~1K rows). Templates are rendered at API time.

delivery_log

Append-only log tracking delivery status per notification-recipient-channel. Written by PushWorker, EmailWorker, and WebSocketService after each delivery attempt. Indexed by notification_id for status queries.

delivery_id UUID PKnotification_id UUID (indexed)recipient VARCHAR (user ID)channel VARCHAR (push, email, inapp)status VARCHAR (pending, sent, delivered, failed)delivered_at TIMESTAMPTZ (nullable)created_at TIMESTAMPTZ

Indexes: idx_delivery_notification ON (notification_id), idx_delivery_status ON (status, created_at)

Grows at ~15K rows/sec at peak (post-fanout expansion). 2 read replicas for status queries.

presence:{user_id} (PresenceCache / Redis)

Tracks WebSocket connection state per user. Updated on connect, disconnect, and 30-second heartbeat. 60-second TTL auto-expires stale entries. Read by FanoutWorker for presence-aware routing.

online BOOLEAN (connection active)connection_id STRING (WebSocket connection identifier)last_seen TIMESTAMP (last heartbeat time)

~200K keys at peak concurrent connections. ~50 bytes per key. Total ~10MB.

prefs:{user_id} (UserPrefsCache / Redis)

Cached user notification preferences. Read by FanoutWorker during per-recipient expansion. 98% hit rate, 3600s TTL. Invalidated on preference update.

channels ARRAY (enabled channels: push, email, inapp)quiet_hours_start INTEGER (UTC hour, nullable)quiet_hours_end INTEGER (UTC hour, nullable)opt_out BOOLEAN (global opt-out flag)

Key pattern: prefs:{user_id}. ~5M keys, ~200 bytes per key. Total ~1GB.

Event Contracts

raw-notificationsraw-notifications

Carries raw notification events from NotifyService to FanoutWorker. Each event contains rendered content, recipient list (or segment ID for broadcast), and requested channels. 8 partitions, partitioned by notification_id. ~15K events/sec at peak.

Key Schema

notification_id: string (partition key for ordering)

Value Schema

{ notification_id: string, rendered_content: string, recipients: string[], channels: string[], priority: string, template_id: string }

push-eventspush-events

Carries expanded push notification events from FanoutWorker to PushWorker. Each event targets a single recipient. 16 partitions, partitioned by recipient_id. ~8K events/sec at peak.

Key Schema

recipient_id: string (partition key for per-user ordering)

Value Schema

{ notification_id: string, recipient_id: string, channel: 'push', rendered_content: string, title: string, priority: string }

email-eventsemail-events

Carries expanded email events from FanoutWorker to EmailWorker. Each event targets a single recipient. 8 partitions, partitioned by recipient_id. ~5K events/sec at peak.

Key Schema

recipient_id: string (partition key for per-user ordering)

Value Schema

{ notification_id: string, recipient_id: string, channel: 'email', rendered_content: string, subject: string, priority: string }

inapp-eventsinapp-events

Carries expanded in-app notification events from FanoutWorker to WebSocketService. Each event targets a single online recipient. 8 partitions, partitioned by recipient_id. ~7K events/sec at peak.

Key Schema

recipient_id: string (partition key for per-user ordering)

Value Schema

{ notification_id: string, recipient_id: string, channel: 'inapp', rendered_content: string, connection_id: string }

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
Naive (Synchronous Send)	T1	350-800ms API response	~500 RPS (thread pool ceiling)	$300/month (single DB + 4 pods)	Low — no queue, no workers	98% (no retry, single DB)
Async Pipeline (Priority Queue)	T2	<200ms API (202 Accepted)	10K+ RPS API, 10K delivery/sec	$2,000/month (Kafka + Redis + workers)	Medium — Kafka, Redis, workers	99.9% (retry + DLQ + replication)
Multi-Channel Fanout	T3	<200ms API, <200ms in-app	15K+ RPS API, 50K delivery/sec	$5,000/month (2x Kafka + WebSocket + Redis)	High — 12 components, WebSocket	99.9% (per-channel retry + DLQ)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

Why add WebSocket when push notifications already work?

Push notifications take 2-5 seconds for delivery via FCM/APN and require the user to NOT be in the app. When a user is actively looking at the application, WebSocket delivers in ~10ms — 500x faster. This is critical for chat-like experiences, real-time alerts, and any notification the user should see immediately. Slack, Discord, and WhatsApp all use WebSocket for in-app delivery, falling back to push only when the user closes the app.

How does presence tracking work and what happens on false positives?

WebSocketService updates a presence:{user_id} key in Redis on every connect, disconnect, and 30-second heartbeat. The key has a 60-second TTL. If the user's browser crashes without a clean disconnect, the key expires after 60 seconds (one missed heartbeat). During this window, FanoutWorker may route to in-app for a user who is no longer online — the WebSocket send will fail silently, and the notification is logged as undelivered. A compensating job can re-route failed in-app notifications to push.

How does the system handle a broadcast to 1 million recipients?

NotifyService publishes a single raw event to RawStream with a segment ID. FanoutWorker resolves the segment to 1M user IDs, checks preferences for each (pipelined Redis MGET), and expands to ~2M channel events (1M users x ~2 channels each). At the 50K expansion/sec rate limit, full expansion takes ~40 seconds. Push delivery follows at pipeline pace. The API returns 202 instantly — the caller does not wait for expansion.

Why not use a single Kafka cluster with partitioned topics?

You can use a single Kafka cluster with separate topics — the 'two Kafka clusters' in the architecture refers to two logical topic groups (raw-notifications and channel-specific topics), not necessarily separate physical clusters. In practice, most deployments use a single MSK cluster with separate topics. The key is that each topic has independent consumer groups, enabling independent scaling and backpressure per channel.

How does this compare to the Async Pipeline variant?

The Async Pipeline variant uses a single Kafka topic with channel routing embedded in the message. All workers consume from the same topic and filter by channel. This simpler design works well for push and email but cannot support WebSocket (which needs a persistent connection server, not a stateless worker). The Multi-Channel Fanout adds: (1) in-app delivery via WebSocket, (2) presence-aware routing, (3) broadcast expansion, and (4) per-channel independent scaling. The trade-off is 12 components vs 8 and higher operational complexity.

What is the WebSocket connection capacity and how does it scale?

WebSocketService runs 6 pods with 200 threads each, supporting ~200K concurrent WebSocket connections. Each connection consumes ~50KB of memory (buffer + state). Total memory: 200K x 50KB = ~10GB across 6 pods. Scaling is horizontal: adding pods increases connection capacity linearly. At 1M concurrent connections, you would need ~30 pods. Connection routing uses consistent hashing on user_id to ensure a user's connections always land on the same pod.

Related Templates

Notification System — Naive (Synchronous Send)Notification System — Async Pipeline (Priority Queue)

Discussion

Ready to design your own Notification System?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator