Design a multi-channel notification platform supporting push, email, SMS, and in-app notifications with priority queues, rate limiting, and user preferences.
A notification system is a critical infrastructure component that every large-scale application needs, yet its design complexity is often underestimated. Unlike a simple message sender, a production notification platform must support multiple delivery channels (push notifications, email, SMS, in-app), respect user preferences and quiet hours, enforce rate limits to prevent notification fatigue, handle delivery failures with retries, and process millions of notifications per hour with varying priority levels.
At the scale of companies like Facebook, Twitter, or Uber, the notification system delivers billions of notifications daily across all channels. Each notification originates from one of dozens of internal services (new message, friend request, payment received, promotional campaign) and must be routed through the appropriate channel based on user preferences, notification type, urgency, and channel availability. A payment confirmation might require SMS + push + email for redundancy, while a promotional notification might only use push with email fallback.
The reliability requirements are stringent: critical notifications (security alerts, payment confirmations, two-factor codes) must have near-100% delivery rates, while marketing notifications can tolerate lower reliability. This creates a natural priority system where high-priority notifications bypass rate limits and receive more aggressive retry policies. The system must also handle third-party delivery service outages (APNs, FCM, SendGrid, Twilio) gracefully by queuing messages and retrying when service recovers.
This template models the complete notification platform: notification ingestion API, preference service, routing engine, priority queue system, channel-specific delivery workers (push, email, SMS, in-app), and delivery tracking service. The simulation demonstrates how priority queues affect delivery latency for critical notifications during traffic spikes, how rate limiting prevents notification fatigue, and how the system degrades gracefully during channel outages.
The notification system architecture follows a pipeline pattern: ingestion, enrichment, routing, queuing, delivery, and tracking. The Notification Ingestion API receives notification requests from internal services. Each request specifies the recipient, notification type, content template, and any dynamic parameters. The API validates the request, deduplicates (preventing the same notification from being sent twice within a time window), and publishes it to the processing pipeline.
The Preference Service enriches each notification with the recipient's channel preferences, quiet hours, and opt-out status. A user might configure: "Send order updates via push and email, promotional content via email only, and never send anything between 10pm and 7am." The service evaluates these rules and determines which channels to use for each notification. Notifications that violate quiet hours are held in a delay queue and dispatched when the window opens.
The Routing Engine determines the delivery path for each notification-channel pair. For push notifications, it resolves the recipient's device tokens (a user may have multiple devices). For email, it resolves the recipient's email address and selects the appropriate email service provider. For SMS, it determines the phone number and routes to the SMS gateway.
The Priority Queue System is the heart of the architecture. Notifications enter one of several priority lanes: CRITICAL (2FA codes, security alerts), HIGH (payment confirmations, direct messages), NORMAL (social interactions, content updates), and LOW (promotional, digest). Each lane has its own queue with dedicated consumer capacity. Critical notifications are processed immediately even under load, while low-priority notifications are rate-limited to prevent fatigue.
Channel-specific Delivery Workers consume from the queues and interact with external delivery services. The Push Worker sends notifications via APNs (iOS) and FCM (Android). The Email Worker renders HTML templates and sends via SendGrid or SES. The SMS Worker sends via Twilio. Each worker implements circuit breaker patterns — if the external service is down, the circuit opens and messages are re-queued for later delivery rather than dropped.
The Delivery Tracking Service records the lifecycle of each notification: created, queued, sent, delivered, opened, clicked. This data feeds dashboards for ops monitoring and provides delivery receipts back to the originating service. It also enables A/B testing of notification content and timing strategies.
The notification system is a multi-channel delivery pipeline that transforms a single notification request into one or more delivery attempts across push, email, and SMS channels. The pipeline is designed around two core principles: never drop a notification (critical messages like 2FA codes must be delivered), and never overwhelm a user (promotional notifications are rate-limited and respect quiet hours).
The Preference Service is the enrichment layer that turns a generic notification into a personalized delivery plan. It resolves the recipient's channel preferences ("push + email, no SMS"), checks quiet hours ("don't disturb 11pm-7am in user's timezone"), verifies opt-out status per notification category, and resolves device tokens (a single user may have 3 devices registered for push). The output is a set of delivery tasks, each targeting a specific channel and device.
The Priority Queue System is the traffic shaper. CRITICAL notifications (2FA, security alerts) bypass rate limiting entirely and are processed immediately even under heavy load. HIGH (payments, direct messages), NORMAL (social), and LOW (promotional) each have dedicated consumer capacity with progressively lower throughput limits. Under load shedding, LOW is paused first while CRITICAL continues uninterrupted. Circuit breakers on each delivery worker prevent cascade failures when an external provider (APNs, SendGrid, Twilio) is degraded — messages are re-queued rather than dropped.
Step-by-Step Walkthrough
Pseudocode
// Notification Pipeline — ingestion to multi-channel delivery
async function sendNotification(userId, template, data, priority):
// 1. Deduplicate (idempotent)
idempotencyKey = hash(userId, template, data)
if await redis.get(`dedup:${idempotencyKey}`):
return { status: "duplicate", skipped: true }
await redis.set(`dedup:${idempotencyKey}`, "1", { ttl: 3600 })
// 2. Enrich with user preferences
prefs = await preferenceService.resolve(userId, template.category)
if prefs.optedOut: return { status: "opted_out" }
if prefs.inQuietHours && priority !== "CRITICAL":
return scheduleForLater(userId, template, data, prefs.quietHoursEnd)
// 3. Route to channels
tasks = []
for channel in prefs.enabledChannels:
endpoints = await routingEngine.resolve(userId, channel)
for endpoint in endpoints:
tasks.push({
channel, endpoint, template, data, priority,
notificationId: uuid()
})
// 4. Enqueue by priority
for task in tasks:
await priorityQueue[task.priority].enqueue(task)
await deliveryTracker.record(task.notificationId, "queued")
// Push Worker — with circuit breaker
async function deliverPush(task):
try:
rendered = renderTemplate(task.template, task.data)
if task.endpoint.platform === "ios":
await apns.send(task.endpoint.deviceToken, rendered)
else:
await fcm.send(task.endpoint.deviceToken, rendered)
await deliveryTracker.record(task.notificationId, "delivered")
catch (error):
if circuitBreaker.isOpen(task.endpoint.platform):
await requeue(task, backoff: exponential) // Don't drop
else:
circuitBreaker.recordFailure(task.endpoint.platform)
await requeue(task, backoff: 1_s)Choice
Separate priority lanes with dedicated consumer pools
Rationale
A single queue with priority reordering can still block critical notifications behind a large batch of low-priority messages (head-of-line blocking). Separate queues with dedicated consumers guarantee that critical notifications are processed independently of other traffic. Each lane's consumer pool is scaled according to its SLA: critical notifications get over-provisioned consumers to ensure sub-second processing even during traffic spikes.
Choice
Per-user, per-channel, per-notification-type rate limits with token bucket
Rationale
Notification fatigue is the primary reason users disable notifications entirely, which degrades the value of the channel for all notification types. Rate limiting at multiple granularities prevents this: per-user limits cap total notifications (e.g., max 10 push notifications per hour), per-channel limits prevent channel saturation, and per-type limits prevent a single notification category from dominating. Token bucket allows burst capacity while maintaining average rate compliance.
Choice
Server-side template rendering with channel-specific formatters
Rationale
Notification content must be adapted for each channel: push notifications have strict character limits and require concise copy, emails support rich HTML with images and CTAs, SMS is plain text with URL shortening. A shared template engine with channel-specific formatters ensures consistent messaging while optimizing for each channel's constraints and capabilities.
Choice
Circuit breaker with exponential backoff and dead letter queue
Rationale
External delivery services (APNs, FCM, Twilio, SendGrid) experience periodic outages. Circuit breakers prevent overwhelming a degraded service with retries. Exponential backoff spaces out retry attempts to give the service time to recover. A dead letter queue captures notifications that exceed the maximum retry count, enabling manual investigation and batch re-processing after the outage resolves.
Choice
Idempotency key with sliding window (1-hour TTL in Redis)
Rationale
Duplicate notifications are a common source of user complaints. Internal services may accidentally send the same notification twice due to retries or race conditions. A Redis-based deduplication layer checks each incoming notification against a sliding window of recently processed idempotency keys. Duplicates within the window are silently dropped. The 1-hour TTL balances deduplication coverage against memory cost.
Target RPS
50K notifications/s
Latency (p99)
<1s (critical), <30s (normal)
Storage
~200 GB/year (delivery logs)
Availability
99.99%
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
A multi-channel notification system uses a routing engine that evaluates each notification against the recipient's channel preferences and determines which channels to use. The notification is then forked into channel-specific delivery paths, each with its own queue, formatter, and delivery worker. A shared delivery tracking service records the status across all channels, enabling the system to implement fallback logic (e.g., send SMS if push delivery fails).
Notification fatigue prevention uses a multi-layered rate limiting strategy: (1) Per-user rate limits cap total notifications per time window. (2) Per-channel limits prevent any single channel from being overused. (3) Per-type limits prevent one category from dominating. (4) Smart batching combines related notifications into digests (e.g., '5 people liked your post' instead of 5 separate notifications). (5) ML-based send-time optimization delivers notifications when the user is most likely to engage.
Delivery failures are handled with a circuit breaker and retry pattern. When an external service (APNs, Twilio) returns an error or times out, the notification is re-queued with exponential backoff (1s, 2s, 4s, 8s, etc.). If the service appears to be down (consecutive failures exceed a threshold), the circuit breaker opens and all notifications for that channel are held in the queue without making requests to the degraded service. Notifications that exceed the maximum retry count are moved to a dead letter queue for manual review.
Push notifications are delivered via platform-specific services (APNs for iOS, FCM for Android) and appear on the device's lock screen or notification center even when the app is not open. In-app notifications are displayed within the application UI when the user is actively using the app, stored in the app's notification inbox. Emails are delivered via email providers (SendGrid, SES) to the user's inbox. Each channel has different reach, urgency, and content capabilities — push for immediate alerts, in-app for contextual updates, email for detailed information.
Critical notifications bypass normal rate limits and are processed through a dedicated high-priority queue with over-provisioned consumer capacity. The priority queue system assigns each notification to a lane (CRITICAL, HIGH, NORMAL, LOW) based on its type. Critical consumers are scaled to handle peak load with sub-second processing time. Additionally, critical notifications use the fastest available channel (SMS or push) and implement aggressive retry policies with shorter backoff intervals to maximize delivery reliability.
Sign in to join the discussion.
Ready to design your own Notification System?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator