The simplest notification system: the service directly calls third-party APIs (FCM, SendGrid) synchronously and waits for the response before returning. No queue, no workers, no retry coordination. Demonstrates why synchronous delivery fails under spike traffic.
Designing a notification system is one of the most commonly asked system design interview questions because it touches on asynchronous processing, multi-channel delivery, retry strategies, and priority management. The naive synchronous approach is the natural starting point — it establishes the baseline that makes the improvements in async pipeline architectures measurable and concrete. Companies like Twilio, SendGrid, Slack, OneSignal, and virtually every mobile-first platform ask this question because notification delivery is a core infrastructure challenge.
The core challenge is delivering notifications across multiple channels (push, email, SMS) reliably and at scale. In the naive approach, the NotifyService receives a notification request, validates the payload, calls the appropriate third-party API (Firebase Cloud Messaging for push notifications, SendGrid for email) synchronously, waits for the response, records the delivery status in PostgreSQL, and returns the result to the caller. This is the simplest possible implementation — no message queues, no background workers, no retry coordination. The caller sends a request and gets back a definitive answer: the notification was delivered, or it was not.
The synchronous delivery model's fatal flaw is thread pool saturation. Each notification request holds a service thread for 200-500 milliseconds while waiting for the third-party API to respond. FCM calls average 200ms; SendGrid calls average 500ms. With 4 pods running 50 threads each (200 threads total) and a 300ms average hold time, the theoretical ceiling is approximately 666 requests per second. In practice, the ceiling is lower (~500 RPS) due to thread scheduling overhead and the occasional slow third-party response (p99 can reach 2 seconds). Above this threshold, requests queue behind occupied threads, and API latency climbs from 350ms to multiple seconds. The system does not fail gracefully — it degrades continuously, with every pending request contributing to the backlog.
The second critical weakness is the lack of retry coordination. Third-party APIs experience transient failures — FCM returns 503 during capacity issues, SendGrid rate-limits at high volume, APN tokens expire and need refresh. In the synchronous model, these failures are returned directly to the caller as 502 errors. The notification is lost unless the caller implements its own retry logic with idempotency keys, exponential backoff, and maximum retry limits. This pushes retry complexity to every upstream service that sends notifications, violating the single-responsibility principle and leading to inconsistent retry behavior across callers. The Async Pipeline variant centralizes retry with exponential backoff and dead-letter queues, ensuring uniform delivery guarantees regardless of which service triggered the notification.
The third weakness is cascading failures during third-party outages. If SendGrid goes down for 5 minutes, every email notification request during that window fails with a 502 error and occupies a thread for the full timeout duration (typically 5-10 seconds). This thread exhaustion cascades to push notifications too, since they share the same thread pool. A single-channel outage becomes a total system outage — users cannot receive push notifications because all threads are stuck waiting for email timeouts. The Async Pipeline variant uses Kafka to buffer notifications during outages, delivering them when the third-party recovers, and separate worker pools per channel isolate failures.
This template makes these failure modes visible and quantifiable. Run the simulation at increasing RPS and watch thread pool utilization climb toward 100%. Inject a simulated third-party outage and observe how the entire service degrades within seconds. The comparison with the Async Pipeline and Multi-Channel Fanout variants provides the concrete numbers to support the discussion of sync-vs-async trade-offs that interviewers expect. Understanding why synchronous delivery fails is the first step toward designing production-grade notification infrastructure.
The naive notification system is a five-component linear architecture: Client, API Gateway, Load Balancer, NotifyService, and PostgreSQL database. There is no message queue, no background workers, no cache layer, and no separation between the notification acceptance path and the delivery path. This simplicity is both the strength and the weakness of the design.
All traffic enters through the API Gateway (Amazon API Gateway, REST mode), which authenticates service-to-service JWT tokens (~3ms) and enforces per-service rate limits (2K RPS cap). Authenticated requests are forwarded to the Load Balancer (AWS Application Load Balancer), which distributes across NotifyService pods using round-robin. The Load Balancer supports 5K RPS — well above the system's actual ceiling, which is determined by thread pool saturation in NotifyService. Both notification sends and status queries flow through the same gateway, LB, and service — there is no CQRS split, no separate read/write paths, and no routing by operation type.
NotifyService is a stateless REST API running on 4 AWS ECS Fargate pods (2 vCPU, 4 GB each) with 50 threads per pod (200 total threads). It handles two operations: (1) notification send — validate the request payload, determine the target third-party API based on channel type, call FCM (for push, ~200ms) or SendGrid (for email, ~500ms) synchronously, wait for the complete response, record delivery status in PostgreSQL, and return the result to the caller; (2) status query — read delivery records from PostgreSQL by notification_id and return the current status. The critical bottleneck is the send operation: each thread is occupied for the full duration of the third-party call plus the database write, making thread count divided by per-request hold time the hard throughput ceiling. No connection pooling to third-party APIs is implemented — each request opens a fresh HTTP connection.
PostgreSQL (Amazon RDS, db.r7g.large) stores a single table: notifications. Each notification send results in an INSERT containing the notification_id (UUID primary key), channel, recipient, subject, body, status (sent or failed), error_message (on failure), created_at, and delivered_at timestamps. Status queries use a B-tree index on notification_id for O(log N) lookups. At low volume (under 500 writes per minute), the database is not a bottleneck — it handles both reads and writes with substantial headroom. A single read replica serves status queries to avoid loading the primary with read traffic. The table grows at approximately 30 MB per day at 500 notifications per minute, including index overhead.
The system has minimal redundancy. There is no cache layer — every status query hits PostgreSQL directly. There is no message queue — every notification is delivered synchronously in the request path. There is no deduplication mechanism — if a client retries a timed-out request, the recipient may receive duplicate notifications because the service has no way to check whether the first attempt succeeded. There is no user preference checking — the caller is entirely responsible for determining the correct channel, verifying the recipient has not opted out, and respecting quiet hours. These missing features are exactly what the Async Pipeline variant adds.
The concrete scaling ceiling is approximately 500 RPS sustained. At this point, 200 threads with a 300ms average hold time support 666 theoretical RPS, but thread scheduling overhead, garbage collection pauses, and occasional slow third-party responses (p99 reaching 2 seconds) reduce the practical limit. Above 500 RPS, thread pool utilization hits 100%, incoming requests queue behind occupied threads, and p99 latency climbs from 800ms to 5+ seconds. The system does not fail gracefully — it degrades continuously as more requests pile up, and there is no backpressure mechanism to reject excess load cleanly.
This sequence diagram shows the synchronous notification delivery flow. The critical insight is the thread hold time: NotifyService blocks for 200-500ms per request waiting for FCM or SendGrid. With 200 threads, this limits throughput to ~666 RPS. Compare with the Async Pipeline variant where the API returns 202 in ~50ms.
Step-by-Step Walkthrough
Pseudocode
// Synchronous notification send — thread blocks on third-party call
async function sendNotification(channel, recipient, body):
notification_id = generateUUID()
// Call third-party API synchronously (200-500ms!)
if channel == "push":
result = await fcm.send(recipient, body) // ~200ms
else if channel == "email":
result = await sendgrid.send(recipient, body) // ~500ms
// Record delivery status
await db.execute(
"INSERT INTO notifications (notification_id, channel, recipient,
body, status, created_at, delivered_at)
VALUES ($1, $2, $3, $4, $5, now(), now())",
[notification_id, channel, recipient, body, result.status]
) // ~30ms
return { status: 200, notification_id, delivery: result.status }
// Total: ~350ms (push) or ~550ms (email)Choice
Call FCM/SendGrid directly in the API request path and wait for the response
Rationale
Synchronous delivery is the simplest approach — no queues, no workers, no async state. The caller gets immediate confirmation of success or failure. The trade-off is that every request holds a thread for 200-500ms, making the thread pool the throughput bottleneck. At 200 threads and 300ms average hold, the ceiling is ~666 RPS.
Choice
No Kafka, no SQS, no async processing pipeline
Rationale
A message queue would decouple the API from delivery, allowing the API to return 202 in ~50ms while workers deliver in the background. The naive approach skips this for simplicity. Adding a queue is the core insight of the Async Pipeline variant and the first optimization interviewers expect candidates to propose.
Choice
Third-party failures are returned directly to the caller
Rationale
Retry coordination requires idempotency keys, retry state storage, exponential backoff logic, and dead-letter handling. For internal tooling with low volume, transient failures are rare and can be retried manually. The Async Pipeline variant adds retry coordination for production workloads.
Choice
One table for all notification records and delivery status
Rationale
A single PostgreSQL instance with one table (notifications) is the simplest storage model. At low volume (under 500 writes/min), the database handles both the INSERT on send and the SELECT for status queries without issue. The database grows at ~30 MB/day — years of headroom on a single instance.
Choice
Caller is responsible for channel selection and opt-out checks
Rationale
The naive approach does not check user preferences (opt-out, quiet hours, preferred channels) — it sends whatever the caller requests. This simplifies the service but pushes preference logic to every upstream caller. The Async Pipeline variant adds Redis-backed preference checking for centralized opt-out enforcement.
Target RPS
~500 sustained (thread pool ceiling)
Latency (p99)
350ms-800ms p50-p99 (dominated by third-party call)
Storage
~30 MB/day at 500 notifications/min
Availability
~99% (single DB, no failover, no retry)
| Operation | Time | Space | Notes |
|---|---|---|---|
| Send notification (POST /api/v1/notifications) | O(1) — single third-party call + single DB INSERT | O(1) per notification (~1KB row) | Latency is dominated by third-party response time (200-500ms), not computation. |
| Check status (GET /api/v1/notifications/{id}/status) | O(log N) — B-tree index lookup on notification_id | O(1) per query | N = total notification records. Sub-10ms at any reasonable scale. |
Stores every notification request and its delivery outcome. Written on every send attempt (INSERT). Read for status queries. Indexed by notification_id for lookups. Grows at ~30 MB/day at 500 notifications/min. No partitioning needed at this scale.
Indexes: idx_notifications_id ON (notification_id)
Single table, single primary instance. At 500 writes/min, the table grows ~1 GB/month with indexes. B-tree on notification_id supports status lookups.
| Variant | Tier | Latency | Throughput | Cost | Complexity | Reliability |
|---|---|---|---|---|---|---|
| Naive (Synchronous Send) | T1 | 350-800ms API response | ~500 RPS (thread pool ceiling) | $300/month (single DB + 4 pods) | Low — no queue, no workers | 98% (no retry, single DB) |
| Async Pipeline (Priority Queue) | T2 | <200ms API (202 Accepted) | 10K+ RPS API, 10K delivery/sec | $2,000/month (Kafka + Redis + workers) | Medium — Kafka, Redis, workers | 99.9% (retry + DLQ + replication) |
| Multi-Channel Fanout | T3 | <200ms API, <200ms in-app | 15K+ RPS API, 50K delivery/sec | $5,000/month (2x Kafka + WebSocket + Redis) | High — 12 components, WebSocket | 99.9% (per-channel retry + DLQ) |
This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.
Notification systems combine multi-channel delivery, asynchronous processing, priority management, retry strategies, and scalability in a single question. They appear at companies like Twilio, SendGrid, Slack, and virtually every company with mobile/web apps. Interviewers expect candidates to start with the synchronous approach, identify the thread pool bottleneck, propose async delivery via a message queue, and then discuss priority partitioning for transactional vs. marketing notifications.
Each notification request holds a thread for 200-500ms while waiting for FCM or SendGrid to respond. With 200 threads and a 300ms average hold time, the system can sustain approximately 666 RPS. Above this, requests queue behind occupied threads, latency climbs to seconds, and the service becomes unresponsive. The fix is async delivery: publish to Kafka and return 202 immediately, letting workers deliver in the background.
Add a message queue (Kafka, SQS, or RabbitMQ) to decouple the API from delivery. The API validates the request, publishes an event to the queue, and returns 202 Accepted in ~50ms. Background workers consume from the queue and deliver via third-party APIs with retry logic. This is the core of the Async Pipeline variant and reduces API latency from 350ms to 50ms while enabling retry coordination.
If SendGrid goes down, every email request holds a thread for the full timeout (5-10 seconds) before failing. This consumes threads that would otherwise handle push notifications. With a shared thread pool of 200 threads, a SendGrid outage can exhaust all threads within seconds, causing push notifications to fail too. A single-channel outage cascades to all channels. The Async Pipeline variant isolates channels with separate worker pools.
Deduplication requires an idempotency key (notification_id + recipient) checked before delivery. In the synchronous model, if a client retries a timed-out request, the service has no way to know if the first attempt succeeded — the original response was lost. The notification may be sent twice. The Async Pipeline variant uses a Redis dedup set with 24-hour TTL to prevent duplicate delivery, keyed by notification_id + recipient.
Sign in to join the discussion.
Ready to design your own Notification System?
Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.
Open Simulator