Vetora logo
Hard10 componentsInterview: High

Email Service — Queue-Based Pipeline (Kafka + Workers)

Industry-standard multi-stage email pipeline: submit, render, suppress, send. Per-stage Kafka queues enable independent scaling and retries. API returns 202 Accepted in 15ms while workers handle template rendering and SMTP delivery asynchronously.

EmailKafkaPipelineRedisAsync
Problem Statement

The queue-based email pipeline is the standard production architecture used by email delivery platforms handling millions of emails per day. It solves the fundamental problem that makes the naive synchronous approach unworkable: the 50-500ms SMTP blocking I/O that exhausts the thread pool at a few hundred emails per second.

The key architectural shift is decoupling the API response from SMTP delivery. Instead of blocking the caller until the SMTP handshake completes (200ms+ per email), the SubmitService validates the request, generates a message_id, publishes to a Kafka topic, and returns 202 Accepted in under 15ms. The actual email processing — template rendering, suppression checking, and SMTP delivery — happens asynchronously in dedicated worker pools, each consuming from their own Kafka topic.

The multi-stage pipeline architecture recognizes that email delivery has distinct processing phases with different failure modes and scaling requirements. Template rendering is CPU-bound (10ms per email for variable substitution and HTML generation). Suppression checking is memory-bound (Bloom filter lookup against 10B addresses in ~1ms). SMTP delivery is I/O-bound (50-500ms per email, dependent on the receiving ISP's response time). Separating these into distinct stages connected by Kafka queues means a slow ISP does not block template rendering, a rendering bug does not affect delivery of already-rendered emails, and each stage can scale independently.

The suppression check is the critical compliance addition over the naive approach. A Bloom filter in Redis holds 10 billion suppressed email addresses (hard bounces, unsubscribes, spam complaints) in approximately 12GB of memory, providing O(1) lookups in ~1ms with a 0.1% false positive rate. Without this check, the sender IP's bounce rate would climb above 5% within days, triggering ISP blacklisting. The Bloom filter trades a small false positive rate (occasionally suppressing a valid address) for massive memory efficiency — a traditional hash set would require 100GB+ for 10B entries.

The per-stage Kafka queues provide natural backpressure and retry isolation. During a 165K/sec bulk campaign burst, rendered emails queue up in the render-to-send Kafka topic while send workers drain at their sustainable rate (limited by SMTP I/O). If SMTP delivery fails for a batch of emails (soft bounce from an ISP), only the send stage retries — the email is not re-submitted or re-rendered. If template rendering fails (missing template, invalid variables), only the render stage retries — already-rendered emails in the send queue continue processing.

This architecture handles 12K emails per second sustained (transactional — order confirmations, password resets) and absorbs 165K/sec bulk campaign bursts through Kafka buffering. The submit tier (15 pods) handles the API burst; render workers (40) process templates at 160K/sec; send workers (80) deliver at 128K/sec with Kafka queue absorbing the overflow. The pipeline drains a 100M-email campaign in approximately 15 minutes.

The primary limitation of this variant is the lack of IP reputation management. All emails go through a single IP pool without transactional/bulk separation. A high-bounce-rate bulk campaign degrades the sending IP's reputation, which also affects transactional emails (password resets, 2FA codes) that share the same IPs. The Pipeline variant solves this with separate transactional and bulk delivery paths, per-IP reputation scoring, and DKIM/SPF/DMARC signing.

Email pipeline design appears in system design interviews at Amazon (SES), Mailchimp (Intuit), SendGrid (Twilio), Postmark, and any company operating email infrastructure. Interviewers expect candidates to explain why async decoupling is necessary, articulate the per-stage scaling rationale, and reason about the suppression list's role in deliverability.

Architecture Overview

The queue-based email system uses 10 components organized into a three-stage pipeline: submit, render, and send. Each stage has its own Kafka topic acting as a buffer and retry queue between stages.

All traffic enters through the API Gateway, which validates API keys (~3ms), enforces per-tenant rate limits (200K RPS cap), and routes to the SubmitService. The API Gateway is the single entry point for all email operations — transactional sends, bulk campaign submissions, and delivery status queries. Rate limiting at this layer prevents a single tenant from monopolizing outbound capacity during a campaign burst.

The SubmitService is a stateless REST API running on 15 pods with 100 threads each (1,500 concurrent connections). It handles three operations: (1) POST /api/v1/emails — validate request, generate message_id, publish to SubmitStream, return 202 Accepted; (2) POST /api/v1/campaigns — fan out a campaign's recipient list into individual messages published to SubmitStream; (3) GET /api/v1/messages/{id}/events — query delivery events from EventDB. The critical design choice is that the API response is decoupled from delivery — the client gets a message_id immediately and can poll for status later.

SubmitStream (Kafka, 64 partitions) is the first pipeline queue. It carries submitted emails from SubmitService to RenderWorker. During campaign bursts (165K/sec), SubmitStream absorbs the traffic spike while render workers process at their sustainable rate. Partitioned by message_id for even load distribution.

RenderWorker (40 instances) is the second pipeline stage. It consumes from SubmitStream, fetches the email template from TemplateCache (Redis, ~2ms hit rate of 98%), hydrates per-recipient variables (name, order ID, etc.), and publishes the fully rendered email to RenderStream. For raw HTML emails (no template), this is a pass-through. CPU-bound at ~10ms per email, 40 workers handle 4K renders/sec each = 160K/sec total.

RenderStream (Kafka, 64 partitions) is the second pipeline queue. It carries rendered emails to SendWorker. Partitioned by recipient_domain so all emails to the same ISP (gmail.com, yahoo.com) land on the same partition, enabling per-ISP send rate awareness.

SendWorker (80 instances) is the final pipeline stage. It consumes from RenderStream, checks the recipient against SuppressionCache (Bloom filter of 10B addresses, ~1ms), and if not suppressed, delivers via SMTP (~50ms per email). Delivery events (sent, bounced, suppressed) are written to EventDB. I/O-bound at ~50ms per email, 80 workers handle 1.6K sends/sec each = 128K/sec total. During campaign bursts, the gap between render throughput (160K/sec) and send throughput (128K/sec) is absorbed by the RenderStream Kafka queue.

TemplateCache (Redis, 3 nodes) caches email templates for fast hydration. 100K templates at ~10KB each = ~1GB working set. 98% hit rate means render workers rarely need a database fallback. SuppressionCache (Redis, 6 nodes) holds the Bloom filter of 10B suppressed addresses in ~12GB of memory. EventDB (DynamoDB, 64 partitions) stores delivery event logs for status queries and analytics.

Architecture Preview
Loading architecture preview...
Key Design Decisions
Async Pipeline with Per-Stage Kafka Queues

Choice

Three Kafka topics connecting submit, render, and send stages

Rationale

Each stage has different failure modes and scaling needs. Template rendering is CPU-bound (~10ms). SMTP delivery is I/O-bound (~50ms, variable by ISP). Suppression lookup is memory-bound (~1ms). Separate Kafka topics mean a slow ISP does not block rendering, and a render bug does not affect already-rendered emails. Each stage retries independently. This is how Amazon SES, Mailchimp, and SendGrid architect their pipelines.

202 Accepted Async Response

Choice

Return message_id immediately; deliver asynchronously via workers

Rationale

Synchronous SMTP blocks the caller for 50-500ms per email. At 165K/sec burst, synchronous delivery would require 82.5K concurrent SMTP connections just for I/O wait. Async decoupling drops API latency to ~15ms regardless of SMTP speed, letting the submit tier absorb traffic spikes. Clients poll for delivery status via the events endpoint.

Bloom Filter Suppression List (10B entries)

Choice

Redis-backed Bloom filter checked before every SMTP delivery

Rationale

A database lookup per email at 165K/sec would require 165K read IOPS — expensive and high-latency. A Bloom filter (12GB in Redis) gives O(1) lookup in ~1ms. The 0.1% false positive rate means occasionally suppressing a valid address, which is acceptable: the cost of sending to a suppressed address (IP blacklisting) far exceeds the cost of not sending to a valid one.

Recipient Domain Partitioning on RenderStream

Choice

Kafka partitioned by recipient_domain for per-ISP awareness

Rationale

All emails to gmail.com land on the same Kafka partitions, which means the same SendWorker instances process them. This enables per-ISP send rate awareness — if Gmail starts throttling (421 responses), the affected workers slow down while workers handling Yahoo and Outlook traffic continue at full speed.

Separate TemplateCache and SuppressionCache

Choice

Two Redis clusters: one for templates (~1GB), one for suppression (~12GB)

Rationale

Templates are small and read-heavy (100K templates, 98% hit rate). Suppression data is large (12GB Bloom filter) and write-heavy (new bounces constantly added). A single Redis cluster would force the Bloom filter's memory pressure to evict hot templates, degrading render performance. Separate clusters isolate failure domains.

Scale & Performance

Target RPS

12K/sec sustained, 165K/sec burst

Latency (p99)

<15ms API response, <5s end-to-end delivery

Storage

~2 TB/year (DynamoDB events + Redis caches)

Availability

99.9% (Kafka replay, per-stage retry)

Time & Space Complexity
OperationTimeSpaceNotes
Submit email (POST /api/v1/emails)O(1) validate + O(1) Kafka publish (~15ms total)O(1) per message in Kafka topicDecoupled from SMTP — caller gets 202 Accepted immediately. No SMTP blocking.
Render email (RenderWorker)O(T) template hydration where T = template variable count (~10ms)O(1) per rendered email (~4KB in RenderStream)CPU-bound. 40 workers at 10ms each = 4K/sec/worker = 160K/sec total.
Send email (SendWorker)O(1) suppression check + O(1) SMTP send (~51ms total)O(1) per delivery event in EventDBI/O-bound on SMTP. 80 workers at 50ms each = 128K/sec total. Kafka buffers overflow.
Suppression check (Bloom filter)O(k) where k = hash functions (~7 for 0.1% FPR), effectively O(1) at ~1msO(n) where n = 10B entries, ~12GB totalConstant-time per lookup. Memory-bound. No disk I/O.
Database Schema (HLD)
delivery_events (DynamoDB)

Delivery event log tracking the full lifecycle of every email: submitted, rendered, sent, delivered, bounced, suppressed. Partition key is message_id; sort key is event_timestamp for time-ordered retrieval. Write-heavy at 12K+ events/sec sustained. Reads are per-message lookups for status queries.

message_id VARCHAR (partition key, UUID)event_type VARCHAR (submitted/rendered/sent/bounced/delivered/suppressed)recipient VARCHAR (recipient email address)timestamp TIMESTAMPTZ (ISO event timestamp)details VARCHAR (bounce reason, SMTP code, etc.)

Partition: message_id

64 partitions x 3 replicas. Eventual consistency. Each email generates 3-5 events across pipeline stages.

template:{template_id} (Redis)

Cached email templates for fast hydration by RenderWorker. ~100K templates at ~10KB each = ~1GB working set. 98% hit rate since templates are reused thousands of times across recipients. 1-hour TTL ensures template updates propagate.

subject_template STRING (subject line with {{variables}})body_html STRING (HTML body with {{variables}})body_text STRING (plain text fallback, optional)

LRU eviction targets rarely-used campaign templates. High-frequency transactional templates stay hot.

suppress:{email_hash} (Redis Bloom Filter)

Bloom filter of 10B suppressed email addresses. Checked by SendWorker before every SMTP delivery. No TTL — suppressed addresses stay suppressed permanently. Written by bounce processing logic when hard bounces or spam complaints are received.

email_hash BLOOM_ENTRY (SHA-256 hash of lowercase email address)

~12GB for 0.1% false positive rate. 6-node Redis cluster. Write-heavy as new bounces are continuously added.

Solution Comparison
VariantTierLatencyThroughputCostComplexityReliability
Naive (Synchronous SMTP)T1200-500ms per send (SMTP blocking)~100-500 emails/sec$200/month (single DB + 5 pods)Low — 4 components, linear flow99% (no retry, no redundancy)
Queue-Based Pipeline (Kafka + Workers)T2<15ms API, <5s delivery (async)12K/sec sustained, 165K/sec burst$2,500/month (Kafka + workers + caches)Medium — 10 components, per-stage queues99.9% (Kafka replay, per-stage retry)
Multi-Stage Pipeline (IP Reputation + Webhooks)T3<15ms API, <5s transactional delivery12K/sec trans + 165K/sec bulk$5,000/month (dual streams, signing, webhooks)High — 12+ components, dual delivery paths99.9% (IP failover, auto-quarantine)

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
Why three Kafka topics instead of one?

Each pipeline stage has different retry semantics. A template render failure should retry at the render stage without re-submitting the email. An SMTP soft bounce should retry delivery without re-rendering. A single topic would force the entire email through all stages on every retry. Per-stage topics also provide backpressure isolation: if SMTP is slow during a campaign burst, rendered emails queue in RenderStream without affecting the submit or render stages.

Why is the Bloom filter acceptable despite false positives?

The 0.1% false positive rate means ~1 in 1,000 valid addresses may be incorrectly suppressed. At 1 billion emails per day, that is approximately 1 million emails not sent to valid recipients. However, the alternative — not checking suppression and sending to invalid addresses — results in ISP blacklisting that blocks ALL emails. A 0.1% suppression error is vastly preferable to 100% deliverability failure from blacklisting. Production systems add a DB confirmation lookup on Bloom-positive results to reduce false suppressions further.

How does the pipeline handle a 165K/sec campaign burst?

The burst flows through three layers of absorption. SubmitService (15 pods, 300K RPS capacity) accepts and publishes all 165K/sec to SubmitStream immediately. RenderWorker (40 instances, 160K/sec) keeps pace with slight queuing in SubmitStream. SendWorker (80 instances, 128K/sec) is the bottleneck — the 37K/sec gap between render and send throughput queues in RenderStream. A 100M-email campaign queues approximately 22M rendered emails in RenderStream, draining in ~3 minutes after rendering completes.

What happens when an ISP rate-limits the sender?

When an ISP like Gmail returns 421 (try again later), the SendWorker for that ISP's partition slows down. Because RenderStream is partitioned by recipient_domain, only the Gmail-bound workers are affected. Yahoo, Outlook, and other ISP workers continue at full speed. The Gmail emails queue up in their RenderStream partitions until the ISP lifts the rate limit. This is why domain-based partitioning matters.

Why does this variant lack IP reputation management?

This variant uses a single pool of sending IPs for all traffic. If a bulk campaign has a high bounce rate (bad recipient list), it degrades the shared IP pool's reputation, which also affects transactional emails (password resets, 2FA). The Pipeline variant solves this by separating transactional and bulk delivery paths with dedicated IP pools, per-IP reputation scoring, and automatic quarantine of degraded IPs.

How are hard vs soft bounces handled?

Hard bounces (550 — invalid address) immediately add the address to the suppression Bloom filter. Soft bounces (421 — temporary failure) trigger retry with exponential backoff at the SendWorker level — the email stays in RenderStream for re-consumption. After 3 soft bounces on the same address over 72 hours, the address is promoted to hard bounce status and added to the suppression list.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own Email Service?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator