Hard12 componentsInterview: Very High

E-Commerce Checkout — Distributed Saga + Outbox Pattern

Q: What is the outbox pattern and why is it necessary?

The outbox pattern writes an event to a local database outbox table in the same transaction as the business write. A separate relay process polls the outbox and publishes events to Kafka. This solves the dual-write problem: without it, a crash between the database write and the Kafka publish creates inconsistency (order exists but no payment event). With the outbox, the event is part of the database transaction — both succeed or both fail. The relay publishes with Kafka's idempotent producer for at-least-once delivery, and the consumer's idempotency check ensures exactly-once processing end-to-end.

Q: How does the TTL-based inventory reservation prevent deadlocks?

If the Saga Orchestrator crashes after sending ReserveInventory but before sending the compensation ReleaseInventory, the reserved stock would be permanently locked without TTL. The 5-minute TTL means the reservation automatically expires and stock is released by a background cron job — no intervention needed. This self-healing property is critical for production reliability. The 5-minute window is chosen to be longer than the expected saga completion time (3-5 seconds) but short enough to not impact flash-sale stock availability.

Q: How does the system handle a double-charge scenario?

Double-charges are prevented by two mechanisms: (1) the idempotency store ensures each ChargePayment command is processed exactly once, and (2) the Stripe API call includes an idempotency_key (derived from saga_id). Even if the PaymentService processes the same command twice (due to Kafka redelivery), both the Redis check and the Stripe idempotency guarantee prevent a second charge. If somehow a double-charge does occur (e.g., Redis failure + Stripe idempotency key mismatch), the DLQ captures the anomaly for manual resolution.

Q: What happens when the Saga Orchestrator itself crashes?

The Saga Orchestrator is stateless — saga state lives in PostgreSQL. When a new orchestrator pod starts, it does not need to recover in-flight sagas. Instead, it simply resumes consuming events from Kafka (from the last committed offset). If events arrive for sagas in unexpected states, the orchestrator checks the saga_state table and resumes from the correct step. TTL-based reservations ensure that sagas abandoned during the crash window self-heal within 5 minutes.

Q: When is this architecture overkill versus the simpler saga (v1)?

This architecture is overkill when: (1) you have fewer than 3 microservices (outbox overhead not justified), (2) traffic is under 5K orders/min (simpler saga handles this), (3) the team lacks distributed systems experience (12 components require strong ops capabilities), or (4) you don't need exactly-once guarantees (the simpler saga's 'at-least-once with compensation' is sufficient). For most e-commerce platforms under $10M/year in GMV, the v1 saga is simpler and adequate.

Q: How do you debug a failed saga across 4 services and 6 Kafka topics?

Distributed tracing is essential. Every command and event carries a correlation_id (the saga_id) and a trace_id (OpenTelemetry span). To debug a failed saga: (1) look up the saga_id in the orchestrator's saga_state table to see which step failed, (2) search Kafka topics for all messages with that saga_id to see the complete event sequence, (3) check the idempotency store to see if the failing command was partially processed, (4) check the DLQ for the failed message with full error details. Tools: Jaeger for distributed tracing, Kafka UI for topic inspection, PagerDuty for DLQ alerts.

Production-grade distributed saga orchestration for e-commerce checkout. Kafka command bus coordinates independent microservices (Inventory, Payment, Order), each with its own database. Outbox pattern ensures exactly-once event publishing. Idempotency store prevents double-processing. DLQ handles permanent failures.

Saga PatternOutboxMicroservicesDistributed Transactions

Try in Simulator

Problem Statement

The distributed saga with outbox pattern represents the most robust approach to e-commerce checkout — the architecture used by Amazon, Shopify, and other large-scale marketplaces when correctness, scalability, and fault tolerance are all non-negotiable. It exists because the simpler saga orchestrator (v1) has a subtle but critical flaw: the dual-write problem.

In the v1 approach, the Checkout Service writes to PostgreSQL (INSERT order) and then publishes to Kafka (order_placed event) in two separate steps. If the Kafka publish fails after the DB write, the order exists but no payment is triggered — a ghost order. If the service crashes between the two writes, the same thing happens. The outbox pattern solves this by writing the event to a local outbox table within the same database transaction as the business write, then a separate relay process polls the outbox and publishes to Kafka. This guarantees exactly-once event publishing: either both the business write and the event happen, or neither does.

The distributed saga extends this pattern across multiple independently deployed microservices. Each service (Inventory, Payment, Order) owns its own database — no shared state, no distributed locks. The Saga Orchestrator sends typed commands to Kafka topics, and each service processes commands, writes results to its local database + outbox, and the outbox relay publishes events back to Kafka. The orchestrator consumes these events and advances the saga state machine.

Compensation flows are the hardest part of this architecture. When payment fails, the orchestrator must send a ReleaseInventory command to reverse the reservation. The inventory service must process this command idempotently (the same release command might be delivered twice). Inventory reservations have a 5-minute TTL as a safety net — if the saga crashes and no compensation is sent, the reservation auto-releases. This self-healing property is essential for production reliability.

The idempotency store (Redis) tracks every processed command ID. Before processing any command, a service checks Redis: if the command_id exists, the command was already processed and is skipped. This makes every operation safely retryable — critical because Kafka guarantees at-least-once delivery, meaning services will occasionally receive duplicate commands during consumer rebalances or crash recovery.

This template is the most complex variant (12 components) and targets FAANG-level interview discussions. Candidates are expected to explain the outbox pattern, the compensation state machine, the idempotency strategy, the TTL-based reservation, and the DLQ handling. The simulation reveals how the system handles payment provider outages, inventory oversell races, and cascading failures across services.

Architecture Overview

The distributed saga architecture uses 12 components organized into four tiers: the edge layer (API Gateway), the orchestration layer (Saga Orchestrator + Kafka command bus), the service layer (Inventory, Payment, Order, Notification microservices), and the infrastructure layer (per-service databases + idempotency store + outbox relay + DLQ).

The API Gateway handles authentication (JWT, ~3ms), rate limiting (60K RPS cap), and routes checkout requests to the Saga Orchestrator. The orchestrator creates a saga record (saga_id, status=STARTED, state_machine_position) and publishes a ReserveInventory command to the Kafka checkout-commands topic. The HTTP response is 202 Accepted with the saga_id — the shopper polls for status asynchronously via GET /api/v1/saga/{id}/status.

The Kafka command bus is the central nervous system. It hosts 6 topics: checkout-commands (saga commands), inventory-events (reservation results), payment-events (charge results), order-events (order creation results), notification-commands (email/push triggers), and checkout-dlq (dead-letter queue for failed commands). Each topic has 16 partitions keyed by saga_id for ordering within a saga. Kafka's durability and replay capability mean no command is ever lost — even if a service is down for hours, commands queue and are processed when the service recovers.

The Inventory Service consumes ReserveInventory and ReleaseInventory commands. Before processing, it checks the idempotency store (Redis GET). For reservations, it creates a record in the reservations table with expires_at = now() + 5 minutes, and decrements available_count in the inventory table. Both operations happen within a single DB transaction that also INSERTs an event into the outbox table. The OutboxRelay polls the outbox and publishes InventoryReserved (or InsufficientStock) to Kafka. A background cron job releases expired reservations every 60 seconds.

The Payment Service consumes ChargePayment commands. After the idempotency check, it calls the external payment provider (Stripe/PayPal) with an idempotency key derived from the saga_id. Payment results (success/failure) are written to PaymentDB + outbox in a single transaction. The outbox relay publishes PaymentSucceeded or PaymentFailed to Kafka. Payment calls take 200-500ms but do not hold any database locks in other services — the saga orchestrator simply waits for the event.

The Order Service consumes CreateOrder commands and INSERTs the order record with status=CONFIRMED. On compensation (CancelOrder), it UPDATEs the status to CANCELLED.

The Saga Orchestrator consumes events from all service topics and advances the saga state machine: STARTED → (ReserveInventory) → INVENTORY_RESERVED → (ChargePayment) → PAYMENT_CHARGED → (CreateOrder) → ORDER_CREATED → (SendNotification) → COMPLETED. On failure events: INVENTORY_RESERVED → (PaymentFailed) → COMPENSATING → (ReleaseInventory) → COMPENSATED.

The DLQ captures commands that fail after 3 retries with exponential backoff. An operator reviews DLQ messages, fixes the underlying issue, and replays them. The DLQ prevents poison messages from blocking healthy processing.

Scaling is independent per service. InventoryService scales based on reservation command throughput. PaymentService scales based on external API call capacity (limited by Stripe's rate limits). OrderService scales based on write throughput. The Saga Orchestrator scales based on total saga volume. Kafka partitions can be increased for higher parallelism (up to 1 consumer per partition).

Architecture Preview

Loading architecture preview...

Open in Simulator

Saga Flow — Happy Path + Compensation

The distributed saga orchestrates a multi-service checkout through a sequence of commands and events flowing through Kafka. The happy path (reserve → pay → create order) takes ~3 seconds end-to-end. The compensation path (payment fails → release inventory) adds ~500ms. The key difference from the v1 saga is the outbox pattern: every service writes events to its local outbox table in the same DB transaction as the business write, and the outbox relay publishes to Kafka — eliminating the dual-write problem.

The compensation flow is the most important part of the diagram. When PaymentService publishes PaymentFailed, the Saga Orchestrator immediately sends ReleaseInventory. If the orchestrator crashes before sending the compensation, the 5-minute TTL on the reservation releases the stock automatically. This defense-in-depth approach ensures no inventory deadlocks.

Loading diagram...

Step-by-Step Walkthrough

1Shopper sends POST /checkout. The Saga Orchestrator creates a saga record (status=STARTED) and publishes a ReserveInventory command to Kafka. Returns 202 Accepted with saga_id in ~50ms
2InventoryService consumes the command, checks the idempotency store (Redis GET ~1ms), reserves stock in its database (INSERT reservation + UPDATE inventory in one transaction + INSERT outbox event). The OutboxRelay publishes InventoryReserved to Kafka
3The Saga Orchestrator consumes InventoryReserved, advances saga to INVENTORY_RESERVED, and publishes ChargePayment command
4PaymentService consumes ChargePayment, checks idempotency, calls Stripe with an idempotency_key (~350ms). Writes result to PaymentDB + outbox in one transaction. OutboxRelay publishes PaymentSucceeded or PaymentFailed
5On PaymentSucceeded: orchestrator publishes CreateOrder. OrderService creates the order record. OutboxRelay publishes OrderCreated. Orchestrator marks saga as COMPLETED. NotificationService sends confirmation email
6On PaymentFailed (COMPENSATION): orchestrator publishes ReleaseInventory. InventoryService deletes the reservation and restores available_count. Orchestrator marks saga as COMPENSATED
7If the orchestrator crashes during compensation, the 5-minute TTL on the inventory reservation automatically releases the stock — no intervention needed
8If any command fails after 3 retries, it routes to the DLQ for manual review. The saga remains in its current state until an operator resolves the issue

Pseudocode

// SAGA ORCHESTRATOR — State machine
class CheckoutSaga:
    states = {
        STARTED:            { command: "ReserveInventory",  next: "INVENTORY_RESERVED" },
        INVENTORY_RESERVED: { command: "ChargePayment",     next: "PAYMENT_CHARGED" },
        PAYMENT_CHARGED:    { command: "CreateOrder",       next: "ORDER_CREATED" },
        ORDER_CREATED:      { command: "SendNotification",  next: "COMPLETED" },
    }
    compensations = {
        PAYMENT_FAILED:     { command: "ReleaseInventory",  next: "COMPENSATED" },
        ORDER_FAILED:       { command: "RefundPayment",     next: "REFUND_PENDING" },
        REFUND_COMPLETED:   { command: "ReleaseInventory",  next: "COMPENSATED" },
    }

    async function onEvent(saga_id, event):
        saga = await db.get("saga_state", saga_id)

        if event.type in ["PaymentFailed", "OrderFailed"]:
            // Enter compensation path
            comp = compensations[event.type]
            await kafka.produce("checkout-commands", {
                command_type: comp.command, saga_id })
            await db.update("saga_state", saga_id, {
                status: "COMPENSATING", compensation_step: comp.next })
            return

        // Happy path: advance to next step
        current = states[saga.status]
        if event matches current.next:
            next_state = states[current.next]
            if next_state:
                await kafka.produce("checkout-commands", {
                    command_type: next_state.command, saga_id })
            await db.update("saga_state", saga_id, { status: current.next })

// INVENTORY SERVICE — With outbox pattern
async function handleReserveInventory(command):
    // Idempotency check
    if await redis.exists("idem:" + command.idempotency_key):
        return  // Already processed

    await db.transaction(async (tx) => {
        // Business write: reserve inventory
        for item in command.items:
            await tx.execute(
                "UPDATE inventory SET available = available - $1
                 WHERE sku_id = $2 AND available >= $1",
                [item.quantity, item.sku_id])
            await tx.execute(
                "INSERT INTO reservations (saga_id, sku_id, quantity, expires_at)
                 VALUES ($1, $2, $3, now() + interval '5 minutes')",
                [command.saga_id, item.sku_id, item.quantity])

        // Outbox write: event in same transaction
        await tx.execute(
            "INSERT INTO outbox (topic, partition_key, payload)
             VALUES ('inventory-events', $1, $2)",
            [command.saga_id, { event_type: "InventoryReserved", saga_id: command.saga_id }])
    })

    // Mark as processed
    await redis.setex("idem:" + command.idempotency_key, 86400, "1")

// OUTBOX RELAY — Polls and publishes
async function pollOutbox(database):
    events = await database.execute(
        "SELECT * FROM outbox WHERE published = FALSE
         ORDER BY created_at LIMIT 50")
    for event in events:
        await kafka.produce(event.topic, key=event.partition_key, value=event.payload)
        await database.execute(
            "UPDATE outbox SET published = TRUE, published_at = now()
             WHERE event_id = $1", [event.event_id])

Per-Service Database Schemas

Each microservice owns its own database — no shared tables, no cross-service JOINs. The outbox table appears in every database because every service uses the outbox pattern for exactly-once event publishing. The saga_state table in the orchestrator's database tracks the complete state machine for each checkout saga.

The reservation table is unique to the InventoryService and embodies the TTL-based reservation pattern. The expires_at column enables a cron job to automatically release expired reservations without any external trigger. This self-healing property makes the system resilient to orchestrator crashes and Kafka outages.

The payment table's idempotency_key column (UNIQUE constraint) provides database-level protection against double-charges, serving as a second line of defense after the Redis idempotency store.

Loading diagram...

Step-by-Step Walkthrough

1saga_state (Orchestrator DB): tracks the full state machine for each checkout. Status transitions: STARTED → INVENTORY_RESERVED → PAYMENT_CHARGED → ORDER_CREATED → COMPLETED. Compensation: COMPENSATING → COMPENSATED. Any pod can resume a saga by reading this table
2reservations (Inventory DB): TTL-based soft locks on inventory. expires_at = now() + 5 minutes. A cron job runs DELETE WHERE expires_at < now() every 60 seconds, automatically releasing abandoned reservations. This is the self-healing mechanism that prevents inventory deadlocks
3inventory (Inventory DB): stock counts. available_count is decremented on reservation. The UPDATE uses WHERE available_count >= quantity to prevent overselling — atomic check-and-decrement within a single SQL statement
4payments (Payment DB): payment records with a UNIQUE idempotency_key. If the same saga retries payment (due to Kafka redelivery), the UNIQUE constraint prevents a duplicate INSERT. Combined with Stripe's idempotency API, this provides two layers of double-charge protection
5orders (Order DB): only created after payment succeeds — no PENDING state. Status is CONFIRMED on creation. Can transition to CANCELLED only during compensation (rare edge case where order creation happens but a downstream step fails)
6outbox (all service DBs): the critical table that enables exactly-once event publishing. Written in the same DB transaction as the business write. Polled by the OutboxRelay every 100ms. The partial index on published=FALSE keeps poll queries fast

Key Design Decisions

Outbox Pattern for Exactly-Once Publishing

Choice

Write event to local outbox table in same DB transaction as business write, then relay to Kafka

Rationale

The dual-write problem: if a service writes to its database and then publishes to Kafka, a crash between the two steps creates inconsistency. The outbox pattern makes the event part of the database transaction — both succeed or both fail. The outbox relay polls and publishes with Kafka's idempotent producer, achieving exactly-once end-to-end. The cost is an additional polling process (OutboxRelay) and slightly higher end-to-end latency (100-200ms relay delay).

Saga Orchestration over Choreography

Choice

Centralized Saga Orchestrator manages the state machine

Rationale

With 4 services and 6 saga steps (3 forward + 3 compensation), choreography (each service reacts to events independently) makes the flow implicit and hard to debug. The orchestrator makes the flow explicit: one component knows the full state machine, handles all transitions, and can be unit-tested independently. The trade-off is a single point of coordination — but the orchestrator is stateless (saga state in PostgreSQL) and horizontally scalable.

TTL-Based Inventory Reservation

Choice

Reservations auto-expire after 5 minutes if the saga doesn't complete

Rationale

In a distributed system, the saga might not complete: the orchestrator crashes, Kafka has an outage, or the PaymentService is down for extended time. TTL-based reservation is self-healing — expired reservations automatically release stock without any compensation command. This prevents the inventory deadlock scenario where reserved stock is permanently unavailable. The 5-minute TTL balances between giving the saga time to complete and not holding stock too long.

Idempotency Store in Redis

Choice

Redis SET with 24h TTL for processed command IDs

Rationale

Kafka guarantees at-least-once delivery. After a consumer rebalance, a service may receive the same ChargePayment command twice. Without idempotency checks, the customer is double-charged. Redis stores processed command_ids with 24h TTL — before processing, each service checks Redis: if the ID exists, skip. After processing, SET the ID. This makes every operation safely retryable at the cost of one Redis GET per command (~1ms overhead).

Dead-Letter Queue for Permanent Failures

Choice

Route to DLQ after 3 retries with exponential backoff

Rationale

Infinite retries mask bugs: a command failing due to a code error (NullPointerException, schema mismatch) will never succeed no matter how many times you retry. After 3 attempts (1s, 5s, 25s delays), the command routes to the DLQ. An operator inspects it, identifies the root cause, fixes the code, and replays the DLQ. This separates transient failures (auto-retry) from permanent failures (human intervention).

Per-Service Databases

Choice

Each microservice owns its own PostgreSQL instance

Rationale

A shared database creates coupling: InventoryService schema changes could break OrderService queries. Per-service databases enforce strong service boundaries — each service can evolve its schema independently, use the optimal database configuration, and scale storage independently. The cost is that cross-service queries require API calls (no JOINs), and data consistency is eventual rather than immediate.

Scale & Performance

Target RPS

10K+ orders/min (55K total with browsing)

Latency (p99)

< 3s saga completion (end-to-end)

Storage

~1 TB/year across 3 service databases + Kafka

Availability

99.99% (per-service redundancy + Kafka durability + TTL self-healing)

Time & Space Complexity

Operation	Time	Space	Notes
Saga happy path (end-to-end)	O(k) for k items + O(1) per saga step	O(k) reservation records + O(1) per command	Total latency: ~3s (Kafka propagation ~5ms * 4 hops + inventory ~20ms + payment ~500ms + order ~20ms). k inventory reservations execute in parallel within the service.
Inventory reservation (single service)	O(k) — k reservation INSERTs + k inventory UPDATEs in one DB transaction	O(k) reservation rows with TTL	No row-level locks held beyond the DB transaction (~40ms). Compare with naive approach: locks held for 200-500ms.
Compensation path	O(k) to release k reservations + O(1) to refund payment	O(1) — compensation deletes reservation rows	Compensation adds 2 extra Kafka hops (~10ms) + service processing (~20ms). Total compensation: ~30ms on top of the failure detection time.
Outbox relay (per poll cycle)	O(b) where b = batch size (50)	O(b) Kafka produce batch	100ms poll interval. At 10K orders/min with 3 outbox events per order, the relay processes ~500 events/sec across 3 databases. Each poll: SELECT unpublished LIMIT 50 + Kafka batch produce.

Database Schema (HLD)

saga_state (Orchestrator DB)

Tracks the state machine position for each checkout saga. The orchestrator reads this table to determine the next command to send and updates it after each event is processed. States: STARTED, INVENTORY_RESERVED, PAYMENT_CHARGED, ORDER_CREATED, COMPLETED, COMPENSATING, COMPENSATED, FAILED.

saga_id UUID PKuser_id UUIDcart_items JSONBstatus TEXTcurrent_step TEXTcompensation_step TEXT (nullable)created_at TIMESTAMPTZupdated_at TIMESTAMPTZcompleted_at TIMESTAMPTZ (nullable)

Indexes: PK on saga_id, idx_status ON (status), idx_created ON (created_at DESC)

Saga completion time p99: ~3 seconds. Sagas in STARTED state for > 5 minutes indicate a stuck saga (alert). Average saga has 4 state transitions (STARTED → INVENTORY_RESERVED → PAYMENT_CHARGED → ORDER_CREATED → COMPLETED).

reservations (Inventory DB)

TTL-based inventory reservations. Each row represents a reserved quantity for a specific SKU in a specific saga. The expires_at column enables automatic release: a background cron job runs DELETE FROM reservations WHERE expires_at < now() every 60 seconds, restoring available_count for expired reservations.

reservation_id UUID PKsaga_id UUIDsku_id UUIDquantity INTEGERreserved_at TIMESTAMPTZexpires_at TIMESTAMPTZ

Indexes: PK on reservation_id, idx_saga ON (saga_id), idx_expires ON (expires_at)

TTL = 5 minutes (300 seconds). At 10K orders/min with 5 items per cart, ~50K active reservations at any time. Expired reservation cleanup runs every 60 seconds.

payments (Payment DB)

Payment records with idempotency protection. The idempotency_key (derived from saga_id) ensures that Stripe processes each payment exactly once even on retry. Status transitions: PENDING → SUCCESS or PENDING → FAILED. Refunds create a separate record in the refunds table linked by payment_id.

payment_id UUID PKsaga_id UUIDamount_cents BIGINTprovider TEXTprovider_txn_id TEXTidempotency_key TEXT UNIQUEstatus TEXTerror_code TEXT (nullable)created_at TIMESTAMPTZ

Indexes: PK on payment_id, idx_saga ON (saga_id), UNIQUE on idempotency_key

Idempotency_key = saga_id + step_number. Stripe returns cached result for duplicate idempotency keys within 24 hours. Payment provider timeout: 10 seconds (3 retries with exponential backoff before DLQ).

orders (Order DB)

Order records created by the Order Service when the saga reaches the ORDER_CREATED step. Status is CONFIRMED on creation (payment already succeeded at this point). Can transition to CANCELLED if the saga compensates after order creation (rare edge case).

order_id UUID PKsaga_id UUIDuser_id UUIDitems JSONBtotal_cents BIGINTshipping_address JSONBstatus TEXTcreated_at TIMESTAMPTZ

Indexes: PK on order_id, idx_saga ON (saga_id), idx_user ON (user_id, created_at DESC)

Order is only created after payment succeeds — no PENDING state in this variant (unlike v1 where orders are created before payment). Status: CONFIRMED → SHIPPED → DELIVERED, or CONFIRMED → CANCELLED.

outbox (per-service — Inventory, Payment, Order DBs)

Each service's database has an outbox table for exactly-once event publishing. When a service processes a command, it writes the business result AND an outbox event in the same DB transaction. The OutboxRelay polls this table and publishes to Kafka. After successful publish, the row is marked as published.

event_id UUID PKtopic TEXTpartition_key TEXTpayload JSONBcreated_at TIMESTAMPTZpublished BOOLEAN DEFAULT FALSEpublished_at TIMESTAMPTZ (nullable)

Indexes: idx_unpublished ON (published, created_at) WHERE published = FALSE

OutboxRelay polls every 100ms with batch size 50. Unpublished events older than 5 minutes trigger an alert. The partial index on unpublished=FALSE makes the poll query efficient even with millions of historical rows.

Event Contracts

ReserveInventory Commandcheckout-commands

Published by the Saga Orchestrator to request inventory reservation. Consumed by InventoryService.

Key Schema

saga_id (STRING)

Value Schema

{ command_type: 'ReserveInventory', saga_id, items: [{sku_id, quantity}], idempotency_key }

InventoryReserved Eventinventory-events

Published by InventoryService (via outbox) when reservation succeeds. Consumed by Saga Orchestrator to advance to payment step.

Key Schema

saga_id (STRING)

Value Schema

{ event_type: 'InventoryReserved', saga_id, reservation_id, items: [{sku_id, quantity, reserved_until}] }

PaymentSucceeded / PaymentFailed Eventpayment-events

Published by PaymentService (via outbox) after charging or failing to charge. On success, saga advances to order creation. On failure, saga enters compensation.

Key Schema

saga_id (STRING)

Value Schema

{ event_type: 'PaymentSucceeded' | 'PaymentFailed', saga_id, payment_id?, transaction_id?, error_code?, error_message? }

OrderCreated Eventorder-events

Published by OrderService (via outbox) when the order record is persisted. Consumed by Saga Orchestrator to finalize the saga and trigger notification.

Key Schema

saga_id (STRING)

Value Schema

{ event_type: 'OrderCreated', saga_id, order_id, user_id, total_cents }

Dead-Letter Queuecheckout-dlq

Commands that failed after 3 retries with exponential backoff. Contains the original command, the error details, and retry metadata. Reviewed by operators for manual resolution.

Key Schema

saga_id (STRING)

Value Schema

{ original_command, error_code, error_message, retry_count: 3, first_failure_at, last_failure_at }

What-If Scenarios

Payment provider timeout mid-saga — Stripe does not respond within 10 seconds

Impact

PaymentService retries 3 times with exponential backoff (1s, 5s, 25s). If all retries fail, the command routes to the DLQ. The saga remains in INVENTORY_RESERVED state. The inventory reservation TTL (5 minutes) eventually releases the stock. The customer sees 'Payment processing' for ~30s, then the saga times out and the customer is notified of the failure.

Mitigation

Circuit breaker on PaymentService stops calling Stripe after 5 consecutive timeouts. Sagas entering INVENTORY_RESERVED while the circuit is open immediately get a PaymentFailed event, triggering fast compensation instead of waiting for the full retry cycle.

Inventory oversell race — two sagas reserve the last unit simultaneously

Impact

Cannot happen. Each InventoryService instance processes commands sequentially per Kafka partition (saga_id determines partition). Two sagas for different saga_ids may process in parallel on different partitions, but they access different reservation records. If both target the same SKU, the database UPDATE (decrement available_count) uses a WHERE available_count >= quantity check — the second transaction fails and the saga compensates.

Mitigation

PostgreSQL's MVCC ensures the check-and-decrement is atomic within each transaction. The InventoryService returns InsufficientStock, and the saga orchestrator sends compensation commands.

Outbox relay goes down for 5 minutes

Impact

Events from InventoryService, PaymentService, and OrderService accumulate in their outbox tables. The Saga Orchestrator stops receiving events and sagas stall in their current state. Inventory reservations do NOT expire during this window (TTL is 5 minutes, matching the relay downtime). When the relay recovers, it drains all pending outbox events in order.

Mitigation

OutboxRelay runs with 3 replicas. If one dies, another takes over within seconds. Health check on relay latency — alert if no events published in 30 seconds.

Flash sale thundering herd — 10,000 users checkout simultaneously

Impact

API Gateway rate limits at 60K RPS — all 10,000 requests pass. Saga Orchestrator creates 10,000 sagas and publishes 10,000 ReserveInventory commands to Kafka. InventoryService (8 workers) processes ~500 commands/sec per worker = 4,000 commands/sec. All 10,000 reservations complete within 2.5 seconds. Payment processing (the slow step) creates a Kafka consumer lag, but payments process at their own pace without blocking new checkouts.

Mitigation

Kafka naturally absorbs the burst — commands queue in the topic and are processed at the service's sustained rate. No back-pressure propagates to the shopper. The 202 Accepted response returns in ~50ms regardless of the queue depth.

Failure Modes & Resilience

Component	Failure	Impact	Mitigation
Saga Orchestrator	Pod crash mid-saga	In-flight sagas stall until another pod resumes consuming events. Saga state in PostgreSQL ensures no duplicate processing.	Multiple orchestrator pods. Kafka consumer group rebalances partitions to surviving pods in <30s. TTL reservations self-heal abandoned sagas.
Kafka Command Bus	Partition leader offline	Commands for affected partitions queue on producers (up to buffer limit). Services processing other partitions continue normally.	Replication factor 3. ISR (in-sync replicas) elect new leader in <5s. Multi-AZ broker deployment.
Outbox Relay	Relay process crash	Events accumulate in outbox tables. Sagas stall. No data loss — events publish when relay recovers.	3 relay replicas with leader election. Health alert if unpublished events > 100. Manual relay restart as escalation.
Idempotency Store (Redis)	Cache failure	Services cannot check idempotency. Duplicate command processing possible (double-charge, double-reservation).	Redis cluster with replication. Failover in <1s. If Redis is down, services fall back to database-level idempotency checks (slower but correct).
Per-Service Database (e.g., Payment DB)	Database failure	Only PaymentService affected. Other services continue. Sagas with in-flight payments stall until recovery.	RDS Multi-AZ failover (~60s). TTL reservations self-heal sagas stuck waiting for payment. DLQ captures commands that fail during DB outage.

Scaling Strategy

Independent scaling per service. Saga Orchestrator: auto-scale on Kafka consumer lag (target: lag < 100 events). InventoryService: auto-scale on reservation command throughput (target: < 50ms processing time). PaymentService: scale based on Stripe rate limits (typically 100 TPS per account — add Stripe accounts for higher limits). OrderService: auto-scale on write throughput. Kafka: add partitions for higher parallelism (max consumers = partitions). Bottleneck analysis: PaymentService is typically the bottleneck due to external API latency (200-500ms). At 10K orders/min, 6 PaymentWorkers each processing ~28 payments/sec (1000ms / 350ms per payment) handle the load. Scale to 12 workers for 20K orders/min.

Monitoring & Alerting

Key metrics: (1) Saga completion rate — alert if < 99.5% (normal: 99.9%), (2) Saga completion time p99 — alert if > 10s (normal: 3s), (3) Outbox relay lag per service — alert if unpublished events > 100, (4) Kafka consumer lag per service — alert if > 1000 events, (5) Idempotency store hit rate — alert if > 1% (indicates duplicate commands), (6) DLQ message count — alert if > 0 (every DLQ message requires human review), (7) Inventory reservation expiration rate — alert if > 0.5% (indicates saga completion issues), (8) Payment provider success rate — alert if < 98%. Dashboard: saga state flow diagram with real-time counts per state, Kafka topic throughput and consumer lag, per-service health with DB connection pool and outbox queue depth, DLQ message list with age and error details.

Cost Analysis

12-component architecture: API Gateway (~$30/month), 8 Saga Orchestrator pods 4vCPU/8GB (~$800/month), Kafka MSK m7g.xlarge (~$400/month), 8 InventoryService workers (~$200/month), 6 PaymentService workers (~$150/month), 4 OrderService workers (~$100/month), 2 NotificationService Lambda (~$10/month), 3 PostgreSQL RDS r7g.large (~$750/month), Redis r7g.large (~$150/month), 3 OutboxRelay workers (~$75/month). Total: ~$2,800/month. About 3x the naive approach and 1.5x the v1 saga, but handles 100x more checkout throughput than naive and 4x more than v1. Cost per order: ~$0.00047 at 10K orders/min sustained. The operational complexity cost (on-call, monitoring, debugging) is the dominant expense — estimated at $5K-10K/month in engineering time for a team of 3-4.

Security Considerations

PCI compliance scope is minimized: only the PaymentService handles payment tokens. It runs in an isolated VPC subnet with no public internet access except to Stripe/PayPal API endpoints. Kafka messages carrying payment tokens are encrypted at rest (MSK encryption) and in transit (TLS). The idempotency store does NOT store payment data — only command IDs. The outbox tables in PaymentDB contain payment amounts but not raw card data (Stripe tokenization). The Saga Orchestrator sees the payment_method token but does not process it directly. Security audit scope: PaymentService + PaymentDB + Kafka payment-events topic. All other components are out of PCI scope.

Deployment Strategy

Independent service deployments with compatibility guarantees. Each service can be deployed independently as long as Kafka message schemas are backward-compatible (new fields are optional, no field removals). Deployment order for breaking changes: (1) deploy consumers first (backward-compatible), (2) deploy producers with new schema, (3) remove old schema support after all consumers are updated. Kafka topic schema registry enforces compatibility. Per-service deployment: ECS rolling update with connection draining. Outbox relay: graceful shutdown (finish current batch before stopping). Database migrations: forward-compatible schema changes (ADD COLUMN, CREATE INDEX CONCURRENTLY) applied before service deploy.

Real-World Examples

•Amazon's order processing pipeline uses saga orchestration with step functions coordinating inventory, payment, and fulfillment microservices
•Shopify's checkout API uses the outbox pattern with Kafka for exactly-once event publishing across their microservice mesh
•Uber uses saga orchestration for ride completion: driver payment, rider charge, and trip record creation — each in its own service with compensation
•Square's payment processing uses idempotency keys on every API call to prevent double-charges during network retries

Solution Comparison

Variant	Tier	Latency	Throughput	Cost	Complexity	Reliability
Naive (Monolith + ACID)	T1	800ms-3s p99	~100 orders/min	~$925/month	Low	99% (single DB)
Saga Orchestrator	T2	200-500ms p99	~2,500 orders/min	~$1,800/month	Medium	99.9%
Distributed Saga + Outbox	T4	< 3s end-to-end	10K+ orders/min	~$2,800/month	Very High	99.99%

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions

What is the outbox pattern and why is it necessary?

The outbox pattern writes an event to a local database outbox table in the same transaction as the business write. A separate relay process polls the outbox and publishes events to Kafka. This solves the dual-write problem: without it, a crash between the database write and the Kafka publish creates inconsistency (order exists but no payment event). With the outbox, the event is part of the database transaction — both succeed or both fail. The relay publishes with Kafka's idempotent producer for at-least-once delivery, and the consumer's idempotency check ensures exactly-once processing end-to-end.

How does the TTL-based inventory reservation prevent deadlocks?

If the Saga Orchestrator crashes after sending ReserveInventory but before sending the compensation ReleaseInventory, the reserved stock would be permanently locked without TTL. The 5-minute TTL means the reservation automatically expires and stock is released by a background cron job — no intervention needed. This self-healing property is critical for production reliability. The 5-minute window is chosen to be longer than the expected saga completion time (3-5 seconds) but short enough to not impact flash-sale stock availability.

How does the system handle a double-charge scenario?

Double-charges are prevented by two mechanisms: (1) the idempotency store ensures each ChargePayment command is processed exactly once, and (2) the Stripe API call includes an idempotency_key (derived from saga_id). Even if the PaymentService processes the same command twice (due to Kafka redelivery), both the Redis check and the Stripe idempotency guarantee prevent a second charge. If somehow a double-charge does occur (e.g., Redis failure + Stripe idempotency key mismatch), the DLQ captures the anomaly for manual resolution.

What happens when the Saga Orchestrator itself crashes?

The Saga Orchestrator is stateless — saga state lives in PostgreSQL. When a new orchestrator pod starts, it does not need to recover in-flight sagas. Instead, it simply resumes consuming events from Kafka (from the last committed offset). If events arrive for sagas in unexpected states, the orchestrator checks the saga_state table and resumes from the correct step. TTL-based reservations ensure that sagas abandoned during the crash window self-heal within 5 minutes.

When is this architecture overkill versus the simpler saga (v1)?

This architecture is overkill when: (1) you have fewer than 3 microservices (outbox overhead not justified), (2) traffic is under 5K orders/min (simpler saga handles this), (3) the team lacks distributed systems experience (12 components require strong ops capabilities), or (4) you don't need exactly-once guarantees (the simpler saga's 'at-least-once with compensation' is sufficient). For most e-commerce platforms under $10M/year in GMV, the v1 saga is simpler and adequate.

How do you debug a failed saga across 4 services and 6 Kafka topics?

Distributed tracing is essential. Every command and event carries a correlation_id (the saga_id) and a trace_id (OpenTelemetry span). To debug a failed saga: (1) look up the saga_id in the orchestrator's saga_state table to see which step failed, (2) search Kafka topics for all messages with that saga_id to see the complete event sequence, (3) check the idempotency store to see if the failing command was partially processed, (4) check the DLQ for the failed message with full error details. Tools: Jaeger for distributed tracing, Kafka UI for topic inspection, PagerDuty for DLQ alerts.

Related Templates

E-Commerce Checkout — Naive (Monolith + ACID)E-Commerce Checkout — Saga Orchestrator

Discussion

Ready to design your own E-Commerce Checkout?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator