Vetora logo
Medium8 componentsInterview: Very High

E-Commerce Checkout — Saga Orchestrator (Event-Driven)

Industry-standard saga orchestration pattern for e-commerce checkout. Checkout Service coordinates inventory reservation (Redis DECR), order persistence (PostgreSQL), and async payment processing (Kafka worker). Compensating transactions handle payment failures.

Saga PatternKafkaEvent-DrivenE-Commerce
Problem Statement

The saga orchestration pattern for e-commerce checkout exists because of a fundamental insight: payment processing should never happen inside a database transaction. The naive approach holds database row locks for 200-500ms while calling Stripe — this is the anti-pattern that kills scalability. The saga pattern separates the checkout flow into discrete, independently committable steps coordinated by an orchestrator, with compensating transactions for rollback.

When a shopper clicks 'Place Order,' the Checkout Service orchestrates a saga: (1) atomically reserve inventory via Redis DECR (sub-millisecond, no database locks), (2) persist the order to PostgreSQL (INSERT, 50ms), (3) publish an order_placed event to Kafka (5ms). The payment is processed asynchronously by a PaymentWorker that consumes the event. If payment fails, a compensating transaction releases the inventory reservation (Redis INCR). The shopper sees 'Order Placed' immediately — payment confirmation arrives via email within seconds.

This pattern is the industry standard used by Amazon (order pipeline), Shopify (checkout API), and most production e-commerce platforms. It appears in virtually every system design interview for e-commerce roles because it demonstrates several key distributed systems concepts: eventual consistency (order is 'pending' until payment completes), compensating transactions (inventory release on payment failure), idempotent operations (keyed by order_id to prevent double-charging), and event-driven architecture (Kafka as the coordination backbone).

The specific design decisions in this template — Redis for atomic inventory counts, Kafka for async payment, CDN for static assets — represent the standard mid-scale approach. Each decision addresses a specific limitation of the naive variant: Redis DECR replaces SELECT ... FOR UPDATE (no row locks), Kafka workers replace synchronous payment calls (no lock hold during payment), and CDN offloads 90% of browsing traffic from the origin servers.

The primary trade-off is eventual consistency. Between the Kafka publish and the PaymentWorker's processing, the order is in a 'pending' state — inventory is reserved but payment has not been attempted. If the PaymentWorker crashes or Kafka has a brief outage, orders accumulate in 'pending' state until processing resumes. For most e-commerce applications, this 1-5 second window of eventual consistency is acceptable — the shopper gets immediate feedback, and the payment confirmation follows shortly.

Architecture Overview

The saga-based e-commerce checkout uses 8 components organized into three tiers: the edge layer (CDN + API Gateway + Load Balancer), the application layer (Checkout Service + Product Service), and the data/async layer (Redis caches + PostgreSQL databases + Kafka + workers).

The edge layer handles the read-heavy browsing traffic. 90% of product browsing hits the CDN (CloudFront), which serves static assets (images, CSS, JS) with an 86,400-second TTL. The remaining 10% passes through the API Gateway (JWT auth, rate limiting at 60K RPS) and the Load Balancer to the application services. This three-layer edge reduces origin traffic by an order of magnitude during flash sales.

The Checkout Service is the saga orchestrator. On POST /api/v1/checkout, it executes a three-step saga: (1) DECR inventory counts in Redis for each line item (atomic, sub-millisecond — no database locks), (2) INSERT the order record into PostgreSQL within a local transaction, (3) publish an order_placed event to Kafka. Each step commits independently. If step 2 fails, step 1 is compensated (INCR Redis). If step 3 fails, step 2 is compensated (UPDATE order status to FAILED). The saga completes in under 100ms — a 10x improvement over the naive approach's 800ms.

The Product Service handles catalog reads. It maintains a separate Redis cache (ProductCache) with 85% hit rate and its own PostgreSQL database (ProductDB). This separation means flash-sale checkout traffic does not compete with product browsing for database resources — a critical improvement over the monolith.

Async payment processing happens via Kafka. The PaymentWorker consumes order_placed events and calls Stripe/PayPal (200-500ms). Payment is idempotent (keyed by order_id) — if the worker crashes and replays the event, Stripe returns the original result without creating a new charge. On success, the worker updates the order status to PAID. On failure, it publishes a payment_failed event, which triggers the compensation: InventoryCache INCR (release reservation) and order status update to PAYMENT_FAILED.

The NotificationWorker consumes the same order_placed events and sends confirmation emails. It is low-priority — if it falls behind, order confirmations are delayed but checkout is unaffected. This decoupling is a key benefit of the event-driven architecture: adding new downstream consumers (analytics, fraud detection, recommendation engine) requires no changes to the Checkout Service.

Horizontal scaling is straightforward: add more Checkout Service pods for higher checkout throughput, add Kafka partitions and PaymentWorker instances for higher payment throughput, add Product Service pods for higher browsing capacity. Each scales independently. The CDN absorbs traffic spikes automatically.

Architecture Preview
Loading architecture preview...
Checkout Saga Flow — Happy Path + Compensation

The saga splits the monolithic transaction into three independently committable steps: inventory reservation (Redis DECR), order persistence (PostgreSQL INSERT), and event publishing (Kafka produce). Payment happens asynchronously outside the checkout response path. The compensation path reverses the reservation if payment fails.

The key insight is the dramatic latency improvement: the checkout response completes in ~60ms (DECR + INSERT + produce) versus ~800ms in the naive approach. The 200-500ms payment processing happens in the background — the shopper sees 'Order Placed' immediately and receives payment confirmation via email within seconds.

Loading diagram...

Step-by-Step Walkthrough

  1. 1Shopper sends POST /checkout — the request flows through CDN (cache miss for POST), API Gateway (JWT auth ~3ms), Load Balancer (~1.5ms), to the Checkout Service
  2. 2Checkout Service executes Redis DECR on each SKU's inventory counter. This is atomic and sub-millisecond — no database locks. If DECR returns negative, the SKU is out of stock and the saga compensates immediately (INCR back)
  3. 3On successful reservation, the Checkout Service INSERTs the order record into PostgreSQL (status=PENDING) and UPDATEs the inventory table (source of truth). This is a local transaction — fast, no contention with other services
  4. 4The Checkout Service publishes an order_placed event to Kafka (~5ms produce with acks=1). This completes the checkout saga's synchronous steps
  5. 5The shopper receives HTTP 202 Accepted with the order_id in ~60ms total — a 13x improvement over the naive approach's 800ms
  6. 6ASYNC: PaymentWorker consumes the order_placed event and calls Stripe (200-500ms). Payment is idempotent (keyed by order_id) — safe to retry on failure
  7. 7On payment success, the worker UPDATEs the order status to PAID. On failure, it executes the compensation: INCR Redis (release reservation) + UPDATE order status to PAYMENT_FAILED
  8. 8NotificationWorker consumes the same event and sends the order confirmation email — completely decoupled from the checkout path

Pseudocode

// CHECKOUT SAGA — Fast synchronous path (~60ms)
async function handleCheckout(cart_items, payment_method):
    saga_steps_completed = []

    try:
        // Step 1: Reserve inventory in Redis (atomic, sub-ms)
        for item in cart_items:
            remaining = await redis.decr("inv:" + item.sku_id)
            if remaining < 0:
                await redis.incr("inv:" + item.sku_id)  // Compensate
                throw OutOfStockError(item.sku_id)
            saga_steps_completed.push({ type: "RESERVE", sku_id: item.sku_id })

        // Step 2: Persist order + update inventory (local DB txn)
        order = await db.transaction(async (tx) => {
            order = await tx.insert("orders", { items: cart_items, status: "PENDING" })
            for item in cart_items:
                await tx.execute(
                    "UPDATE inventory SET reserved = reserved + $1 WHERE sku_id = $2",
                    [item.quantity, item.sku_id])
            return order
        })   // ~50ms
        saga_steps_completed.push({ type: "ORDER", order_id: order.id })

        // Step 3: Publish event to Kafka (triggers async payment)
        await kafka.produce("order_placed", {
            order_id: order.id, items: cart_items,
            payment_method, total: cart_total
        })   // ~5ms

        return { status: 202, order_id: order.id }  // ~60ms total

    catch (error):
        // Compensate all completed steps in reverse order
        for step in saga_steps_completed.reverse():
            if step.type == "RESERVE":
                await redis.incr("inv:" + step.sku_id)
            if step.type == "ORDER":
                await db.execute("UPDATE orders SET status='FAILED' WHERE id=$1", [step.order_id])
        throw error

// ASYNC PAYMENT — Background worker (200-500ms, not in checkout path)
async function processPayment(event):
    order = event.payload
    payment = await stripe.charge({
        amount: order.total,
        payment_method: order.payment_method,
        idempotency_key: order.order_id  // Prevents double-charge
    })
    if payment.success:
        await db.execute("UPDATE orders SET status='PAID', payment_id=$1 WHERE id=$2",
            [payment.id, order.order_id])
    else:
        // COMPENSATION: release inventory + mark order failed
        for item in order.items:
            await redis.incr("inv:" + item.sku_id)
        await db.execute("UPDATE orders SET status='PAYMENT_FAILED' WHERE id=$1",
            [order.order_id])
Data Model — Redis + PostgreSQL

The data model splits across two storage technologies. Redis holds real-time inventory counters for atomic DECR/INCR (fast path). PostgreSQL holds durable order, inventory, and product records (source of truth). The separation is a CQRS-like pattern: Redis is optimized for the write-heavy reservation path, PostgreSQL is optimized for the read-heavy order status and product browsing paths.

The Kafka topics act as the coordination layer between the synchronous checkout path and the asynchronous payment path. The order_placed topic is the 'handoff' that transfers responsibility from the Checkout Service to the PaymentWorker.

Loading diagram...

Step-by-Step Walkthrough

  1. 1Redis inventory counters (inv:{sku_id}) provide atomic DECR/INCR for sub-millisecond reservation. Warmed from PostgreSQL on startup with 600s TTL refresh
  2. 2PostgreSQL inventory table is the durable source of truth. Updated in the same local transaction as the order INSERT. The reserved_count field tracks items in pending sagas
  3. 3Orders table stores checkout results with saga-driven status transitions: PENDING (created during checkout), PAID (updated by PaymentWorker), or PAYMENT_FAILED (compensation). The nullable payment_id is populated only after successful payment
  4. 4Kafka order_placed topic is the asynchronous handoff between checkout (synchronous) and payment (asynchronous). Partitioned by order_id for ordering guarantees per order
  5. 5The data flow is: Redis DECR → PostgreSQL INSERT → Kafka publish (synchronous checkout) → Kafka consume → Stripe charge → PostgreSQL UPDATE (asynchronous payment)
Key Design Decisions
Redis DECR for Inventory Reservation

Choice

Atomic DECR in Redis instead of SELECT ... FOR UPDATE in PostgreSQL

Rationale

Redis DECR is O(1) and completes in sub-millisecond — no row-level locks, no transaction contention. At 2,500 checkouts/sec with 5 items per cart, that is 12,500 DECR/sec, well within Redis capacity. The PostgreSQL database serves as the source of truth (updated in the same saga step), while Redis provides the fast-path for atomic reservations. The trade-off: Redis and PostgreSQL can briefly disagree on stock counts — an acceptable eventual consistency window.

Async Payment via Kafka Worker

Choice

Payment processing in a background worker consuming Kafka events

Rationale

Moving payment from the checkout hot path to an async worker eliminates the 200-500ms lock hold time that makes the naive approach fail. The checkout response time drops from 800ms to under 200ms. The shopper sees 'Order Placed' immediately. The trade-off is that order status is 'pending' until the PaymentWorker processes the event (1-5 seconds). For e-commerce, this is a well-understood pattern — Amazon shows 'Order Placed' before payment is confirmed.

Compensating Transactions for Failure Handling

Choice

INCR Redis to release inventory on payment failure

Rationale

In a saga, each step commits independently. If payment fails, the inventory reservation must be explicitly released. Redis INCR reverses the DECR. This is simpler than the naive approach (which rolls back the entire transaction) but requires explicit compensation logic for every step. Missing a compensation creates an inventory leak — reserved stock that is never released.

CDN for Static Assets

Choice

CloudFront CDN with 90% cache hit rate for product images/CSS/JS

Rationale

During flash sales, 90% of traffic is browsing (product images, category pages). CDN edge caching offloads 27K+ RPS from the origin, reducing API Gateway and service load by 10x. Without CDN, the backend would need 10x more capacity. The 86,400-second TTL works because static assets are versioned (hash-based filenames).

Separate Product and Checkout Services

Choice

Two independent services with their own databases and caches

Rationale

In the monolith, flash-sale checkout traffic starved product browsing. Separate services mean checkout can monopolize its Redis and PostgreSQL without affecting product queries. Each service scales independently: 8 Checkout Service pods for checkout throughput, 6 Product Service pods for browsing capacity.

Scale & Performance

Target RPS

2,500 checkouts/sec (50K total with browsing)

Latency (p99)

< 200ms checkout, < 500ms payment (async)

Storage

~500 GB/year (inventory + orders + products)

Availability

99.9% (Redis replication + Kafka durability + multi-pod services)

Time & Space Complexity
OperationTimeSpaceNotes
Checkout saga (happy path)O(k) where k = cart items (k DECR operations)O(1) per saga stepEach DECR is O(1) in Redis. Total: k * 1ms + 50ms DB INSERT + 5ms Kafka produce ≈ 60ms for a 5-item cart.
Payment processing (async)O(1) per order — single Stripe API callO(1) per payment200-500ms external call but NOT in the checkout response path. Worker processes independently.
Compensation (payment failure)O(k) — k INCR operations to release inventoryO(1)Compensation mirrors the reservation: one INCR per SKU in the cart. Total: k * 1ms ≈ 5ms for a 5-item cart.
Database Schema (HLD)
inventory (Redis)

Real-time inventory counts stored in Redis for atomic DECR/INCR operations. Each SKU has a counter key (inv:{sku_id}) with the current available stock. DECR for reservation, INCR for compensation. The PostgreSQL inventory table is the durable source of truth, updated in the same saga step.

key: inv:{sku_id}value: INTEGER (stock count)TTL: 600 seconds (warm from DB)

Redis DECR is atomic and O(1) — no locks, no contention. If Redis returns negative after DECR, the reservation failed (out of stock) and the saga compensates immediately.

orders (PostgreSQL)

Order records with saga-driven status transitions. Created with status=PENDING during checkout, updated to PAID by PaymentWorker on success, or PAYMENT_FAILED on failure. Unlike the naive approach, orders exist in intermediate states during the saga.

order_id UUID PKuser_id UUIDitems JSONBtotal_cents BIGINTstatus TEXTpayment_id UUID (nullable)created_at TIMESTAMPTZ

Indexes: PK on order_id, idx_user_orders ON (user_id, created_at DESC), idx_status ON (status)

Status transitions: PENDING → PAID → SHIPPED → DELIVERED, or PENDING → PAYMENT_FAILED. The PENDING state is new vs. the naive approach (which has no intermediate states).

inventory (PostgreSQL)

Durable inventory records. Updated in the same saga step as the Redis DECR. Contains stock_count (total available) and reserved_count (pending payment). The difference (stock_count - reserved_count) is the true available stock.

sku_id UUID PKstock_count INTEGERreserved_count INTEGERupdated_at TIMESTAMPTZ

Indexes: PK on sku_id

Reserved_count tracks items reserved in pending sagas. A background job releases reservations for orders that stay in PENDING > 30 minutes (abandoned checkout safety net).

Event Contracts
Order Placed Eventorder_placed

Published by Checkout Service after successful inventory reservation + order INSERT. Consumed by PaymentWorker (charge payment) and NotificationWorker (send confirmation email).

Key Schema

order_id (STRING)

Value Schema

{ order_id, user_id, items: [{sku_id, quantity, price_cents}], total_cents, payment_method, created_at }

Payment Result Eventpayment_result

Published by PaymentWorker after charging payment. On success, Checkout Service updates order status to PAID. On failure, triggers compensation: Redis INCR (release inventory) + order status to PAYMENT_FAILED.

Key Schema

order_id (STRING)

Value Schema

{ order_id, status: 'success' | 'failed', transaction_id, error_code?, error_message? }

What-If Scenarios

Payment provider (Stripe) has a 30-second outage

Impact

PaymentWorker backs up — orders accumulate in 'pending' state. Inventory remains reserved in Redis. Shoppers see 'Order Placed' but payment confirmation is delayed. After recovery, the worker drains the backlog and processes all pending payments. No orders are lost.

Mitigation

Circuit breaker on the PaymentWorker stops calling Stripe after 5 consecutive failures. Orders in pending state are retried when the circuit closes. Shoppers are notified of the delay via NotificationWorker.

Redis cache crashes and loses all inventory counts

Impact

DECR operations fail — all checkout attempts fail with 'service unavailable.' Product browsing continues (served from CDN + ProductCache). On Redis recovery, inventory counts are warmed from PostgreSQL (takes ~5 seconds for 10K SKUs). During the 5-second warmup, checkouts are blocked.

Mitigation

Redis cluster with 3 replicas (automatic failover in <1 second). Warmup from PostgreSQL on startup with single_flight to prevent thundering herd on cache miss.

Kafka partition leader rebalance during flash sale

Impact

PaymentWorker loses its assigned partitions for 30-60 seconds during rebalance. Events accumulate in Kafka but are not lost. After rebalance, the new consumer resumes from the last committed offset. Orders in 'pending' state for 30-60 seconds longer than normal.

Mitigation

Kafka consumer group with cooperative-sticky assignor minimizes rebalance duration. Multiple PaymentWorker instances ensure other partitions continue processing during rebalance.

Flash sale: 5,000 users checkout the same limited SKU (100 units available)

Impact

The first 100 Redis DECR operations succeed (stock goes from 100 to 0). The next 4,900 DECR operations return negative — the saga immediately compensates (INCR back) and returns 'out of stock.' Total time for all 5,000 attempts: ~50ms (Redis handles them all in parallel). Compare with the naive approach: 5,000 transactions queuing for row locks, 30+ minutes to process.

Mitigation

Redis DECR naturally handles this — atomic operations cannot oversell. The 4,900 'out of stock' responses complete in milliseconds, not minutes.

Failure Modes & Resilience
ComponentFailureImpactMitigation
Inventory Cache (Redis)Cache failure / memory exhaustionAll checkout operations fail. Product browsing unaffected (separate cache).Redis cluster with replication. Failover in <1s. Warmup from PostgreSQL on recovery.
Kafka / Order EventsBroker failure or partition offlineCheckout Service cannot publish order_placed events. Checkouts fail at step 3 (compensate: INCR Redis, rollback order INSERT).Kafka replication factor 3. Single broker failure has no impact. Multi-broker failure: circuit breaker stops checkouts, preventing inventory leak.
PaymentWorkerWorker crash or scaling to zeroOrders accumulate in 'pending' state indefinitely. Inventory reserved but not charged. Shoppers see 'Order Placed' but never receive payment confirmation.Auto-scaling based on Kafka consumer lag. Alert on pending orders older than 5 minutes. Manual compensation process for stuck orders.
Checkout ServicePod crash mid-saga (after DECR, before Kafka publish)Inventory decremented in Redis but no order created and no event published. Orphaned reservation.Background reconciliation job compares Redis reservations with PostgreSQL orders. Orphaned reservations (no matching order after 5 minutes) are released via INCR.
Scaling Strategy

Horizontal scaling at every tier. Checkout Service: auto-scale on CPU utilization (target 60%, scale-out at 70%). Product Service: auto-scale on request count. PaymentWorker: auto-scale on Kafka consumer lag (scale out when lag > 500 events). Redis: add read replicas for browsing cache. Kafka: add partitions for higher throughput (requires consumer rebalance). CDN: automatic edge scaling. Key bottleneck: Kafka partition count limits maximum parallelism — 12 partitions means max 12 PaymentWorker instances processing in parallel.

Monitoring & Alerting

Key metrics: (1) Checkout saga completion rate — alert if < 99%, (2) PaymentWorker consumer lag — alert if > 1000 events (indicates worker scaling issue), (3) Redis DECR failure rate (out of stock vs. Redis errors), (4) order status distribution (% PENDING vs PAID vs FAILED — growing PENDING indicates payment worker issue), (5) compensation rate — high compensation rate indicates payment provider issues or fraud. Dashboard: real-time saga state flow diagram, Kafka consumer lag per partition, inventory stock levels per SKU, payment success/failure rate by provider.

Cost Analysis

8-component architecture: CDN (~$50/month), API Gateway (~$30/month), ALB (~$25/month), 8 Checkout Service pods 4vCPU/8GB (~$800/month), 6 Product Service pods 4vCPU/8GB (~$600/month), 2 Redis caches r7g.large (~$300/month), 2 PostgreSQL RDS r7g.large (~$500/month), Kafka MSK m7g.large (~$200/month), 4 PaymentWorker Fargate tasks (~$100/month), Lambda NotificationWorker (~$5/month). Total: ~$1,800/month. About 2x the naive approach but handles 25x more checkout throughput. Cost per order: ~$0.0012 at 2,500 orders/min sustained.

Security Considerations

Payment tokens (Stripe payment method IDs) flow through the Checkout Service to Kafka to PaymentWorker. PCI scope includes the Checkout Service, Kafka topic, and PaymentWorker — 3 components vs. the entire monolith in the naive approach. Kafka messages should be encrypted at rest (MSK encryption). Payment tokens should be short-lived (Stripe's payment intents expire). The PaymentWorker should run in an isolated VPC subnet with no public internet access except to Stripe's API endpoints. CDN should enforce HTTPS-only with HSTS headers.

Deployment Strategy

Independent service deployments. Checkout Service: ECS rolling update with drain connections (zero-downtime). Product Service: independent rolling update. PaymentWorker: ECS task update with graceful shutdown (finish current event before stopping). Kafka: MSK manages broker updates (rolling restart). Redis: ElastiCache managed updates (failover during patching). Database: RDS managed maintenance window. Canary deployments are possible for Checkout Service (route 10% of traffic to new version via ALB weighted target groups).

Real-World Examples
  • Amazon's order pipeline uses saga orchestration with SQS/SNS for async payment and fulfillment
  • Shopify's checkout API separates inventory reservation from payment processing with idempotent retry
  • Stripe's own payment links use async confirmation webhooks — the merchant gets immediate 'order placed' before payment settles
  • DoorDash uses Kafka-based saga orchestration for order → restaurant → driver coordination
Solution Comparison
VariantTierLatencyThroughputCostComplexityReliability
Naive (Monolith + ACID)T1800ms-3s p99~100 orders/min~$925/monthLow99% (single DB)
Saga OrchestratorT2200-500ms p99~2,500 orders/min~$1,800/monthMedium99.9%
Distributed Saga + OutboxT4< 3s end-to-end10K+ orders/min~$2,800/monthVery High99.99%

This template is for educational and illustration purposes only. It may not represent the optimal production design for this problem. Real-world systems involve additional considerations (compliance, specific cloud provider constraints, organizational requirements) not captured here. Use this as a starting point for discussion, not as a production blueprint.

Frequently Asked Questions
How does the saga handle a PaymentWorker crash mid-processing?

Kafka tracks the consumer offset — the position in the event stream where the worker last committed. If the worker crashes before committing the offset, the event is redelivered when the worker restarts (or when another worker picks up the partition). Because payment is idempotent (keyed by order_id), the redelivered event does not create a duplicate charge. The order stays in 'pending' state until the worker successfully processes it.

What happens if Redis and PostgreSQL disagree on inventory counts?

This can happen if the Redis DECR succeeds but the PostgreSQL UPDATE fails (saga step 2 fails and compensates). During the brief window before compensation, Redis shows lower stock than PostgreSQL. The saga compensates within milliseconds (INCR Redis). In the worst case, a transient disagreement means a shopper is told 'out of stock' when stock exists — underselling, not overselling. This is acceptable because overselling (charging a customer for stock that doesn't exist) is far worse.

Why is the checkout latency so much better than the naive approach?

The naive approach holds database row locks for 200-500ms during payment. The saga approach replaces this with three fast steps: Redis DECR (~2ms) + PostgreSQL INSERT (~50ms) + Kafka produce (~5ms) = ~57ms total. Payment happens asynchronously, completely outside the checkout response path. The 10x improvement comes from removing the payment processing from the critical path — the user gets instant feedback.

How does the CDN handle flash-sale traffic spikes?

CDN edge nodes are designed for traffic spikes — they scale automatically with demand. During a flash sale, product images, CSS, and JS are served from edge caches with 0ms origin latency. Only 10% of requests (catalog data, checkout, order status) reach the origin. Without CDN, a 5x traffic spike would require 5x origin capacity. With CDN, the origin sees only a 1.5x increase (the non-cacheable portion).

When should you upgrade from this saga pattern to the distributed saga with outbox?

Upgrade when: (1) you need independent scaling per domain (inventory scales differently than payment), (2) you need stronger exactly-once guarantees (the outbox pattern prevents the dual-write problem), (3) you experience Kafka consumer rebalance issues at scale, or (4) you need compensation for multi-step operations across more than 3 services. Below 5K orders/min, this saga pattern is simpler and sufficient.

Related Templates

Discussion

Sign in to join the discussion.

Ready to design your own E-Commerce Checkout?

Open the simulator, place components on the canvas, wire them up, and run a traffic simulation to see how your architecture performs under real load.

Open Simulator