1What is the primary difference between a saga and a distributed two-phase commit (2PC) transaction?
Learn how the saga pattern manages distributed transactions across microservices using a sequence of local transactions with compensating actions, avoiding the need for distributed ACID transactions.
In a monolithic application, a business operation like 'place an order' can be a single ACID transaction: debit the customer's account, reserve inventory, and create the order record all succeed or all fail atomically. In a microservices architecture where each service owns its own database, this ACID guarantee disappears. The Payment Service, Inventory Service, and Order Service each have their own database, and there is no way to wrap a transaction across all three without distributed two-phase commit (2PC), which is slow, brittle, and incompatible with most modern datastores.
The saga pattern, originally described by Hector Garcia-Molina and Kenneth Salem in 1987, solves this problem by decomposing the distributed transaction into a sequence of local transactions, each executed within a single service's database. If the payment succeeds but inventory reservation fails, a compensating transaction refunds the payment. Each step either advances the saga forward or triggers compensating actions to undo all previously completed steps, bringing the system back to a consistent state.
There are two coordination approaches. In choreography-based sagas, each service listens for events and reacts independently. The Order Service publishes 'OrderCreated', the Payment Service hears it and charges the customer, then publishes 'PaymentCompleted', the Inventory Service hears that and reserves stock. If any step fails, the failing service publishes a failure event that triggers compensating actions in upstream services. Choreography works well for simple sagas with 3-4 steps but becomes difficult to understand and debug as the number of steps grows because the flow logic is distributed across services.
Orchestration-based sagas use a central saga coordinator (orchestrator) that explicitly defines the step sequence. The orchestrator tells each service what to do and what to do if it fails, like a workflow engine. The Order Saga Orchestrator calls the Payment Service, waits for the result, calls the Inventory Service, waits for the result, and if any step fails, it sends compensating commands to services that have already completed their steps. Orchestration is easier to understand, test, and monitor because the entire flow is visible in one place, but it introduces a coordination point that can become a bottleneck or single point of failure.
The Travel Booking Analogy
Booking a vacation involves three separate services: flight booking, hotel reservation, and car rental. In a saga, you book the flight first (step 1), then reserve the hotel (step 2), then rent the car (step 3). If the car rental fails because no cars are available, you need to compensate: cancel the hotel reservation (undo step 2) and cancel the flight booking (undo step 1). Each booking system is independent with its own database -- there is no way to wrap all three in a single transaction. The saga ensures that either all three bookings succeed, or any completed bookings are cancelled, leaving you in a consistent state (no partial vacation bookings).
Uber
Uber uses an orchestration-based saga (built on their Cadence/Temporal workflow engine) for trip processing. The saga coordinates rider matching, driver notification, fare calculation, payment processing, and receipt generation. If payment fails after a completed ride, a compensating action adjusts the rider's balance and notifies the driver about payment issues. The orchestrator processes millions of trip sagas daily with full visibility into each step's status.
Airbnb
Airbnb's booking system uses sagas to coordinate across the Reservation Service, Payment Service, Host Notification Service, and Calendar Service. When a guest books a stay, the saga reserves the dates, processes the payment, and notifies the host. If the host declines within the acceptance window, compensating actions refund the payment and release the calendar dates. Their saga orchestrator handles approximately 1 million booking attempts per day across 220 countries.
Doordash
Doordash uses Cadence (now Temporal) to orchestrate order fulfillment sagas. A single order saga coordinates across 10+ services: order validation, restaurant confirmation, driver assignment, payment authorization, delivery tracking, and final settlement. Each step has a timeout and compensating action. If the restaurant cannot fulfill the order after payment is authorized, the saga triggers a payment reversal and customer notification in the correct order.
| Aspect | Description |
|---|---|
| Availability vs Consistency | Sagas avoid distributed locks, allowing each service to remain available and responsive even during multi-step transactions. The trade-off is that the system is in an inconsistent intermediate state between steps. A customer might briefly see a debited account before inventory is confirmed, requiring careful UX to manage expectations. |
| Choreography vs Orchestration | Choreography is more decoupled and avoids a central coordinator, but the flow logic is scattered across services, making it hard to understand, test, and debug. Orchestration centralizes the flow logic for visibility and testability, but introduces a coordination point that all participants depend on. |
| Simplicity vs Compensation Complexity | Sagas avoid the complexity of distributed 2PC transactions, but compensating transactions can be surprisingly difficult to implement. Some operations are not easily reversible (e.g., sending an email, shipping a package). Semantic compensation ('send a cancellation email') is often the only option, and designing these for all failure scenarios requires careful domain analysis. |
| Scalability vs Observability | Choreography-based sagas scale well because there is no central bottleneck, but observing the state of a saga requires correlating events across multiple services. Orchestration-based sagas have a single point of visibility (the orchestrator's state machine) but the coordinator must scale to handle all saga instances. |
Order Processing Saga at a Large Retailer
Scenario
A major online retailer's order processing involved five services: Order, Payment, Inventory, Shipping, and Loyalty. The original design used synchronous REST calls in a chain: Order -> Payment -> Inventory -> Shipping -> Loyalty. If any downstream service was slow or unavailable, the entire chain blocked. Payment timeouts caused duplicate charges, and partial failures left orders in inconsistent states -- charged but not shipped, or shipped but not charged. Error rates during peak traffic reached 5%, resulting in thousands of customer complaints daily.
Solution
The team implemented an orchestration-based saga using a dedicated Saga Orchestrator service backed by a durable state machine (Temporal). Each step became an asynchronous operation with explicit timeouts, retry policies, and compensating actions. The Payment step had a 'refund' compensator, the Inventory step had a 'release reservation' compensator, and the Shipping step had a 'cancel shipment' compensator. All participants were made idempotent using idempotency keys. The orchestrator persisted its state durably, so in-flight sagas could survive orchestrator restarts.
Outcome
Order processing error rates dropped from 5% to 0.01%. During peak traffic events (Black Friday), the saga-based system processed 50,000 orders per minute with full consistency guarantees. Partial failures were automatically compensated within seconds, eliminating the backlog of inconsistent orders that previously required manual intervention. The orchestrator's state machine provided full visibility into every order's saga progress, reducing customer support investigation time from 15 minutes to 30 seconds.
See Saga Pattern in action
Explore system design templates that use saga pattern and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the primary difference between a saga and a distributed two-phase commit (2PC) transaction?
2Why must saga participants be idempotent?