Vetora logo
⚖️Consistency & Transactions

Two-Phase Commit (2PC)

Two-phase commit (2PC) is a distributed transaction protocol that ensures all participating nodes either commit or abort a transaction atomically. A coordinator asks all participants to prepare (Phase 1), then instructs them to commit or abort based on their responses (Phase 2). 2PC guarantees atomicity but blocks if the coordinator fails after Phase 1.

Overview

Two-phase commit is the classic protocol for achieving atomic transactions across multiple databases, services, or resource managers. The problem it solves is straightforward: when a business operation spans multiple data stores (e.g., debiting Account A in Database 1 and crediting Account B in Database 2), either both changes must be applied or neither. Without an atomic commit protocol, a failure between the two writes could leave the system in an inconsistent state -- money debited but not credited.

2PC works in two phases. In Phase 1 (Prepare/Vote), the coordinator sends a 'prepare' message to all participants. Each participant validates the transaction against its local constraints, acquires necessary locks, writes a prepare record to its WAL (write-ahead log), and responds with either 'yes' (ready to commit) or 'no' (abort). Importantly, after voting 'yes,' a participant has promised to commit if asked -- it must hold its locks and not unilaterally abort. In Phase 2 (Commit/Abort), if all participants voted 'yes,' the coordinator writes a commit decision to its own WAL and sends 'commit' to all participants. If any voted 'no,' the coordinator sends 'abort.' Participants apply or rollback the transaction and release their locks.

The fundamental weakness of 2PC is blocking on coordinator failure. If the coordinator crashes after receiving all 'yes' votes but before sending the commit/abort decision, all participants are stuck: they have voted 'yes,' are holding locks, and cannot proceed without the coordinator's decision. They cannot unilaterally commit (the coordinator might have decided to abort) or abort (the coordinator might have decided to commit). The participants must wait until the coordinator recovers, potentially blocking other transactions that need the locked resources. This blocking behavior can last seconds to hours depending on recovery time.

Despite this limitation, 2PC is widely used for cross-database transactions in traditional enterprise systems via the XA protocol (X/Open Distributed Transaction Processing standard). The XA protocol defines a standard interface between a transaction manager (coordinator) and resource managers (participants -- databases, message queues). Java's JTA (Java Transaction API) and .NET's TransactionScope both use XA under the hood. However, 2PC is rarely used in microservices architectures because the blocking behavior, performance overhead, and tight coupling between services contradict microservice design principles. The Saga pattern has emerged as the preferred alternative.

Key Points
  • 12PC guarantees atomicity: either all participants commit or all abort. There is no state where some have committed and others have not (assuming no Byzantine failures and no permanent loss of the coordinator's WAL).
  • 2After voting 'yes,' a participant enters an 'in-doubt' state and must hold locks until the coordinator's decision arrives. This is the source of 2PC's blocking problem.
  • 3The coordinator's commit/abort decision is the 'point of no return.' Once written to the coordinator's WAL, the decision is final. Even if the coordinator crashes immediately after, upon recovery it will re-send the decision.
  • 43PC (three-phase commit) was designed to eliminate the blocking problem by adding a 'pre-commit' phase. However, 3PC does not work in asynchronous networks (it relies on timeout-based assumptions) and is rarely used in practice.
  • 5XA transactions use 2PC across heterogeneous databases. A Java application can atomically commit to both PostgreSQL and MySQL via a JTA transaction manager. The cost is increased latency (2 round-trips), held locks, and operational complexity.
  • 6The Saga pattern replaces 2PC in microservices by breaking a distributed transaction into a sequence of local transactions with compensating actions (undo operations). Sagas are non-blocking but provide weaker guarantees (eventual consistency, not atomicity).
Simple Example

Bank Transfer Between Two Databases

Transferring $100 from Bank A (Database 1) to Bank B (Database 2). The coordinator sends 'prepare: debit $100' to DB1 and 'prepare: credit $100' to DB2. DB1 checks Alice's balance (sufficient), locks the row, writes a prepare record, and responds 'yes.' DB2 does the same for Bob's account and responds 'yes.' The coordinator records 'commit' in its WAL and sends 'commit' to both. Both databases apply the changes and release locks. If DB1 had responded 'no' (insufficient funds), the coordinator would send 'abort' to both, and no money moves.

Real-World Examples

Google Spanner

Spanner uses an optimized variant of 2PC for cross-shard transactions. The Paxos leader of each shard acts as a participant. One shard is designated as the coordinator. Spanner reduces 2PC's latency by piggybacking the prepare phase on the Paxos replication of write intents. The coordinator also uses TrueTime to assign a globally consistent commit timestamp. Spanner's 2PC is less prone to blocking because each participant is itself a Paxos group -- if the coordinator node fails, its Paxos group elects a new leader that resumes the protocol.

MySQL + XA Transactions

MySQL supports XA (distributed) transactions via XA START, XA END, XA PREPARE, and XA COMMIT statements. An application transaction manager (like Atomikos or Narayana) coordinates 2PC across multiple MySQL instances. XA transactions in MySQL are significantly slower than local transactions due to the additional WAL fsync at prepare time and the lock-holding duration. MySQL's documentation recommends using XA only when cross-database atomicity is absolutely required.

Traditional Enterprise (JTA/Java EE)

Java EE application servers (WebLogic, JBoss, WebSphere) provide built-in JTA transaction managers that coordinate 2PC across JDBC connections, JMS queues, and other XA-compatible resources. A common pattern is a service that writes to a database and publishes a JMS message in a single transaction -- 2PC ensures both happen or neither does. This was the standard approach for decades but is being replaced by the outbox pattern and Saga in modern architectures.

Trade-Offs
AspectDescription
Atomicity vs Availability2PC guarantees atomic commit but blocks all participants if the coordinator fails during the in-doubt state. The system trades availability for atomicity. In latency-sensitive or high-availability systems, this blocking behavior is unacceptable. Sagas and eventual consistency approaches trade atomicity for availability.
Latency Overhead2PC requires 2 round-trips (prepare + commit) plus WAL fsyncs at each participant. A local transaction taking 5ms might take 50-100ms as a 2PC transaction. Cross-datacenter 2PC adds network latency (100-400ms round trip), making it impractical for geo-distributed transactions.
Lock DurationParticipants hold locks from prepare until commit/abort. During this window, other transactions that need the locked rows are blocked. If the coordinator is slow or fails, locks are held for an extended period, potentially causing cascading lock timeouts and throughput degradation.
Coupling vs Independence2PC tightly couples all participants: if any participant is unavailable, the entire transaction fails. This contradicts the microservice principle of independent deployability. Adding a new participant to a 2PC transaction requires all participants to be available simultaneously.
Case Study

From XA Transactions to the Outbox Pattern at Shopify

Scenario

Shopify's order processing system originally used XA transactions to atomically update the orders database and publish an event to the message queue. As traffic grew, XA transactions became a bottleneck: the 2PC overhead doubled write latency, lock contention increased during peak sales events (Black Friday), and the message broker's unavailability caused all order writes to fail.

Solution

Shopify replaced XA transactions with the outbox pattern. Instead of atomically writing to both the database and the message queue, the service writes the order AND the outbox message to the database in a single local transaction. A separate relay process reads the outbox table and publishes events to the message queue. If the relay fails, messages are retried from the outbox table (which serves as a durable buffer).

Outcome

Write latency dropped 60% (eliminated 2PC overhead). The message broker's availability no longer affected order writes. The outbox relay could be scaled independently and caught up quickly after outages. The trade-off was that event publication became eventually consistent (instead of atomically consistent), but consumers were already designed for at-least-once delivery with idempotency. The pattern was widely adopted across Shopify's microservices.

Common Mistakes
  • Using 2PC for microservice-to-microservice communication. 2PC couples services tightly and blocks on failures. Use Sagas (choreography or orchestration) for cross-service transactions, reserving 2PC for cross-database transactions within a single service.
  • Not implementing a recovery mechanism for the coordinator's crash. The coordinator's WAL must be on durable storage, and recovery must re-read the WAL and re-send commit/abort decisions to any participants in the in-doubt state.
  • Assuming 2PC prevents all failures. 2PC ensures atomicity (all or nothing) but does not prevent the transaction from failing. If any participant votes 'no,' the entire transaction aborts. The application must handle abort scenarios gracefully.
  • Forgetting that 2PC holds locks during the prepare-commit window. Under high contention, the lock duration can cause cascading timeouts. Minimize the work done between prepare and commit, and set aggressive coordinator recovery timeouts.
Related Concepts

See Two-Phase Commit (2PC) in action

Explore system design templates that use two-phase commit (2pc) and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Simulate 2PC coordinator failure during distributed transactions

Metrics to watch
prepare_latency_mscommit_latency_msblocking_duration_msabort_rate_pct
Run Simulation
Test Your Understanding

1What happens if the 2PC coordinator crashes after receiving all 'yes' votes but before sending the commit decision?

2Which pattern is the recommended alternative to 2PC in microservices architectures?

Deeper Reading