Distributed Transactions: Two-Phase Commit, Sagas, and Coordination Patterns
Distributed transaction coordination governs how multi-node systems achieve atomic, consistent outcomes across services that may fail independently, operate at different speeds, or reside across network partitions. This page covers the principal coordination protocols — two-phase commit, three-phase commit, and the Saga pattern — along with the classification boundaries that separate them, the failure modes that motivate each design, and the tradeoffs practitioners weigh when selecting among them. The treatment draws on published research and standards from NIST, the ACM, and the IETF.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps (non-advisory)
- Reference table or matrix
Definition and scope
A distributed transaction is a unit of work that spans 2 or more independent resource managers — databases, message brokers, or microservices — and must satisfy atomicity: either all participants commit or all roll back. The challenge is that no single node has a global view of system state, and any participant may fail before signaling completion.
The ACID properties — Atomicity, Consistency, Isolation, Durability — defined in the database literature (Gray and Reuter, Transaction Processing: Concepts and Techniques, 1992) apply to distributed settings, but their enforcement requires explicit coordination protocols that have no equivalent in single-node transactions. The CAP theorem, as framed by Eric Brewer in his 2000 PODC keynote, establishes that a distributed system cannot simultaneously guarantee Consistency, Availability, and Partition Tolerance — a constraint that directly shapes which transaction protocol is appropriate in a given deployment.
The scope of distributed transactions extends across microservices architecture, distributed data storage, and event-driven workflows. The distributed transactions landscape at the system level encompasses both synchronous blocking protocols (2PC, 3PC) and asynchronous compensating patterns (Sagas), each targeting a distinct point on the consistency-availability spectrum.
The X/Open Distributed Transaction Processing (DTP) model — standardized by The Open Group as X/Open XA — defines the interface contract between transaction managers and resource managers. XA is the basis for Java Transaction API (JTA) implementations and is referenced in NIST SP 1500-1 as a foundational interoperability specification in distributed data architectures.
Core mechanics or structure
Two-Phase Commit (2PC)
The two-phase commit protocol divides the commit decision into 2 sequential phases coordinated by a single designated coordinator node:
Phase 1 — Prepare (Voting Phase): The coordinator sends a PREPARE message to all participant nodes. Each participant executes the transaction up to the point of commit, writes a prepare record to its durable log, and responds with either VOTE-COMMIT (ready) or VOTE-ABORT (unable to commit).
Phase 2 — Commit or Abort: If all participants vote to commit, the coordinator writes a commit record and broadcasts COMMIT. If any participant votes to abort — or fails to respond within a timeout — the coordinator broadcasts ABORT. Each participant then finalizes or rolls back and acknowledges.
The critical invariant is that a participant that has voted VOTE-COMMIT is blocked: it holds locks and cannot proceed until it receives the coordinator's Phase 2 decision. This is the source of 2PC's blocking failure mode. If the coordinator crashes after Phase 1 voting completes but before broadcasting Phase 2, participants are indefinitely blocked — a property that makes 2PC unsuitable for systems requiring high availability under coordinator failure.
Three-Phase Commit (3PC)
Three-phase commit inserts a PRE-COMMIT phase between the vote and the final commit broadcast, with the goal of making the protocol non-blocking under coordinator failure. By ensuring that no participant can be uncertain about whether others have received the commit decision, 3PC allows a new coordinator to be elected and to determine the correct outcome. However, 3PC assumes a synchronous network model — an assumption that does not hold in real wide-area deployments — limiting its practical adoption relative to 2PC.
The Saga Pattern
The Saga pattern, formalized by Hector Garcia-Molina and Kenneth Salem in their 1987 ACM SIGMOD paper, decomposes a long-lived transaction into a sequence of local transactions, each of which publishes an event or message upon completion. If a step fails, a series of compensating transactions executes in reverse order to undo the effects of prior steps.
Sagas do not hold locks across service boundaries. Isolation between concurrent Sagas must be managed explicitly through semantic locking, versioning, or ordering constraints — the absence of ACID isolation is a defining characteristic, not an implementation detail. Two Saga orchestration models exist: choreography (each service reacts to events from the prior step, with no central controller) and orchestration (a central saga orchestrator issues commands to each participant and handles failures). The orchestration model maps naturally to event-driven architecture and CQRS and event sourcing implementations.
Causal relationships or drivers
The adoption of non-blocking coordination patterns is driven by 3 compounding forces:
-
Independent deployability of services. In microservices architecture, services are owned by separate teams and deployed on independent release cycles. Introducing 2PC across service boundaries creates tight operational coupling — a coordinator failure in one team's service can block participants owned by other teams.
-
Network partition frequency. Network partitions are not hypothetical edge cases in modern cloud infrastructure. AWS, GCP, and Azure each publish availability SLAs that acknowledge partial failure. The probability of a complete network partition affecting at least 1 node in a cluster increases linearly with cluster size.
-
Latency accumulation. 2PC requires at minimum 2 round-trip message exchanges between coordinator and all participants before a transaction can commit. In a deployment where participants are distributed across 3 geographic regions, each round trip may add 50–150 milliseconds of latency, compounding into commit latencies that violate service-level objectives. Latency and throughput constraints in high-volume systems make synchronous blocking protocols operationally expensive.
Consensus algorithms such as Raft address coordinator single-point-of-failure by replicating the coordinator's log across a quorum — but they do not eliminate the blocking behavior of 2PC itself; they merely make the coordinator more fault-tolerant.
Classification boundaries
Distributed transaction protocols separate along 3 axes: blocking behavior, isolation guarantee, and coordination model.
Blocking vs. Non-Blocking: 2PC is a blocking protocol — participants hold locks while awaiting coordinator decision. 3PC is theoretically non-blocking under asynchronous failure assumptions but blocking under network partition. Sagas are non-blocking because each local transaction commits independently.
Locking vs. Compensation: 2PC and 3PC use pessimistic locking — resources are reserved before commit. Sagas use compensating transactions — resources are modified immediately, and compensation logic reverses effects if a later step fails. Compensation is semantically different from rollback: a compensating transaction is a new forward transaction, not an undo operation; its effects may be visible to external observers before compensation runs.
Centralized vs. Decentralized Coordination: 2PC and orchestration-based Sagas use a single coordinator or orchestrator. Choreography-based Sagas distribute coordination across services via message queues and event streaming. Zookeeper and coordination services provide infrastructure for leader election and distributed locking that supports centralized coordination patterns.
The XA standard governs 2PC implementations specifically. Sagas fall outside XA scope and are typically implemented at the application layer, often using frameworks that interact with event-driven architecture infrastructure.
Tradeoffs and tensions
The central tension in distributed transaction design is the consistency-availability tradeoff as bounded by the CAP theorem and its practical successor, the PACELC model (Patterson et al., 2012, published in IEEE Data Eng. Bull.), which adds latency as a dimension: even in the absence of partition, systems face a tradeoff between consistency and latency.
2PC maximizes consistency at the cost of availability and throughput. A coordinator failure leaves participants blocked indefinitely without external recovery intervention. 2PC is appropriate when the transaction spans 2–3 resources owned within a single operational boundary, latency budgets permit blocking, and the resource managers all support XA.
Sagas maximize availability and throughput at the cost of isolation. Concurrent Sagas can observe each other's intermediate states — the "dirty reads" that ACID isolation prevents. Compensating transactions introduce complexity: compensation logic must be idempotent (see idempotency and exactly-once semantics) and must be designed to handle scenarios where downstream services have already acted on the committed state.
3PC resolves the coordinator blocking problem theoretically but introduces vulnerability to network partition, making it unsuitable for most real-world deployments, which cannot guarantee a synchronous communication model.
The adoption of Sagas in practice trades implementation simplicity for operational safety. Each compensating transaction must be explicitly designed, tested, and monitored — failure to implement compensation correctly produces orphaned partial states that are difficult to detect without distributed system observability tooling.
Common misconceptions
Misconception: Sagas guarantee eventual consistency automatically.
Sagas guarantee that all steps will either complete or be compensated — they do not guarantee that the system reaches the same final state that a serializable 2PC transaction would produce. Concurrent Sagas may interleave in ways that produce anomalies (such as lost updates) not present under serializable isolation. The consistency models achieved by Sagas depend heavily on how the application manages ordering, versioning, and semantic locking.
Misconception: 2PC is always reliable if the coordinator does not fail.
Even without coordinator failure, 2PC is vulnerable to participant failure after voting VOTE-COMMIT. A participant that crashes between Phase 1 and Phase 2 must recover its pre-commit log entry and contact the coordinator to learn the outcome — a recovery process that requires the coordinator to still be available. If the coordinator has also failed, the participant is permanently blocked without manual intervention.
Misconception: XA transactions are deprecated in modern systems.
XA remains the active standard for 2PC in Java EE and Jakarta EE environments, where JTA-compliant application servers implement XA across JDBC data sources and JMS message brokers. The Jakarta EE specification (maintained by the Eclipse Foundation) includes JTA as a required API. XA is not deprecated; it is selectively inappropriate for high-availability microservices deployments where its blocking behavior conflicts with availability requirements.
Misconception: Choreography-based Sagas are simpler than orchestration-based Sagas.
Choreography distributes coordination logic across services, which reduces single-point-of-failure risk but makes the overall transaction flow implicit and difficult to trace. Distributed system observability tooling — distributed tracing, structured logging — is required to reconstruct the full transaction path in choreography deployments. Orchestration centralizes the transaction logic in a single component, making the flow explicit and testable, at the cost of introducing an orchestrator as a component that must be fault-tolerant.
Checklist or steps (non-advisory)
The following steps describe the operational sequence of a 2PC transaction under the X/Open XA model:
- Transaction Start: Application initiates a transaction via the transaction manager. The transaction manager assigns a global transaction identifier (GTID).
- Resource Enlistment: Each resource manager (e.g., database, message broker) is enlisted in the transaction, establishing the XA branch association.
- Application Work: Application performs read and write operations across enlisted resource managers. Locks are held per each resource manager's isolation policy.
- Prepare Request (Phase 1): Transaction manager issues
xa_prepare()to each resource manager. Each resource manager flushes dirty pages, writes a prepare log record, and returnsXA_OKorXA_RBROLLBACK. - Vote Collection: Transaction manager collects all responses. If any resource manager returns a rollback vote or fails to respond within the configured timeout, the manager records an abort decision.
- Global Decision Commit or Rollback (Phase 2): Transaction manager writes the global commit or abort record to its own durable log.
- Commit or Rollback Broadcast: Transaction manager issues
xa_commit()orxa_rollback()to each resource manager. Each resource manager applies the decision and releases locks. - Acknowledgment and Cleanup: Each resource manager acknowledges. The transaction manager marks the GTID as complete and releases the transaction record.
- Heuristic Handling (if applicable): If a resource manager made a unilateral decision (heuristic commit or rollback) due to timeout, the transaction manager records a heuristic completion for manual resolution.
The distributed system failure modes that arise at steps 6–8 — specifically coordinator crash after writing the global decision but before all acknowledgments are received — constitute the known "in-doubt transaction" problem addressed in XA recovery specifications.
Reference table or matrix
| Protocol | Blocking Under Coordinator Failure | Isolation Level | Network Model Required | XA Standard | Practical Adoption Context |
|---|---|---|---|---|---|
| Two-Phase Commit (2PC) | Yes — participants block indefinitely | Serializable (full ACID) | Asynchronous (tolerates loss) | Yes (X/Open XA) | Single-org multi-DB, JTA/JEE environments |
| Three-Phase Commit (3PC) | No (theoretical) | Serializable | Synchronous (assumption) | No | Academic reference; rarely deployed |
| Saga (Choreography) | No | No isolation between Sagas | Asynchronous | No | Event-driven microservices, high availability |
| Saga (Orchestration) | No (orchestrator is single PoF) | No isolation between Sagas | Asynchronous | No | Workflow engines, explicit process management |
| XA with Raft Coordinator | Reduced (coordinator replicated) | Serializable | Asynchronous | Yes (coordinator layer) | Distributed databases (e.g., CockroachDB model) |
The /index for this reference network provides entry points across the full range of distributed systems topics. For fault-tolerance behavior under the failure modes verified above, see fault tolerance and resilience. Systems that rely on saga-based consistency must also address eventual consistency guarantees and the back-pressure and flow control mechanisms that prevent compensation queue saturation under load. Protocol selection interacts with distributed system design patterns at the architectural layer and with container orchestration at the deployment layer, particularly when transaction participants run as ephemeral pods with variable startup times.