1What is the core problem the Outbox Pattern solves?
The Outbox Pattern ensures reliable message publishing by writing the message to an 'outbox' table in the same database transaction as the business operation. A separate process reads the outbox table and publishes messages to the message broker. This eliminates the dual-write problem where a database commit succeeds but the message publish fails (or vice versa), ensuring atomicity between state changes and event publication.
Consider a common scenario: an order service creates an order in its database and publishes an 'OrderCreated' event to Kafka. These are two separate operations -- a database write and a message broker publish. What happens if the database write succeeds but the Kafka publish fails? The order exists but downstream services never learn about it. What if the Kafka publish succeeds but the database write fails? Downstream services process a phantom order that does not exist.
This is the **dual-write problem**: writing to two separate systems (database + broker) cannot be made atomic without distributed transactions, which are slow, complex, and often unavailable. The Outbox Pattern solves this by reducing the dual-write to a single-write:
1. **Write phase**: In a single database transaction, the service writes the business entity (orders table) AND the outgoing event (outbox table). The outbox row contains the event type, payload, timestamp, and a status flag ('pending').
2. **Relay phase**: A separate process reads pending outbox rows and publishes them to the message broker. After successful publication, it marks the row as 'published' (or deletes it). If the relay crashes, it restarts and re-reads pending rows -- the outbox table is the source of truth.
Two relay strategies exist: - **Polling publisher**: A background thread or cron job queries the outbox table for pending rows at regular intervals (e.g., every 100ms). Simple but adds latency equal to the polling interval. - **Change Data Capture (CDC)**: A tool like Debezium tails the database's transaction log (WAL in PostgreSQL, binlog in MySQL) and publishes new outbox rows to Kafka in near-real-time. Lower latency, no polling overhead, but requires CDC infrastructure.
The outbox table is a lightweight transactional log within your existing database. It leverages the database's ACID guarantees to ensure that the business operation and the event publication are atomic. The trade-off is eventual delivery: the event reaches the broker after a short delay (milliseconds with CDC, up to seconds with polling).
This pattern is fundamental to event-driven microservices. It is described by Chris Richardson in his Microservices Patterns book and implemented by Debezium, Confluent's Outbox SMT, and AWS DMS.
Order Service Publishes Events Reliably
The order service receives a CreateOrder request. In a single database transaction, it: (1) INSERTs a row into the orders table, and (2) INSERTs a row into the outbox table with {event_type: 'OrderCreated', payload: {order_id, items, total}, status: 'pending'}. The transaction commits atomically. A Debezium connector tails the PostgreSQL WAL, detects the outbox insert, publishes the event to the 'orders' Kafka topic, and the outbox row is eventually deleted. If Debezium is down, the outbox rows accumulate safely in the database and are published when Debezium recovers.
Zalando
Zalando's open-source Nakadi event bus uses the outbox pattern across hundreds of microservices. Each service writes events to a local outbox table in the same transaction as business updates. A per-service publisher reads the outbox and publishes to Nakadi (their Kafka-based event bus). This ensures that every database change is reflected as an event, enabling reliable event-driven architecture across the company.
Debezium (Red Hat)
Debezium is the most widely used CDC platform for implementing the outbox pattern. Its 'outbox event router' SMT (Single Message Transform) extracts events from the outbox table, routes them to Kafka topics based on event type, and supports event deduplication via event IDs. Debezium tails PostgreSQL WAL, MySQL binlog, MongoDB oplog, and SQL Server CDC to achieve near-real-time event publishing.
Wix
Wix processes millions of website updates daily using the outbox pattern. When a user edits their website, the change is saved to the database with an outbox entry in the same transaction. A CDC pipeline publishes the change event to Kafka, which triggers CDN invalidation, search reindexing, and real-time collaboration updates. The outbox pattern ensures no edit is ever lost, even during broker outages.
| Aspect | Description |
|---|---|
| Consistency vs Complexity | The outbox pattern adds a table, a relay process, and cleanup logic. This is more complex than a naive dual-write (publish after commit). But the naive approach is fundamentally unreliable -- the complexity of the outbox pattern is the price of correctness. |
| Latency vs Simplicity | Polling adds latency equal to the polling interval (100ms-1s). CDC adds infrastructure complexity (Debezium cluster, WAL configuration, connector management) but achieves near-real-time delivery. Choose based on your latency requirements and operational maturity. |
| Database Load | Every business transaction includes an extra INSERT into the outbox table. Under high throughput, this adds write amplification. The outbox table must be indexed for the relay query (status + created_at). Periodic cleanup (DELETE published rows) adds maintenance I/O. For most workloads, this overhead is negligible. |
| At-Least-Once vs Exactly-Once | The outbox pattern guarantees at-least-once publication: if the relay crashes after publishing but before marking as published, it republishes. Consumers must handle duplicates. For exactly-once end-to-end, combine the outbox pattern with consumer-side idempotency keys. |
How Debezium Solved the Dual-Write Problem for Microservices
Scenario
A large e-commerce platform migrated from a monolith to microservices. Each service owned its database but needed to publish domain events for inter-service communication. Teams implemented the naive approach: commit to database, then publish to Kafka. Under load, 0.1% of publishes failed (Kafka timeouts, producer errors), creating 'ghost state' -- database records with no corresponding Kafka events. Downstream services (search, recommendations, analytics) had missing data. Reconciliation jobs ran nightly but could not fix real-time inconsistencies.
Solution
The platform adopted Debezium's outbox pattern. Each service writes to a local outbox table in the same transaction as the business update. Debezium connectors (one per database) tail the WAL and publish outbox events to Kafka. The outbox table schema: (id UUID, aggregateType VARCHAR, aggregateId VARCHAR, type VARCHAR, payload JSONB, created_at TIMESTAMP). Debezium's outbox event router SMT extracts and routes events based on aggregateType.
Outcome
Ghost state dropped from 0.1% to effectively zero. Nightly reconciliation jobs were eliminated. Event publishing latency was under 100ms (WAL tail to Kafka delivery). The platform processes 50M events/day through the outbox pipeline with 99.99% reliability. The Debezium connector cluster requires minimal operational overhead (3 tasks, automated offset tracking). The key lesson: 'If your service writes to a database and a broker, you have a dual-write problem. The outbox pattern is the standard solution.'
See Outbox Pattern in action
Explore system design templates that use outbox pattern and run traffic simulations to see how these concepts perform under real load.
Browse Templates1What is the core problem the Outbox Pattern solves?
2Why must consumers be idempotent when using the Outbox Pattern?