Vetora logo
📜Architectural Patterns

Event Sourcing

Learn how event sourcing persists state as an immutable sequence of domain events rather than mutable rows, enabling complete audit trails, temporal queries, and the ability to reconstruct any past system state.

Overview

In traditional CRUD architectures, the database stores only the current state of each entity. When you update a user's email address, the old value is overwritten and lost forever. Event sourcing inverts this model: instead of storing current state, you store every change that has ever happened as an immutable event. The current state is derived by replaying the full sequence of events from the beginning. An account's balance is not a single row with value $500 -- it is the result of replaying 'AccountOpened($0)', 'FundsDeposited($1000)', 'FundsWithdrawn($300)', 'FundsWithdrawn($200)'.

The event store is an append-only log where each event is immutable -- once written, it is never modified or deleted. Events are domain-specific and meaningful: 'OrderPlaced', 'ItemShipped', 'PaymentCaptured', not generic 'row updated' operations. Each event carries a timestamp, a sequence number, and the aggregate ID it belongs to. To reconstruct the current state of an entity, the system loads all events for that aggregate and replays them in order through a fold function.

Replaying events from the beginning becomes expensive as the event count grows. Snapshots solve this: periodically, the system saves the current state (the snapshot) and records the event sequence number at which it was taken. To reconstruct state, the system loads the latest snapshot and replays only events that occurred after it. A snapshot every 100 events means at most 100 events need to be replayed on any read, regardless of the aggregate's total history.

Event sourcing is particularly powerful when combined with event-driven architecture. The same events stored for state reconstruction can be published to downstream consumers -- search indexes, analytics pipelines, notification services -- creating a single source of truth that feeds the entire system. This eliminates the need for change data capture (CDC) or dual-write patterns that plague CRUD-based architectures. However, event sourcing introduces significant complexity: event schema evolution requires careful versioning, eventual consistency between the event store and read models must be managed, and developers need to think in terms of events rather than state -- a paradigm shift that takes time to internalize.

Key Points
  • 1Events are immutable facts about what happened in the domain. They use past tense ('OrderPlaced', 'PaymentCaptured') and carry all data needed to reconstruct the state change they represent.
  • 2Current state is derived, not stored. To get the current state of an aggregate, replay all its events through a fold function. Snapshots optimize this by caching intermediate state at periodic intervals.
  • 3The event store is an append-only log. Events are never updated or deleted (only corrective events like 'OrderCancelled' can reverse the effect of prior events). This provides a complete, tamper-evident audit trail.
  • 4Event schema evolution is the hardest operational challenge. As the domain model evolves, events written with old schemas must still be deserializable. Upcasters, versioned event types, and schema registries are essential tooling.
  • 5Event sourcing naturally supports temporal queries: 'What was the account balance on March 15?' is answered by replaying events up to that date. CRUD databases require separate audit tables or temporal extensions for this capability.
  • 6Storage grows monotonically because events are never deleted. For high-volume aggregates, compaction strategies (archiving events older than a retention window) or snapshot-only reads become necessary.
Simple Example

The Bank Ledger Analogy

A traditional database stores your bank balance as a single number: $1,500. If you ask 'why is my balance $1,500?', the system cannot answer. An event-sourced system stores every transaction: 'Deposit $2,000 on Jan 1', 'Withdraw $300 on Jan 15', 'Deposit $100 on Feb 1', 'Withdraw $300 on Feb 10'. Your balance of $1,500 is computed by replaying these events. If there is a dispute about a charge, you can examine the exact sequence of events. If you need to know your balance on January 20th, you replay events up to that date and get $1,700. This is exactly how real bank ledgers have worked for centuries -- event sourcing applies this proven pattern to software systems.

Real-World Examples

LMAX Exchange

LMAX, a high-frequency trading exchange, uses event sourcing to process 6 million transactions per second on a single thread. Every order, trade, and cancellation is stored as an event in an in-memory journal. The entire system state can be reconstructed by replaying the event journal, enabling instant recovery after failures. The event-sourced design also provides a complete audit trail required by financial regulators.

Walmart

Walmart's e-commerce inventory system uses event sourcing to track stock levels across 4,700+ stores and multiple fulfillment centers. Events like 'ItemReceived', 'ItemSold', 'ItemTransferred' flow through the system, enabling real-time inventory visibility. The event log serves as the source of truth for inventory reconciliation, reducing shrinkage discrepancies by providing a full audit trail of every inventory movement.

Klarna

Klarna, the buy-now-pay-later provider, uses event sourcing for its payment processing pipeline. Every payment state transition -- authorized, captured, refunded, disputed -- is stored as an immutable event. This provides the regulatory audit trail required across 45 markets, enables reconciliation with merchant and bank systems, and allows Klarna to replay events to rebuild read models when business requirements change.

Trade-Offs
AspectDescription
Auditability vs ComplexityEvent sourcing provides a complete, immutable audit trail by default -- invaluable for financial, healthcare, and regulatory domains. However, the programming model is more complex than CRUD: developers must think in terms of events and state transitions rather than direct state manipulation, and debugging requires understanding event sequences rather than inspecting current rows.
Temporal Queries vs Storage CostThe ability to reconstruct state at any point in time is a unique capability of event sourcing. However, the event store grows without bound because events are never deleted. High-throughput systems can accumulate terabytes of events, requiring archival strategies and efficient snapshot mechanisms.
Flexibility vs Schema EvolutionEvent sourcing makes it easy to add new read models or projections by replaying existing events -- no data migration needed. However, changing the structure of existing event types requires careful versioning and upcasting logic to ensure old events remain deserializable as the schema evolves.
Consistency vs LatencyWrites to the event store are strongly consistent and fast (append-only). However, read models (projections) derived from events are eventually consistent, introducing a lag between when an event is written and when it is reflected in query results. For many use cases this lag is sub-second, but it must be accounted for in the UX and system design.
Case Study

Event Sourcing at a Large European Bank

Scenario

A major European bank needed to rebuild its core banking ledger to comply with new regulatory requirements mandating complete transaction traceability. The existing CRUD-based system maintained only current balances, and the audit tables were incomplete and inconsistent. Regulators required the ability to reconstruct the exact state of any account at any point in the past five years, with cryptographic proof that records had not been tampered with.

Solution

The bank adopted event sourcing for its ledger system. Every financial operation -- deposits, withdrawals, transfers, fee assessments, interest accruals -- was modeled as an immutable domain event stored in an append-only event store built on Apache Kafka with long-term archival to S3. Current balances were maintained as projections updated by event consumers. Snapshots were taken hourly for high-activity accounts to keep reconstruction times under 200ms. Event integrity was ensured through hash chaining, where each event's hash includes the previous event's hash.

Outcome

The system passed regulatory audits with full marks for traceability and data integrity. Any account state could be reconstructed at any historical point within 500ms. The event log also eliminated reconciliation discrepancies between the ledger and downstream reporting systems, which had previously required 15 FTEs to investigate manually. Processing throughput improved 3x over the CRUD system because append-only writes are inherently faster than update-in-place operations.

Common Mistakes
  • Storing events that are too granular or too coarse. Events should represent meaningful domain state transitions, not low-level field changes ('EmailFieldUpdated') or overly broad mutations ('UserUpdated' with a diff payload).
  • Neglecting event schema evolution from the start. Without versioned event types and upcaster infrastructure, changing event schemas after production deployment leads to deserialization failures and data corruption.
  • Treating the event store as a message queue. The event store is a database of facts, not a transport mechanism. Publishing events to downstream consumers should use a separate pub/sub system, with the event store as the authoritative source.
  • Replaying all events on every read without snapshots. For aggregates with thousands of events, this causes unacceptable read latency. Implement snapshots from day one for any aggregate expected to accumulate more than a few hundred events.
Related Concepts

See Event Sourcing in action

Explore system design templates that use event sourcing and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Replay click events from an append-only event store

Metrics to watch
event_throughput_rpsreplay_time_msstorage_growth_gbprojection_lag_ms
Run Simulation
Test Your Understanding

1In an event-sourced system, how is the current state of an aggregate determined?

2What is the primary challenge of event schema evolution in event-sourced systems?

Deeper Reading