Vetora logo
Data Engineering

Exactly-Once Semantics

Exactly-once semantics guarantees that each record in a data pipeline is processed exactly one time, producing the same result as if no failures occurred. Achieving this in distributed systems requires coordinating source offsets, processing state, and sink writes atomically -- a challenge solved by techniques like idempotent producers, transactional sinks, and distributed snapshots.

Overview

In distributed data processing, failures are not exceptional -- they are routine. Network partitions, process crashes, disk failures, and deployment restarts interrupt processing regularly. When a failure occurs, the system must decide: replay the failed records (risking duplicates) or skip them (risking data loss). The three delivery semantics -- at-most-once, at-least-once, and exactly-once -- represent different answers to this trade-off.

**At-most-once**: the producer sends each message once and does not retry on failure. Simple and fast, but messages can be permanently lost. Suitable only for non-critical data like debug logs or ephemeral metrics. **At-least-once**: the producer retries until the consumer acknowledges receipt. No data loss, but the same message may be delivered multiple times. This is the default for most messaging systems (Kafka, SQS, RabbitMQ). **Exactly-once**: each message is processed exactly one time, producing the same result as if no failures occurred. This is what financial systems, billing pipelines, and inventory management require.

True exactly-once in a distributed system requires **atomically coordinating three things**: (1) the source offset (which records have been consumed), (2) the processing state (aggregation counters, window contents), and (3) the sink output (what has been written downstream). If any of these is committed without the others, inconsistency results. Flink solves this with the **Chandy-Lamport distributed snapshot** algorithm: a barrier flows through the data stream, and when all operators have received the barrier, a consistent checkpoint of all state and source offsets is atomically committed. On failure, the system restores from the last checkpoint and replays from the saved source offset.

In practice, many systems achieve effectively exactly-once through **at-least-once delivery + idempotent processing**. The source retries messages (ensuring none are lost), and the consumer uses idempotency keys, upsert semantics, or deduplication to ensure that processing a message twice produces the same result as processing it once. This is simpler to implement than true exactly-once and sufficient for most use cases.

Key Points
  • 1At-most-once: send once, no retry. Fast but lossy. Suitable for non-critical metrics and logs where occasional data loss is acceptable.
  • 2At-least-once: retry until acknowledged. No data loss but possible duplicates. Default for Kafka (acks=all with retries), SQS, and most message brokers. Requires downstream deduplication or idempotent processing.
  • 3Exactly-once: each record processed exactly once despite failures. Requires coordinating source offsets, operator state, and sink writes atomically. Kafka's transactional API and Flink's distributed snapshots are the primary implementations.
  • 4Kafka achieves exactly-once with idempotent producers (each message has a sequence number; the broker deduplicates retried messages) and transactional consumers (read-process-write cycles are committed atomically: consumer offsets and produced output messages are committed in a single Kafka transaction).
  • 5Flink achieves exactly-once state consistency via the Chandy-Lamport algorithm: checkpoint barriers flow through the data stream, triggering consistent snapshots of all operator state. On failure, Flink restores from the last checkpoint and replays from the corresponding Kafka offset. The two-phase commit sink protocol extends exactly-once to external systems.
  • 6Idempotent processing is the pragmatic approach to exactly-once: design sinks to produce the same result whether a record is processed once or multiple times. Techniques include upsert (INSERT ON CONFLICT UPDATE), deduplication tables (track processed message IDs), and deterministic output keys.
Simple Example

Exactly-Once Payment Processing

A payment service consumes payment-request events from a Kafka topic. Each event has a unique payment_id. The consumer debits the payer's account and credits the payee's account. Without exactly-once, a retry after a crash could debit the payer twice. The system achieves exactly-once through idempotency: before processing, it checks a deduplication table (payment_id -> status). If the payment_id already exists, the message is skipped. If not, the consumer executes the debit, credit, and deduplication-table insert in a single database transaction, then commits the Kafka offset. If the consumer crashes after the database commit but before the Kafka commit, the message is redelivered -- but the deduplication table prevents double-processing.

Real-World Examples

Confluent / Apache Kafka

Kafka introduced exactly-once semantics (EOS) in version 0.11 (2017) with two mechanisms: (1) Idempotent producers assign a sequence number to each message; the broker rejects duplicates from retries, ensuring each message is written to the log exactly once. (2) Transactional producers/consumers wrap read-process-write cycles in transactions: the consumer reads from input topics, processes records, writes to output topics, and commits consumer offsets -- all atomically. If any step fails, the entire transaction is aborted and retried. This enables exactly-once stream processing within the Kafka ecosystem.

Alibaba (Apache Flink)

Alibaba's Flink deployment processes 40+ billion events per second during Singles' Day with exactly-once state consistency. Flink's checkpoint mechanism takes consistent distributed snapshots every 10 seconds, storing operator state and Kafka offsets to HDFS/S3. On failure, the affected operators restore from the last checkpoint and replay events from the saved Kafka offset. Their exactly-once Kafka sink uses Flink's two-phase commit protocol: data is pre-committed during the checkpoint and finalized when the checkpoint completes.

Stripe

Stripe achieves exactly-once payment processing through application-level idempotency. Every API request includes an Idempotency-Key header. Stripe's backend stores the result of the first request with that key and returns the cached result for any retries. This means clients can safely retry failed payment requests without risk of double-charging. Internally, payment state machines use deterministic transitions -- processing the same event twice has no additional effect because the state machine is already in the target state.

Trade-Offs
AspectDescription
Exactly-Once vs PerformanceExactly-once semantics add overhead: Kafka's transactional producer batches messages into transactions (adding commit latency of ~100ms), Flink's checkpointing pauses processing during barrier alignment (milliseconds to seconds depending on state size), and idempotent sinks require deduplication lookups on every write. At-least-once with idempotent processing is typically 10-30% faster and simpler to operate.
End-to-End vs Internal Exactly-OnceKafka and Flink guarantee exactly-once within their own systems (Kafka topic to Kafka topic, or Flink state). But end-to-end exactly-once (from external source through processing to external sink) requires the sink to support atomic writes or idempotent upserts. Writing to a non-transactional sink (e.g., sending an email) cannot be made exactly-once by the framework alone -- you need application-level deduplication.
Complexity vs CorrectnessExactly-once adds significant complexity: transactional producers, two-phase commit sinks, checkpoint tuning, and failure recovery testing. For non-critical pipelines (log aggregation, approximate analytics), at-least-once with approximate deduplication (e.g., HyperLogLog for unique counts) is much simpler and 'close enough.' Reserve exactly-once for financial, billing, and inventory pipelines where correctness is non-negotiable.
Case Study

Flink Exactly-Once at Uber for Financial Reconciliation

Scenario

Uber's financial reconciliation pipeline matches trip charges to rider payments, driver payouts, and platform fees. Any inconsistency -- a duplicate charge, a missed payout -- directly impacts riders and drivers. Their initial at-least-once pipeline produced duplicate records during failure recovery, requiring expensive manual reconciliation. At peak, the pipeline processed millions of financial events per hour across 1,000+ Kafka partitions.

Solution

Uber deployed a Flink pipeline with exactly-once checkpointing. Flink checkpoints were configured every 30 seconds, storing operator state (running reconciliation balances per trip) to S3. The Kafka source used committed offsets aligned with checkpoints. The sink wrote to an Iceberg table using Flink's two-phase commit protocol: data was pre-committed during the Flink checkpoint and atomically committed when the checkpoint completed. An end-of-day batch job verified reconciliation totals against the source of truth.

Outcome

Duplicate financial records dropped from ~0.1% (thousands per day) to effectively zero. Manual reconciliation effort was eliminated, saving the finance operations team hundreds of hours per month. Recovery time from failures dropped from 30+ minutes (manual intervention) to under 2 minutes (automatic checkpoint recovery). The exactly-once guarantee was verified by the end-of-day reconciliation job, which consistently showed zero discrepancies.

Common Mistakes
  • Assuming exactly-once delivery exists at the network level. Networks can duplicate, reorder, or lose packets. Exactly-once semantics are always built on top of at-least-once delivery (retries) plus deduplication (idempotent processing or transactional commits). Do not confuse delivery guarantees with processing guarantees.
  • Enabling Kafka exactly-once (transactional producer) without understanding the performance impact. Transactions add ~100ms commit latency and reduce throughput because messages are not visible to consumers until the transaction commits. For high-throughput, latency-sensitive pipelines, at-least-once with idempotent sinks may be a better trade-off.
  • Implementing deduplication with an in-memory set of processed message IDs. This loses deduplication state on restart, allowing duplicates immediately after recovery. Store deduplication keys in durable storage (database, Redis with persistence, or Flink's checkpointed state) with a TTL to bound storage growth.
  • Forgetting that exactly-once in Flink only covers state and Kafka offsets -- it does not automatically extend to external sinks. If your Flink job writes to PostgreSQL, you need either an idempotent upsert sink or a two-phase commit sink that coordinates with Flink's checkpoints. A plain JDBC INSERT will produce duplicates on checkpoint recovery.
Related Concepts

See Exactly-Once Semantics in action

Explore system design templates that use exactly-once semantics and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Verify exactly-once click counting with transactional processing

Metrics to watch
duplicate_counttransaction_overhead_mscheckpoint_interval_msthroughput_rps
Run Simulation
Test Your Understanding

1How do most production systems achieve exactly-once processing in practice?

2What does Flink's Chandy-Lamport checkpointing algorithm coordinate to achieve exactly-once state consistency?

Deeper Reading