Vetora logo
🔢Messaging & Streaming

Ordering Guarantees

Ordering guarantees determine whether messages are delivered and processed in the same order they were sent. Total ordering means all consumers see all messages in the same global sequence. Partition ordering means messages with the same key are ordered within a partition. No ordering means messages may arrive in any order. Stronger ordering guarantees reduce parallelism and throughput.

Overview

Message ordering is one of the most misunderstood aspects of distributed messaging. Engineers often assume messages will be processed in send order, but most messaging systems make no such guarantee by default -- and stronger guarantees come with significant performance costs.

**No ordering guarantee** means messages may arrive at consumers in any order, regardless of send order. SQS Standard queues provide 'best-effort ordering' which means roughly FIFO under normal conditions but with no guarantee under load. SNS provides no ordering guarantee across subscribers. This is the most common default and supports the highest throughput because messages can be distributed across any available consumer without coordination.

**Partition ordering** (also called partial ordering or per-key ordering) guarantees that messages with the same partition key are delivered in order, while messages with different keys may be delivered in any relative order. Kafka guarantees ordering within a partition: all messages with the same key go to the same partition and are consumed in offset order. SQS FIFO queues guarantee ordering within a message group: messages with the same MessageGroupId are delivered in strict FIFO order, while different group IDs are independent. This is the practical sweet spot for most systems.

**Total ordering** guarantees that all consumers see all messages in the same global sequence. This requires a single partition or a single queue with a single consumer. Kafka achieves total ordering with a single-partition topic, but this limits throughput to one consumer per group. Total ordering across multiple partitions requires a coordination service (e.g., ZooKeeper-based sequencer), which adds latency and reduces throughput.

**Causal ordering** is a middle ground: messages that are causally related (A caused B) are delivered in causal order, while unrelated messages may arrive in any order. This is weaker than total ordering but stronger than no ordering. Vector clocks or Lamport timestamps can enforce causal ordering without the throughput penalty of total ordering.

The fundamental tension: stronger ordering requires more coordination, which reduces parallelism and throughput. The key insight is to order only what needs to be ordered (per-entity state transitions) and leave everything else unordered for maximum throughput.

Key Points
  • 1No ordering (SQS Standard, SNS): messages may arrive in any order. Highest throughput. Suitable when order does not matter (e.g., analytics events, notifications).
  • 2Partition ordering (Kafka, SQS FIFO with message groups): messages with the same key are ordered within their partition/group. Different keys are independent. Best trade-off for most use cases.
  • 3Total ordering (single partition, single consumer): all messages globally ordered. Limits throughput to one consumer. Only needed when cross-entity ordering matters.
  • 4Kafka's ordering model: per-partition FIFO. Messages with the same key go to the same partition via hash(key) % partitions. Within a partition, ordering is strict. Across partitions, no ordering guarantee.
  • 5SQS FIFO message groups: each MessageGroupId is an independent FIFO lane. Messages in different groups are unordered relative to each other. Throughput scales linearly with the number of distinct groups.
  • 6Reordering can occur even with FIFO systems if a consumer processes messages in parallel or if retries deliver messages out of sequence. True FIFO processing requires serial consumption per partition/group.
Simple Example

Bank Account Transaction Ordering

A bank account receives three transactions: (1) Deposit $100, (2) Withdraw $80, (3) Deposit $50. Processing in order: balance goes 0 → 100 → 20 → 70. If processed out of order as (2), (1), (3): the withdrawal fails (insufficient funds) when the balance is still 0. Partition ordering with account_id as the partition key ensures all transactions for the same account are processed in order, while transactions for different accounts are processed in parallel across partitions.

Real-World Examples

Kafka (LinkedIn)

LinkedIn uses Kafka partition ordering for all user activity streams. The user_id is the partition key, ensuring all events for a given user (profile views, connections, posts) arrive in chronological order. This is critical for the feed ranking algorithm, which needs to know the sequence of interactions. Across users, events are unordered -- the feed service processes different users' events in parallel.

Amazon SQS FIFO

Amazon uses SQS FIFO queues for order processing. Each order's OrderId is the MessageGroupId, ensuring state transitions (placed → paid → shipped → delivered) for a single order are processed sequentially. Different orders are processed in parallel because they have different group IDs. This gives per-order FIFO with multi-order parallelism.

Apache Pulsar

Apache Pulsar offers three ordering modes: per-partition (like Kafka), per-key (finer-grained -- keys within the same partition can be reordered), and exclusive (total ordering with a single active consumer). Pulsar's per-key ordering allows higher parallelism than per-partition by tracking ordering per key rather than per physical partition.

Trade-Offs
AspectDescription
Ordering vs ThroughputTotal ordering limits you to one consumer. Partition ordering allows N consumers (one per partition). No ordering allows unlimited horizontal scaling. If you need 100K msg/sec and strict total order, you have a fundamental design conflict -- redesign to use per-entity ordering instead.
Ordering vs LatencyMaintaining order may require head-of-line blocking: a slow message blocks all subsequent messages in the same partition. If message 5 takes 30 seconds, messages 6-100 wait. Unordered processing allows messages to be processed independently, reducing tail latency.
Partition Key DesignChoosing the right partition key is critical. Too broad (e.g., region_id): hot partitions, poor parallelism. Too narrow (e.g., request_id): no ordering benefit, uniform distribution. The sweet spot is the business entity that needs ordering: user_id, order_id, account_id.
Reordering Despite FIFOEven with FIFO delivery, consumers can break ordering by processing messages in parallel threads, or by nacking and requeuing messages. True FIFO requires single-threaded serial processing per partition. Some systems (Kafka, Pulsar) enforce this by assigning each partition to exactly one consumer thread.
Case Study

Confluent's Ordering vs Throughput Analysis for Financial Services

Scenario

A financial services company needed to process 500K stock trade events per second while maintaining strict ordering per stock symbol. Initially, they used a single Kafka partition for total ordering, which maxed out at 50K msg/sec -- 10x below their requirement. Increasing partitions would break cross-symbol ordering.

Solution

They realized that cross-symbol ordering was unnecessary: AAPL trades do not need to be ordered relative to GOOGL trades. Using the stock ticker as the partition key with 256 partitions gave per-symbol ordering (all AAPL trades in the same partition, in order) while distributing load across 256 partitions. Hot symbols (SPY, AAPL) got their own partitions; less-active symbols shared partitions.

Outcome

Throughput increased from 50K to 800K msg/sec. Per-symbol ordering was maintained for regulatory compliance. Processing latency dropped from 200ms (single-partition bottleneck) to 5ms (parallel processing). The key insight: almost no system needs total ordering -- per-entity ordering with the right partition key gives both correctness and performance.

Common Mistakes
  • Assuming SQS Standard provides FIFO ordering. SQS Standard provides 'best-effort' ordering which means roughly FIFO under low load but with no guarantee. Under high throughput, messages may arrive out of order. Use SQS FIFO with message groups for guaranteed ordering.
  • Using a single Kafka partition for total ordering when per-key ordering suffices. A single partition limits throughput to one consumer. Ask: 'Do events for DIFFERENT entities need to be ordered relative to each other?' Almost always the answer is no.
  • Processing messages in parallel within a single partition consumer. If your consumer hands messages to a thread pool, ordering is lost even though Kafka delivered them in order. Process each partition's messages serially, or use per-message locking.
  • Choosing the wrong partition key. Using a high-cardinality but ordering-irrelevant key (request_id) gives uniform distribution but no ordering benefit. Using a low-cardinality key (region) creates hot partitions. Choose the business entity that needs ordering.
Related Concepts

See Ordering Guarantees in action

Explore system design templates that use ordering guarantees and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Watch partition-key ordering under consumer rebalancing

Metrics to watch
out_of_order_pctrebalance_duration_msconsumer_lagthroughput_rps
Run Simulation
Test Your Understanding

1What is the practical trade-off of requiring total message ordering?

2In Kafka, what guarantees ordering for messages with the same key?

Deeper Reading