Vetora logo
๐Ÿ“จMessaging & Streaming

Pub/Sub vs Message Queues

Pub/Sub (publish-subscribe) and message queues are two fundamental asynchronous messaging patterns. Pub/Sub fans out every message to all subscribers, enabling event-driven architectures where producers are fully decoupled from consumers. Message queues deliver each message to exactly one consumer, enabling work distribution and load leveling. Choosing the right pattern shapes your system's coupling, scalability, and failure semantics.

Overview

Asynchronous messaging decouples producers from consumers in time and space: the sender does not block waiting for the receiver, and the sender need not know who (or how many) receivers will process the message. The two foundational patterns are point-to-point queues and publish-subscribe topics.

In a **point-to-point queue** (e.g., SQS, RabbitMQ default exchange, Celery), each message is delivered to exactly one consumer. Multiple consumers compete for messages, providing natural load balancing. Once a consumer acknowledges a message, it is removed from the queue. This pattern excels at work distribution: image processing pipelines, email sending, order fulfillment -- any workload where each unit of work should be processed once.

In **publish-subscribe** (e.g., SNS, Kafka topics, Redis Pub/Sub, Google Pub/Sub), each message is delivered to every subscriber. Publishers emit events without knowing who listens. New subscribers can be added without modifying the publisher. This pattern excels at event notification and domain event propagation: 'order placed' triggers inventory, shipping, analytics, and email simultaneously.

Kafka introduced a hybrid model: messages are published to a **topic** (pub/sub), but within each topic, **consumer groups** provide queue semantics -- each partition is consumed by exactly one member of the group. This allows both fan-out (multiple consumer groups) and load balancing (multiple consumers within a group) on the same stream of events.

The choice between pub/sub and queues is not binary. Most production systems combine both: an event bus (pub/sub) for domain events, feeding into dedicated queues for heavy processing tasks. The key question is: 'Should this message be processed by one consumer or by all interested consumers?'

Key Points
  • 1Point-to-point queues deliver each message to exactly one consumer. Competing consumers provide natural load balancing. SQS, RabbitMQ (default exchange), and Celery follow this pattern.
  • 2Pub/Sub broadcasts each message to all subscribers. Publishers are fully decoupled from consumers. SNS, Kafka topics (across consumer groups), Google Pub/Sub, and Redis Pub/Sub follow this pattern.
  • 3Kafka's consumer group model is a hybrid: pub/sub across groups (each group gets every message) plus queue semantics within a group (each partition assigned to one consumer). This enables both fan-out and parallel processing.
  • 4Queues naturally provide back-pressure: if consumers are slow, the queue grows, signaling the need to scale consumers. Pub/Sub typically does not apply back-pressure to publishers, so slow subscribers can fall behind.
  • 5Message ordering differs: queues typically offer FIFO ordering per queue (or per message group in SQS FIFO). Pub/Sub ordering depends on the system -- Kafka guarantees order per partition, while SNS provides no ordering guarantee.
  • 6Durability varies: SQS and Kafka durably persist messages. Redis Pub/Sub is fire-and-forget -- if no subscriber is connected, the message is lost. This is a critical distinction for reliability.
Simple Example

E-Commerce Order Events

When a customer places an order, the order service publishes an 'OrderPlaced' event to a Pub/Sub topic. The inventory service, shipping service, email service, and analytics service each subscribe and process the event independently. No queue contention -- every service gets every event. Separately, the email service uses a point-to-point queue internally: it enqueues each email to send, and a pool of 10 worker threads compete to dequeue and send emails, providing load balancing. The order event fans out (pub/sub), but the actual email sending is distributed (queue).

Real-World Examples

LinkedIn

LinkedIn uses Kafka as a unified pub/sub log for all system events. Activity events (profile views, connection requests, post interactions) are published to Kafka topics. Multiple consumer groups subscribe: the feed service, notification service, analytics pipeline, and anti-abuse system each consume independently. Within each consumer group, partitions are distributed across consumer instances for parallel processing.

Amazon

AWS offers SQS (point-to-point queue) and SNS (pub/sub) as separate services, often combined as 'SNS fan-out to SQS'. An SNS topic broadcasts events to multiple SQS queues, each consumed by a different microservice. This provides fan-out (every service gets the event) with per-service buffering and retry (each SQS queue is independent). It is the canonical AWS pattern for event-driven microservices.

Uber

Uber uses a combination of Kafka for high-throughput event streaming (trip events, location updates) and Cherami (their custom durable message queue) for task-oriented workloads. Trip events fan out to pricing, ETA, matching, and analytics via pub/sub. Individual tasks like sending push notifications or charging riders go through dedicated queues with competing consumers.

Trade-Offs
AspectDescription
Fan-Out vs Work DistributionPub/Sub sends every message to every subscriber -- ideal when multiple services need the same event (e.g., 'order placed' triggers inventory, email, analytics). Queues send each message to one consumer -- ideal when work should be done once (e.g., process this image, send this email). Using the wrong pattern causes either duplicate work or missed notifications.
Coupling vs FlexibilityPub/Sub fully decouples publishers from subscribers: new consumers can subscribe without any change to the publisher. Queues create a tighter contract: the producer puts messages in a specific queue that a specific consumer reads. Pub/Sub enables easier evolution but makes it harder to know who is consuming your events.
Back-Pressure and Flow ControlQueues naturally apply back-pressure: a growing queue signals that consumers need to scale up. Pub/Sub typically does not slow down publishers when subscribers are slow. In Kafka, a slow consumer group simply falls behind (consumer lag) without affecting other groups or the producer. In Redis Pub/Sub, slow subscribers miss messages entirely.
Ordering and Exactly-Once GuaranteesStrict ordering is easier in queues (single FIFO channel) than in pub/sub (messages may arrive at different subscribers in different orders). Exactly-once semantics are challenging in both but especially hard with fan-out because each subscriber must independently deduplicate. Kafka offers exactly-once within a consumer group via idempotent producers and transactional consumers.
Case Study

Shopify's Event-Driven Commerce Platform

Scenario

Shopify needed to decouple its monolithic order processing pipeline. When a merchant received an order, a synchronous chain of operations (inventory check, payment capture, tax calculation, shipping label generation, email notification) created cascading failures. A slow tax API would block payment capture, which would block the entire order.

Solution

Shopify adopted an event-driven architecture using Kafka topics. The order service publishes 'order.created' events to a Kafka topic. Each downstream service (inventory, payments, tax, shipping, notifications) runs as an independent consumer group. Within each group, multiple consumers process partitions in parallel. For critical operations like payment capture, a dedicated queue ensures exactly-once processing with idempotency keys.

Outcome

Order processing latency dropped from seconds to milliseconds for the merchant-facing API (which now just publishes and returns). Cascading failures were eliminated: a slow tax API only affects tax calculation, not payments or inventory. Adding new services (fraud detection, analytics, loyalty points) required zero changes to the order service. Throughput scaled linearly by adding partitions and consumers.

Common Mistakes
  • โš Using Redis Pub/Sub for critical events. Redis Pub/Sub is fire-and-forget: if no subscriber is connected when the message is published, it is permanently lost. Use it only for ephemeral notifications (cache invalidation, real-time dashboards). For durable events, use Kafka, SQS/SNS, or Google Pub/Sub.
  • โš Building a fan-out system with queues by publishing the same message to N queues. This tightly couples the producer to every consumer. Instead, use a pub/sub topic and let each consumer subscribe independently.
  • โš Assuming pub/sub guarantees delivery order across subscribers. Most pub/sub systems make no ordering guarantee across different subscribers. Even Kafka only guarantees order within a single partition. Design consumers to be order-independent or use partition keys to ensure related messages go to the same partition.
  • โš Ignoring consumer group rebalancing in Kafka. When a consumer joins or leaves a group, partitions are reassigned. During rebalancing, processing pauses. Use cooperative rebalancing (incremental cooperative) and keep consumer groups stable to minimize disruption.
Related Concepts

See Pub/Sub vs Message Queues in action

Explore system design templates that use pub/sub vs message queues and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Compare pub/sub fan-out vs queue-based delivery

Metrics to watch
delivery_latency_msthroughput_rpsconsumer_lagmessage_duplication_rate
Run Simulation
Test Your Understanding

1In Kafka, how do consumer groups combine pub/sub and queue semantics?

2Why is Redis Pub/Sub unsuitable for critical domain events?

Deeper Reading