Vetora logo
🔄Trade-Off Deep Dives

Push vs Pull

The push vs pull architecture trade-off determines whether data producers send data to consumers (push) or consumers request data from producers (pull). This decision affects latency, scalability, resource efficiency, and system coupling. Most production systems use a hybrid approach, and understanding when to use each pattern is essential for system design interviews.

Overview

The push vs pull trade-off is one of the most fundamental architectural decisions in distributed systems. In a push model, the producer of data actively sends updates to consumers as events occur -- think WebSocket connections, webhook callbacks, or message queue publishing. In a pull model, consumers periodically request data from the producer -- think HTTP polling, RSS feed readers, or batch ETL jobs. Each model has distinct strengths, and the right choice depends on latency requirements, consumer characteristics, and operational constraints.

Push architectures excel at low-latency delivery. When a new message arrives in a chat application, it should appear on all connected clients within milliseconds. Polling every second would introduce up to 1 second of latency and waste bandwidth on the many polls that return no new data. Push (via WebSocket or Server-Sent Events) eliminates this idle polling and delivers data the instant it is available. However, push requires the producer to maintain state about all consumers (which consumers exist, which are connected, what they have already received) and handle backpressure (what happens when a consumer cannot keep up with the push rate).

Pull architectures excel at decoupling and consumer autonomy. Each consumer decides when and how fast to consume data, naturally solving the backpressure problem. If a consumer is slow, it simply polls less frequently or processes smaller batches. The producer does not need to know about consumers or manage connections. This makes pull architectures operationally simpler and more resilient to consumer failures. Kafka's design exemplifies this: producers write to a log, and consumers pull at their own pace using their own offset -- a failing consumer does not affect the producer or other consumers.

Most production systems use a hybrid approach. Twitter's timeline uses fan-out-on-write (push to timeline caches) combined with fan-out-on-read (pull for celebrity accounts). Kafka uses pull-based consumption (consumers pull from partitions) with a push-based notification mechanism (consumer group rebalancing). Notification systems push urgent alerts but use pull for non-urgent summaries. The architectural skill is decomposing your system to apply push and pull where each is most appropriate.

Key Points
  • 1Push minimizes latency: data arrives at consumers the instant it is produced. This is essential for real-time applications (chat, live scores, stock tickers, collaborative editing). The trade-off is that the producer must manage consumer connections and handle failures.
  • 2Pull maximizes consumer autonomy: each consumer controls its consumption rate, naturally solving backpressure. This is ideal for heterogeneous consumers with different processing speeds (fast web servers, slow batch analytics, unreliable mobile clients).
  • 3Push requires producer-side consumer state: the producer must track which consumers exist, which are connected, and what they have received. This state management is complex at scale (millions of WebSocket connections) and creates a coupling between producer and consumers.
  • 4Pull wastes resources on empty polls: if data changes infrequently, most poll requests return 'no new data.' This wastes bandwidth, CPU, and can strain the producer. Long polling and conditional requests (ETags, If-Modified-Since) mitigate this but add complexity.
  • 5Kafka's design is brilliantly hybrid: producers push to a commit log (fire-and-forget), and consumers pull from the log at their own pace. The log decouples push speed from pull speed, and consumer offsets are consumer-managed. This gives push's durability with pull's consumer autonomy.
  • 6Backpressure is the critical operational concern for push architectures. If a producer pushes faster than a consumer can process, the consumer's buffer grows unboundedly, leading to memory exhaustion or data loss. Strategies include: bounded buffers with drop-oldest, flow control (TCP-like), and consumer-side rate limiting.
Simple Example

News Feed: Push vs Pull vs Hybrid

A social media feed needs to show new posts to followers. Pure pull: each client polls /feed every 5 seconds. With 10M active users, that is 2M requests/second -- mostly returning 'no new posts' (wasteful). P99 latency for new posts is 5 seconds. Pure push: when a user posts, push the post to all followers' WebSocket connections. A user with 1M followers triggers 1M pushes per post. If that user posts frequently, this overwhelms the push infrastructure. Hybrid (Twitter's approach): fan-out-on-write for users with <10K followers (push post ID to each follower's precomputed timeline in Redis). Fan-out-on-read for celebrities with >10K followers (merge their tweets at read time). Clients pull the precomputed timeline, which includes pushed posts. This hybrid gives low latency for most posts while handling high-follower accounts efficiently.

Real-World Examples

Slack

Slack uses WebSocket push for real-time message delivery. When a user sends a message, it is pushed to all connected members of the channel via their WebSocket connections. For offline users, messages are stored and delivered on the next connection (pull-on-connect). Slack also uses a pull mechanism for loading message history: scrolling up in a channel triggers paginated HTTP requests for older messages. The real-time channel is push; the history channel is pull. This hybrid approach provides sub-second message delivery for online users without maintaining push state for offline users.

GitHub

GitHub uses webhooks (push) to notify integrations about repository events (pushes, PRs, issues). When a commit is pushed, GitHub sends HTTP POST requests to all registered webhook URLs. However, webhook delivery is not guaranteed (the receiver may be down), so GitHub also provides a polling API where integrations can list events since a given timestamp (pull). GitHub Actions uses a pull-based model internally: runners poll for queued workflow jobs rather than having jobs pushed to them, allowing runners to pull work at their own capacity and enabling auto-scaling.

Apache Kafka

Kafka is a masterclass in hybrid push/pull design. Producers push records to Kafka brokers, which append them to partition logs. Consumers pull records from partitions at their own pace, tracking their position via offsets. A fast consumer processes records in real-time; a slow consumer processes in batch -- both use the same log. This separation of push (ingestion) and pull (consumption) is Kafka's key architectural insight. Consumer groups add coordination: when a consumer joins or leaves, partitions are rebalanced, but the pull model means a slow consumer only affects its own partition, not the entire pipeline.

Trade-Offs
AspectDescription
Latency vs Resource EfficiencyPush delivers data in milliseconds (bounded only by network latency). Pull introduces polling interval latency (if you poll every 10 seconds, average new-data latency is 5 seconds). Reducing the polling interval improves latency but increases wasted requests. Long polling bridges this gap: the server holds the connection open until data is available, achieving push-like latency with pull semantics, but each open connection consumes server resources.
Producer Complexity vs Consumer ComplexityPush shifts complexity to the producer: it must manage consumer registrations, connection state, delivery guarantees, and backpressure. Pull shifts complexity to the consumer: it must manage polling schedules, deduplication, and detecting missed data. Choose based on who can absorb the complexity. If you control the producer (internal service), push is manageable. If consumers are external (third-party integrations), pull or webhooks with retry are simpler for them.
Scalability CharacteristicsPush has O(consumers) cost per event: each event triggers one notification per consumer. A single event to 1M consumers means 1M pushes. Pull has O(polls) cost per interval: the cost depends on poll frequency, not event frequency. With 1M consumers polling every 10 seconds, the cost is 100K polls/sec regardless of whether there are events. Push scales better with low event frequency; pull scales better with many events per poll interval.
Coupling vs DecouplingPush creates tighter coupling: the producer must know about consumers and their connection state. If a consumer changes its interface, the producer may need to adapt. Pull creates loose coupling: the producer exposes a stable API, and consumers integrate independently. For microservices and external integrations, pull's loose coupling reduces coordination overhead. Message brokers (Kafka, RabbitMQ) provide push semantics with pull-like decoupling by interposing a durable buffer.
Case Study

Stripe Webhooks: Push with Pull Fallback

Scenario

Stripe needed to notify merchants about payment events (successful charges, refunds, disputes) in near-real-time. Merchants run diverse infrastructure: some have highly available webhook endpoints; others have unreliable hobby servers that go down regularly. A pure push (webhooks) approach would miss events when merchant servers are down. A pure pull (polling) approach would introduce unacceptable latency for time-sensitive events like payment confirmations.

Solution

Stripe implemented a push-primary, pull-secondary architecture. Primary: when a payment event occurs, Stripe pushes a webhook (HTTP POST with JSON payload) to the merchant's registered URL. Stripe retries failed deliveries with exponential backoff (up to 72 hours). Each webhook includes an idempotency key so merchants can safely process duplicates. Secondary: Stripe provides an Events API where merchants can list all events since a given timestamp (pull). Merchants are encouraged to periodically reconcile their state against the Events API to catch any missed webhooks. The webhook includes only the event type and ID -- merchants must call the API to fetch the full event data, adding a pull step that also serves as verification.

Outcome

The hybrid approach achieves sub-second notification for 99.9% of events (push via webhooks) while guaranteeing eventual delivery for 100% of events (pull via Events API reconciliation). Merchants with reliable infrastructure get near-real-time notifications. Merchants with unreliable infrastructure can rely on periodic polling. The idempotency key design means merchants can safely process both the push and the pull without double-processing. This pattern (push for speed, pull for reliability) has become the industry standard for payment and SaaS platform integrations.

Common Mistakes
  • Using short-interval polling when push is clearly better. Polling every 1 second for chat messages wastes 99% of requests (most polls return empty) and still has worse latency than a WebSocket push. If the data changes unpredictably and consumers need real-time updates, push (or long polling) is the right choice.
  • Using push without a backpressure mechanism. Pushing events to consumers without rate limiting or flow control can overwhelm slow consumers, causing buffer overflow, memory exhaustion, or cascading failures. Always design push systems with bounded buffers, drop policies, or consumer-driven flow control.
  • Ignoring the at-least-once delivery semantics of webhooks. Network failures, timeouts, and retries mean webhook events may be delivered multiple times. Consumers must be idempotent -- processing the same event twice should have no additional effect. Failing to design for idempotency causes duplicate charges, duplicate notifications, or corrupt state.
  • Overlooking the hybrid approach. Pure push and pure pull both have significant weaknesses. Most production systems benefit from a hybrid: push for real-time delivery, pull for reliability and catch-up. Kafka, Stripe, and GitHub all use hybrid architectures. Default to hybrid unless the use case is clearly pure push (real-time gaming) or pure pull (batch ETL).
Related Concepts

See Push vs Pull in action

Explore system design templates that use push vs pull and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Compare push notifications vs pull-based polling

Metrics to watch
delivery_latency_msbandwidth_usage_mbconnection_countcpu_utilization_pct
Run Simulation
Test Your Understanding

1A monitoring system collects metrics from 10,000 servers. Each server generates a metric every second. Should the monitoring system push or pull metrics?

2What is the main advantage of Kafka's pull-based consumer model over a traditional push-based message queue?

Deeper Reading