What is important about Long Polling vs Short Polling regarding "Short polling sends requests at fixed intervals regardless o..."?

Short polling sends requests at fixed intervals regardless of whether data has changed. It is simple to implement but generates high server load from empty responses and introduces latency proportional to the polling interval.

What is important about Long Polling vs Short Polling regarding "Long polling holds the HTTP request open until data is avail..."?

Long polling holds the HTTP request open until data is available or a timeout occurs. This reduces empty responses to near zero and provides near-real-time data delivery, at the cost of tying up server connections and threads.

What is important about Long Polling vs Short Polling regarding "Long polling timeout must be shorter than load balancer and ..."?

Long polling timeout must be shorter than load balancer and proxy timeouts. If the load balancer times out the connection before the server responds, the client sees an error instead of a controlled empty response. Typical values: server timeout 30s, load balancer timeout 60s.

What is important about Long Polling vs Short Polling regarding "Each long-poll connection occupies a server thread (in threa..."?

Each long-poll connection occupies a server thread (in thread-per-connection models) or a connection slot (in async models). A server holding 50,000 long-poll connections needs either an event-driven architecture (Node.js, Netty, Go) or a very large thread pool.

What is important about Long Polling vs Short Polling regarding "The thundering herd problem applies to long polling: when da..."?

The thundering herd problem applies to long polling: when data arrives for many waiting clients simultaneously, the server must process all responses and accept all new long-poll requests at once, creating a spike in CPU and network usage.

What is important about Long Polling vs Short Polling regarding "Short polling is self-healing: if the server crashes, the ne..."?

Short polling is self-healing: if the server crashes, the next poll request will fail and the client can retry. Long polling connections in flight during a crash are lost, and clients must detect the failure and reconnect.

Vetora

🔄Networking & Protocols

Long Polling vs Short Polling

Short polling sends periodic HTTP requests at fixed intervals, while long polling holds requests open until data is available or a timeout occurs. Both are server push strategies that work over standard HTTP without WebSocket support.

Overview

Polling is the simplest mechanism for a client to receive updates from a server. In a standard HTTP request-response model, the server cannot initiate communication -- the client must ask for new data. Short polling and long polling are two strategies for the client to discover new data as quickly as possible while working within the constraints of HTTP. Understanding both patterns, their trade-offs, and their place relative to WebSockets and Server-Sent Events is important for system design because many real-world systems use polling either as a primary mechanism or as a fallback.

Short polling is the most straightforward approach: the client sends an HTTP request to the server at a fixed interval (e.g., every 5 seconds), and the server responds immediately with the current state or an empty response if nothing has changed. The client processes the response and waits for the interval to elapse before polling again. This is trivial to implement -- it is just a timer and a standard HTTP request -- but wasteful. In a typical application, the vast majority of poll responses contain no new data. If 1 million clients poll every 5 seconds, the server handles 200,000 requests per second, most returning empty responses. The polling interval also introduces a latency floor: on average, new data is delayed by half the polling interval, and in the worst case by the full interval.

Long polling (also known as the Comet pattern) improves on short polling by eliminating most empty responses. The client sends an HTTP request, and instead of responding immediately, the server holds the request open until either: (1) new data is available, in which case the server responds with the data, or (2) a timeout expires (typically 30-60 seconds), in which case the server responds with an empty body. Upon receiving a response (either with data or a timeout), the client immediately sends a new long-poll request. This approach delivers data with near-zero latency (as fast as the server can detect and respond) and generates far fewer empty responses than short polling. However, long polling ties up a server connection (and often a thread) for each waiting client, creating scalability challenges for high-concurrency scenarios.

The choice between polling, long polling, WebSockets, and SSE depends on update frequency, infrastructure constraints, and scalability requirements. Short polling is appropriate for low-frequency updates (every 30-60 seconds) where simplicity is paramount -- dashboards, status pages, and periodic data refreshes. Long polling is appropriate for near-real-time updates over standard HTTP infrastructure, especially as a fallback when WebSockets are blocked by corporate firewalls. WebSockets are superior for high-frequency bidirectional communication (chat, gaming, live trading). SSE is the best choice for server-to-client-only push over standard HTTP. Many production systems implement a hierarchy: WebSocket as primary, long polling as fallback, and short polling as a last resort.

Key Points

1Short polling sends requests at fixed intervals regardless of whether data has changed. It is simple to implement but generates high server load from empty responses and introduces latency proportional to the polling interval.
2Long polling holds the HTTP request open until data is available or a timeout occurs. This reduces empty responses to near zero and provides near-real-time data delivery, at the cost of tying up server connections and threads.
3Long polling timeout must be shorter than load balancer and proxy timeouts. If the load balancer times out the connection before the server responds, the client sees an error instead of a controlled empty response. Typical values: server timeout 30s, load balancer timeout 60s.
4Each long-poll connection occupies a server thread (in thread-per-connection models) or a connection slot (in async models). A server holding 50,000 long-poll connections needs either an event-driven architecture (Node.js, Netty, Go) or a very large thread pool.
5The thundering herd problem applies to long polling: when data arrives for many waiting clients simultaneously, the server must process all responses and accept all new long-poll requests at once, creating a spike in CPU and network usage.
6Short polling is self-healing: if the server crashes, the next poll request will fail and the client can retry. Long polling connections in flight during a crash are lost, and clients must detect the failure and reconnect.

Simple Example

The Receptionist Desk Analogy

Imagine waiting for a package at a receptionist desk. Short polling is like walking up to the desk every 5 minutes and asking 'has my package arrived?' Most trips, the answer is 'not yet,' and you have wasted the trip. Long polling is like walking up and saying 'I will wait here until my package arrives or 30 minutes pass.' The receptionist holds you at the desk and hands you the package the moment it arrives, or sends you away after 30 minutes to come back and wait again. Long polling gets you the package faster with fewer wasted trips, but you are occupying a spot at the desk the entire time.

Real-World Examples

Facebook

Facebook's original chat system (2008) used long polling with the Comet pattern. Each active user held an open HTTP connection to Facebook's chat servers. When a message arrived, the server responded immediately, and the client reconnected. At peak, Facebook's Comet servers handled millions of concurrent long-poll connections. Facebook eventually migrated to MQTT (a lightweight pub/sub protocol) and later to WebSockets for better efficiency at their scale.

Slack

Slack uses long polling as a fallback mechanism when WebSocket connections cannot be established due to corporate firewalls, HTTP proxies, or restrictive network configurations. The long-polling fallback maintains the same API contract as the WebSocket channel: the client receives the same event payloads, just delivered via HTTP responses instead of WebSocket frames. This ensures the user experience degrades gracefully rather than failing entirely.

Atlassian JIRA

JIRA uses short polling for dashboard and board refresh functionality. When viewing a JIRA board, the client polls the server every 30-60 seconds for updated issue statuses, new comments, and workflow transitions. Short polling is appropriate here because updates are infrequent, real-time delivery is not critical for project management data, and the simplicity of periodic polling reduces client-side complexity.

Trade-Offs

Aspect	Description
Latency vs Server Load	Short polling has latency proportional to the interval (average: half the interval). Decreasing the interval improves latency but linearly increases server load. Long polling provides near-zero latency but ties up a server connection for each waiting client. The trade-off is between predictable periodic load (short) and event-driven load with persistent connections (long).
Implementation Simplicity vs Efficiency	Short polling is trivially implemented: setInterval + fetch. Long polling requires careful server-side connection management, timeout handling, and client reconnection logic. However, short polling wastes 80-95% of requests on empty responses, while long polling only generates responses when data exists or timeouts expire.
Infrastructure Compatibility vs Real-Time Performance	Both polling methods work over standard HTTP and pass through any proxy, CDN, or firewall without special configuration. WebSockets and SSE require infrastructure support (Upgrade headers, persistent connections). When infrastructure is restrictive, polling is the only option, but it cannot match WebSocket latency or efficiency.
Scalability Model Difference	Short polling scales with request rate: each poll is a stateless HTTP request handled and forgotten. Long polling scales with concurrent connections: each client occupies a persistent connection. A server with an event-driven architecture (Node.js, Go) handles long polling efficiently, but thread-per-connection servers (traditional Java servlets) struggle with high concurrency.

Case Study

Facebook Chat's Comet Architecture (2008)

Scenario

Facebook launched its in-app chat feature in 2008 and needed near-real-time message delivery to hundreds of millions of users. WebSockets did not exist yet (RFC 6455 was published in 2011), and short polling at the frequency needed for chat (every 1-2 seconds) would have generated billions of requests per hour, mostly empty. Facebook needed a solution that provided real-time feel over standard HTTP infrastructure.

Solution

Facebook implemented long polling using the Comet pattern. Each active chat user maintained a persistent HTTP connection to a pool of Comet servers. When User A sent a message to User B, the message was written to a channel server and the Comet server holding User B's connection was notified via an internal message bus. The Comet server immediately responded to User B's pending long-poll request with the new message, and User B's client immediately reconnected with a new long-poll request. Connections timed out after 60 seconds if no messages arrived, and the client immediately reconnected.

Outcome

Facebook Chat delivered messages with sub-second latency to hundreds of millions of users using only HTTP infrastructure. The Comet approach reduced server load by approximately 90% compared to the short-polling approach they prototyped. However, holding millions of concurrent HTTP connections created significant memory pressure on the Comet servers. Facebook eventually migrated to MQTT and then WebSockets as these technologies matured, but the long-polling architecture served as the foundation for one of the world's largest real-time messaging systems for several years.

Common Mistakes

⚠Setting the short-polling interval too aggressively (e.g., every 500ms) for non-real-time data. Polling a dashboard every half-second generates enormous server load with negligible user benefit. Match the interval to how quickly users actually need updates.
⚠Forgetting to set the long-polling server timeout shorter than the load balancer timeout. If the load balancer closes the connection at 30s but the server holds it for 60s, clients receive unexpected connection errors instead of controlled empty responses.
⚠Using thread-per-connection servers for long polling at scale. Traditional Java servlet containers create a thread for each HTTP request. Holding 50,000 long-poll connections means 50,000 threads, which is unsustainable. Use async I/O (Netty, Node.js, Go goroutines) for long polling.
⚠Not implementing reconnection backoff for long polling. If the server restarts, all clients receive errors simultaneously and reconnect immediately, creating a thundering herd. Exponential backoff with jitter spreads reconnection attempts over several seconds.

Related Concepts

WebSockets HTTP/1.1 vs HTTP/2 vs HTTP/3 Rate Limiting Stateless Service Design Chat System Design

See Long Polling vs Short Polling in action

Explore system design templates that use long polling vs short polling and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Compare polling strategies for real-time message delivery

Metrics to watch

message_latency_msrequest_rate_rpsbandwidth_usage_mbcpu_utilization_pct

Run Simulation

Test Your Understanding

1What is the primary advantage of long polling over short polling?

2Why must the long-polling server timeout be shorter than the load balancer timeout?

3When is short polling a better choice than WebSockets?

Deeper Reading