1How does a WebSocket connection begin?
WebSockets provide full-duplex, bidirectional communication over a single TCP connection, enabling real-time data exchange between clients and servers without the overhead of repeated HTTP request-response cycles.
WebSockets (RFC 6455) provide a standardized protocol for full-duplex, bidirectional communication between a client and server over a single, long-lived TCP connection. Unlike the traditional HTTP request-response model where the client must initiate every exchange, a WebSocket connection allows either side to send messages at any time once the connection is established. This eliminates the overhead of repeated HTTP headers, connection establishment, and the artificial constraint that the server can only respond to client-initiated requests. WebSockets are the backbone of real-time web applications including chat systems, live financial data feeds, collaborative editing tools, and multiplayer games.
The WebSocket protocol begins with an HTTP upgrade handshake. The client sends a standard HTTP request with an Upgrade: websocket header, and the server responds with a 101 Switching Protocols status code. After the handshake, the connection transitions from HTTP to the WebSocket protocol, and both sides can send frames -- text, binary, ping, pong, or close -- at any time. WebSocket frames have a minimal 2-14 byte header (compared to hundreds of bytes for HTTP headers), making them extremely efficient for high-frequency, small-message workloads. A WebSocket ping/pong mechanism serves as a heartbeat: either side can send a ping frame, and the other must respond with a pong, enabling detection of dead connections due to network failures or client crashes.
Scaling WebSockets presents unique challenges compared to stateless HTTP. Each WebSocket connection is a persistent, stateful TCP session that consumes memory (for send/receive buffers and application state), a file descriptor, and a slot in the server's connection table. A server handling 100,000 concurrent WebSocket connections might use 1-2 GB of memory for connection state alone. Load balancers must support connection-aware routing -- either sticky sessions (routing all traffic from a client to the same backend) or a pub/sub backbone (like Redis Pub/Sub or Kafka) that broadcasts messages to all backends so any server can deliver to any connected client. Horizontal scaling requires a mechanism to fan out messages across servers, typically through a centralized message bus or gossip protocol.
Alternatives to WebSockets exist for specific use cases. Server-Sent Events (SSE) provide a simpler, HTTP-based protocol for server-to-client push (but not client-to-server). SSE works over standard HTTP, is automatically reconnected by the browser, and passes through HTTP proxies and CDNs without special configuration. Long polling (holding HTTP requests open until data is available) serves as a fallback when WebSocket connections are blocked by corporate firewalls or proxies. For new applications, the choice between WebSockets and alternatives depends on whether full-duplex communication is genuinely needed or whether server-push alone (SSE) suffices.
The Walkie-Talkie vs Mail Carrier Analogy
HTTP is like communicating by mail: the client writes a letter (request), sends it, and waits for a reply (response). To check for new messages, the client must keep sending letters asking 'anything new?' WebSockets are like walkie-talkies: once both sides tune to the same channel (handshake), either person can talk at any time without waiting for the other. The channel stays open, there is no envelope overhead per message, and both sides immediately hear what the other says. The trade-off is that each active walkie-talkie channel uses a radio frequency (server resources), so you need enough frequencies for everyone talking simultaneously.
Slack
Slack uses WebSockets for real-time message delivery, typing indicators, presence updates, and channel notifications. Each connected client maintains a persistent WebSocket connection to Slack's edge servers. Slack's backend uses a pub/sub system to fan out messages from the sender's server to the WebSocket servers of all channel members. When WebSocket connections fail (corporate firewalls, proxy issues), Slack falls back to long polling automatically.
Binance
Binance, the cryptocurrency exchange, uses WebSockets to deliver real-time market data (order book updates, trade executions, price tickers) to over 1 million concurrent connections. Each market data update is a small JSON frame (typically 100-500 bytes) sent at high frequency (10-100 updates per second per symbol). Binance uses connection multiplexing: a single WebSocket connection can subscribe to multiple data streams, reducing per-connection overhead.
Figma
Figma uses WebSockets for real-time collaborative design editing, where multiple users edit the same design file simultaneously. Operations are sent as small binary WebSocket frames using an operation-based synchronization protocol (similar to OT or CRDTs). Figma's servers maintain document state and broadcast operations to all connected editors. The WebSocket connection is also used for cursor tracking and presence indicators showing who is viewing which part of the design.
| Aspect | Description |
|---|---|
| Real-Time Capability vs Server Resource Cost | WebSockets provide true real-time bidirectional communication but consume server resources proportional to the number of connected clients, not the number of active requests. A server with 500,000 idle WebSocket connections still uses memory and file descriptors for each, unlike HTTP where idle clients consume no server resources. |
| Protocol Efficiency vs Infrastructure Complexity | WebSocket frames have minimal overhead (2-14 bytes per frame vs hundreds for HTTP headers), but the infrastructure to support them is more complex: sticky sessions or pub/sub for load balancing, heartbeat monitoring for dead connection detection, and reconnection logic with backoff in clients. |
| Bidirectional Communication vs Proxy/Firewall Compatibility | WebSockets require HTTP Upgrade support from all intermediaries (proxies, load balancers, CDNs, firewalls). Some corporate environments block WebSocket connections or terminate them at the proxy. SSE and long polling pass through standard HTTP infrastructure without special configuration. |
| Connection Persistence vs Horizontal Scaling | WebSocket connections are stateful and tied to a specific server. Adding or removing backend servers requires connection migration or re-establishment. This makes rolling deployments and autoscaling more complex than with stateless HTTP, where requests can be routed to any server. |
Slack's WebSocket Architecture for Real-Time Messaging
Scenario
Slack needed to deliver messages in real-time to millions of concurrent users across thousands of channels. Each message sent in a channel must appear on every channel member's screen within milliseconds. Traditional HTTP polling would require each client to poll every few seconds, generating billions of empty requests per day and still introducing noticeable latency. The system also needed to handle typing indicators, presence updates, and read receipts -- all high-frequency, low-payload updates.
Solution
Slack established a persistent WebSocket connection for each connected client. When a user sends a message, it is written to the database and published to a distributed pub/sub system. Each WebSocket gateway server subscribes to channels relevant to its connected clients and pushes messages as WebSocket frames. For users behind corporate firewalls that block WebSocket upgrades, Slack falls back to long polling with a 30-second timeout. Heartbeat pings every 30 seconds detect dead connections and trigger cleanup. Exponential backoff with jitter prevents reconnection storms after server restarts.
Outcome
Slack achieved sub-100ms message delivery latency for 99% of messages. The WebSocket architecture reduced server load by over 90% compared to the polling approach it replaced, because only actual messages generate traffic instead of millions of empty poll responses. The long-polling fallback ensures universal connectivity even in restrictive network environments, and heartbeat-based cleanup keeps connection tables lean.
See WebSockets in action
Explore system design templates that use websockets and run traffic simulations to see how these concepts perform under real load.
Browse Templates1How does a WebSocket connection begin?
2Why is round-robin load balancing problematic for WebSocket connections?
3What is the purpose of WebSocket ping/pong frames?