Vetora logo
🔗Networking & Protocols

WebSockets

WebSockets provide full-duplex, bidirectional communication over a single TCP connection, enabling real-time data exchange between clients and servers without the overhead of repeated HTTP request-response cycles.

Overview

WebSockets (RFC 6455) provide a standardized protocol for full-duplex, bidirectional communication between a client and server over a single, long-lived TCP connection. Unlike the traditional HTTP request-response model where the client must initiate every exchange, a WebSocket connection allows either side to send messages at any time once the connection is established. This eliminates the overhead of repeated HTTP headers, connection establishment, and the artificial constraint that the server can only respond to client-initiated requests. WebSockets are the backbone of real-time web applications including chat systems, live financial data feeds, collaborative editing tools, and multiplayer games.

The WebSocket protocol begins with an HTTP upgrade handshake. The client sends a standard HTTP request with an Upgrade: websocket header, and the server responds with a 101 Switching Protocols status code. After the handshake, the connection transitions from HTTP to the WebSocket protocol, and both sides can send frames -- text, binary, ping, pong, or close -- at any time. WebSocket frames have a minimal 2-14 byte header (compared to hundreds of bytes for HTTP headers), making them extremely efficient for high-frequency, small-message workloads. A WebSocket ping/pong mechanism serves as a heartbeat: either side can send a ping frame, and the other must respond with a pong, enabling detection of dead connections due to network failures or client crashes.

Scaling WebSockets presents unique challenges compared to stateless HTTP. Each WebSocket connection is a persistent, stateful TCP session that consumes memory (for send/receive buffers and application state), a file descriptor, and a slot in the server's connection table. A server handling 100,000 concurrent WebSocket connections might use 1-2 GB of memory for connection state alone. Load balancers must support connection-aware routing -- either sticky sessions (routing all traffic from a client to the same backend) or a pub/sub backbone (like Redis Pub/Sub or Kafka) that broadcasts messages to all backends so any server can deliver to any connected client. Horizontal scaling requires a mechanism to fan out messages across servers, typically through a centralized message bus or gossip protocol.

Alternatives to WebSockets exist for specific use cases. Server-Sent Events (SSE) provide a simpler, HTTP-based protocol for server-to-client push (but not client-to-server). SSE works over standard HTTP, is automatically reconnected by the browser, and passes through HTTP proxies and CDNs without special configuration. Long polling (holding HTTP requests open until data is available) serves as a fallback when WebSocket connections are blocked by corporate firewalls or proxies. For new applications, the choice between WebSockets and alternatives depends on whether full-duplex communication is genuinely needed or whether server-push alone (SSE) suffices.

Key Points
  • 1The WebSocket handshake is an HTTP Upgrade request (101 Switching Protocols). After the handshake, the TCP connection switches to the WebSocket frame protocol with minimal 2-14 byte headers instead of full HTTP headers per message.
  • 2WebSocket frames come in five types: text (UTF-8), binary (arbitrary bytes), ping (heartbeat request), pong (heartbeat response), and close (connection teardown). Ping/pong is essential for detecting dead connections caused by network failures or NAT timeout.
  • 3Each WebSocket connection holds state: TCP socket, send/receive buffers, and application-level state. A server with 1 million connections needs careful memory management -- each connection typically uses 1-10 KB of memory depending on buffer configuration.
  • 4Load balancing WebSocket connections requires either sticky sessions (hash client IP or connection ID to a specific backend) or a pub/sub backbone (Redis Pub/Sub, Kafka) to fan messages across all backends. Standard round-robin load balancing breaks WebSocket routing.
  • 5Reconnection should use exponential backoff with jitter to prevent a thundering herd: if a server restarts, all clients reconnecting simultaneously can overwhelm the new server. Random jitter spreads reconnection attempts over time.
  • 6Server-Sent Events (SSE) is a simpler alternative when only server-to-client push is needed. SSE works over standard HTTP, supports automatic reconnection, and passes through CDNs and proxies without configuration changes.
Simple Example

The Walkie-Talkie vs Mail Carrier Analogy

HTTP is like communicating by mail: the client writes a letter (request), sends it, and waits for a reply (response). To check for new messages, the client must keep sending letters asking 'anything new?' WebSockets are like walkie-talkies: once both sides tune to the same channel (handshake), either person can talk at any time without waiting for the other. The channel stays open, there is no envelope overhead per message, and both sides immediately hear what the other says. The trade-off is that each active walkie-talkie channel uses a radio frequency (server resources), so you need enough frequencies for everyone talking simultaneously.

Real-World Examples

Slack

Slack uses WebSockets for real-time message delivery, typing indicators, presence updates, and channel notifications. Each connected client maintains a persistent WebSocket connection to Slack's edge servers. Slack's backend uses a pub/sub system to fan out messages from the sender's server to the WebSocket servers of all channel members. When WebSocket connections fail (corporate firewalls, proxy issues), Slack falls back to long polling automatically.

Binance

Binance, the cryptocurrency exchange, uses WebSockets to deliver real-time market data (order book updates, trade executions, price tickers) to over 1 million concurrent connections. Each market data update is a small JSON frame (typically 100-500 bytes) sent at high frequency (10-100 updates per second per symbol). Binance uses connection multiplexing: a single WebSocket connection can subscribe to multiple data streams, reducing per-connection overhead.

Figma

Figma uses WebSockets for real-time collaborative design editing, where multiple users edit the same design file simultaneously. Operations are sent as small binary WebSocket frames using an operation-based synchronization protocol (similar to OT or CRDTs). Figma's servers maintain document state and broadcast operations to all connected editors. The WebSocket connection is also used for cursor tracking and presence indicators showing who is viewing which part of the design.

Trade-Offs
AspectDescription
Real-Time Capability vs Server Resource CostWebSockets provide true real-time bidirectional communication but consume server resources proportional to the number of connected clients, not the number of active requests. A server with 500,000 idle WebSocket connections still uses memory and file descriptors for each, unlike HTTP where idle clients consume no server resources.
Protocol Efficiency vs Infrastructure ComplexityWebSocket frames have minimal overhead (2-14 bytes per frame vs hundreds for HTTP headers), but the infrastructure to support them is more complex: sticky sessions or pub/sub for load balancing, heartbeat monitoring for dead connection detection, and reconnection logic with backoff in clients.
Bidirectional Communication vs Proxy/Firewall CompatibilityWebSockets require HTTP Upgrade support from all intermediaries (proxies, load balancers, CDNs, firewalls). Some corporate environments block WebSocket connections or terminate them at the proxy. SSE and long polling pass through standard HTTP infrastructure without special configuration.
Connection Persistence vs Horizontal ScalingWebSocket connections are stateful and tied to a specific server. Adding or removing backend servers requires connection migration or re-establishment. This makes rolling deployments and autoscaling more complex than with stateless HTTP, where requests can be routed to any server.
Case Study

Slack's WebSocket Architecture for Real-Time Messaging

Scenario

Slack needed to deliver messages in real-time to millions of concurrent users across thousands of channels. Each message sent in a channel must appear on every channel member's screen within milliseconds. Traditional HTTP polling would require each client to poll every few seconds, generating billions of empty requests per day and still introducing noticeable latency. The system also needed to handle typing indicators, presence updates, and read receipts -- all high-frequency, low-payload updates.

Solution

Slack established a persistent WebSocket connection for each connected client. When a user sends a message, it is written to the database and published to a distributed pub/sub system. Each WebSocket gateway server subscribes to channels relevant to its connected clients and pushes messages as WebSocket frames. For users behind corporate firewalls that block WebSocket upgrades, Slack falls back to long polling with a 30-second timeout. Heartbeat pings every 30 seconds detect dead connections and trigger cleanup. Exponential backoff with jitter prevents reconnection storms after server restarts.

Outcome

Slack achieved sub-100ms message delivery latency for 99% of messages. The WebSocket architecture reduced server load by over 90% compared to the polling approach it replaced, because only actual messages generate traffic instead of millions of empty poll responses. The long-polling fallback ensures universal connectivity even in restrictive network environments, and heartbeat-based cleanup keeps connection tables lean.

Common Mistakes
  • Not implementing heartbeat ping/pong to detect dead connections. Without heartbeats, connections severed by network failures or NAT timeouts remain in the server's connection table indefinitely, consuming memory and file descriptors until the server runs out of resources.
  • Using round-robin load balancing for WebSocket connections. WebSocket connections are stateful -- a message intended for a client must reach the specific server that holds that client's connection. Either use sticky sessions or implement a pub/sub backbone to fan messages across all servers.
  • Reconnecting immediately after a connection drops without exponential backoff. If a server restarts, all clients reconnecting simultaneously create a thundering herd that can overwhelm the new server. Always use exponential backoff with random jitter for reconnection.
  • Choosing WebSockets when Server-Sent Events would suffice. If the application only needs server-to-client push (notifications, live feeds, dashboards), SSE is simpler, works over standard HTTP, supports automatic reconnection, and passes through CDNs and proxies without WebSocket-specific configuration.
Related Concepts

See WebSockets in action

Explore system design templates that use websockets and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Simulate WebSocket-based chat under concurrent load

Metrics to watch
websocket_connectionsmessage_delivery_latency_ms
Run Simulation
Test Your Understanding

1How does a WebSocket connection begin?

2Why is round-robin load balancing problematic for WebSocket connections?

3What is the purpose of WebSocket ping/pong frames?

Deeper Reading