1Why are server-assigned sequence numbers preferred over client timestamps for message ordering in a chat system?
A thorough interview walkthrough for designing a real-time chat system like WhatsApp or Slack. Covers WebSocket transport, message storage and ordering, delivery guarantees, presence, group chat, and offline message delivery.
Designing a real-time chat system is a classic system design interview question that tests your understanding of bidirectional communication protocols, delivery guarantees, distributed state management, and real-time infrastructure. The problem spans from low-level transport decisions to high-level architectural patterns, making it a rich playground for demonstrating breadth and depth.
The transport layer is the first key decision. HTTP is request-response and inherently unidirectional from the client's perspective -- the server cannot push messages to the client without the client polling. WebSocket provides a full-duplex, persistent connection between client and server, allowing the server to push new messages to the client the instant they arrive. Server-Sent Events (SSE) is a lighter-weight alternative that supports server-to-client push but not client-to-server messages, making it less suitable for chat. For a chat system, WebSocket is the standard choice. The client establishes a WebSocket connection on app launch and keeps it open for the session. The server maintains a mapping of user_id to WebSocket connection, enabling targeted message delivery.
Message storage requires careful schema design. Messages are stored in a table partitioned by conversation_id to ensure that all messages in a conversation are co-located for efficient retrieval. Each message has a unique message_id (UUID or Snowflake ID), the conversation_id, sender_id, content, timestamp, and delivery status. Message ordering within a conversation is critical and subtle. Client-generated timestamps are unreliable because device clocks drift. Instead, the server assigns a monotonically increasing sequence number per conversation when it receives a message. This guarantees a total ordering within each conversation regardless of clock skew. For multi-server deployments, a per-conversation sequence generator (backed by a database counter or a distributed sequence service) ensures consistent ordering.
Delivery guarantees are fundamental to a chat system. Users expect that messages they send will be delivered, and delivered exactly once. The system provides at-least-once delivery: when a server receives a message, it persists it, then pushes it to the recipient's WebSocket connection. The recipient's client sends an acknowledgment (ACK) back. If the server does not receive an ACK within a timeout, it retries delivery. This can cause duplicate deliveries if the ACK was lost in transit, so the client must deduplicate by message_id. The combination of server-side retry and client-side dedup achieves effectively-once delivery semantics.
Offline message delivery handles the case where the recipient is not connected when a message is sent. The system stores undelivered messages in a persistent queue (or simply marks them as undelivered in the messages table). When the recipient reconnects, the server pushes all undelivered messages in order. The presence system determines whether a user is online by tracking WebSocket connection state and periodic heartbeats. A user is considered online if the server has received a heartbeat within the last N seconds (typically 30-60 seconds). Presence information is published to subscribers (friends, conversation members) so that the UI can show online/offline status indicators.
The Walkie-Talkie vs Postal Mail Analogy
HTTP-based chat is like postal mail: you send a letter and then keep checking your mailbox for a reply (polling). WebSocket-based chat is like a walkie-talkie: once the channel is open, either party can speak at any time and the other hears instantly. The walkie-talkie channel stays open (persistent connection) so there is no delay in establishing communication each time. If the other person's walkie-talkie is off (they are offline), the message is stored at the base station (message queue) and delivered when they turn on their device and check in.
WhatsApp originally built its backend on Erlang, chosen for its ability to handle millions of concurrent lightweight processes (one per WebSocket connection). A single server handled up to 2 million simultaneous connections. Messages use end-to-end encryption via the Signal Protocol, meaning WhatsApp servers never see plaintext message content. Message delivery uses a store-and-forward model: messages are persisted until the recipient acknowledges receipt, then deleted from the server.
Discord
Discord handles over 4 billion messages per day across millions of servers (group chats). Their infrastructure evolved from a Python monolith to an Elixir-based gateway service for WebSocket connections and a Rust-based data services layer for performance-critical paths. Discord uses eventual consistency for message delivery: messages are written to a Cassandra cluster partitioned by channel_id and time bucket, with a ScyllaDB migration for better tail latency.
Slack
Slack started as a PHP application and migrated to a Hack/HHVM backend for better performance. Slack's channel-based model stores messages per channel rather than per conversation, enabling features like channel search and threaded replies. Their real-time messaging layer uses WebSocket connections managed by a dedicated gateway tier that routes messages to the correct recipient connections across a fleet of edge servers.
| Aspect | Description |
|---|---|
| WebSocket vs Long Polling | WebSocket provides true bidirectional real-time communication but requires stateful servers that maintain persistent connections, complicating load balancing and server scaling. Long polling is simpler and works with stateless servers but introduces latency (up to the polling interval) and wastes bandwidth on empty responses. |
| At-Least-Once vs At-Most-Once Delivery | At-least-once delivery (retry until ACK) guarantees message delivery but requires client-side deduplication to handle duplicate messages. At-most-once delivery (fire and forget) is simpler but risks losing messages during network failures. Chat systems universally choose at-least-once because message loss is unacceptable. |
| Server-Assigned vs Client-Assigned Ordering | Server-assigned sequence numbers guarantee a total ordering per conversation but require a centralized or coordinated sequence generator. Client-assigned timestamps are distributed and require no coordination but are unreliable due to clock skew, potentially causing messages to appear out of order. |
| Store-and-Forward vs Relay-Only | Store-and-forward persists every message on the server, enabling message history, multi-device sync, and offline delivery. Relay-only forwards messages in real-time without server-side persistence, reducing storage costs but losing message history and making offline delivery impossible without a separate queue. |
Discord's Migration from MongoDB to Cassandra for Message Storage
Scenario
Discord initially stored messages in MongoDB, which worked well during early growth. As the platform scaled to billions of messages, MongoDB struggled with the write volume and the query pattern of fetching messages for a channel sorted by time. Hot channels with millions of messages caused individual MongoDB collections to grow beyond the efficient working set size, leading to increased query latency and memory pressure.
Solution
Discord migrated message storage to Cassandra, partitioned by channel_id and time bucket (e.g., 10-day buckets). This partition scheme co-located messages in the same channel and time range for efficient reads (loading recent messages for a channel is a single partition scan), while distributing the write load across the cluster. Each message includes a Snowflake-style ID that encodes the timestamp, ensuring chronological ordering within partitions. Read-heavy channels benefit from Cassandra's linear read scalability, while write-heavy channels benefit from its append-optimized storage engine.
Outcome
The Cassandra migration reduced p99 message fetch latency from 40ms to under 5ms for recent messages. The time-bucketed partition scheme kept partition sizes manageable (capped at approximately 100MB per bucket), preventing the unbounded growth that plagued MongoDB. Write throughput scaled linearly as new Cassandra nodes were added. Discord later migrated from Cassandra to ScyllaDB (a C++ rewrite of Cassandra) for even better tail latency performance, achieving sub-millisecond p99 reads for the hot path.
See Interview Walkthrough: Chat System in action
Explore system design templates that use interview walkthrough: chat system and run traffic simulations to see how these concepts perform under real load.
Browse Templates1Why are server-assigned sequence numbers preferred over client timestamps for message ordering in a chat system?
2How does a chat system achieve effectively-once message delivery?