Vetora logo
💬Interview Toolkit

Interview Walkthrough: Chat System

A thorough interview walkthrough for designing a real-time chat system like WhatsApp or Slack. Covers WebSocket transport, message storage and ordering, delivery guarantees, presence, group chat, and offline message delivery.

Overview

Designing a real-time chat system is a classic system design interview question that tests your understanding of bidirectional communication protocols, delivery guarantees, distributed state management, and real-time infrastructure. The problem spans from low-level transport decisions to high-level architectural patterns, making it a rich playground for demonstrating breadth and depth.

The transport layer is the first key decision. HTTP is request-response and inherently unidirectional from the client's perspective -- the server cannot push messages to the client without the client polling. WebSocket provides a full-duplex, persistent connection between client and server, allowing the server to push new messages to the client the instant they arrive. Server-Sent Events (SSE) is a lighter-weight alternative that supports server-to-client push but not client-to-server messages, making it less suitable for chat. For a chat system, WebSocket is the standard choice. The client establishes a WebSocket connection on app launch and keeps it open for the session. The server maintains a mapping of user_id to WebSocket connection, enabling targeted message delivery.

Message storage requires careful schema design. Messages are stored in a table partitioned by conversation_id to ensure that all messages in a conversation are co-located for efficient retrieval. Each message has a unique message_id (UUID or Snowflake ID), the conversation_id, sender_id, content, timestamp, and delivery status. Message ordering within a conversation is critical and subtle. Client-generated timestamps are unreliable because device clocks drift. Instead, the server assigns a monotonically increasing sequence number per conversation when it receives a message. This guarantees a total ordering within each conversation regardless of clock skew. For multi-server deployments, a per-conversation sequence generator (backed by a database counter or a distributed sequence service) ensures consistent ordering.

Delivery guarantees are fundamental to a chat system. Users expect that messages they send will be delivered, and delivered exactly once. The system provides at-least-once delivery: when a server receives a message, it persists it, then pushes it to the recipient's WebSocket connection. The recipient's client sends an acknowledgment (ACK) back. If the server does not receive an ACK within a timeout, it retries delivery. This can cause duplicate deliveries if the ACK was lost in transit, so the client must deduplicate by message_id. The combination of server-side retry and client-side dedup achieves effectively-once delivery semantics.

Offline message delivery handles the case where the recipient is not connected when a message is sent. The system stores undelivered messages in a persistent queue (or simply marks them as undelivered in the messages table). When the recipient reconnects, the server pushes all undelivered messages in order. The presence system determines whether a user is online by tracking WebSocket connection state and periodic heartbeats. A user is considered online if the server has received a heartbeat within the last N seconds (typically 30-60 seconds). Presence information is published to subscribers (friends, conversation members) so that the UI can show online/offline status indicators.

Key Points
  • 1WebSocket provides full-duplex persistent connections essential for real-time chat. The server maintains a user_id-to-connection mapping for targeted message delivery, requiring a connection registry service in a multi-server deployment.
  • 2Messages are stored partitioned by conversation_id for query locality. Each message gets a server-assigned monotonic sequence number within its conversation to guarantee ordering regardless of client clock skew.
  • 3At-least-once delivery with client-side deduplication provides effectively-once semantics. The server retries unacknowledged messages, and the client ignores duplicates by checking the message_id against its local history.
  • 4Group chat fans out messages to all group members. For small groups (under 500 members), the server writes a single message record and pushes notifications to each member's WebSocket connection. For large groups, an async fanout queue prevents write-path latency from scaling with group size.
  • 5Presence detection uses WebSocket connection state and heartbeats. A user is marked offline if no heartbeat is received within a configurable timeout. Presence updates are published to relevant subscribers to update UI indicators.
  • 6End-to-end encryption (E2EE) encrypts messages on the sender's device and decrypts on the recipient's device, meaning the server stores only ciphertext. This requires key exchange protocols (Signal Protocol) and complicates features like server-side search and message history on new devices.
Simple Example

The Walkie-Talkie vs Postal Mail Analogy

HTTP-based chat is like postal mail: you send a letter and then keep checking your mailbox for a reply (polling). WebSocket-based chat is like a walkie-talkie: once the channel is open, either party can speak at any time and the other hears instantly. The walkie-talkie channel stays open (persistent connection) so there is no delay in establishing communication each time. If the other person's walkie-talkie is off (they are offline), the message is stored at the base station (message queue) and delivered when they turn on their device and check in.

Real-World Examples

WhatsApp

WhatsApp originally built its backend on Erlang, chosen for its ability to handle millions of concurrent lightweight processes (one per WebSocket connection). A single server handled up to 2 million simultaneous connections. Messages use end-to-end encryption via the Signal Protocol, meaning WhatsApp servers never see plaintext message content. Message delivery uses a store-and-forward model: messages are persisted until the recipient acknowledges receipt, then deleted from the server.

Discord

Discord handles over 4 billion messages per day across millions of servers (group chats). Their infrastructure evolved from a Python monolith to an Elixir-based gateway service for WebSocket connections and a Rust-based data services layer for performance-critical paths. Discord uses eventual consistency for message delivery: messages are written to a Cassandra cluster partitioned by channel_id and time bucket, with a ScyllaDB migration for better tail latency.

Slack

Slack started as a PHP application and migrated to a Hack/HHVM backend for better performance. Slack's channel-based model stores messages per channel rather than per conversation, enabling features like channel search and threaded replies. Their real-time messaging layer uses WebSocket connections managed by a dedicated gateway tier that routes messages to the correct recipient connections across a fleet of edge servers.

Trade-Offs
AspectDescription
WebSocket vs Long PollingWebSocket provides true bidirectional real-time communication but requires stateful servers that maintain persistent connections, complicating load balancing and server scaling. Long polling is simpler and works with stateless servers but introduces latency (up to the polling interval) and wastes bandwidth on empty responses.
At-Least-Once vs At-Most-Once DeliveryAt-least-once delivery (retry until ACK) guarantees message delivery but requires client-side deduplication to handle duplicate messages. At-most-once delivery (fire and forget) is simpler but risks losing messages during network failures. Chat systems universally choose at-least-once because message loss is unacceptable.
Server-Assigned vs Client-Assigned OrderingServer-assigned sequence numbers guarantee a total ordering per conversation but require a centralized or coordinated sequence generator. Client-assigned timestamps are distributed and require no coordination but are unreliable due to clock skew, potentially causing messages to appear out of order.
Store-and-Forward vs Relay-OnlyStore-and-forward persists every message on the server, enabling message history, multi-device sync, and offline delivery. Relay-only forwards messages in real-time without server-side persistence, reducing storage costs but losing message history and making offline delivery impossible without a separate queue.
Case Study

Discord's Migration from MongoDB to Cassandra for Message Storage

Scenario

Discord initially stored messages in MongoDB, which worked well during early growth. As the platform scaled to billions of messages, MongoDB struggled with the write volume and the query pattern of fetching messages for a channel sorted by time. Hot channels with millions of messages caused individual MongoDB collections to grow beyond the efficient working set size, leading to increased query latency and memory pressure.

Solution

Discord migrated message storage to Cassandra, partitioned by channel_id and time bucket (e.g., 10-day buckets). This partition scheme co-located messages in the same channel and time range for efficient reads (loading recent messages for a channel is a single partition scan), while distributing the write load across the cluster. Each message includes a Snowflake-style ID that encodes the timestamp, ensuring chronological ordering within partitions. Read-heavy channels benefit from Cassandra's linear read scalability, while write-heavy channels benefit from its append-optimized storage engine.

Outcome

The Cassandra migration reduced p99 message fetch latency from 40ms to under 5ms for recent messages. The time-bucketed partition scheme kept partition sizes manageable (capped at approximately 100MB per bucket), preventing the unbounded growth that plagued MongoDB. Write throughput scaled linearly as new Cassandra nodes were added. Discord later migrated from Cassandra to ScyllaDB (a C++ rewrite of Cassandra) for even better tail latency performance, achieving sub-millisecond p99 reads for the hot path.

Common Mistakes
  • Using HTTP polling or long polling for real-time messaging when WebSocket is the appropriate technology. HTTP polling introduces unnecessary latency and bandwidth waste. WebSocket should be the default choice for any system requiring server-initiated message delivery.
  • Relying on client-generated timestamps for message ordering. Device clocks are unreliable -- they can drift, be manually set incorrectly, or differ by seconds across devices. Always use server-assigned monotonic IDs or sequence numbers for message ordering within a conversation.
  • Designing the system without offline message delivery. Not all users are online at all times. The system must queue messages for offline users and deliver them in order when the user reconnects. This is a fundamental requirement that interviewers expect you to address.
  • Treating group chat as a simple extension of 1:1 chat without considering fanout cost. In a group of 1,000 members, every message triggers 999 delivery operations. For large groups, asynchronous fanout through a message queue prevents the send path from blocking on delivery to all members.
Related Concepts

See Interview Walkthrough: Chat System in action

Explore system design templates that use interview walkthrough: chat system and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Simulate a chat system under load with the chat template

Metrics to watch
message_delivery_latency_mswebsocket_connectionsthroughput_rps
Run Simulation
Test Your Understanding

1Why are server-assigned sequence numbers preferred over client timestamps for message ordering in a chat system?

2How does a chat system achieve effectively-once message delivery?

Deeper Reading