Design WhatsApp
Developing real-time WebSockets gateways, message statuses, and Cassandra chat history stores.
What you'll learn
- WebSockets — Why Not HTTP Polling?
- Presence Service (Redis Routing Table)
- Cassandra for Message Storage
- Message Delivery Receipts (3-State FSM)
- Group Messaging via Kafka Fan-Out
- End-to-End Encryption (Signal Protocol)
TL;DR
Developing real-time WebSockets gateways, message statuses, and Cassandra chat history stores.
Visual System Topology
WhatsApp — Real-Time Messaging Architecture
Offline: Save to Cassandra → Kafka → Notification Service → APNs / FCM
Concept Overview
A real-time messaging system at WhatsApp scale serves 2B+ users sending 100B messages per day — roughly 1.15 million messages every second.
Functional Requirements:
- 1-on-1 and group messaging (up to 1,024 members per group)
- Message delivery status: Sent ✓, Delivered ✓✓, Read ✓✓ (blue)
- User presence (online/offline/last seen)
- Media sharing (images, video, voice, documents)
- End-to-end encryption (every message)
- Push notifications for offline users
Non-Functional Requirements:
- < 100ms message delivery for online users
- 99.99% availability (~52 minutes downtime/year)
- Horizontal scalability for billions of messages/day
- Durability — no message loss even on server crash
Capacity Estimation (2B users, 500M DAU):
- Messages/day: 100B → ~1.15M/sec
- Text storage/day: 100B × 100 bytes = 10 TB/day
- Media (20% of messages, 100KB avg): 20B × 100KB = 2 PB/day
- Active WebSocket connections: 500M persistent TCP connections
Key Architectural Pillars
WebSockets — Why Not HTTP Polling?
HTTP polling (client asks "any messages?" every N seconds) wastes bandwidth and adds latency. WebSockets establish a persistent, bidirectional TCP connection — the server pushes a message the instant it arrives. Each Chat Server holds 50K–100K simultaneous WebSocket connections. WhatsApp uses Erlang/OTP (BEAM VM) — each WebSocket is one lightweight Erlang process, enabling millions of concurrent connections per server.
Presence Service (Redis Routing Table)
A dedicated Presence Service maintains a mapping of user_id → (server_id, socket_id) in Redis. When User A connects via WebSocket, the chat server writes {user_id, server_ip, socket_id} to Redis. When A disconnects, the entry is removed after a heartbeat timeout. This lets any chat server instantly route a message to the server holding any online user's connection.
Cassandra for Message Storage
SQL bottlenecks on chat: millions of concurrent writes need row locks, and ORDER BY timestamp scans are slow. Cassandra's wide-column model is append-only, optimized for (conversation_id, timestamp) lookups, and scales writes linearly by adding nodes. Schema: partition_key = conversation_id, clustering_key = message_id (time-based UUID for ordering).
Message Delivery Receipts (3-State FSM)
Three delivery states: (1) Sent ✓ — server received and saved to Cassandra. (2) Delivered ✓✓ — recipient's device received the message (device sends ACK). (3) Read ✓✓ blue — recipient opened the conversation. Each state transition flows: recipient → their chat server → updates Cassandra → pushes receipt event to sender's WebSocket.
Group Messaging via Kafka Fan-Out
For 1-on-1 messages, direct server-to-server push is instant. For groups with 1,024 members, synchronous fan-out would block for seconds. Instead: message is saved to Cassandra, published to a Kafka topic (partition = group_id), and fan-out workers read from Kafka in parallel — each worker routes to one member's chat server via Redis Presence lookup.
End-to-End Encryption (Signal Protocol)
Every message is encrypted on the sender's device before transmission. The server only sees ciphertext — it cannot read content. Key exchange uses Diffie-Hellman to establish a shared secret. Messages use AES-256 encryption with new keys per session (forward secrecy: past messages cannot be decrypted even if current keys are compromised).
