WhatsApp Architecture Case Study
How WhatsApp handles 100+ billion messages daily with remarkable efficiency.
Architecture Overview
WhatsApp is known for its incredibly efficient architecture, handling massive scale with a relatively small engineering team.
┌─────────────────────────────────────────────────────────────┐
│ Mobile Clients │
│ (iOS, Android, Web, Desktop) │
└─────────────────────────┬───────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Load Balancers │
│ (Geographic Distribution) │
└─────────────────────────┬───────────────────────────────────┘
│
┌─────────────────────────┼───────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────────────┐
│ Connection │ │ Message │ │ Media Storage │
│ Servers │ │ Routing │ │ (S3/CDN) │
│ (XMPP/Noise) │ │ Servers │ │ │
└───────────────┘ └───────────────┘ └───────────────────────┘
Core Technology Stack
Erlang/OTP
WhatsApp’s backend is primarily built on Erlang, chosen for:
- Concurrency: Lightweight processes (millions per server)
- Fault Tolerance: “Let it crash” philosophy
- Hot Code Swapping: Update without downtime
- Distributed Computing: Built-in distribution
%% Example: Erlang process handling
-module(message_handler).
-export([start/0, handle/1]).
start() ->
spawn(fun() -> loop() end).
loop() ->
receive
{send, Message, To} ->
route_message(Message, To),
loop();
stop ->
ok
end.
FreeBSD Operating System
- Highly tuned for networking
- Better performance than Linux for their workload
- Custom kernel optimizations
Key Components
1. Connection Management
- Protocol: Custom protocol based on XMPP (simplified)
- Encryption: Signal Protocol (end-to-end)
- Connections: Long-lived TCP connections
- Compression: Efficient binary protocol
2. Message Flow
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Sender │───▶│ Server │───▶│ Server │───▶│ Receiver │
│ Client │ │ (Home) │ │ (Dest) │ │ Client │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
┌──────────────┐
│ Mnesia/MySQL │
│ (Offline) │
└──────────────┘
Message States:
- Single checkmark: Delivered to server
- Double checkmark: Delivered to recipient
- Blue checkmarks: Read by recipient
3. Data Storage
| Component | Storage | Purpose |
|---|---|---|
| Messages (offline) | Mnesia → MySQL | Store until delivered |
| User profiles | MySQL | Account data |
| Media files | Amazon S3 | Images, videos, documents |
| Keys | Local device | End-to-end encryption keys |
4. Media Handling
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Client │───▶│ Upload │───▶│ S3 │
│ Uploads │ │ Server │ │ Storage │
└────────────┘ └────────────┘ └────────────┘
│
▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Client │◀───│ CDN │◀───│ Generate │
│ Downloads │ │ │ │ URL │
└────────────┘ └────────────┘ └────────────┘
Scalability Strategies
1. Server Efficiency
- 2 million connections per server (Erlang’s strength)
- Custom memory management
- Optimized garbage collection
2. Database Optimization
- Read replicas for scaling reads
- Sharding by user ID
- Minimal data storage (messages deleted after delivery)
3. Caching
┌─────────────┐ ┌─────────────┐
│ Request │────▶│ Memcached │ (Hit: Return)
└─────────────┘ └──────┬──────┘
│ (Miss)
▼
┌─────────────┐
│ MySQL │
└─────────────┘
End-to-End Encryption
Signal Protocol Implementation
┌─────────────────────────────────────────────────────┐
│ Key Exchange │
├─────────────────────────────────────────────────────┤
│ 1. Identity Key (long-term) │
│ 2. Signed Pre-Key (medium-term) │
│ 3. One-Time Pre-Keys (single use) │
└─────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────┐
│ Double Ratchet Algorithm │
├─────────────────────────────────────────────────────┤
│ - Forward secrecy │
│ - Break-in recovery │
│ - Per-message keys │
└─────────────────────────────────────────────────────┘
Group Messaging
- Sender Keys for efficiency
- Each member has unique key
- Server cannot decrypt messages
Performance Metrics
| Metric | Value |
|---|---|
| Daily Messages | 100+ billion |
| Monthly Active Users | 2+ billion |
| Engineers (2014) | ~50 |
| Servers (2014) | ~550 |
| Messages/second | 1+ million |
Design Principles
1. Simplicity
- Focus on core messaging functionality
- Minimal features, maximum reliability
- Simple user experience
2. Efficiency
- Binary protocol (not JSON/XML)
- Minimal server storage
- Optimized network usage
3. Privacy
- End-to-end encryption by default
- Minimal data collection
- Messages not stored on servers
4. Reliability
- Messages always delivered
- Offline message queuing
- Automatic reconnection
Lessons for Architects
1. Choose the Right Technology
Erlang was perfect for WhatsApp’s needs:
- Concurrent connections
- Fault tolerance
- Low latency
2. Optimize Ruthlessly
- Every byte counts
- Profile and measure
- Custom solutions when needed
3. Keep It Simple
- Fewer features, done well
- Minimal dependencies
- Clear architecture
4. Plan for Scale
- Design for millions from day one
- Horizontal scaling capability
- Efficient resource usage
C# Equivalent Patterns
Connection Handling (SignalR)
public class ChatHub : Hub
{
public async Task SendMessage(string user, string message)
{
await Clients.User(user).SendAsync("ReceiveMessage", message);
}
public override async Task OnConnectedAsync()
{
await Groups.AddToGroupAsync(Context.ConnectionId, "Online");
await base.OnConnectedAsync();
}
}
Message Queue Pattern
public class MessageService
{
private readonly IMessageQueue _queue;
public async Task SendMessageAsync(Message message)
{
if (await IsUserOnline(message.RecipientId))
{
await DeliverDirectly(message);
}
else
{
await _queue.EnqueueForDelivery(message);
}
}
}
Sources
Arhitectura/WhatsApp architecture.gif