Design Twitter
Handling write spikes and rendering active timelines instantly using hybrid fan-out models.
What you'll learn
- Hybrid Fan-Out (Push + Pull by Follower Count)
- Redis Timeline Cache (Sorted Sets)
- Kafka Fan-Out Worker Architecture
- Trending Topics (Count-Min Sketch)
- Tweet Storage (Cassandra + Caching)
- Search (Elasticsearch + Real-Time Indexing)
TL;DR
Handling write spikes and rendering active timelines instantly using hybrid fan-out models.
Visual System Topology
Twitter/X — Hybrid Fan-Out Architecture
Celebrity posts (>500K followers): stored in DB only → pulled on feed load → merged with Redis timeline
Concept Overview
Twitter/X is a microblogging platform serving 500M+ DAU with 500M tweets/day. Its core design challenge: generating a personalized home timeline for millions of users in < 200ms when each user follows hundreds of accounts tweeting thousands of times per second.
Functional Requirements:
- Post tweets (up to 280 chars, with media/polls)
- Home timeline (posts from followed accounts, reverse chronological + ranked)
- Follow/unfollow users
- Like, retweet, reply, quote tweet
- Trending topics
- Search tweets and users
- Direct messages
Non-Functional Requirements:
- < 200ms home timeline load
- 99.9% availability
- Eventual consistency acceptable (seeing a tweet 1–2 seconds late is fine)
- Millions of concurrent users
Capacity Estimation (500M DAU):
- Tweets/day: 500M = 5,787 tweets/sec
- Avg followers per user: 200
- Fan-out writes: 5,787 × 200 = 1.15M Redis writes/sec (normal users only)
- Timeline reads: 500M × 5 opens = 2.5B/day = 29K reads/sec
- Tweet storage: 500M × 300 bytes = 150 GB/day
- Redis timeline memory: 100M active users × 800 tweets × 8B = 640 GB
Key Architectural Pillars
Hybrid Fan-Out (Push + Pull by Follower Count)
Push model (Fan-out Write): When User A tweets, workers push tweet_id to each follower's Redis timeline. Timeline read = O(1). Problem: @elonmusk (100M followers) tweets → 100M Redis ZADD operations. That's 100M writes for ONE tweet. Pull model (Fan-out Read): On timeline load, fetch recent tweets from each followed account. Problem: 500 follows → 500 DB queries per load = too slow. Hybrid: Push for users with < 500K followers (99.9% of users). Pull for celebrities (≥ 500K followers). Merge celebrity tweets into timeline at read time.
Redis Timeline Cache (Sorted Sets)
Each active user's home timeline is a Redis Sorted Set: key = timeline:{user_id}, value = tweet_id, score = timestamp (Unix epoch). Fan-out worker: ZADD timeline:{follower_id} {timestamp} {tweet_id} for each new tweet. On read: ZREVRANGE timeline:{user_id} 0 19 returns 20 most recent tweet IDs in O(log N). Each timeline is capped at 800 tweets (ZREMRANGEBYRANK to trim oldest). Inactive users (30+ days) have feeds evicted.
Kafka Fan-Out Worker Architecture
When a tweet is posted: (1) Tweet saved to Cassandra/MySQL (primary storage), (2) Kafka event published {tweet_id, user_id, timestamp}, (3) Fan-out workers (sharded by user_id) consume from Kafka, (4) Worker fetches follower list from Social Graph DB, (5) For each follower: ZADD to their Redis timeline. Workers are parallelized — multiple workers per Kafka partition. Kafka provides backpressure: if workers are slow (e.g., celebrity fan-out), Kafka absorbs the lag.
Trending Topics (Count-Min Sketch)
Trending topics require counting hashtag/term frequency in a sliding time window (last 1 hour). Storing exact counts for every word in every tweet is too expensive. Instead: Count-Min Sketch (probabilistic data structure) approximates counts with configurable error bounds using minimal memory. A Kafka stream of all tweets feeds a Flink/Spark Streaming job that updates the sketch every 10 seconds. Top-K trending topics extracted from the sketch.
Tweet Storage (Cassandra + Caching)
Tweets are stored in Cassandra: partition_key = tweet_id (or user_id for user timeline page), clustering_key = timestamp. Cassandra handles the append-only write pattern of 5,787 tweets/sec with no write contention. For the most recent tweets (< 7 days old) that dominate read traffic, a Redis String cache (tweet_id → tweet_content_json) provides O(1) reads. Cache hit rate > 95% since most timeline reads fetch tweets from the last 24 hours.
Search (Elasticsearch + Real-Time Indexing)
Tweet search (full-text search by content, hashtags, mentioned users) uses Elasticsearch. Every tweet event published to Kafka is consumed by an Indexer service that writes to Elasticsearch in near-real-time (< 5 second delay). Elasticsearch handles: full-text search with relevance ranking, hashtag search, user search, date range filtering, and geographic search (geo-tagged tweets). At 5,787 tweets/sec, Elasticsearch ingestion must be highly parallelized.
