Advanced20 min readReal-world Case Studies

Design Twitter

Handling write spikes and rendering active timelines instantly using hybrid fan-out models.

What you'll learn

Hybrid Fan-Out (Push + Pull by Follower Count)
Redis Timeline Cache (Sorted Sets)
Kafka Fan-Out Worker Architecture
Trending Topics (Count-Min Sketch)
Tweet Storage (Cassandra + Caching)
Search (Elasticsearch + Real-Time Indexing)

TL;DR

Handling write spikes and rendering active timelines instantly using hybrid fan-out models.

Visual System Topology

Twitter/X — Hybrid Fan-Out Architecture

Client

──►

API Gateway

──►

Kafka Event Bus

Fan-Out Workers push to follower feeds

Redis Timeline Cache sorted sets by timestamp

Search + Trending Elasticsearch + count-min

Tweet Store Cassandra / MySQL

Social Graph DB follows / followers

Celebrity Pull merged at read time

Normal user posts: Kafka → fan-out workers → ZADD to followers' Redis timeline sorted sets
Celebrity posts (>500K followers): stored in DB only → pulled on feed load → merged with Redis timeline

Concept Overview

Twitter/X is a microblogging platform serving 500M+ DAU with 500M tweets/day. Its core design challenge: generating a personalized home timeline for millions of users in < 200ms when each user follows hundreds of accounts tweeting thousands of times per second.

Functional Requirements:

Post tweets (up to 280 chars, with media/polls)
Home timeline (posts from followed accounts, reverse chronological + ranked)
Follow/unfollow users
Like, retweet, reply, quote tweet
Trending topics
Search tweets and users
Direct messages

Non-Functional Requirements:

< 200ms home timeline load
99.9% availability
Eventual consistency acceptable (seeing a tweet 1–2 seconds late is fine)
Millions of concurrent users

Capacity Estimation (500M DAU):

Tweets/day: 500M = 5,787 tweets/sec
Avg followers per user: 200
Fan-out writes: 5,787 × 200 = 1.15M Redis writes/sec (normal users only)
Timeline reads: 500M × 5 opens = 2.5B/day = 29K reads/sec
Tweet storage: 500M × 300 bytes = 150 GB/day
Redis timeline memory: 100M active users × 800 tweets × 8B = 640 GB

Key Architectural Pillars

Hybrid Fan-Out (Push + Pull by Follower Count)

Push model (Fan-out Write): When User A tweets, workers push tweet_id to each follower's Redis timeline. Timeline read = O(1). Problem: @elonmusk (100M followers) tweets → 100M Redis ZADD operations. That's 100M writes for ONE tweet. Pull model (Fan-out Read): On timeline load, fetch recent tweets from each followed account. Problem: 500 follows → 500 DB queries per load = too slow. Hybrid: Push for users with < 500K followers (99.9% of users). Pull for celebrities (≥ 500K followers). Merge celebrity tweets into timeline at read time.

Example: Regular user tweets (250 followers): fan-out workers ZADD tweet_id to 250 Redis sorted sets = 250 writes. @nasa tweets (60M followers): tweet stored in DB, NOT pushed. When follower loads timeline: Redis LRANGE for 20 tweets + SELECT recent tweets FROM nasa_tweets → merge → return.

Redis Timeline Cache (Sorted Sets)

Each active user's home timeline is a Redis Sorted Set: key = timeline:{user_id}, value = tweet_id, score = timestamp (Unix epoch). Fan-out worker: ZADD timeline:{follower_id} {timestamp} {tweet_id} for each new tweet. On read: ZREVRANGE timeline:{user_id} 0 19 returns 20 most recent tweet IDs in O(log N). Each timeline is capped at 800 tweets (ZREMRANGEBYRANK to trim oldest). Inactive users (30+ days) have feeds evicted.

Example: Timeline load: ZREVRANGE timeline:{user_id} 0 19 → 20 tweet_ids in < 1ms → batch fetch tweet content from Cassandra or cache → merge celebrity tweets → return JSON. Total pipeline: < 200ms for 99th percentile.

Kafka Fan-Out Worker Architecture

When a tweet is posted: (1) Tweet saved to Cassandra/MySQL (primary storage), (2) Kafka event published {tweet_id, user_id, timestamp}, (3) Fan-out workers (sharded by user_id) consume from Kafka, (4) Worker fetches follower list from Social Graph DB, (5) For each follower: ZADD to their Redis timeline. Workers are parallelized — multiple workers per Kafka partition. Kafka provides backpressure: if workers are slow (e.g., celebrity fan-out), Kafka absorbs the lag.

Example: Kafka partition key = tweet.user_id. Fan-out workers sharded by same key. Worker receives tweet event → queries followers from Social Graph → issues 250 Redis ZADD commands in a pipelined batch (not one-by-one). Batching reduces Redis RTT overhead significantly.

Tweet Storage (Cassandra + Caching)

Tweets are stored in Cassandra: partition_key = tweet_id (or user_id for user timeline page), clustering_key = timestamp. Cassandra handles the append-only write pattern of 5,787 tweets/sec with no write contention. For the most recent tweets (< 7 days old) that dominate read traffic, a Redis String cache (tweet_id → tweet_content_json) provides O(1) reads. Cache hit rate > 95% since most timeline reads fetch tweets from the last 24 hours.

Example: Timeline load fetches 20 tweet_ids from Redis timeline cache → multi-GET from Redis tweet cache (95% hit) → remaining 5% fetched from Cassandra → return. Total Cassandra reads per timeline: ~1 on average vs 20 without tweet caching.

Search (Elasticsearch + Real-Time Indexing)

Tweet search (full-text search by content, hashtags, mentioned users) uses Elasticsearch. Every tweet event published to Kafka is consumed by an Indexer service that writes to Elasticsearch in near-real-time (< 5 second delay). Elasticsearch handles: full-text search with relevance ranking, hashtag search, user search, date range filtering, and geographic search (geo-tagged tweets). At 5,787 tweets/sec, Elasticsearch ingestion must be highly parallelized.

Example: Search "#WorldCup": Elasticsearch query → match hashtag "#WorldCup" in last 1 hour → sort by engagement (likes + retweets) → return top 20. Elasticsearch handles this with < 100ms latency across 500M+ tweet index.