ReviseAlgo Logo
Advanced20 min readReal-world Case Studies

Design Twitter

Handling write spikes and rendering active timelines instantly using hybrid fan-out models.

What you'll learn

  • Hybrid Fan-Out (Push + Pull by Follower Count)
  • Redis Timeline Cache (Sorted Sets)
  • Kafka Fan-Out Worker Architecture
  • Trending Topics (Count-Min Sketch)
  • Tweet Storage (Cassandra + Caching)
  • Search (Elasticsearch + Real-Time Indexing)

TL;DR

Handling write spikes and rendering active timelines instantly using hybrid fan-out models.

Visual System Topology

Twitter/X — Hybrid Fan-Out Architecture

Client
API Gateway
Kafka Event Bus
Fan-Out Workers push to follower feeds
Redis Timeline Cache sorted sets by timestamp
Search + Trending Elasticsearch + count-min
Tweet Store Cassandra / MySQL
Social Graph DB follows / followers
Celebrity Pull merged at read time
Normal user posts: Kafka → fan-out workers → ZADD to followers' Redis timeline sorted sets
Celebrity posts (>500K followers): stored in DB only → pulled on feed load → merged with Redis timeline

Concept Overview

Twitter/X is a microblogging platform serving 500M+ DAU with 500M tweets/day. Its core design challenge: generating a personalized home timeline for millions of users in < 200ms when each user follows hundreds of accounts tweeting thousands of times per second.

Functional Requirements:

  • Post tweets (up to 280 chars, with media/polls)
  • Home timeline (posts from followed accounts, reverse chronological + ranked)
  • Follow/unfollow users
  • Like, retweet, reply, quote tweet
  • Trending topics
  • Search tweets and users
  • Direct messages

Non-Functional Requirements:

  • < 200ms home timeline load
  • 99.9% availability
  • Eventual consistency acceptable (seeing a tweet 1–2 seconds late is fine)
  • Millions of concurrent users

Capacity Estimation (500M DAU):

  • Tweets/day: 500M = 5,787 tweets/sec
  • Avg followers per user: 200
  • Fan-out writes: 5,787 × 200 = 1.15M Redis writes/sec (normal users only)
  • Timeline reads: 500M × 5 opens = 2.5B/day = 29K reads/sec
  • Tweet storage: 500M × 300 bytes = 150 GB/day
  • Redis timeline memory: 100M active users × 800 tweets × 8B = 640 GB

Key Architectural Pillars

1

Hybrid Fan-Out (Push + Pull by Follower Count)

Push model (Fan-out Write): When User A tweets, workers push tweet_id to each follower's Redis timeline. Timeline read = O(1). Problem: @elonmusk (100M followers) tweets → 100M Redis ZADD operations. That's 100M writes for ONE tweet. Pull model (Fan-out Read): On timeline load, fetch recent tweets from each followed account. Problem: 500 follows → 500 DB queries per load = too slow. Hybrid: Push for users with < 500K followers (99.9% of users). Pull for celebrities (≥ 500K followers). Merge celebrity tweets into timeline at read time.

Example: Regular user tweets (250 followers): fan-out workers ZADD tweet_id to 250 Redis sorted sets = 250 writes. @nasa tweets (60M followers): tweet stored in DB, NOT pushed. When follower loads timeline: Redis LRANGE for 20 tweets + SELECT recent tweets FROM nasa_tweets → merge → return.
2

Redis Timeline Cache (Sorted Sets)

Each active user's home timeline is a Redis Sorted Set: key = timeline:{user_id}, value = tweet_id, score = timestamp (Unix epoch). Fan-out worker: ZADD timeline:{follower_id} {timestamp} {tweet_id} for each new tweet. On read: ZREVRANGE timeline:{user_id} 0 19 returns 20 most recent tweet IDs in O(log N). Each timeline is capped at 800 tweets (ZREMRANGEBYRANK to trim oldest). Inactive users (30+ days) have feeds evicted.

Example: Timeline load: ZREVRANGE timeline:{user_id} 0 19 → 20 tweet_ids in < 1ms → batch fetch tweet content from Cassandra or cache → merge celebrity tweets → return JSON. Total pipeline: < 200ms for 99th percentile.
3

Kafka Fan-Out Worker Architecture

When a tweet is posted: (1) Tweet saved to Cassandra/MySQL (primary storage), (2) Kafka event published {tweet_id, user_id, timestamp}, (3) Fan-out workers (sharded by user_id) consume from Kafka, (4) Worker fetches follower list from Social Graph DB, (5) For each follower: ZADD to their Redis timeline. Workers are parallelized — multiple workers per Kafka partition. Kafka provides backpressure: if workers are slow (e.g., celebrity fan-out), Kafka absorbs the lag.

Example: Kafka partition key = tweet.user_id. Fan-out workers sharded by same key. Worker receives tweet event → queries followers from Social Graph → issues 250 Redis ZADD commands in a pipelined batch (not one-by-one). Batching reduces Redis RTT overhead significantly.
4

Trending Topics (Count-Min Sketch)

Trending topics require counting hashtag/term frequency in a sliding time window (last 1 hour). Storing exact counts for every word in every tweet is too expensive. Instead: Count-Min Sketch (probabilistic data structure) approximates counts with configurable error bounds using minimal memory. A Kafka stream of all tweets feeds a Flink/Spark Streaming job that updates the sketch every 10 seconds. Top-K trending topics extracted from the sketch.

Example: Count-Min Sketch uses w hash functions × d counters each. Update: HASHTAG → hash_1(tag) through hash_w(tag) → increment those d counters. Query: min(counter at each hash position) = estimated frequency. Memory: 4MB handles billions of unique terms with < 1% error. SQL COUNT(*) would require TBs.
5

Tweet Storage (Cassandra + Caching)

Tweets are stored in Cassandra: partition_key = tweet_id (or user_id for user timeline page), clustering_key = timestamp. Cassandra handles the append-only write pattern of 5,787 tweets/sec with no write contention. For the most recent tweets (< 7 days old) that dominate read traffic, a Redis String cache (tweet_id → tweet_content_json) provides O(1) reads. Cache hit rate > 95% since most timeline reads fetch tweets from the last 24 hours.

Example: Timeline load fetches 20 tweet_ids from Redis timeline cache → multi-GET from Redis tweet cache (95% hit) → remaining 5% fetched from Cassandra → return. Total Cassandra reads per timeline: ~1 on average vs 20 without tweet caching.
6

Search (Elasticsearch + Real-Time Indexing)

Tweet search (full-text search by content, hashtags, mentioned users) uses Elasticsearch. Every tweet event published to Kafka is consumed by an Indexer service that writes to Elasticsearch in near-real-time (< 5 second delay). Elasticsearch handles: full-text search with relevance ranking, hashtag search, user search, date range filtering, and geographic search (geo-tagged tweets). At 5,787 tweets/sec, Elasticsearch ingestion must be highly parallelized.

Example: Search "#WorldCup": Elasticsearch query → match hashtag "#WorldCup" in last 1 hour → sort by engagement (likes + retweets) → return top 20. Elasticsearch handles this with < 100ms latency across 500M+ tweet index.

AI Tutor

Ask about the topic

Sign in Required

Please sign in to use the AI tutor

Sign In
Design Twitter - Module 10: Real-world Case Studies | System Design | Revise Algo