Design Instagram
Uploading images, caching stories globally, CDN lookups, and hybrid read-write push-pull home feeds.
What you'll learn
- S3 + CDN for Global Media Delivery
- Hybrid Fan-Out (Push + Pull by Follower Count)
- Redis Feed Cache (Sorted Sets)
- Async Image Processing Pipeline
- PostgreSQL for Social Graph + Post Metadata
- Explore Page (Elasticsearch + ML)
TL;DR
Uploading images, caching stories globally, CDN lookups, and hybrid read-write push-pull home feeds.
Visual System Topology
Instagram — Photo Platform Architecture
Celebrity posts: pulled on feed load and merged with Redis feed (not pushed to 400M lists)
Concept Overview
Instagram is a photo/video sharing platform serving 2B+ users. It is read-heavy — for every post uploaded, there are ~100 feed views and image loads.
Functional Requirements:
- Upload photos and videos (posts, reels, stories)
- Home feed of posts from followed users
- Like, comment, follow/unfollow
- Stories (24-hour ephemeral content)
- Explore page (content discovery for non-follows)
- Push notifications
Non-Functional Requirements:
- < 200ms feed load time
- 99.9% availability
- Eventual consistency acceptable for feed (seeing a post 1–2 seconds late is fine)
- Media must be durable (S3 eleven-nines durability)
Capacity Estimation (500M DAU, 2B total users):
- Uploads: 100M posts/day = 1,157 posts/sec
- Feed reads: 500M × 5 opens = 2.5B/day = 29K reads/sec (read:write ≈ 100:1)
- Storage per post: 200KB compressed image + metadata
- Storage/day: 100M × 200KB = 20 TB/day media
- CDN traffic: 2.5B views × 200KB = 500 TB/day
- Redis feed memory: 100M active users × 200 posts × 8B = 160 GB
Key Architectural Pillars
S3 + CDN for Global Media Delivery
Uploaded images are stored in AWS S3 (object storage). S3 alone is too slow for global users — every request would cross the internet to one AWS region. A CDN (CloudFront/Fastly) caches images at hundreds of edge nodes worldwide. Users download from the nearest edge (< 50ms) instead of the origin S3 bucket. CDN handles 95%+ of image requests; origin only serves the first viewer.
Hybrid Fan-Out (Push + Pull by Follower Count)
Push (Fan-out Write): When User A posts, push post_id to all followers' Redis feed lists. Feed reads are O(1). Problem: @selenagomez (400M followers) → 400M Redis writes per post = write explosion. Pull (Fan-out Read): On feed load, fetch recent posts from each followed account. Problem: following 500 accounts → 500 DB queries per load = slow. Hybrid: Push for users with < 1M followers. Pull for celebrities. Merge on read.
Redis Feed Cache (Sorted Sets)
Each active user's home feed is a Redis Sorted Set: key = feed:{user_id}, value = post_id, score = timestamp. ZADD on post: adds post to all follower feeds. ZREVRANGE on feed load: returns 20 most recent post_ids in O(1). Eviction: only keep last 800 posts per feed. Inactive users (30+ days offline) have feeds evicted — rebuilt on next login via Pull.
Async Image Processing Pipeline
Raw uploads go through asynchronous processing after the upload completes: (1) Validation (file type, size limit), (2) Compression to WebP (~30% smaller than JPEG), (3) Resize to 3 dimensions (thumbnail, medium, full), (4) Store all sizes to S3, (5) Write CDN URLs to PostgreSQL post metadata. The user sees an immediate "uploading" state while processing completes in the background — no blocking.
PostgreSQL for Social Graph + Post Metadata
The follows table (follower_id, following_id, created_at) is stored in PostgreSQL with indexes on both columns: "who does user A follow?" (by follower_id) and "who follows user A?" (by following_id, for fan-out). At Instagram scale, this table has billions of rows and requires horizontal sharding by user_id. Instagram uses Cassandra in production for the follows graph.
Explore Page (Elasticsearch + ML)
The Explore page discovers content from accounts you don't follow using: (1) Elasticsearch for hashtag and caption text search, (2) Collaborative filtering ML model for personalization ("users like you also engaged with X"), (3) Trending algorithm based on engagement velocity (high like rate in last hour). Post metadata is asynchronously indexed into Elasticsearch after upload.
