Design YouTube
Uploading heavy files, managing parallel worker video transcoder queues, metadata indexes, and edge CDNs.
What you'll learn
- Chunked Upload (Resumable)
- Kafka-Triggered Transcoding Pipeline
- HLS / DASH — Adaptive Bitrate Streaming
- CDN Architecture for Video Delivery
- Approximate View Counts (Kafka + Batch)
- Video Metadata Storage (Bigtable)
TL;DR
Uploading heavy files, managing parallel worker video transcoder queues, metadata indexes, and edge CDNs.
Visual System Topology
YouTube — Video Platform Architecture
Stream: Viewer → CDN edge (m3u8 manifest) → ABR player → segments from nearest edge
Concept Overview
YouTube processes 500+ hours of video uploaded every minute and serves 1 billion+ hours watched daily. The system has two independent pipelines: the upload/transcoding pipeline (compute-heavy, async) and the streaming pipeline (bandwidth-heavy, CDN-driven).
Functional Requirements:
- Upload videos of any length and quality (up to 4K, 12 hours)
- Stream videos with smooth adaptive playback
- Search videos by title, description, tags
- Personalized recommendations
- Like, comment, subscribe
- Creator analytics (views, watch time, ad revenue)
Non-Functional Requirements:
- < 0.5% rebuffering rate — smooth playback is the #1 UX metric
- Support millions of concurrent streams
- Eventual consistency for view counts (exact real-time counts not required)
- Video content must be durable — never lose a creator's upload
Capacity Estimation:
- Uploads: 500 hr/min = 8.3 hr/sec → ~30 GB/sec raw video ingest
- Transcoding cost: 1 hr raw → 5 quality levels × ~1 CPU-hr each = 5 CPU-hours/video
- Storage: 500 hr/min × 60 × 24 × 365 × 10 yr × 1 GB/hr × 5 qualities ≈ 1.3 Exabytes
- Concurrent streams: 1B hr/day ÷ 86,400 sec × avg 30 min/session ≈ 11.5M concurrent
- Bandwidth: 11.5M streams × 3 Mbps avg = 34 Tbps — served almost entirely from CDN
Key Architectural Pillars
Chunked Upload (Resumable)
A 10GB 4K video cannot upload as one HTTP request — any network drop requires starting over. The client splits the video into 5–10MB chunks, each with a sequence number. The Upload Service reassembles chunks in S3. If the connection drops mid-upload, the client resumes from the last confirmed chunk. Google's GCS Resumable Upload Protocol is the production implementation.
Kafka-Triggered Transcoding Pipeline
Transcoding (converting raw video to multiple resolutions) is extremely CPU-intensive: 1 hour of 4K video takes 5–10 CPU-hours. The pipeline: upload complete → Kafka event (video_id, s3_raw_path) → Transcoding Workers pick up event → 5 parallel workers each transcode to one resolution (360p, 480p, 720p, 1080p, 4K) → HLS segment files stored back to S3. Kafka decouples the upload from the transcoding workload.
HLS / DASH — Adaptive Bitrate Streaming
Videos are split into 2–10 second segments. An HLS manifest (.m3u8) lists segment URLs at each quality level. The player starts at low quality (fastest to buffer), monitors download speed, and switches quality segment-by-segment. High bandwidth → upgrade to 1080p. Bandwidth drops → downgrade to 360p. This is Adaptive Bitrate (ABR) streaming — the reason YouTube rarely buffers even on variable connections.
CDN Architecture for Video Delivery
YouTube's CDN caches video segment files at hundreds of edge nodes worldwide. Pre-staging: YouTube's ML predicts which newly uploaded videos will go viral (by early engagement signals) and proactively pushes their segments to CDN edges before the traffic spike. Long-tail: Old niche videos not in CDN cache are fetched from S3 origin on first request and cached for subsequent viewers in that region.
Approximate View Counts (Kafka + Batch)
YouTube shows "1.2M views" not "1,234,567 views." Exact real-time counts are impossible at scale: 1M concurrent viewers writing to one counter row = write lock contention disaster. Instead: each view event published to Kafka → Flink/Spark Streaming aggregates in 10-second windows → ONE counter write per window. Exact counts reconciled in batch nightly for creator analytics.
Video Metadata Storage (Bigtable)
YouTube runs on Google infrastructure and uses Bigtable (a wide-column NoSQL DB) for video metadata: title, description, creator_id, upload_time, processing_status, CDN manifest URLs for each quality level, and approximate stats. Bigtable supports fast O(1) row key lookup by video_id and efficient range scans for creator video lists. It scales horizontally without the write bottlenecks of SQL.
