Advanced20 min readReal-world Case Studies

Design YouTube

Uploading heavy files, managing parallel worker video transcoder queues, metadata indexes, and edge CDNs.

What you'll learn

Chunked Upload (Resumable)
Kafka-Triggered Transcoding Pipeline
HLS / DASH — Adaptive Bitrate Streaming
CDN Architecture for Video Delivery
Approximate View Counts (Kafka + Batch)
Video Metadata Storage (Bigtable)

TL;DR

Uploading heavy files, managing parallel worker video transcoder queues, metadata indexes, and edge CDNs.

Visual System Topology

YouTube — Video Platform Architecture

Upload Service chunked multipart → S3

Transcoding Pipeline Kafka → parallel workers

CDN (Streaming) HLS/DASH at edge nodes

upload path | stream path

S3 (Raw + Encoded) origin storage

Metadata DB Bigtable / Spanner

Analytics + Search view counts / recommendations

Upload: Creator → Upload Service → S3 raw → Kafka → 5 parallel workers → 5 quality levels → S3 encoded → CDN staged
Stream: Viewer → CDN edge (m3u8 manifest) → ABR player → segments from nearest edge

Concept Overview

YouTube processes 500+ hours of video uploaded every minute and serves 1 billion+ hours watched daily. The system has two independent pipelines: the upload/transcoding pipeline (compute-heavy, async) and the streaming pipeline (bandwidth-heavy, CDN-driven).

Functional Requirements:

Upload videos of any length and quality (up to 4K, 12 hours)
Stream videos with smooth adaptive playback
Search videos by title, description, tags
Personalized recommendations
Like, comment, subscribe
Creator analytics (views, watch time, ad revenue)

Non-Functional Requirements:

< 0.5% rebuffering rate — smooth playback is the #1 UX metric
Support millions of concurrent streams
Eventual consistency for view counts (exact real-time counts not required)
Video content must be durable — never lose a creator's upload

Capacity Estimation:

Uploads: 500 hr/min = 8.3 hr/sec → ~30 GB/sec raw video ingest
Transcoding cost: 1 hr raw → 5 quality levels × ~1 CPU-hr each = 5 CPU-hours/video
Storage: 500 hr/min × 60 × 24 × 365 × 10 yr × 1 GB/hr × 5 qualities ≈ 1.3 Exabytes
Concurrent streams: 1B hr/day ÷ 86,400 sec × avg 30 min/session ≈ 11.5M concurrent
Bandwidth: 11.5M streams × 3 Mbps avg = 34 Tbps — served almost entirely from CDN

Key Architectural Pillars

Chunked Upload (Resumable)

A 10GB 4K video cannot upload as one HTTP request — any network drop requires starting over. The client splits the video into 5–10MB chunks, each with a sequence number. The Upload Service reassembles chunks in S3. If the connection drops mid-upload, the client resumes from the last confirmed chunk. Google's GCS Resumable Upload Protocol is the production implementation.

Example: Client uploads chunks 1–6 → connection drops mid-chunk 7 → client resumes from chunk 7. Creator sees a progress bar rather than a failed upload. This is non-negotiable for large 4K video files.

Kafka-Triggered Transcoding Pipeline

Transcoding (converting raw video to multiple resolutions) is extremely CPU-intensive: 1 hour of 4K video takes 5–10 CPU-hours. The pipeline: upload complete → Kafka event (video_id, s3_raw_path) → Transcoding Workers pick up event → 5 parallel workers each transcode to one resolution (360p, 480p, 720p, 1080p, 4K) → HLS segment files stored back to S3. Kafka decouples the upload from the transcoding workload.

Example: One 1-hour video → 5 workers run in parallel → each transcodes one quality level simultaneously → total wall-clock time ~1 hour (not 5 hours). Without parallelism: 5 hours of delay before the video is watchable.

HLS / DASH — Adaptive Bitrate Streaming

Videos are split into 2–10 second segments. An HLS manifest (.m3u8) lists segment URLs at each quality level. The player starts at low quality (fastest to buffer), monitors download speed, and switches quality segment-by-segment. High bandwidth → upgrade to 1080p. Bandwidth drops → downgrade to 360p. This is Adaptive Bitrate (ABR) streaming — the reason YouTube rarely buffers even on variable connections.

Example: Player downloads m3u8 manifest → starts 360p_seg_001.ts → measures: 5Mbps download speed → switches to 1080p_seg_002.ts → watches uninterrupted. No buffering spinner, no manual quality selection required.

CDN Architecture for Video Delivery

YouTube's CDN caches video segment files at hundreds of edge nodes worldwide. Pre-staging: YouTube's ML predicts which newly uploaded videos will go viral (by early engagement signals) and proactively pushes their segments to CDN edges before the traffic spike. Long-tail: Old niche videos not in CDN cache are fetched from S3 origin on first request and cached for subsequent viewers in that region.

Example: Popular video: first viewer → CDN fetches from S3, caches locally → next 10,000 viewers in same region get CDN cache hits at < 20ms. YouTube CDN serves > 99% of streaming traffic; S3 origin only handles < 1% of requests.

Approximate View Counts (Kafka + Batch)

YouTube shows "1.2M views" not "1,234,567 views." Exact real-time counts are impossible at scale: 1M concurrent viewers writing to one counter row = write lock contention disaster. Instead: each view event published to Kafka → Flink/Spark Streaming aggregates in 10-second windows → ONE counter write per window. Exact counts reconciled in batch nightly for creator analytics.

Example: At 1M concurrent viewers: 1M events/sec → Kafka partitions → aggregation workers batch → one DB update per 10 seconds. View count accuracy: within 1% real-time, exact in daily reports. This is the correct production approach.

Video Metadata Storage (Bigtable)

YouTube runs on Google infrastructure and uses Bigtable (a wide-column NoSQL DB) for video metadata: title, description, creator_id, upload_time, processing_status, CDN manifest URLs for each quality level, and approximate stats. Bigtable supports fast O(1) row key lookup by video_id and efficient range scans for creator video lists. It scales horizontally without the write bottlenecks of SQL.

Example: Bigtable row key = video_id. Columns: metadata:title, metadata:description, cdn:360p_manifest_url, cdn:1080p_manifest_url, stats:view_count_approx. A GET /video/{id} request hits Bigtable once → returns all needed metadata in one call.