ReviseAlgo Logo
Advanced20 min readReal-world Case Studies

Design YouTube

Uploading heavy files, managing parallel worker video transcoder queues, metadata indexes, and edge CDNs.

What you'll learn

  • Chunked Upload (Resumable)
  • Kafka-Triggered Transcoding Pipeline
  • HLS / DASH — Adaptive Bitrate Streaming
  • CDN Architecture for Video Delivery
  • Approximate View Counts (Kafka + Batch)
  • Video Metadata Storage (Bigtable)

TL;DR

Uploading heavy files, managing parallel worker video transcoder queues, metadata indexes, and edge CDNs.

Visual System Topology

YouTube — Video Platform Architecture

Upload Service chunked multipart → S3
Transcoding Pipeline Kafka → parallel workers
CDN (Streaming) HLS/DASH at edge nodes
upload path | stream path
S3 (Raw + Encoded) origin storage
Metadata DB Bigtable / Spanner
Analytics + Search view counts / recommendations
Upload: Creator → Upload Service → S3 raw → Kafka → 5 parallel workers → 5 quality levels → S3 encoded → CDN staged
Stream: Viewer → CDN edge (m3u8 manifest) → ABR player → segments from nearest edge

Concept Overview

YouTube processes 500+ hours of video uploaded every minute and serves 1 billion+ hours watched daily. The system has two independent pipelines: the upload/transcoding pipeline (compute-heavy, async) and the streaming pipeline (bandwidth-heavy, CDN-driven).

Functional Requirements:

  • Upload videos of any length and quality (up to 4K, 12 hours)
  • Stream videos with smooth adaptive playback
  • Search videos by title, description, tags
  • Personalized recommendations
  • Like, comment, subscribe
  • Creator analytics (views, watch time, ad revenue)

Non-Functional Requirements:

  • < 0.5% rebuffering rate — smooth playback is the #1 UX metric
  • Support millions of concurrent streams
  • Eventual consistency for view counts (exact real-time counts not required)
  • Video content must be durable — never lose a creator's upload

Capacity Estimation:

  • Uploads: 500 hr/min = 8.3 hr/sec → ~30 GB/sec raw video ingest
  • Transcoding cost: 1 hr raw → 5 quality levels × ~1 CPU-hr each = 5 CPU-hours/video
  • Storage: 500 hr/min × 60 × 24 × 365 × 10 yr × 1 GB/hr × 5 qualities ≈ 1.3 Exabytes
  • Concurrent streams: 1B hr/day ÷ 86,400 sec × avg 30 min/session ≈ 11.5M concurrent
  • Bandwidth: 11.5M streams × 3 Mbps avg = 34 Tbps — served almost entirely from CDN

Key Architectural Pillars

1

Chunked Upload (Resumable)

A 10GB 4K video cannot upload as one HTTP request — any network drop requires starting over. The client splits the video into 5–10MB chunks, each with a sequence number. The Upload Service reassembles chunks in S3. If the connection drops mid-upload, the client resumes from the last confirmed chunk. Google's GCS Resumable Upload Protocol is the production implementation.

Example: Client uploads chunks 1–6 → connection drops mid-chunk 7 → client resumes from chunk 7. Creator sees a progress bar rather than a failed upload. This is non-negotiable for large 4K video files.
2

Kafka-Triggered Transcoding Pipeline

Transcoding (converting raw video to multiple resolutions) is extremely CPU-intensive: 1 hour of 4K video takes 5–10 CPU-hours. The pipeline: upload complete → Kafka event (video_id, s3_raw_path) → Transcoding Workers pick up event → 5 parallel workers each transcode to one resolution (360p, 480p, 720p, 1080p, 4K) → HLS segment files stored back to S3. Kafka decouples the upload from the transcoding workload.

Example: One 1-hour video → 5 workers run in parallel → each transcodes one quality level simultaneously → total wall-clock time ~1 hour (not 5 hours). Without parallelism: 5 hours of delay before the video is watchable.
3

HLS / DASH — Adaptive Bitrate Streaming

Videos are split into 2–10 second segments. An HLS manifest (.m3u8) lists segment URLs at each quality level. The player starts at low quality (fastest to buffer), monitors download speed, and switches quality segment-by-segment. High bandwidth → upgrade to 1080p. Bandwidth drops → downgrade to 360p. This is Adaptive Bitrate (ABR) streaming — the reason YouTube rarely buffers even on variable connections.

Example: Player downloads m3u8 manifest → starts 360p_seg_001.ts → measures: 5Mbps download speed → switches to 1080p_seg_002.ts → watches uninterrupted. No buffering spinner, no manual quality selection required.
4

CDN Architecture for Video Delivery

YouTube's CDN caches video segment files at hundreds of edge nodes worldwide. Pre-staging: YouTube's ML predicts which newly uploaded videos will go viral (by early engagement signals) and proactively pushes their segments to CDN edges before the traffic spike. Long-tail: Old niche videos not in CDN cache are fetched from S3 origin on first request and cached for subsequent viewers in that region.

Example: Popular video: first viewer → CDN fetches from S3, caches locally → next 10,000 viewers in same region get CDN cache hits at < 20ms. YouTube CDN serves > 99% of streaming traffic; S3 origin only handles < 1% of requests.
5

Approximate View Counts (Kafka + Batch)

YouTube shows "1.2M views" not "1,234,567 views." Exact real-time counts are impossible at scale: 1M concurrent viewers writing to one counter row = write lock contention disaster. Instead: each view event published to Kafka → Flink/Spark Streaming aggregates in 10-second windows → ONE counter write per window. Exact counts reconciled in batch nightly for creator analytics.

Example: At 1M concurrent viewers: 1M events/sec → Kafka partitions → aggregation workers batch → one DB update per 10 seconds. View count accuracy: within 1% real-time, exact in daily reports. This is the correct production approach.
6

Video Metadata Storage (Bigtable)

YouTube runs on Google infrastructure and uses Bigtable (a wide-column NoSQL DB) for video metadata: title, description, creator_id, upload_time, processing_status, CDN manifest URLs for each quality level, and approximate stats. Bigtable supports fast O(1) row key lookup by video_id and efficient range scans for creator video lists. It scales horizontally without the write bottlenecks of SQL.

Example: Bigtable row key = video_id. Columns: metadata:title, metadata:description, cdn:360p_manifest_url, cdn:1080p_manifest_url, stats:view_count_approx. A GET /video/{id} request hits Bigtable once → returns all needed metadata in one call.

AI Tutor

Ask about the topic

Sign in Required

Please sign in to use the AI tutor

Sign In
Design YouTube - Module 10: Real-world Case Studies | System Design | Revise Algo