ReviseAlgo Logo
Advanced20 min readReal-world Case Studies

Design TinyURL

Building a highly available URL shortener with unique base62 hash range distributions.

What you'll learn

  • Base62 Encoding
  • Approach 1 — Hashing (MD5/SHA-256 + Base62)
  • Approach 2 — Unique ID Generation (Recommended)
  • Redirection: 301 vs 302
  • Caching Strategy
  • Custom Aliases
  • Link Expiration
  • Database Choice: NoSQL over SQL

TL;DR

Building a highly available URL shortener with unique base62 hash range distributions.

Visual System Topology

URL Shortener — Full Architecture

Client Browser / App
Load Balancer Round-robin
API Servers Stateless × N
URL Generator Base62 / Snowflake ID
POST /shorten
Redirect Service 301 / 302 redirect
GET /{'{key}'}
Analytics Service Async via Kafka
click events
Redis Cache shortKey → longURL
90% cache hit rate
NoSQL DB DynamoDB / Cassandra
sharded by short_key
ID Generator Snowflake / ZooKeeper
range-based
Shorten: Client → LB → API → URL Generator (get ID) → Base62 encode → write DB → return short URL
Redirect: Client → LB → API → check Redis → (cache miss) → query DB → 301 redirect → browser caches it

Concept Overview

A URL Shortener (like TinyURL or Bitly) takes a long URL and returns a compact alias that redirects users to the original destination.

Functional Requirements:

  • Generate a unique short URL for any long URL
  • Redirect users to the original URL when the short URL is accessed
  • Optional: custom aliases (user-defined short keys)
  • Optional: link expiration (URLs become inactive after a set period)
  • Optional: click analytics (tracking usage per link)

Non-Functional Requirements:

  • High Availability: 99.9% uptime — a broken short link is a broken user experience
  • Low Latency: URL creation and redirects should complete in milliseconds
  • Scalability: handle millions of requests per day with horizontal scaling
  • Durability: shortened URLs should remain valid for years
  • Security: prevent abuse (phishing, spam), rate-limit creation endpoints

Capacity Estimation (1M writes/day, 100:1 read-to-write ratio):

  • Writes/sec: ~12 avg, ~120 peak
  • Redirects/sec: ~1,200 avg, ~12,000 peak
  • Storage per URL: 127 bytes (7B short key + 100B long URL + timestamps + click count)
  • Storage per year: ~46 GB (365M URLs × 127 bytes)
  • Cache needed for hot 20% of URLs: ~25 MB (fits easily in Redis)
  • With 90% cache hit rate, only ~120 RPS actually reach the database

Key Architectural Pillars

1

Base62 Encoding

Short keys are built from [A-Z, a-z, 0-9] — 62 characters. A 7-character Base62 string can represent 62⁷ ≈ 3.5 billion unique URLs. The approach: take a numeric ID → convert to Base62 → that becomes the short key. Example: ID 47770830013755 (decimal) → "DZFbb43" (Base62).

Example: ID 1 → "1", ID 62 → "10", ID 3844 → "100" in Base62. The mapping is deterministic and compact.
2

Approach 1 — Hashing (MD5/SHA-256 + Base62)

Hash the long URL using MD5, take the first 6 bytes (48 bits), convert to decimal, then Base62-encode. Fast and stateless but has two problems: (1) identical long URLs always produce the same short key (collision on purpose, but users sharing the same short URL is unexpected), and (2) rare hash collisions can occur where two different long URLs produce the same short key. Resolution: re-hash with a different seed, or append an incremental suffix.

Example: MD5("https://example.com/long") = "1b3aabf5266b..." → first 6 bytes "1b3aabf5266b" → decimal 47770830013755 → Base62 "DZFbb43"
3

Approach 2 — Unique ID Generation (Recommended)

Each new URL is assigned an auto-incrementing unique ID (via a distributed ID service like Twitter Snowflake or a range-based ZooKeeper coordinator). The ID is Base62-encoded into the short key. This is collision-free by design since IDs are always unique. Drawback: sequential IDs are predictable — mitigate by shuffling bits or adding randomness before encoding.

Example: URL #1000000 → Base62("1000000") = "4c92". Range-based: ZooKeeper gives server A range [1-1000], server B range [1001-2000] — no coordination needed per request.
4

Redirection: 301 vs 302

301 (Permanent Redirect): browser caches the mapping and skips the server on future visits. Pros: dramatically reduces server load. Cons: click analytics become impossible since the browser never calls the server again. 302 (Temporary Redirect): server is always contacted on every click. Pros: full analytics tracking. Cons: higher server load. Choose based on whether analytics matter.

Example: Bitly uses 301 for anonymous links (performance) and 302 for tracked links (analytics dashboards).
5

Caching Strategy

The redirect path (read) vastly outnumbers URL creation (write) at 100:1. Cache shortKey→longURL mappings in Redis. Apply 80-20 rule: 20% of URLs generate 80% of traffic — cache only the hot set. Use LRU eviction. With a 90% cache hit rate, only 120 RPS reach the database instead of 1,200.

Example: Redis lookup: O(1). Cache miss → DynamoDB lookup → write back to Redis. Cache TTL should match URL expiration to avoid serving expired links from cache.
6

Custom Aliases

Users can specify a custom short key (e.g., short.ly/my-brand). Validation pipeline: (1) uniqueness check in DB, (2) character validation — alphanumeric + hyphens only, (3) reserved words check (block "admin", "api", "help"). Store alongside system-generated keys in the same table with a flag indicating it is user-defined.

Example: Reserved alias list: ["api", "admin", "help", "login", "signup", "static"]. If alias is taken, return 409 Conflict with suggestions like "my-brand-2".
7

Link Expiration

URLs can expire after a set time. Two mechanisms: (1) Real-time check — on every redirect, compare current time against expiration_date. Return HTTP 410 Gone if expired. (2) Background cleanup — a cron job periodically deletes expired entries from the DB and evicts them from Redis. Default TTL can be 1 year if unspecified.

Example: HTTP 410 Gone is preferred over 404 Not Found for expired links — it signals "this existed and is now intentionally gone."
8

Database Choice: NoSQL over SQL

URL shorteners are essentially a massive key-value store (shortKey → longURL). NoSQL (DynamoDB, Cassandra) outperforms SQL here because: single-key lookups are the primary operation, no complex joins are needed, horizontal sharding is trivial with shortKey as the partition key, and write throughput at millions/day requires distributed nodes. Schema: URLs table (short_key PK, long_url, user_id, created_at, expires_at, click_count) + Users table.

Example: DynamoDB partition key = short_key gives O(1) lookups at any scale. Cassandra offers tunable consistency: use QUORUM for writes, ONE for reads (fast redirects).

AI Tutor

Ask about the topic

Sign in Required

Please sign in to use the AI tutor

Sign In
Design TinyURL - Module 10: Real-world Case Studies | System Design | Revise Algo