Design TinyURL
Building a highly available URL shortener with unique base62 hash range distributions.
What you'll learn
- Base62 Encoding
- Approach 1 — Hashing (MD5/SHA-256 + Base62)
- Approach 2 — Unique ID Generation (Recommended)
- Redirection: 301 vs 302
- Caching Strategy
- Custom Aliases
- Link Expiration
- Database Choice: NoSQL over SQL
TL;DR
Building a highly available URL shortener with unique base62 hash range distributions.
Visual System Topology
URL Shortener — Full Architecture
Redirect: Client → LB → API → check Redis → (cache miss) → query DB → 301 redirect → browser caches it
Concept Overview
A URL Shortener (like TinyURL or Bitly) takes a long URL and returns a compact alias that redirects users to the original destination.
Functional Requirements:
- Generate a unique short URL for any long URL
- Redirect users to the original URL when the short URL is accessed
- Optional: custom aliases (user-defined short keys)
- Optional: link expiration (URLs become inactive after a set period)
- Optional: click analytics (tracking usage per link)
Non-Functional Requirements:
- High Availability: 99.9% uptime — a broken short link is a broken user experience
- Low Latency: URL creation and redirects should complete in milliseconds
- Scalability: handle millions of requests per day with horizontal scaling
- Durability: shortened URLs should remain valid for years
- Security: prevent abuse (phishing, spam), rate-limit creation endpoints
Capacity Estimation (1M writes/day, 100:1 read-to-write ratio):
- Writes/sec: ~12 avg, ~120 peak
- Redirects/sec: ~1,200 avg, ~12,000 peak
- Storage per URL: 127 bytes (7B short key + 100B long URL + timestamps + click count)
- Storage per year: ~46 GB (365M URLs × 127 bytes)
- Cache needed for hot 20% of URLs: ~25 MB (fits easily in Redis)
- With 90% cache hit rate, only ~120 RPS actually reach the database
Key Architectural Pillars
Base62 Encoding
Short keys are built from [A-Z, a-z, 0-9] — 62 characters. A 7-character Base62 string can represent 62⁷ ≈ 3.5 billion unique URLs. The approach: take a numeric ID → convert to Base62 → that becomes the short key. Example: ID 47770830013755 (decimal) → "DZFbb43" (Base62).
Approach 1 — Hashing (MD5/SHA-256 + Base62)
Hash the long URL using MD5, take the first 6 bytes (48 bits), convert to decimal, then Base62-encode. Fast and stateless but has two problems: (1) identical long URLs always produce the same short key (collision on purpose, but users sharing the same short URL is unexpected), and (2) rare hash collisions can occur where two different long URLs produce the same short key. Resolution: re-hash with a different seed, or append an incremental suffix.
Approach 2 — Unique ID Generation (Recommended)
Each new URL is assigned an auto-incrementing unique ID (via a distributed ID service like Twitter Snowflake or a range-based ZooKeeper coordinator). The ID is Base62-encoded into the short key. This is collision-free by design since IDs are always unique. Drawback: sequential IDs are predictable — mitigate by shuffling bits or adding randomness before encoding.
Redirection: 301 vs 302
301 (Permanent Redirect): browser caches the mapping and skips the server on future visits. Pros: dramatically reduces server load. Cons: click analytics become impossible since the browser never calls the server again. 302 (Temporary Redirect): server is always contacted on every click. Pros: full analytics tracking. Cons: higher server load. Choose based on whether analytics matter.
Caching Strategy
The redirect path (read) vastly outnumbers URL creation (write) at 100:1. Cache shortKey→longURL mappings in Redis. Apply 80-20 rule: 20% of URLs generate 80% of traffic — cache only the hot set. Use LRU eviction. With a 90% cache hit rate, only 120 RPS reach the database instead of 1,200.
Custom Aliases
Users can specify a custom short key (e.g., short.ly/my-brand). Validation pipeline: (1) uniqueness check in DB, (2) character validation — alphanumeric + hyphens only, (3) reserved words check (block "admin", "api", "help"). Store alongside system-generated keys in the same table with a flag indicating it is user-defined.
Link Expiration
URLs can expire after a set time. Two mechanisms: (1) Real-time check — on every redirect, compare current time against expiration_date. Return HTTP 410 Gone if expired. (2) Background cleanup — a cron job periodically deletes expired entries from the DB and evicts them from Redis. Default TTL can be 1 year if unspecified.
Database Choice: NoSQL over SQL
URL shorteners are essentially a massive key-value store (shortKey → longURL). NoSQL (DynamoDB, Cassandra) outperforms SQL here because: single-key lookups are the primary operation, no complex joins are needed, horizontal sharding is trivial with shortKey as the partition key, and write throughput at millions/day requires distributed nodes. Schema: URLs table (short_key PK, long_url, user_id, created_at, expires_at, click_count) + Users table.
