Advanced20 min readReal-world Case Studies

Design Uber

Broadcasting location coordinates over WebSockets and matching drivers dynamically via Quadtrees.

What you'll learn

WebSockets for Real-Time GPS
Redis GEO for Geospatial Indexing
Geohash / S2 Cells for Spatial Partitioning
Driver-Rider Matching Algorithm
Surge Pricing (Supply/Demand Ratio)
Kafka + PostgreSQL for Trip Lifecycle

TL;DR

Broadcasting location coordinates over WebSockets and matching drivers dynamically via Quadtrees.

Visual System Topology

Uber — Real-Time Ride Matching Architecture

Driver App GPS every 4s

──►

Location Service WebSocket server

──►

Geospatial Index Redis GEO / Geohash

Rider App requests ride

Matching Service GEORADIUS query

Surge Pricing supply/demand ratio

Kafka trip events log

PostgreSQL trips, users, drivers

Cassandra location history

Match flow: Rider requests → GEORADIUS query on Redis → nearest available drivers → offer sent → first accept → trip starts
Surge: demand/supply ratio per geohash cell → multiplier applied → recalculated every 30 sec

Concept Overview

Uber is a real-time geo-distributed ride-matching platform. Its core challenge: match 1M+ active drivers to riders within 2 seconds using live GPS data that updates every 4 seconds from every active driver.

Functional Requirements:

Rider requests a ride (pickup location, destination)
Match rider to nearest available driver in < 2 seconds
Real-time GPS tracking of the driver's location
Dynamic pricing (surge when demand > supply)
Trip management (start, complete, cancel)
Ratings, payments, driver earnings

Non-Functional Requirements:

Driver location updates: 250K updates/sec (1M drivers × once/4 sec)
Matching latency: < 2 seconds from ride request to driver offer
99.9% availability
Location data freshness: < 5 seconds stale

Capacity Estimation:

Rides/month: 100M = 3.3M/day = 38 rides/sec
Active drivers at peak: 1M globally
GPS updates: 1M drivers × 1 update/4 sec = 250K writes/sec to Redis
Rider requests at peak: 38/sec globally (much lower than location updates)
Location storage (Cassandra history): 250K × 100 bytes = 25 MB/sec = 2 TB/day

Key Architectural Pillars

WebSockets for Real-Time GPS

Drivers send GPS coordinates every 4 seconds. Using HTTP REST (a new connection per update) would generate 250K × connection_overhead/sec — wasteful. WebSockets maintain one persistent connection per driver throughout their shift. The Location Service receives the stream of {driver_id, lat, lng, heading, speed} updates and writes them to Redis GEO in real time. Riders also receive driver location updates via WebSocket for the live map view.

Example: Driver goes online → WebSocket connection established → phone sends {lat: 37.774, lng: -122.418} every 4s over the same connection → Location Service updates Redis and Cassandra → Rider app receives live position via their WebSocket connection.

Redis GEO for Geospatial Indexing

SQL databases with lat/lng columns require WHERE lat BETWEEN X AND Y AND lng BETWEEN A AND B — this is a slow range scan even with composite indexes at scale. Redis GEO stores driver locations using geohash encoding under the hood. The GEORADIUS command efficiently returns all drivers within a radius of a point in O(N+log(M)) where N is the result count. At 1M active drivers, this query completes in < 10ms.

Example: Driver location stored: GEOADD active_drivers -122.418 37.774 "driver_42". Rider requests at (37.77, -122.41): GEORADIUS active_drivers -122.41 37.77 5 km ASC COUNT 10 → returns 10 nearest drivers sorted by distance in milliseconds.

Geohash / S2 Cells for Spatial Partitioning

The Earth's surface is divided into cells at multiple precision levels. A Geohash string encodes a region: "9q8y" ≈ 40km², "9q8yy" ≈ 1km². Drivers in the same cell are nearby. Benefits: (1) Nearby cells share a common prefix (geohash "9q8yy" and "9q8yz" are adjacent), (2) Cell-based surge pricing calculation (demand/supply ratio per cell), (3) Sharding: each geohash cell can be handled by a different server. Uber uses Google S2 cells (more uniform than geohash).

Example: San Francisco Geohash "9q8y": contains all of SF. Zoomed in, "9q8yy" = Mission District (1km²). Surge pricing: count riders who requested in this cell in last 5 minutes vs drivers who accepted in last 5 minutes → ratio > 1.5 → 1.5x surge multiplier applied.

Driver-Rider Matching Algorithm

When a rider requests, the Matching Service: (1) Queries Redis GEORADIUS for drivers within 5km, (2) Filters by availability (no active trip, online) and car type, (3) Estimates ETA using real-time traffic data for each candidate, (4) Ranks by ETA + rating score, (5) Sends ride offer to top 3 drivers simultaneously. First driver to accept gets the trip. Offer expires in 10 seconds (driver must respond). If rejected by all 3, expand radius and retry.

Example: Rider at 37.77N, -122.41W requests UberX. GEORADIUS returns 15 drivers within 5km. Filter: 10 are available. ETA estimate via routing API: 3 drivers are < 3 minutes away. Offer sent to all 3. Driver_12 accepts in 4 seconds → trip assigned.

Surge Pricing (Supply/Demand Ratio)

Surge pricing is calculated per geohash cell every 30 seconds: (demand_rate: ride requests per 5 min in cell) ÷ (supply_rate: available drivers in cell) = ratio. If ratio > threshold → surge multiplier applied (1.2x, 1.5x, 2.0x, up to configured max). Calculated by a Surge Pricing Service that reads aggregated metrics from Kafka (rider requests) and Redis GEO (available drivers per cell). Stored in Redis: surge:{cell_id} → multiplier with 30-second TTL.

Example: Rainy Friday night in Downtown Manhattan: 200 ride requests in last 5 min (geohash "dr5r") but only 20 available drivers. Ratio = 10x. Surge = 2.5x (capped). All new ride requests in this cell display the 2.5x multiplier before confirming.

Kafka + PostgreSQL for Trip Lifecycle

Every significant trip event (requested, driver_assigned, trip_started, trip_completed, payment_processed, rated) is published to Kafka. Downstream consumers: Analytics Service (build dashboards), Billing Service (finalize fare), Notification Service (send receipts), ML Training (improve ETA models). PostgreSQL stores the persistent trip record with all state transitions. This event-driven design decouples the core trip flow from all the downstream operations.

Example: Trip completed event → Kafka → (Billing calculates final fare based on distance/time) + (Notification sends email receipt) + (Analytics updates driver earnings report) + (ML training logs actual vs estimated ETA for model improvement). All in parallel, none blocking the trip completion.