ReviseAlgo Logo
Advanced20 min readReal-world Case Studies

Design Amazon

Decoupling shopping carts, order checkouts, payment APIs, and distributed transaction lockings.

What you'll learn

  • Microservices Architecture
  • DynamoDB for Shopping Cart
  • Saga Pattern for Distributed Transactions
  • Idempotency Keys for Payment Safety
  • Inventory Locking (Optimistic vs Pessimistic)
  • Elasticsearch for Product Search

TL;DR

Decoupling shopping carts, order checkouts, payment APIs, and distributed transaction lockings.

Visual System Topology

Amazon — E-Commerce Microservices Architecture

Client
API Gateway
Kafka Event Bus
Catalog Service Elasticsearch
Cart Service DynamoDB
Order Service PostgreSQL + Saga
Payment Service idempotency keys
Inventory Service optimistic locking
Fulfillment Service warehouse routing
Saga order flow: Order created → reserve inventory → charge payment → assign fulfillment → notify
Compensation: payment fails → release inventory reservation → cancel order → refund

Concept Overview

Amazon is the world's largest e-commerce platform, requiring a system that is simultaneously highly available (Black Friday!), strongly consistent for payments, and able to handle thousands of order transactions per second.

Functional Requirements:

  • Product catalog browsing and search
  • Shopping cart management (session-based, persistent)
  • Order placement and payment processing
  • Inventory management and stock reservation
  • Order tracking and fulfillment routing
  • Reviews, ratings, and recommendations

Non-Functional Requirements:

  • 99.99% availability during Black Friday (4 nines)
  • < 200ms product search response time
  • Exactly-once payment processing (never double-charge)
  • Horizontal scalability for 10x traffic spikes

Capacity Estimation (300M users, 20M DAU):

  • Orders/day: 1.5M normal, 15M on Black Friday peak
  • Peak order rate: 15M ÷ 86,400 × safety factor = 1,740 orders/sec at peak
  • Product catalog: 350M+ products
  • Product searches: 20M DAU × 10 searches = 200M/day = 2,315 searches/sec
  • Cart operations: 20M × 5 adds/day = 100M/day = 1,157 cart writes/sec

Key Architectural Pillars

1

Microservices Architecture

Amazon pioneered the "two-pizza team" rule: each service is owned by a team small enough to be fed by two pizzas. Services: Catalog (product data), Cart (session state), Order (order lifecycle), Payment (transaction processing), Inventory (stock levels), Fulfillment (warehouse routing), Notification (email/SMS). Each service has its own database (Database-per-Service pattern) — a Catalog search slowdown cannot cascade to block Order processing.

Example: Amazon's actual architecture: 100s of microservices. The checkout page makes 100+ service calls in parallel (recommendations, inventory check, payment validation, address verification, etc.) — all coordinated in < 1 second.
2

DynamoDB for Shopping Cart

The shopping cart is a key-value store naturally: key = user_id, value = {list of items, quantities, prices}. DynamoDB is ideal: O(1) get/put by user_id, extremely low read/write latency (< 5ms), scales to millions of writes/sec, and handles Black Friday spikes without configuration changes (on-demand capacity mode). SQL databases would require complex sharding for this access pattern.

Example: Cart schema: partition_key = user_id, item = [{product_id, quantity, price_snapshot, added_at}, ...]. PUTITEM on add-to-cart, GETITEM on checkout, DELETEITEM on purchase. No joins needed — the cart is purely user-scoped data.
3

Saga Pattern for Distributed Transactions

In a monolith, order checkout is one database transaction (ACID). In microservices, it spans 5 services — you cannot use 2-Phase Commit (2PC) because it locks resources across services and one slow service blocks everything. The Saga pattern breaks the transaction into a sequence of local transactions, each publishing an event. If any step fails, compensating transactions undo the previous steps.

Example: Order saga: Reserve inventory → Charge payment → Assign fulfillment → Send confirmation. If "Charge payment" fails: publish "payment-failed" event → Inventory Service runs compensating transaction "release reservation" → Order status set to FAILED. No distributed lock needed.
4

Idempotency Keys for Payment Safety

Payment processing must be exactly-once — double-charging a customer is a severe business and legal problem. If a network timeout occurs after the payment succeeds but before the response returns, the client may retry — causing a second charge. Solution: the client generates a unique idempotency_key (UUID) per checkout attempt. The Payment Service stores {idempotency_key → result} in Redis. On retry with the same key: return the cached result, don't re-charge.

Example: Client sends {idempotency_key: "uuid-123", amount: 99.99}. Payment succeeds, stored in Redis. Network drops. Client retries with same uuid-123. Payment Service finds it in Redis → returns "success" immediately, no new charge. Without this: customer sees two charges on their credit card.
5

Inventory Locking (Optimistic vs Pessimistic)

Pessimistic locking: Lock the inventory row when a user adds to cart. No other user can buy the same last item. Problem: locks are held for seconds during checkout — massive contention at scale. Optimistic locking: Allow multiple users to proceed; check stock at payment time. Use a version number: SELECT stock, version WHERE product_id = X → UPDATE stock = stock-1 WHERE version = {saved_version} → if update changes 0 rows, another user bought it first (conflict). Retry or show "out of stock."

Example: Optimistic: User A and B both see 1 item in stock. A's UPDATE sets version 1→2, stock 1→0 (success). B's UPDATE: WHERE version = 1 → 0 rows updated (version already 2) → B gets "out of stock." No locks held, highly concurrent.
6

Elasticsearch for Product Search

Product catalog has 350M+ items with rich attributes (name, brand, category, specs, price, ratings). SQL LIKE queries are too slow for full-text search at this scale. Elasticsearch provides: full-text search with relevance ranking, faceted filtering (category, price range, rating), autocomplete suggestions, and near-realtime index updates. Product metadata is asynchronously synced from PostgreSQL to Elasticsearch after any update.

Example: Search "wireless headphones under $100 with 5 stars": Elasticsearch query with full-text match + price range filter + rating filter → returns ranked results in < 50ms for 350M products. SQL would need a full table scan.

AI Tutor

Ask about the topic

Sign in Required

Please sign in to use the AI tutor

Sign In
Design Amazon - Module 10: Real-world Case Studies | System Design | Revise Algo