Advanced20 min readReal-world Case Studies

Design Amazon

Decoupling shopping carts, order checkouts, payment APIs, and distributed transaction lockings.

What you'll learn

Microservices Architecture
DynamoDB for Shopping Cart
Saga Pattern for Distributed Transactions
Idempotency Keys for Payment Safety
Inventory Locking (Optimistic vs Pessimistic)
Elasticsearch for Product Search

TL;DR

Decoupling shopping carts, order checkouts, payment APIs, and distributed transaction lockings.

Visual System Topology

Amazon — E-Commerce Microservices Architecture

Client

──►

API Gateway

──►

Kafka Event Bus

Catalog Service Elasticsearch

Cart Service DynamoDB

Order Service PostgreSQL + Saga

Payment Service idempotency keys

Inventory Service optimistic locking

Fulfillment Service warehouse routing

Saga order flow: Order created → reserve inventory → charge payment → assign fulfillment → notify
Compensation: payment fails → release inventory reservation → cancel order → refund

Concept Overview

Amazon is the world's largest e-commerce platform, requiring a system that is simultaneously highly available (Black Friday!), strongly consistent for payments, and able to handle thousands of order transactions per second.

Functional Requirements:

Product catalog browsing and search
Shopping cart management (session-based, persistent)
Order placement and payment processing
Inventory management and stock reservation
Order tracking and fulfillment routing
Reviews, ratings, and recommendations

Non-Functional Requirements:

99.99% availability during Black Friday (4 nines)
< 200ms product search response time
Exactly-once payment processing (never double-charge)
Horizontal scalability for 10x traffic spikes

Capacity Estimation (300M users, 20M DAU):

Orders/day: 1.5M normal, 15M on Black Friday peak
Peak order rate: 15M ÷ 86,400 × safety factor = 1,740 orders/sec at peak
Product catalog: 350M+ products
Product searches: 20M DAU × 10 searches = 200M/day = 2,315 searches/sec
Cart operations: 20M × 5 adds/day = 100M/day = 1,157 cart writes/sec

Key Architectural Pillars

Microservices Architecture

Amazon pioneered the "two-pizza team" rule: each service is owned by a team small enough to be fed by two pizzas. Services: Catalog (product data), Cart (session state), Order (order lifecycle), Payment (transaction processing), Inventory (stock levels), Fulfillment (warehouse routing), Notification (email/SMS). Each service has its own database (Database-per-Service pattern) — a Catalog search slowdown cannot cascade to block Order processing.

Example: Amazon's actual architecture: 100s of microservices. The checkout page makes 100+ service calls in parallel (recommendations, inventory check, payment validation, address verification, etc.) — all coordinated in < 1 second.

DynamoDB for Shopping Cart

The shopping cart is a key-value store naturally: key = user_id, value = {list of items, quantities, prices}. DynamoDB is ideal: O(1) get/put by user_id, extremely low read/write latency (< 5ms), scales to millions of writes/sec, and handles Black Friday spikes without configuration changes (on-demand capacity mode). SQL databases would require complex sharding for this access pattern.

Example: Cart schema: partition_key = user_id, item = [{product_id, quantity, price_snapshot, added_at}, ...]. PUTITEM on add-to-cart, GETITEM on checkout, DELETEITEM on purchase. No joins needed — the cart is purely user-scoped data.

Saga Pattern for Distributed Transactions

In a monolith, order checkout is one database transaction (ACID). In microservices, it spans 5 services — you cannot use 2-Phase Commit (2PC) because it locks resources across services and one slow service blocks everything. The Saga pattern breaks the transaction into a sequence of local transactions, each publishing an event. If any step fails, compensating transactions undo the previous steps.

Example: Order saga: Reserve inventory → Charge payment → Assign fulfillment → Send confirmation. If "Charge payment" fails: publish "payment-failed" event → Inventory Service runs compensating transaction "release reservation" → Order status set to FAILED. No distributed lock needed.

Idempotency Keys for Payment Safety

Payment processing must be exactly-once — double-charging a customer is a severe business and legal problem. If a network timeout occurs after the payment succeeds but before the response returns, the client may retry — causing a second charge. Solution: the client generates a unique idempotency_key (UUID) per checkout attempt. The Payment Service stores {idempotency_key → result} in Redis. On retry with the same key: return the cached result, don't re-charge.

Example: Client sends {idempotency_key: "uuid-123", amount: 99.99}. Payment succeeds, stored in Redis. Network drops. Client retries with same uuid-123. Payment Service finds it in Redis → returns "success" immediately, no new charge. Without this: customer sees two charges on their credit card.

Inventory Locking (Optimistic vs Pessimistic)

Pessimistic locking: Lock the inventory row when a user adds to cart. No other user can buy the same last item. Problem: locks are held for seconds during checkout — massive contention at scale. Optimistic locking: Allow multiple users to proceed; check stock at payment time. Use a version number: SELECT stock, version WHERE product_id = X → UPDATE stock = stock-1 WHERE version = {saved_version} → if update changes 0 rows, another user bought it first (conflict). Retry or show "out of stock."

Example: Optimistic: User A and B both see 1 item in stock. A's UPDATE sets version 1→2, stock 1→0 (success). B's UPDATE: WHERE version = 1 → 0 rows updated (version already 2) → B gets "out of stock." No locks held, highly concurrent.

Elasticsearch for Product Search

Product catalog has 350M+ items with rich attributes (name, brand, category, specs, price, ratings). SQL LIKE queries are too slow for full-text search at this scale. Elasticsearch provides: full-text search with relevance ranking, faceted filtering (category, price range, rating), autocomplete suggestions, and near-realtime index updates. Product metadata is asynchronously synced from PostgreSQL to Elasticsearch after any update.

Example: Search "wireless headphones under $100 with 5 stars": Elasticsearch query with full-text match + price range filter + rating filter → returns ranked results in < 50ms for 350M products. SQL would need a full table scan.