Distributed Locks
Avoiding concurrent transactional race states using global lock databases (Redlock, Chubby, ZooKeeper).
What you'll learn
- Redis SETNX + Expiry (Basic Pattern)
- Fencing Tokens
- Redlock Algorithm
- Consensus-Based Locks (etcd, ZooKeeper)
- Lock Granularity
- Lock Lease vs. Renewal
TL;DR
Avoiding concurrent transactional race states using global lock databases (Redlock, Chubby, ZooKeeper).
Visual System Topology
Distributed Locks Execution Topology
Concept Overview
A distributed lock is a mutual exclusion mechanism that coordinates access to shared resources across multiple independent processes in a distributed system. Unlike a single-process mutex, a distributed lock must be acquired via a network call to a shared store, making it subject to network partitions, clock skew, and process crashes.
Distributed locks are notoriously difficult to implement correctly. A naive implementation based on "check-then-set" in Redis is vulnerable to race conditions. Even "correct" implementations like Redlock are controversial because clock jumps, GC pauses, and network delays can cause the lock holder to believe it holds the lock while it has actually expired.
Common use cases: preventing duplicate cron job execution, limiting concurrent payment processing, coordinating leader election in small clusters, and ensuring exactly-once processing of expensive operations.
Key Architectural Pillars
Redis SETNX + Expiry (Basic Pattern)
The simplest distributed lock: SET key value NX PX timeout (set only if not exists, with expiry). The expiry prevents lock starvation if the lock holder crashes. The key problem: if the holder is paused longer than the TTL (GC pause, network issue), the lock expires and another process acquires it, creating two "holders."
Fencing Tokens
A monotonically increasing token issued by the lock service with each successful acquisition. All downstream operations (database writes, file writes) must include the token; the storage system rejects operations with stale tokens. This prevents a GC-paused old lock holder from corrupting state after its lock expired.
Redlock Algorithm
Martin Kleppmann's critique of Redlock: with 5 independent Redis nodes, acquire the lock from a majority (3+) within a TTL window. If acquired from majority, the lock is valid for (TTL - time_elapsed). Controversial because it doesn't use fencing tokens and can be violated by GC pauses.
Consensus-Based Locks (etcd, ZooKeeper)
Production-grade distributed locks use consensus algorithms (Raft in etcd, ZAB in ZooKeeper) to manage locks. The lock is a key in the consensus store. etcd leases provide TTL-based expiry with automatic renewal. Fencing tokens are built-in (etcd revision numbers are monotonically increasing).
Lock Granularity
Coarse-grained locks (single global lock) eliminate parallelism. Fine-grained locks (per-resource locks) maximize parallelism but increase lock management overhead. Sharding the lock key space achieves a middle ground.
Lock Lease vs. Renewal
Short TTLs reduce blast radius when a holder crashes (lock recovers quickly) but require periodic renewal for long-running operations. Long TTLs allow long operations without renewal but block progress for long periods if the holder crashes. Heartbeat-based renewal is the standard solution.
