Vetora logo
🔐Consensus & Coordination

Distributed Locks

Distributed locks provide mutual exclusion across multiple processes or machines, ensuring that only one client can hold a lock at a time. They are essential for coordinating access to shared resources in distributed systems but carry fundamental trade-offs between safety, liveness, and performance.

Overview

Distributed locks extend the concept of mutual exclusion from single-machine programming to distributed systems. While a local mutex protects a critical section within one process, a distributed lock protects a resource accessed by processes running on different machines -- a database row, a file, an external API with rate limits, or a scheduled job that should run on exactly one node. The fundamental challenge is that a lock holder can become unreachable (crash, network partition, GC pause) while still believing it holds the lock.

There are three main approaches to implementing distributed locks. First, single-node locks using Redis SET NX EX (set if not exists, with expiration). This is simple and fast (~1ms) but unsafe: if the Redis node crashes, the lock state is lost, and two clients may proceed simultaneously. Second, quorum-based locks like Redlock, which acquire locks on a majority of independent Redis nodes. Martin Kleppmann's analysis demonstrated that Redlock still has safety issues due to clock drift and GC pauses. Third, consensus-backed locks using ZooKeeper (ephemeral znodes), etcd (lease-based locks), or Consul (session-based locks). These provide the strongest safety guarantees because the lock state is replicated via Raft or ZAB consensus.

The most insidious problem with distributed locks is the 'paused client' scenario. A client acquires a lock, then experiences a long GC pause (or network delay). The lock's TTL expires, another client acquires the lock and modifies the shared resource, and then the first client resumes, still believing it holds the lock, and overwrites the changes. TTL-based expiration prevents deadlocks (a crashed client's lock eventually releases) but cannot prevent this split-brain scenario. The solution is fencing tokens: every lock acquisition returns a monotonically increasing token, and the resource server rejects any operation with a token lower than the highest it has seen.

Choosing the right distributed lock depends on your safety requirements. For efficiency locks (preventing duplicate work, where occasional double-execution is acceptable), a single Redis lock with TTL is sufficient and fast. For correctness locks (preventing data corruption, where even a single double-execution causes harm), you need a consensus-backed lock with fencing tokens. Many systems use the wrong tier -- applying a Redis lock to a correctness-critical resource, or paying the latency cost of ZooKeeper for an efficiency optimization.

Key Points
  • 1Distributed locks must handle lock-holder crashes. A TTL (time-to-live) on the lock ensures that a crashed holder's lock eventually expires. But TTL introduces a safety gap: the lock expires while the holder is merely slow, not crashed.
  • 2Fencing tokens are the gold standard for safety. Each lock acquisition returns a monotonically increasing number. The resource server rejects operations with stale (lower) tokens. This prevents paused or partitioned clients from corrupting data even if they believe they still hold the lock.
  • 3Single-node Redis locks (SET NX EX) offer ~1ms acquisition but no safety if Redis crashes or partitions. Suitable for efficiency locks (idempotent work deduplication) but not correctness locks (financial transactions).
  • 4Consensus-backed locks (ZooKeeper, etcd, Consul) replicate lock state via Raft or ZAB, surviving node failures. Acquisition takes 5-50ms depending on cluster configuration. These are appropriate for correctness-critical resources.
  • 5Redlock attempts to bridge the gap by acquiring locks on a majority of independent Redis instances. Kleppmann's analysis showed it is still vulnerable to clock drift and process pauses. Use fencing tokens regardless of the lock mechanism.
  • 6Auto-renewal (extending TTL while the holder is alive) reduces but does not eliminate the paused-client problem. A GC pause can exceed the renewal interval, and the renewal message can be delayed by a network partition.
Simple Example

Two Cron Jobs Competing for a Task

Two servers each run a cron job that processes a payment queue every minute. Without coordination, both servers process the same payments, causing double charges. Server A acquires a distributed lock on key 'payment-processor' with a 30-second TTL. If A gets the lock, it processes payments and releases the lock when done. If B tries to acquire the lock and fails, it skips this cycle. If A crashes mid-processing, the lock expires after 30 seconds, and B picks up the next cycle. The TTL prevents deadlock, but if A is merely slow (not crashed), both A and a new lock holder could process simultaneously -- hence the need for fencing tokens on the payment API.

Real-World Examples

Apache ZooKeeper

ZooKeeper provides distributed locks via ephemeral sequential znodes. To acquire a lock, a client creates an ephemeral znode under a lock path (e.g., /locks/payment/lock-000001). The client with the lowest sequence number holds the lock. If the holder crashes, its ephemeral znode is automatically deleted (ZooKeeper detects session timeout), and the next client in sequence acquires the lock. This approach provides fair (FIFO) ordering and automatic cleanup without TTL-based expiration.

etcd (Kubernetes Leader Election)

Kubernetes uses etcd leases for leader election, which is effectively a distributed lock. A controller creates a Lease object with a TTL and attaches it to an Endpoints resource. The holder periodically renews the lease. If the holder crashes, the lease expires and another controller acquires it. etcd's Raft-backed storage ensures the lease state survives etcd node failures. The lease TTL is typically 15 seconds, with renewals every 10 seconds.

Stripe (Idempotency Keys as Soft Locks)

Stripe uses idempotency keys as a form of distributed locking for payment operations. When a client sends a request with an idempotency key, Stripe atomically records the key and blocks concurrent requests with the same key until the first request completes. This prevents double charges even if the client retries due to network timeouts. The lock is released when the request completes (or after a TTL), and subsequent retries return the cached response.

Trade-Offs
AspectDescription
Safety vs AvailabilityA consensus-backed lock (ZooKeeper, etcd) provides safety -- at most one holder at a time -- but becomes unavailable if a majority of consensus nodes fail. A single-node Redis lock is highly available (one node must be reachable) but unsafe if that node crashes with unreplicated lock state.
TTL DurationShort TTLs (5s) minimize the window where a crashed holder blocks progress but increase the risk of premature expiration during GC pauses or network hiccups. Long TTLs (60s) are safer against false expirations but block other processes longer when the holder genuinely crashes. There is no universally correct TTL -- it depends on the expected critical section duration and the system's pause characteristics.
Performance vs CorrectnessRedis locks complete in ~1ms; ZooKeeper locks take ~10-50ms; cross-DC consensus locks take ~100-200ms. For high-throughput workloads processing thousands of lock-acquire/release cycles per second, the latency difference is significant. The choice depends on whether double-execution causes data corruption (need correctness) or merely wastes compute (need efficiency).
Complexity of FencingFencing tokens require the protected resource to participate in the protocol by rejecting stale tokens. This means modifying the resource server's API to accept and validate tokens -- adding complexity to every system that the lock protects. Many teams skip fencing because of this complexity and accept the small risk of double-execution.
Case Study

Martin Kleppmann's Analysis of Redlock

Scenario

Redis creator Salvatore Sanfilippo proposed Redlock, a distributed lock algorithm that acquires locks on a majority (3 of 5) of independent Redis instances. The claim was that Redlock provided stronger safety than a single Redis lock while maintaining Redis's performance. The distributed systems community needed to evaluate whether Redlock was safe enough for correctness-critical applications.

Solution

Martin Kleppmann published 'How to do distributed locking' (2016), analyzing Redlock's safety properties. He demonstrated two failure scenarios: (1) a client acquires the lock, then a long GC pause causes the lock's TTL to expire, and another client acquires it -- both clients believe they hold the lock; (2) clock drift on one of the 5 Redis nodes causes it to expire the lock early, breaking the quorum safety assumption. Kleppmann argued that for correctness-critical locks, you must use fencing tokens regardless of the lock mechanism, and if you use fencing tokens, a simple single-node lock is sufficient.

Outcome

The Kleppmann-Sanfilippo debate became one of the most educational discussions in distributed systems. The consensus among practitioners is: (1) for efficiency locks, use simple Redis SET NX EX; (2) for correctness locks, use ZooKeeper/etcd with fencing tokens; (3) Redlock adds complexity without providing the safety guarantees it claims. The debate highlighted that distributed lock safety requires reasoning about worst-case timing scenarios, not just normal operation.

Common Mistakes
  • Assuming a lock guarantees mutual exclusion across process pauses. A client can hold a lock, experience a GC pause longer than the TTL, and resume after another client has acquired the lock. Without fencing tokens, both clients corrupt the shared resource.
  • Not setting a TTL on the lock. Without TTL, a crashed lock holder blocks all other processes permanently (deadlock). Every distributed lock must have an expiration, even if it means accepting the paused-client risk.
  • Using Redis locks for correctness-critical operations. Redis replication is asynchronous -- if the master crashes after SET NX but before replicating to a replica, the lock state is lost and a second client can acquire the same lock.
  • Implementing Redlock without understanding its limitations. Redlock's safety depends on all Redis nodes having accurate, synchronized clocks and no GC pauses exceeding the lock's TTL. These assumptions rarely hold in practice.
  • Releasing a lock you no longer hold. If a client's lock expired due to TTL and another client acquired it, the first client's DELETE removes the second client's lock. Always verify the lock value (e.g., a unique token) before releasing.
Related Concepts

See Distributed Locks in action

Explore system design templates that use distributed locks and run traffic simulations to see how these concepts perform under real load.

Browse Templates

Test distributed lock contention during flash sales

Metrics to watch
lock_acquisition_mslock_contention_pctdeadlock_countthroughput_rps
Run Simulation
Test Your Understanding

1What problem do fencing tokens solve in distributed locking?

2Why is a single-node Redis lock unsafe for correctness-critical operations?

Deeper Reading