Algorithms and patterns for distributed agreement -- Raft, Paxos, ZAB.
Paxos is a family of protocols for achieving consensus among a group of unreliable processes. Originally described by Leslie Lamport, Paxos guarantees safety (agreement on a single value) under any non-Byzantine failure and remains the theoretical foundation for most production consensus systems.
Raft is a consensus algorithm designed for understandability. It provides the same safety guarantees as Multi-Paxos but decomposes the problem into leader election, log replication, and safety -- making it significantly easier to implement correctly. Raft powers etcd, CockroachDB, TiKV, and Consul.
Gossip protocols (epidemic protocols) disseminate information through a cluster by having each node periodically exchange state with a random peer. They provide probabilistic convergence, extreme scalability, and resilience to failures -- ideal for membership detection, failure detection, and eventually consistent data propagation.
Vector clocks are a mechanism for tracking causality in distributed systems. Each process maintains a vector of logical timestamps -- one per process -- enabling the system to determine whether two events are causally related or concurrent. Vector clocks are essential for conflict detection in eventually consistent databases.
Lamport timestamps are the simplest logical clock mechanism for ordering events in a distributed system. Each process maintains a counter that increments on every event and is updated on message receipt, establishing a partial ordering consistent with the happened-before relation.
Distributed locks provide mutual exclusion across multiple processes or machines, ensuring that only one client can hold a lock at a time. They are essential for coordinating access to shared resources in distributed systems but carry fundamental trade-offs between safety, liveness, and performance.
Leader election is the process of designating a single node in a distributed system as the coordinator for a specific task. It is fundamental to many distributed algorithms -- from database replication to job scheduling -- and can be implemented via consensus protocols, lease-based mechanisms, or distributed lock services.
Service discovery is the mechanism by which services in a distributed system locate each other's network addresses. It replaces hardcoded IP addresses with dynamic, health-aware resolution -- essential in cloud-native environments where instances are ephemeral and addresses change frequently.