← ClaudeAtlas

distributed-consensuslisted

Distributed consensus algorithms and logical time for cloud and multi-node systems. Covers Lamport clocks, vector clocks, FLP impossibility, Paxos (basic, multi, fast), Raft, Viewstamped Replication, Byzantine fault tolerance basics, quorum reads/writes (N/R/W), leader election, and TLA+ specification style. Use when designing replicated state machines, picking a consensus protocol, reasoning about split-brain and quorum loss, or writing formal specs for distributed coordination.
Tibsfox/gsd-skill-creator · ★ 61 · AI & Automation · score 80
Install: claude install-skill Tibsfox/gsd-skill-creator
# Distributed Consensus Consensus is the problem of getting a group of processes that fail independently to agree on a single value, or on a sequence of values, in the presence of network delays, message loss, and process crashes. Consensus is the foundation on which replicated state machines, leader election, distributed locks, configuration management, and strongly consistent databases are built. This skill catalogs the core results, algorithms, and design heuristics a cloud-systems practitioner needs to reason about coordination primitives without reinventing them. **Agent affinity:** lamport (consensus theory, logical clocks, TLA+), decandia (quorum mechanics in Dynamo-style stores), dean (Paxos/Spanner experience in production systems) **Concept IDs:** cloud-multi-service-coordination, cloud-procedure-execution, cloud-requirements-tracing ## Why Consensus is Hard Distributed systems fail in ways that single-node systems do not. Messages arrive late, arrive out of order, arrive twice, or never arrive. Processes crash, restart, and come back with stale state. Networks partition and heal. Clocks drift. Two observers watching the same sequence of events can see them in different orders and both be telling the truth about what they saw. Building reliable systems on this substrate requires algorithms that are correct under the worst combinations of these failures, not just the common cases. The core insight, due to Lamport, is that "time" in a distributed system is not a