distributed-logginglisted
Install: claude install-skill proyecto26/system-design-skills
# Distributed logging
Move logs from thousands of processes into one searchable place, fast enough to
debug a live incident and cheap enough to keep for months. Getting it wrong is a
classic "ignore failure" miss: the logging pipeline is itself a distributed system
that buckles under the exact traffic spike you most need it during, and a naive
design either drops the evidence or takes down the app it instruments.
## When to reach for this
More than one process emits logs and someone needs to search them together; an
incident requires correlating a request across services; log volume has outgrown
`grep` on a box; or compliance demands retention. The pipeline buys central search,
cross-service correlation, and a durable record decoupled from any single host.
## When NOT to
A single service on one host where `journald` + log rotation is enough — a full
pipeline is pure operational overhead (YAGNI). Numeric time-series questions ("what
is p99 latency", "is error rate up") belong to metrics, not log scans — that is
`observability`'s job; logs answer "what exactly happened to *this* request". Don't
ship every debug line at full volume before a number shows the volume justifies the
cost; sample first.
## Clarify first
- **Volume and peak** — lines/sec and bytes/sec, average and peak (→ `back-of-the-envelope`). This sizes every stage.
- **Structured or free-text** — can producers emit JSON now, or is there legacy text to parse?
- **Query latency need** — interactive search in sec