observability-designlisted
Install: claude install-skill dtsong/my-claude-setup
# Observability Design
## Purpose
Design a comprehensive observability strategy covering metrics, logging, tracing, alerting, and SLI/SLO definitions. Produces a monitoring architecture that enables rapid incident detection, diagnosis, and resolution.
## Scope Constraints
Reads system architecture documentation, existing monitoring configurations, and service definitions for observability analysis. Does not modify files, deploy monitoring agents, or access production telemetry data directly.
## Inputs
- System architecture (services, databases, APIs, third-party dependencies)
- Current monitoring setup (existing tools, dashboards, alerts)
- Reliability requirements (SLA commitments, uptime targets)
- Team structure (on-call rotation, escalation paths)
## Input Sanitization
No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets.
## Procedure
### Progress Checklist
- [ ] Step 1: Define observability pillars
- [ ] Step 2: Design metric collection
- [ ] Step 3: Define alert thresholds and escalation
- [ ] Step 4: Plan structured logging
- [ ] Step 5: Design distributed tracing
- [ ] Step 6: Specify dashboard requirements
- [ ] Step 7: Define SLIs/SLOs
### Step 1: Define Observability Pillars
Establish the three pillars for this system:
- **Metrics**: What to measure — request rate, error rate, latency, saturation, business KPIs
- **Logs**: What to record — request lifecycle, state changes, errors, audit events