← ClaudeAtlas

observability-designlisted

Use when designing monitoring, alerting, logging, tracing, and SLI/SLO strategies for services or systems. Covers metric collection, structured logging, distributed tracing, dashboard design, and error budget management. Do not use for deployment pipeline design (use deployment-plan) or infrastructure cost modeling (use cost-analysis).
dtsong/agentic-council · ★ 0 · DevOps & Infrastructure · score 78
Install: claude install-skill dtsong/agentic-council
# Observability Design ## Purpose Design a comprehensive observability strategy covering metrics, logging, tracing, alerting, and SLI/SLO definitions. Produces a monitoring architecture that enables rapid incident detection, diagnosis, and resolution. ## Scope Constraints Reads system architecture documentation, existing monitoring configurations, and service definitions for observability analysis. Does not modify files, deploy monitoring agents, or access production telemetry data directly. ## Inputs - System architecture (services, databases, APIs, third-party dependencies) - Current monitoring setup (existing tools, dashboards, alerts) - Reliability requirements (SLA commitments, uptime targets) - Team structure (on-call rotation, escalation paths) ## Input Sanitization No user-provided values are used in commands or file paths. All inputs are treated as read-only analysis targets. ## Procedure ### Progress Checklist - [ ] Step 1: Define observability pillars - [ ] Step 2: Design metric collection - [ ] Step 3: Define alert thresholds and escalation - [ ] Step 4: Plan structured logging - [ ] Step 5: Design distributed tracing - [ ] Step 6: Specify dashboard requirements - [ ] Step 7: Define SLIs/SLOs ### Step 1: Define Observability Pillars Establish the three pillars for this system: - **Metrics**: What to measure — request rate, error rate, latency, saturation, business KPIs - **Logs**: What to record — request lifecycle, state changes, errors, audit events