nexus-reliabilitylisted
Install: claude install-skill aayushostwal/nexus
# Nexus Production Incident Investigator
Time-boxed, evidence-driven workflow for investigating production incidents and evaluating release readiness.
---
## Compatibility
- Sub-skills: `release-readiness.md` | Materials: `checklists/incident-checklist.md`, `anti-patterns/common-mistakes.md`, `validation/output-validation.md`
- Required: Read, Bash, Grep | Optional: WebSearch | Hands off to: `nexus:debugging` (code RCA), `nexus:planning` (post-mortem actions)
---
## Core Principle
**Stabilize first. Investigate second. Never skip the timeline.**
---
## Incident Classification
| Severity | Definition | Response target |
|----------|-----------|-----------------|
| **P0** | Complete outage or data loss affecting all/most users | Immediate; page all on-call |
| **P1** | Major feature broken or >25% requests failing | < 5 min acknowledgement; page on-call lead |
| **P2** | Degraded performance or partial failure | < 15 min acknowledgement; one engineer |
| **P3** | Minor anomaly, no user impact | Next business hour |
If you cannot classify immediately, default to P1 and downgrade after Phase 1.
---
## Investigation Workflow
### Phase 1 — Triage (0–5 min)
1. **Confirm the signal** — cross-check alert against at least one independent source (logs, dashboard, manual repro). Never declare from a single data point.
2. **Classify severity** using the table above.
3. **Identify blast radius** — which services, user segments, and whether data integrity is at risk.
4. **Open