ci-failure-triagelisted
Install: claude install-skill IgorGanapolsky/ThumbGate
# CI Failure Triage — Read Before You Conclude
## The failure this prevents
Calling a red check "transient / flaky / orphaned / stale" WITHOUT reading its
body. This session: a CodeQL check failed; it was dismissed as a "transient 4s
orphaned check-run" — twice — when it was reporting **3 real security
vulnerabilities** (2 critical command-injection + 1 high XSS). The conclusion came
before the evidence. (2026 failure-triage practice = taxonomy → read → cluster →
gate, a *repeatable detection system*, not vibes:
https://latitude.so/blog/ai-agent-failure-modes-detection-playbook)
## Hard rule
**You may not use the words "transient", "flaky", "stale", "orphaned", or
"unrelated" about a check until you have read its failure body and quoted the
actual error.** A duration (e.g. "4s") is a hint, never proof.
## Protocol (in order — do not skip)
1. **Identify the exact failing check + its commit.**
```bash
head=$(gh pr view <N> --json headRefOid -q .headRefOid)
gh pr checks <N> | grep -viP "\t(pass|skipping)\t"
```
2. **Read the failure body. This is the step that gets skipped.**
- GitHub Actions job: `gh run view --job <id> --log-failed | tail -40`
- CodeQL / code-scanning: read the ALERTS, not just the check:
```bash
gh api "repos/<owner>/<repo>/code-scanning/alerts?state=open&per_page=100" \
--jq '.[] | "\(.rule.id) | \(.rule.security_severity_level) | \(.most_recent_instance.location.path):\(.most_recent_instance.location.start_line) | r