← ClaudeAtlas

live-evaluatorlisted

Use this skill in Stage 6 of feature-development to perform live verification with a fresh-context, skeptical-QA-framed agent — separate from the implementer that wrote the code. Triggers on "live verify this", "evaluate this end-to-end", "QA this against the deployed environment", or via feature-development Stage 6 launch. Forces evaluation to come from an adversarial reviewer, not the praised-by-its-author implementer. Output: a verification report at `specs/<feature>/E2E_VERIFICATION.md` with reproducible commands, observed evidence, and a pass/fail per acceptance criterion.
neuralforge-labs/tlmforge · ★ 4 · AI & Automation · score 55
Install: claude install-skill neuralforge-labs/tlmforge
# Live evaluator — fresh-context skeptical QA The implementer's context is full of "I just made this work" — they're optimistic about what works. Stage 6 verification needs an adversarial perspective from a reviewer that DIDN'T write the code. This skill defines that pattern. ## When to use **Triggers:** - feature-development Stage 6 ("Live verification + operator tooling") - User says "live verify this" / "QA this against the deployed env" - After deploying a feature to a staging/canary environment, before promoting to prod **When NOT to use:** - Unit / integration test verification — that's Stage 4 / Stage 5 territory - Pure code review — code-reviewer / red-team-reviewer handle that - Smoke tests on synthetic features — direct execution is fine; this skill is for real-environment verification ## How it works The skill is launched via `Agent(subagent_type="general-purpose", model="sonnet", ...)` with a fresh context (the launch prompt is self-contained — no prior conversation history). Inside the prompt: 1. **Skeptical-QA framing** ("you are a skeptical QA engineer; assume every check is wrong until you've reproduced it yourself") — counteracts implementer-praises-own-work bias. 2. **Acceptance criteria** are listed verbatim from `specs/<feature>/STATUS.md` or `phase-N-spec.md`'s "Verification criteria" section. 3. **Tool-use plan** — the agent uses Bash for backend (`curl`, `pytest`, log greps) and Playwright MCP for UI (clicks the actual UI, validates rend