← ClaudeAtlas

exp-evallisted

Experiment verdict gate — Review LLM independently judges results → 4 verdict paths → auto-update claims confidence, ideas status, graph edges
Lambenthan/empiricalwiki · ★ 45 · AI & Automation · score 80
Install: claude install-skill Lambenthan/empiricalwiki
# /exp-eval > Convert completed experiment results into wiki knowledge updates. > Review LLM acts as an impartial judge (following cross-model-review), independently evaluating how experimental results affect the target claim. > Four verdict paths: supported → claim↑ + idea validated / partially_supported → supplementary experiments / > not_supported → claim↓ + idea failed / inconclusive → debug. > Auto-updates claims confidence and evidence, ideas status, and graph edges. ## Inputs - `experiment`: slug from wiki/experiments/ (status must be `completed`) - `--auto` (optional): automatic mode — do not pause for user confirmation before wiki updates (used when called by /research) ## Outputs - `wiki/claims/{slug}.md` — updated confidence, status, evidence list - `wiki/ideas/{slug}.md` — updated status (validated/failed), pilot_result, failure_reason - `wiki/experiments/{slug}.md` — `## Claim updates` section filled in - `wiki/graph/edges.jsonl` — new supports/invalidates edges added - `wiki/graph/context_brief.md` — rebuilt - `wiki/graph/open_questions.md` — rebuilt - `wiki/log.md` — appended log entry - **VERDICT_REPORT** (printed to terminal) — verdict result, wiki change summary, next step suggestions ## Wiki Interaction ### Reads - `wiki/experiments/{slug}.md` — experiment results: outcome, key_result, metrics, full Results section - `wiki/claims/{target-claim}.md` — target claim current state: status, confidence, evidence list - `wiki/ideas/{linked-idea}.md` — linke