regen-eval-baselinelisted
Install: claude install-skill sumithr/sumo-qa
# regen-eval-baseline
Captures a promptfoo run for one sumo-qa skill and stores its JSON output in `docs/qa/runs/eval-baselines/` (gitignored). The deterministic work lives in `scripts/run_baseline.py`; this document is the guide for picking inputs and reading the output.
## When to use
Trigger this skill when the user wants a per-skill eval snapshot. Common phrasings: "baseline this skill", "snapshot the eval", "run the eval for skill X", "capture before/after for the rewrite I just made". The user invokes it explicitly with `/regen-eval-baseline`; it doesn't auto-trigger.
This is single-skill on purpose. Full-sweep regeneration belongs on `npm run eval:all`, which also reads the same `tests/evals/promptfoo/skill-*.yaml` files but runs them sequentially without snapshotting.
## Inputs
Pass **exactly one** config selector — either `--skill` (the base config) or `--config` (a suffixed scenario / `.ab.yaml` control / explicit path) — plus an optional `--label`.
1. **`--skill <name>`** — the base config `tests/evals/promptfoo/skill-<name>.yaml`. Resolves that file **exactly**; it never cross-matches a longer suffixed sibling (e.g. `--skill reviewing-before-merge` drives `skill-reviewing-before-merge.yaml`, NOT `skill-reviewing-before-merge-adversarial.yaml`). If the named config is absent the script lists the available configs.
2. **`--config <selector>`** — for a suffixed scenario config or an `.ab.yaml` A/B control. Accepts a bare stem (`skill-reviewing-before-merge-adv