plan-quality-evallisted
Install: claude install-skill bakw00ds/yakos
# Plan Quality Eval
## Purpose
Score a plan file against a 6-dimension structural rubric using a 3-judge
cross-vendor LLM panel. Surfaces plans that are too vague, badly
decomposed, or missing assumptions before specialists start work —
when fixing the plan is cheap. Writes a structured NDJSON record so
aggregate scoring trends are trackable over time.
Phase 1 = **manual invocation only**. The auto-fire hook (pre-dispatch
gate) comes in Phase 2. Per-project score thresholds come in Phase 2.
Outcome telemetry aggregation comes in Phase 3.
## Scope
- Reads a plan from the path passed as the first argument (typically
`work/current/plan.md`).
- Parses it into canonical JSON via `extract-plan.sh`.
- Dispatches `plan-judge` 3× in parallel — one per vendor runtime
(`claude`/`haiku`, `codex`/`gpt-5-nano`, `gemini`/`gemini-2.5-flash`).
- Computes per-dimension medians and a weighted aggregate.
- Applies family-drop: if the planner model family matches a judge family,
that judge is dropped and the panel operates at size 2.
- Applies dissent flag: if max–min ≥ 0.5 on any dimension, sets
`dissent: true` and `recommended_action: surface_to_operator` regardless
of aggregate score.
- Cost ceiling: $0.15 per run. Hard-fails if estimated cost exceeds.
- Writes one NDJSON record to `~/.yakos-state/plan-quality-log.ndjson`.
- Emits a human-readable markdown report to stdout.
## When to use
- Before dispatching specialists on a new plan — catch structural problems
when they ar