plan-quality-evallisted

Score work/current/plan.md against a 6-dimension rubric using a 3-judge cross-vendor LLM panel; write a structured record to ~/.yakos-state/plan-quality-log.ndjson
bakw00ds/yakos · ★ 2 · AI & Automation · score 81

Install: claude install-skill bakw00ds/yakos

# Plan Quality Eval ## Purpose Score a plan file against a 6-dimension structural rubric using a 3-judge cross-vendor LLM panel. Surfaces plans that are too vague, badly decomposed, or missing assumptions before specialists start work — when fixing the plan is cheap. Writes a structured NDJSON record so aggregate scoring trends are trackable over time. Phase 1 = **manual invocation only**. The auto-fire hook (pre-dispatch gate) comes in Phase 2. Per-project score thresholds come in Phase 2. Outcome telemetry aggregation comes in Phase 3. ## Scope - Reads a plan from the path passed as the first argument (typically `work/current/plan.md`). - Parses it into canonical JSON via `extract-plan.sh`. - Dispatches `plan-judge` 3× in parallel — one per vendor runtime (`claude`/`haiku`, `codex`/`gpt-5-nano`, `gemini`/`gemini-2.5-flash`). - Computes per-dimension medians and a weighted aggregate. - Applies family-drop: if the planner model family matches a judge family, that judge is dropped and the panel operates at size 2. - Applies dissent flag: if max–min ≥ 0.5 on any dimension, sets `dissent: true` and `recommended_action: surface_to_operator` regardless of aggregate score. - Cost ceiling: $0.15 per run. Hard-fails if estimated cost exceeds. - Writes one NDJSON record to `~/.yakos-state/plan-quality-log.ndjson`. - Emits a human-readable markdown report to stdout. ## When to use - Before dispatching specialists on a new plan — catch structural problems when they ar