← ClaudeAtlas

bosskuai-eval-driven-agent-improvementlisted

Use this for agent eval design, routing tests, retrieval tests, LLM quality cases, regression harnesses, scorecards, and continuous agent improvement.
wankimmy/Bossku-AI · ★ 5 · AI & Automation · score 70
Install: claude install-skill wankimmy/Bossku-AI
# Bosskuai Eval Driven Agent Improvement Use this for agent eval design, routing tests, retrieval tests, LLM quality cases, regression harnesses, scorecards, and continuous agent improvement. ## Fast Path 1. Turn recurring agent failures into small deterministic or LLM-judged eval cases. 2. Separate routing, retrieval, workflow, quality, token, and safety evals. 3. Track pass rate and false confidence, not just green checks. 4. Avoid overfitting exact phrases; add fresh generalization cases. ## Default Checks - Turn recurring agent failures into small deterministic or LLM-judged eval cases. - Separate routing, retrieval, workflow, quality, token, and safety evals. - Track pass rate and false confidence, not just green checks. - Avoid overfitting exact phrases; add fresh generalization cases. - Require changelog entry for behavior changes. ## When To Open The Playbook Open `../../references/playbooks/bosskuai-eval-driven-agent-improvement-playbook.md` only when the task needs detailed workflow, implementation examples, or release-grade depth. ## Output Quality - Start with the verdict or action. - Separate confirmed facts, assumptions, and risks. - Include exact files, commands, tests, metrics, or rollback triggers when relevant. - Do not claim legal, security, or cost certainty without evidence. ## References - `../../references/playbooks/bosskuai-eval-driven-agent-improvement-playbook.md` - `../../references/checklists/eval-driven-agent-improvement-checklist.md`