eval-improve
SolidTurn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.
Install
Quality Score: 88/100
Skill Content
Details
- Author
- mnemon-dev
- Repository
- mnemon-dev/mnemon
- Created
- 3 months ago
- Last Updated
- today
- Language
- Go
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
eval-analyze
Analyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.
eval-plan
Design a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.
eval-triage-and-improvement
Use this skill when AI agent evaluations have come back and the user needs to interpret scores, diagnose root causes of underperforming test cases, find remediation steps, or analyze patterns to improve their agent. Works against any agent platform - Copilot Studio is the primary worked example here, but the triage framework applies equally to custom harnesses, LangChain/LangGraph, AutoGen, Semantic Kernel, OpenAI Assistants, and other agent runtimes. Always use this skill when the user mentions: "eval failed", "why did this fail", "triage", "diagnose failure", "low pass rate", "fix evaluation results", "not passing", "failing test cases", "evaluation results", "improve my eval scores", or any situation where eval scores need interpretation and action.
eval-harness
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
eval-harness
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles