eval-analyze
SolidAnalyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.
Install
Quality Score: 88/100
Skill Content
Details
- Author
- mnemon-dev
- Repository
- mnemon-dev/mnemon
- Created
- 3 months ago
- Last Updated
- today
- Language
- Go
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
eval-analysis
Analyze eval results to understand signal quality and guide prompt/config changes. Use when the user says "eval-analysis", "analyze eval results", "what do the evals show", "eval regression", or asks about eval signal enrichment.
eval-improve
Turn stable Mnemon harness eval findings into scoped project, loop, adapter, docs, or eval asset improvements.
eval-plan
Design a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.
eval-result-interpreter
Analyzes AI agent evaluation results - primarily from Copilot Studio (the worked example here, via its CSV export) but also from custom harnesses or any evaluator that produces per-case pass/fail rows - using Microsoft's Triage & Improvement Playbook. Returns a SHIP / ITERATE / BLOCK verdict with root cause classification, diagnostic triage, prioritized remediation, and pattern analysis.
evaluate
This skill should be used when the user asks to "analyze results", "improve strategy", "run PDCA", "evaluate effectiveness", "check response rates", or wants to evaluate sales performance and improve strategy. Automatically analyzes and improves strategy, targeting, and messaging based on response rate data.