eval-analyze

Solid

Analyze Mnemon harness eval reports, classify outcomes, and extract improvement evidence.

AI & Automation 322 stars 46 forks Updated today Apache-2.0

Install

View on GitHub

Quality Score: 88/100

Stars 20%
84
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
45
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Eval Analyze Use this skill after an eval run to judge behavior and extract improvement evidence. ## Procedure 1. Read the report, relevant artifact summaries, and the selected rubric. 2. Compare observed behavior to the hypothesis. 3. Classify the outcome: - `pass`: behavior meets the rubric. - `weak`: partially useful but missing expected evidence or consistency. - `fail`: behavior contradicts the target expectation. - `invalid`: setup or scenario issue prevents judgement. 4. Identify the likely improvement target: - memory - skill - eval - host adapter - setup - docs - scenario or rubric 5. If a new eval asset is warranted, create a candidate summary instead of editing canonical assets immediately. ## Output Write a concise analysis with: - outcome - evidence - likely cause - recommended next action - candidate eval asset path, if any

Details

Author
mnemon-dev
Repository
mnemon-dev/mnemon
Created
3 months ago
Last Updated
today
Language
Go
License
Apache-2.0

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category