eval-run
SolidExecute or supervise a planned Mnemon harness eval run in an isolated HostAgent workspace.
Install
Quality Score: 88/100
Skill Content
Details
- Author
- mnemon-dev
- Repository
- mnemon-dev/mnemon
- Created
- 3 months ago
- Last Updated
- today
- Language
- Go
- License
- Apache-2.0
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
eval-plan
Design a scenario-driven Mnemon harness eval with target, hypothesis, HostAgent, loop configuration, evidence, and rubric.
eval-runner
Run eval scenarios to benchmark Mycelium effectiveness. Execute tasks using reflexion loop, validate against success criteria, record metrics.
eval-harness
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
eval-harness
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles
genesis-evals
Use this skill to run the genesis maintainer-side eval suite against a target model (default: claude-opus-4.7). Activate when validating a genesis PR, when changing the genesis catalogue (architectural-patterns, primitives, design-patterns, refactor-patterns, composition-substrate, pattern-tradeoffs, SKILL.md), or when the operator asks to "run evals" or "regenerate the eval matrix". This skill orchestrates parallel cold sub-agent spawns via the harness's task tool, scores deterministically, and converges P>=0.8 / N>=0.8 / R==1.0 within max 3 iteration loops. This skill is contributor-only -- it lives under dev/skills/ (OUTSIDE .apm/) and is NOT shipped inside the user-facing skills/genesis/ bundle (BUNDLE LEAKAGE discipline). See "Why this lives outside .apm/" below.