evals-context

Solid

Provides context about the Roo Code evals system structure in this monorepo. Use when tasks mention "evals", "evaluation", "eval runs", "eval exercises", or working with the evals infrastructure. Helps distinguish between the evals execution system (packages/evals, apps/web-evals) and the public website evals display page (apps/web-roo-code/src/app/evals).

AI & Automation 2,279 stars 168 forks Updated 3 weeks ago Apache-2.0

Install

View on GitHub

Quality Score: 91/100

Stars 20%
100
Recency 20%
90
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Evals Codebase Context ## When to Use This Skill Use this skill when the task involves: - Modifying or debugging the evals execution infrastructure - Adding new eval exercises or languages - Working with the evals web interface (apps/web-evals) - Modifying the public evals display page on roocode.com - Understanding where evals code lives in this monorepo ## When NOT to Use This Skill Do NOT use this skill when: - Working on unrelated parts of the codebase (extension, webview-ui, etc.) - The task is purely about the VS Code extension's core functionality - Working on the main website pages that don't involve evals ## Key Disambiguation: Two "Evals" Locations This monorepo has **two distinct evals-related locations** that can cause confusion: | Component | Path | Purpose | | --------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------- | | **Evals Execution System** | `packages/evals/` | Core eval infrastructure: CLI, DB schema, Docker configs | | **Evals Management UI** | `apps/web-evals/` | Next.js app for creating/monitoring eval runs (localhost:3446) | | **Website Evals Page** | `apps/web-roo-code/src/app/evals/` ...

Details

Author
foryourhealth111-pixel
Repository
foryourhealth111-pixel/Vibe-Skills
Created
3 months ago
Last Updated
3 weeks ago
Language
Python
License
Apache-2.0

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

eval-engineer

Use when a user is unsure which Eval Engineer command to run for AI agents/RAG apps, needs onboarding/status for a .galileo workspace, or asks where to start.

33 Updated 2 weeks ago
Galileo-Agent-Labs
Testing & QA Listed

run-evals

Run and iterate on existing skill evaluations. Use when evals/evals.json already exists and the user wants to run evals, re-evaluate after skill changes, check results, compare iterations, add/modify eval cases, or gate CI with thresholds. Triggers on "run evals", "re-eval", "how did it do", "check results", "compare iterations", "run benchmarks", or any eval-related request when evals already exist.

1 Updated 5 days ago
matantsach
AI & Automation Listed

genesis-evals

Use this skill to run the genesis maintainer-side eval suite against a target model (default: claude-opus-4.7). Activate when validating a genesis PR, when changing the genesis catalogue (architectural-patterns, primitives, design-patterns, refactor-patterns, composition-substrate, pattern-tradeoffs, SKILL.md), or when the operator asks to "run evals" or "regenerate the eval matrix". This skill orchestrates parallel cold sub-agent spawns via the harness's task tool, scores deterministically, and converges P>=0.8 / N>=0.8 / R==1.0 within max 3 iteration loops. This skill is contributor-only -- it lives under dev/skills/ (OUTSIDE .apm/) and is NOT shipped inside the user-facing skills/genesis/ bundle (BUNDLE LEAKAGE discipline). See "Why this lives outside .apm/" below.

38 Updated 1 weeks ago
danielmeppiel