agent-evallisted
Install: claude install-skill goharabbas321/zeoel-framework
# Agent Eval Skill
A lightweight CLI tool for comparing coding agents head-to-head on reproducible tasks. Every "which coding agent is best?" comparison runs on vibes — this tool systematizes it.
## When to Activate
- Comparing coding agents (Claude Code, Aider, Codex, etc.) on your own codebase
- Measuring agent performance before adopting a new tool or model
- Running regression checks when an agent updates its model or tooling
- Producing data-backed agent selection decisions for a team
## Installation
> **Note:** Install agent-eval from its repository after reviewing the source.
## Core Concepts
### YAML Task Definitions
Define tasks declaratively. Each task specifies what to do, which files to touch, and how to judge success:
```yaml
name: add-retry-logic
description: Add exponential backoff retry to the HTTP client
repo: ./my-project
files:
- src/http_client.py
prompt: |
Add retry logic with exponential backoff to all HTTP requests.
Max 3 retries. Initial delay 1s, max delay 30s.
judge:
- type: pytest
command: pytest tests/test_http_client.py -v
- type: grep
pattern: "exponential_backoff|retry"
files: src/http_client.py
commit: "abc1234" # pin to specific commit for reproducibility
```
### Git Worktree Isolation
Each agent run gets its own git worktree — no Docker required. This provides reproducibility isolation so agents cannot interfere with each other or corrupt the base repo.
### Metrics Collected
| Metric | What It Measures |
|-