agent-eval
SolidHead-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
Install
Quality Score: 96/100
Skill Content
Details
- Author
- affaan-m
- Repository
- affaan-m/everything-claude-code
- Created
- 4 months ago
- Last Updated
- yesterday
- Language
- JavaScript
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
agent-eval
Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
agent-eval
Head-to-head comparison of coding agents (Claude Code, Aider, Codex, etc.) on custom tasks with pass rate, cost, time, and consistency metrics
agent-evaluation
Evaluate LLM agents and tool-using workflows—task success, tool accuracy, latency/cost, safety, and regression suites. Use when shipping agent features, comparing prompts/models, or debugging agent failures.
eval-agent
Run evaluation tests against an agent to assess quality and archetype resistance
agent-eval
【Agent 评估】评估 AI Agent 输出质量。触发时机:用户说"评估 agent"、"测试 agent 质量"、"agent eval"、"检查 agent 输出"时。