agent-benchmark
SolidFramework for measuring and tracking agent response quality over time. Detects regressions before they reach production. Use when evaluating agent changes, auditing quality, or establishing performance baselines.
Install
Quality Score: 86/100
Skill Content
Details
- Author
- vibeeval
- Repository
- vibeeval/vibecosystem
- Created
- 2 months ago
- Last Updated
- 1 months ago
- Language
- C#
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
agent-benchmark-suite
Agent skill for benchmark-suite - invoke with $agent-benchmark-suite
agent-performance-benchmarker
Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker
agent-evaluation
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks
agent-evaluation
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks
agent-evaluation
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks