phoenix-evals
FeaturedBuild and run evaluators for AI/LLM applications using Phoenix.
Install
Quality Score: 99/100
Skill Content
Details
- Author
- github
- Repository
- github/awesome-copilot
- Created
- 11 months ago
- Last Updated
- today
- Language
- Python
- License
- MIT
Similar Skills
Semantically similar based on skill content — not just same category
phoenix-observability
Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring production AI systems with real-time insights.
phoenix-observability
Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring production AI systems with real-time insights.
ai-evals
Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure AI output quality.
phoenix-arize-setup
Arize Phoenix observability platform setup for LLM debugging and evaluation
eval-driven-dev
Set up eval-based QA for Python LLM applications: instrument the app, build golden datasets, write and run eval tests, and iterate on failures. ALWAYS USE THIS SKILL when the user asks to set up QA, add tests, add evals, evaluate, benchmark, fix wrong behaviors, improve quality, or do quality assurance for any Python project that calls an LLM model.