phoenix-evals

Featured

Build and run evaluators for AI/LLM applications using Phoenix.

AI & Automation 34,233 stars 4188 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Phoenix Evals Build evaluators for AI/LLM applications. Code first, LLM for nuance, validate against humans. ## Quick Reference | Task | Files | | ---- | ----- | | Setup | [setup-python](references/setup-python.md), [setup-typescript](references/setup-typescript.md) | | Decide what to evaluate | [evaluators-overview](references/evaluators-overview.md) | | Choose a judge model | [fundamentals-model-selection](references/fundamentals-model-selection.md) | | Use pre-built evaluators | [evaluators-pre-built](references/evaluators-pre-built.md) | | Build code evaluator | [evaluators-code-python](references/evaluators-code-python.md), [evaluators-code-typescript](references/evaluators-code-typescript.md) | | Build LLM evaluator | [evaluators-llm-python](references/evaluators-llm-python.md), [evaluators-llm-typescript](references/evaluators-llm-typescript.md), [evaluators-custom-templates](references/evaluators-custom-templates.md) | | Batch evaluate DataFrame | [evaluate-dataframe-python](references/evaluate-dataframe-python.md) | | Run experiment | [experiments-running-python](references/experiments-running-python.md), [experiments-running-typescript](references/experiments-running-typescript.md) | | Create dataset | [experiments-datasets-python](references/experiments-datasets-python.md), [experiments-datasets-typescript](references/experiments-datasets-typescript.md) | | Generate synthetic data | [experiments-synthetic-python](references/experiments-synthetic-python.md), [e...

Details

Author: github
Repository: github/awesome-copilot
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

phoenix-observability

Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring production AI systems with real-time insights.

27,705 Updated today

davila7

AI & Automation Solid

phoenix-observability

9,182 Updated 1 months ago

Orchestra-Research

AI & Automation Listed

ai-evals

Help users create and run AI evaluations. Use when someone is building evals for LLM products, measuring model quality, creating test cases, designing rubrics, or trying to systematically measure AI output quality.

0 Updated today

TindanLawrence

AI & Automation Solid

phoenix-arize-setup

Arize Phoenix observability platform setup for LLM debugging and evaluation

1,160 Updated today

a5c-ai

AI & Automation Featured

eval-driven-dev

Set up eval-based QA for Python LLM applications: instrument the app, build golden datasets, write and run eval tests, and iterate on failures. ALWAYS USE THIS SKILL when the user asks to set up QA, add tests, add evals, evaluate, benchmark, fix wrong behaviors, improve quality, or do quality assurance for any Python project that calls an LLM model.

34,233 Updated today

github