Galileo-Agent-Labs
OrganizationBring Galileo powered eval workflows into Claude Code and Codex.
Categories
Indexed Skills (8)
eval-cost
Use when the user asks to make an AI app cheaper or faster, reduce tokens, latency, model/tool/retrieval/rerank/self-check/retry/evaluator cost, or compare cost before/after.
eval-dataset
Use when the user asks to turn a failure into an eval, create/review/accept/reject dataset cases, or convert Galileo traces, metric gaps, or production examples into cases.
eval-diagnose
Use when Galileo evidence is available and the user asks why a trace, session, log stream, experiment, metric, or AI app behavior failed, regressed, or became unsafe.
eval-engineer
Use when a user is unsure which Eval Engineer command to run for AI agents/RAG apps, needs onboarding/status for a .galileo workspace, or asks where to start.
eval-fetch
Use when a user asks to fetch Galileo evidence, provides Galileo URLs or IDs, says "fetch this Galileo link", or needs traces, sessions, experiments, or log streams saved locally.
eval-setup
Use when the user asks to set up Eval Engineer, check .galileo readiness, create workspace scaffolding, or configure editable files, verification commands, app type, or evidence paths.
eval-audit
Use when the user asks for an AI app audit, launch readiness review, safety/security review, OWASP agentic risk check, metric coverage review, or production RCA gap review.
eval-measure
Use when the user asks if an AI app is measured correctly, needs Galileo metrics, expected-output contracts, metric profiles, eval gates, or measurement before optimizing.
Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.