Galileo-Agent-Labs

eval-cost

Use when the user asks to make an AI app cheaper or faster, reduce tokens, latency, model/tool/retrieval/rerank/self-check/retry/evaluator cost, or compare cost before/after.

Data & Documents Listed

eval-dataset

Use when the user asks to turn a failure into an eval, create/review/accept/reject dataset cases, or convert Galileo traces, metric gaps, or production examples into cases.

eval-diagnose

Use when Galileo evidence is available and the user asks why a trace, session, log stream, experiment, metric, or AI app behavior failed, regressed, or became unsafe.

eval-engineer

Use when a user is unsure which Eval Engineer command to run for AI agents/RAG apps, needs onboarding/status for a .galileo workspace, or asks where to start.

eval-fetch

Use when a user asks to fetch Galileo evidence, provides Galileo URLs or IDs, says "fetch this Galileo link", or needs traces, sessions, experiments, or log streams saved locally.

eval-setup

Use when the user asks to set up Eval Engineer, check .galileo readiness, create workspace scaffolding, or configure editable files, verification commands, app type, or evidence paths.

eval-audit

Use when the user asks for an AI app audit, launch readiness review, safety/security review, OWASP agentic risk check, metric coverage review, or production RCA gap review.

eval-measure

Use when the user asks if an AI app is measured correctly, needs Galileo metrics, expected-output contracts, metric profiles, eval gates, or measurement before optimizing.