cost-quality-frontierlisted
Install: claude install-skill varunk130/AI-Eval-Skills
# Cost-Quality Frontier
Most eval suites today answer the question *"which model is most accurate?"* The production question is harder: *"which model gives me the best quality I can afford, at the latency budget my product allows?"* This skill takes existing eval runs and augments each result with cost and latency, then plots the candidates on a Pareto frontier so the trade-off is visible at a glance.
## Core Principle
**Quality, cost, and latency are a single decision, not three.** A model that is 2 points more accurate but costs 6× and adds 800ms of p95 latency is rarely the right pick. A model that is 1 point worse but cheaper *and* faster usually is - but only if you can see the frontier. This skill makes the frontier explicit.
---
## What You'll Get
| Artifact | Description |
|----------|-------------|
| **Augmented Eval Schema** | Every result row gains `cost_usd`, `input_tokens`, `output_tokens`, `latency_ms`, `model`, `pricing_source` columns |
| **Per-Model Aggregates** | Quality (mean + 95% CI), p50/p95 latency, cost-per-eval-run, cost-per-1k-runs, total tokens consumed |
| **Pareto Frontier Plot** | ASCII / Markdown table of which models are non-dominated on (quality, cost) and (quality, latency) |
| **Quality-per-Dollar Score** | Single composite: `quality_score / cost_per_1k_runs`, with a 95% CI from bootstrap |
| **Decision Matrix** | "If your latency budget is X and your cost ceiling is Y, the best model is Z" - for several common (X, Y) regimes |
| **Sen