finops-reviewlisted

Analyze the dispatch-log for per-feature spend, cache hit rate, and model routing — surface optimization opportunities
bakw00ds/yakos · ★ 2 · Code & Development · score 81

Install: claude install-skill bakw00ds/yakos

# FinOps Review ## Purpose Look at the dispatch-log with a finance hat on. Answer: - Where is the money going? Per feature, per agent, per model. - What fraction of input tokens are hitting cache? (Anything below ~70% on a stable system prompt is a smell.) - Is the routing sensible? Are opus calls doing work that haiku / gpt-5-nano / gemini-flash could do for 1/30th the cost? - Are there workloads on the realtime API that should be on the batch API (50% discount, 24h SLA)? The output is a list of *opportunities*, ranked by estimated monthly savings, not a blame report. Owned by the `ai-finops` agent. ## Scope - Reads `~/.yakos-state/dispatch-log*.ndjson` (current + rotated). - Joins with the agent registry (model alias per agent) and the runtime billing snapshot to compute per-call cost. - Computes: - **Spend by feature.** Tag-based: each dispatch carries a `feature_tag` (set by the lead or inferred from the calling agent's domain). - **Cache hit rate per system prompt.** Grouped by `system_prompt_hash`. Low hit rates point at unstable prompts (date-stamped headers, shuffled examples, etc.). - **Model routing audit.** For each agent, the distribution of model choices. Opus on a `cheap`-eligible agent is flagged. - **Batch-eligible candidates.** Workloads with high volume + low latency-sensitivity (offline rubric scoring, summarization backfills, etc.) that are running on the realtime API. - Output is a markdown report with thr