cost-latency-optimizerlisted

Use this when an agent is too slow or too expensive — to apply the four levers (prompt caching, batching, streaming, model routing) in the right order and lint a cache-breakpoint plan for the mistakes that silently bust the cache. Triggers on "too expensive", "latency", "speed up", "cut cost", "prompt caching", "which model", "routing".
Luis247911/universal-ai-workspace-foundation · ★ 0 · AI & Automation · score 78

Install: claude install-skill Luis247911/universal-ai-workspace-foundation

# cost-latency-optimizer Cuts cost and latency with four levers, applied highest-leverage first. The bundled tool lints a **prompt-cache breakpoint plan** — because the #1 caching mistake (a breakpoint after volatile content) silently busts the cache on every request and costs more than no cache at all. ## When to use - An agent's bill or p95 latency is too high. - You added prompt caching but it isn't hitting. - Choosing which model handles which request (cheap-first with fallback). ## The four levers (detail in `reference.md`) 1. **Prompt caching** — highest leverage. Order the prefix most-stable-first, cache the stable part. 2. **Routing** — send easy requests to a cheap model, hard ones to a strong model; fall back on error. 3. **Batching** — for non-interactive bulk work, trade latency for throughput/discount. 4. **Streaming** — does not cut cost, but cuts *perceived* latency; use for interactive UX. ## Run it ``` python -m harness.router cache-lint # a sound plan -> "cache plan OK" python -m harness.router cache-lint --bad # a plan with mistakes -> warnings, non-zero exit python -m harness.router pick --alias fast # which deployment a route resolves to python -m harness.router explain # the fallback chains per error class ``` ## The cache rule (what the linter enforces) - Order segments most-stable-first: **tools → system → messages**. - Never place a breakpoint *after* volatile content (a timestamp, the latest user turn). - S