cost-aware-llm-pipeline

Solid

LLM API 使用成本优化模式 —— 基于任务复杂度的模型路由、预算跟踪、重试逻辑和提示缓存。

AI & Automation 201,447 stars 30903 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# 成本感知型 LLM 流水线 在保持质量的同时控制 LLM API 成本的模式。将模型路由、预算跟踪、重试逻辑和提示词缓存组合成一个可组合的流水线。 ## 何时激活 * 构建调用 LLM API(Claude、GPT 等)的应用程序时 * 处理具有不同复杂度的批量项目时 * 需要将 API 支出控制在预算范围内时 * 需要在复杂任务上优化成本而不牺牲质量时 ## 核心概念 ### 1. 根据任务复杂度进行模型路由 自动为简单任务选择更便宜的模型,为复杂任务保留昂贵的模型。 ```python MODEL_SONNET = "claude-sonnet-4-6" MODEL_HAIKU = "claude-haiku-4-5-20251001" _SONNET_TEXT_THRESHOLD = 10_000 # chars _SONNET_ITEM_THRESHOLD = 30 # items def select_model( text_length: int, item_count: int, force_model: str | None = None, ) -> str: """Select model based on task complexity.""" if force_model is not None: return force_model if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD: return MODEL_SONNET # Complex task return MODEL_HAIKU # Simple task (3-4x cheaper) ``` ### 2. 不可变的成本跟踪 使用冻结的数据类跟踪累计支出。每个 API 调用都会返回一个新的跟踪器 —— 永不改变状态。 ```python from dataclasses import dataclass @dataclass(frozen=True, slots=True) class CostRecord: model: str input_tokens: int output_tokens: int cost_usd: float @dataclass(frozen=True, slots=True) class CostTracker: budget_limit: float = 1.00 records: tuple[CostRecord, ...] = () def add(self, record: CostRecord) -> "CostTracker": """Return new tracker with added record (never mutates self).""" return CostTracker( budget_limit=self.budget_limit, records=(*self.records, record), ) @property def total_cost(self) -> float: ...

Details

Author
affaan-m
Repository
affaan-m/everything-claude-code
Created
4 months ago
Last Updated
yesterday
Language
JavaScript
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

cost-aware-pipeline

Cost-aware LLM pipeline patterns for optimal model routing, narrow retry strategies, and prompt caching. Reduces API costs 40-70% through intelligent model selection, targeted retries, and cache-friendly prompt structures. Use when: (1) Building multi-model pipelines, (2) Optimizing API costs, (3) Designing retry strategies for LLM calls, (4) Implementing prompt caching, (5) Choosing between haiku/sonnet/opus for sub-tasks.

11 Updated today
stevengonsalvez
AI & Automation Listed

llm-cost-optimizer

Analyze and reduce LLM API costs through model routing, caching, and prompt optimization. TRIGGER when: user asks about LLM costs, API spend reduction, token optimization, model routing, or prompt caching. DO NOT TRIGGER when: user asks about model quality comparison, fine-tuning, or general prompt engineering.

1 Updated 1 weeks ago
DROOdotFOO
AI & Automation Featured

langchain-cost-tuning

Optimize LangChain API costs with token tracking, model tiering, caching, prompt compression, and budget enforcement. Trigger: "langchain cost", "langchain tokens", "reduce langchain cost", "langchain billing", "langchain budget", "token optimization".

2,274 Updated today
jeremylongshore
AI & Automation Solid

llm-cost-optimizer

Use when you need to reduce LLM API spend, control token usage, route between models by cost/quality, implement prompt caching, or build cost observability for AI features. Triggers: 'my AI costs are too high', 'optimize token usage', 'which model should I use', 'LLM spend is out of control', 'implement prompt caching'. NOT for RAG pipeline design (use rag-architect). NOT for prompt writing quality (use senior-prompt-engineer).

16,782 Updated 3 days ago
alirezarezvani
AI & Automation Featured

langfuse-cost-tuning

Monitor and optimize LLM costs using Langfuse analytics and dashboards. Use when tracking LLM spending, identifying cost anomalies, or implementing cost controls for AI applications. Trigger with phrases like "langfuse costs", "LLM spending", "track AI costs", "langfuse token usage", "optimize LLM budget".

2,274 Updated today
jeremylongshore