prompt-cachinglisted
Install: claude install-skill Claudient/Claudient
# Prompt Caching
## When to activate
Claude API usage with repeated large context blocks — system prompts, prepended documents, large tool definitions — or when the user wants to reduce API costs or latency for workloads that reuse the same context across multiple calls.
## When NOT to use
- Single-shot API calls with no repeated context
- Conversations where the system prompt changes every request
- Contexts smaller than 1024 tokens (Claude 3) or 2048 tokens (Claude 3.5+ Haiku) — below these thresholds, caching has no effect
## Instructions
### How Prompt Caching Works
Cache breakpoints mark content blocks as eligible for caching. When Anthropic's infrastructure sees the same prefix again (up to the breakpoint), it reads from cache instead of reprocessing.
- **Cache write cost:** 1.25× standard input token price
- **Cache read cost:** 0.1× standard input token price
- **Break-even point:** ~9 reads of the same content
- **Default TTL:** 5 minutes
- **Extended TTL:** 1 hour — set `ENABLE_PROMPT_CACHING_1H=1` as an environment variable (beta)
### cache_control Syntax
Add `"cache_control": {"type": "ephemeral"}` to the last content block you want included in the cache prefix:
```python
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system=[
{
"type": "text",
"text": "<your large system prompt here — must be >1024 tokens to cache>",
"cach