gemini-prompt-cachelisted
Install: claude install-skill baronguyen001/ai-automation-skills
# Gemini Prompt Cache
Use this skill when the same large instruction block - rules, a response schema, few-shot examples - is sent on every Gemini request. Caching that prefix once turns those repeated input tokens into a much cheaper cached-token rate, which matters a lot at volume. This wraps Gemini context caching: create the cache once, then point every generate call at the cached handle.
## When to invoke
- User says: "cache my gemini system prompt" / "cut gemini input cost" / "context caching" / "reuse a long prompt prefix"
- Code in the conversation uses: a fixed multi-thousand-token system instruction repeated across many `generate_content` calls.
## When NOT to invoke
- The system prompt is short, or it changes on every call (caching a one-off prefix loses money on the cache write).
- The user wants output-side savings (use [[gemini-flash-budget]] to drop thinking tokens instead).
## Concrete example
User input:
```text
I send the same 4k-token extraction ruleset to Gemini on every row. Stop paying full price for it.
```
Output:
```python
# Copy assets/cache.py into your project, then:
from cache import create_cache, generate_with_cache
prefix = create_cache(BIG_RULESET, model="gemini-2.5-flash", ttl_seconds=1800)
for row in rows:
print(generate_with_cache(prefix, row)) # ruleset billed at cached rate
```
The helper reads `GEMINI_API_KEY` from the environment and requires `google-genai`.
## Pattern to apply
1. Separate the stable prefix (rules, sc