llmkitlisted
Install: claude install-skill hsigstad/research-kit
# llmkit — LLM Extraction Framework
Shared package at `~/research/packages/llmkit/` (`pip install -e`). Provides deterministic, auditable LLM extraction with Pydantic validation and file-backed caching. Designed for academic research where referees need to understand exactly what happened.
## Core API
```python
from llmkit import LLMCache, ExtractionSchema, extract, audit_sample
from llmkit.cache import text_hash, content_hash
from llmkit.extract import ExtractionResult
```
### `LLMCache(directory: Path)`
File-backed cache. Each entry is a JSON file with three sections: `_cache_meta`, `input_text`, `extraction`.
```python
cache = LLMCache(Path("cache_dir"))
key = cache.key(doc_id, text_hash, model) # composite key (back-compat)
key = cache.key(doc_id, text_hash, model, schema_name="my_task") # schema-aware key (recommended for new tasks)
hit = cache.get(key) # by composite key
hit = cache.get_by_doc(doc_id) # legacy fallback (doc_id.json)
cache.put(key, extraction, doc_id=..., text_hash=..., ...)
entries = cache.iter_entries()
```
`schema_name` is **opt-in** so existing caches (procure, EJ) stay valid. When you pass it, `schema_name` is mixed into the key — preventing collisions when two extraction tasks run against the same `(doc_id, text)` on the same model. New tasks should pass it; established tasks can migrate by renaming cache files (the new key is computable from each file's `_cache_meta`).
##