llmkitlisted

Reference for the llmkit LLM extraction framework — cache design, Pydantic validation, audit workflow, and per-project setup. Use when writing code that does LLM-based structured extraction, setting up new extraction tasks, or working with cached LLM outputs.
hsigstad/research-kit · ★ 0 · AI & Automation · score 75

Install: claude install-skill hsigstad/research-kit

# llmkit — LLM Extraction Framework Shared package at `~/research/packages/llmkit/` (`pip install -e`). Provides deterministic, auditable LLM extraction with Pydantic validation and file-backed caching. Designed for academic research where referees need to understand exactly what happened. ## Core API ```python from llmkit import LLMCache, ExtractionSchema, extract, audit_sample from llmkit.cache import text_hash, content_hash from llmkit.extract import ExtractionResult ``` ### `LLMCache(directory: Path)` File-backed cache. Each entry is a JSON file with three sections: `_cache_meta`, `input_text`, `extraction`. ```python cache = LLMCache(Path("cache_dir")) key = cache.key(doc_id, text_hash, model) # composite key (back-compat) key = cache.key(doc_id, text_hash, model, schema_name="my_task") # schema-aware key (recommended for new tasks) hit = cache.get(key) # by composite key hit = cache.get_by_doc(doc_id) # legacy fallback (doc_id.json) cache.put(key, extraction, doc_id=..., text_hash=..., ...) entries = cache.iter_entries() ``` `schema_name` is **opt-in** so existing caches (procure, EJ) stay valid. When you pass it, `schema_name` is mixed into the key — preventing collisions when two extraction tasks run against the same `(doc_id, text)` on the same model. New tasks should pass it; established tasks can migrate by renaming cache files (the new key is computable from each file's `_cache_meta`). ##