shrink-vector-storelisted

Shrink an embedding/RAG vector store 4–32× via int8 or binary quantization with a float rescore pass, preserving recall and provenance metadata. Use when a vector store is too large to be laptop-resident, query cost/latency is too high, or embeddings need to be quantized for FAISS/Qdrant/usearch. Do NOT use for embed-time ingestion failures (e.g. a provider 'too many tokens' 400 — that is an upstream chunking bug, not a storage-size problem) and do NOT enable the dark TurboQuant 4-bit path without the gate below.
VictorGjn/agent-skills · ★ 1 · AI & Automation · score 67

Install: claude install-skill VictorGjn/agent-skills

# Shrink Vector Store Reduce a stored embedding index's footprint with **scalar (int8)** or **binary** quantization plus an **asymmetric float rescore**, so recall stays high while the store shrinks 4–32×. Vectors change; provenance metadata does not. ## Prerequisites - `numpy` (only hard dependency). Python 3. - The original float32 vectors (or a representative sample ≥10k) **and** a query set — required to measure recall; this skill cannot validate blind. ## Decision (commit to one) | Goal | Choice | Shrink | Recall | |------|--------|--------|--------| | Safe default | **int8 + float rescore** | ~4× | ≈ float (rescore recovers it) | | Maximum shrink | **binary + int8 rescore** | up to ~32× | ~96–99% with rescore | | Extra trim (optional) | + Matryoshka dim-trim | stacks | depends on model support | Pick **int8+rescore** unless the user explicitly needs maximum shrink. Do not offer both and ask — choose, state which, proceed. ## Workflow 1. **Confirm scope.** Verify it is a storage-size problem, not an ingestion failure (see above). If ingestion, stop and explain. 2. **Locate the float baseline.** You need the original float32 vectors (or a sample ≥10k) and a query set to measure recall against. 3. **Quantize + build index.** Run the reference script: ``` python scripts/quantize_embeddings.py --vectors vecs.npy --queries q.npy \ --precision int8 --rescore --out store/ ``` `--precision {int8,binary}`; `--rescore` keeps a small float/int8 co