shrink-vector-storelisted
Install: claude install-skill VictorGjn/agent-skills
# Shrink Vector Store
Reduce a stored embedding index's footprint with **scalar (int8)** or
**binary** quantization plus an **asymmetric float rescore**, so recall
stays high while the store shrinks 4–32×. Vectors change; provenance
metadata does not.
## Prerequisites
- `numpy` (only hard dependency). Python 3.
- The original float32 vectors (or a representative sample ≥10k) **and** a
query set — required to measure recall; this skill cannot validate blind.
## Decision (commit to one)
| Goal | Choice | Shrink | Recall |
|------|--------|--------|--------|
| Safe default | **int8 + float rescore** | ~4× | ≈ float (rescore recovers it) |
| Maximum shrink | **binary + int8 rescore** | up to ~32× | ~96–99% with rescore |
| Extra trim (optional) | + Matryoshka dim-trim | stacks | depends on model support |
Pick **int8+rescore** unless the user explicitly needs maximum shrink.
Do not offer both and ask — choose, state which, proceed.
## Workflow
1. **Confirm scope.** Verify it is a storage-size problem, not an
ingestion failure (see above). If ingestion, stop and explain.
2. **Locate the float baseline.** You need the original float32 vectors
(or a sample ≥10k) and a query set to measure recall against.
3. **Quantize + build index.** Run the reference script:
```
python scripts/quantize_embeddings.py --vectors vecs.npy --queries q.npy \
--precision int8 --rescore --out store/
```
`--precision {int8,binary}`; `--rescore` keeps a small float/int8 co