bm25listed
Install: claude install-skill oaustegard/claude-skills
# bm25
Ranked content search over any text corpus. One CLI, in-memory BM25 index
per process, with a session-local disk cache so repeat invocations against
the same corpus load in tens of milliseconds instead of rebuilding.
## Setup
```bash
uv pip install --system --break-system-packages bm25s
```
Install is sub-second on a warm uv cache. That's the entire dependency.
## Usage
```bash
BM25=/mnt/skills/user/bm25/scripts/bm25.py
# Local directory
python3 $BM25 ./repo 'csrf middleware'
# Multiple queries against the same in-memory index (build once, query many)
python3 $BM25 ./repo 'csrf middleware' 'session backend' 'queryset filter'
# Cloned GitHub repo via tarball (one HTTP call)
python3 $BM25 'github.com/django/django' 'atomic transaction'
python3 $BM25 'github.com/django/django@stable/5.0.x' 'atomic transaction'
# Project knowledge or uploads
python3 $BM25 project 'RAG scaling laws'
python3 $BM25 uploads 'tax loss harvesting'
# Filters
python3 $BM25 ./repo 'auth flow' --exclude 'tests/*' --exclude '*/tests/*'
python3 $BM25 ./repo 'config' --include '*.py' --include '*.toml'
# Interactive (REPL — single corpus, many queries)
python3 $BM25 ./repo --interactive
# JSON output for piping
python3 $BM25 ./repo 'auth flow' --json
```
## Corpus types
| Spec | Meaning |
|------|---------|
| `./path` or `/abs/path` | Local directory |
| `uploads` | `/mnt/user-data/uploads/` |
| `project` | `/mnt/project/` |
| `github.com/owner/repo[@ref]` | Tarball fetch via GitHub API