local-vaultlisted

Build and query a local Markdown knowledge base ("vault"). TWO functions — (1) CONVERT raw files (PDF, Word/docx, PowerPoint/pptx, Excel/xlsx, csv/tsv, images, html, md/txt, json/yaml/code, audio/video) into clean Markdown with retrieval-friendly frontmatter; local-first (pandoc / python-pptx / openpyxl / pymupdf4llm / whisper), with cloud OCR (MinerU) only as a fallback. (2) ANSWER questions over the resulting vault with retrieval discipline — self-monitor coverage, flag missing/lossy content, and propose Maps-of-Content (MOCs). Triggers: "build/sync my local knowledge base", "convert these files to markdown for AI", "整理我的资料库", "把文件转成 md 给 AI 读", "本地知识库", "读我的本地 vault 回答", "这个主题我的资料里怎么说". Not for: one-off web research, or files that are already in a single doc you can read directly.
genli-ai/market-research-skills · ★ 39 · Data & Documents · score 83

Install: claude install-skill genli-ai/market-research-skills

# local-vault Turn a folder of raw files into a **Markdown vault** that an LLM can grep, and then answer questions over that vault responsibly. **Mental model:** `SOURCE` = raw files (source of truth). `VAULT` = one `.md` per source file, carrying retrieval frontmatter (abstract / tags / synonyms) + a `source` backlink. The vault is the layer the LLM reads; the raw files are where the user goes to verify. There are **two distinct jobs** — figure out which the user wants: - **A. Convert / sync** — they dropped files in and want them in the vault → run the pipeline (`scripts/sync.py`). - **B. Retrieve / answer** — they want answers from an existing vault → follow the *Retrieval & feedback protocol* below. Do **not** run the pipeline for this. --- ## A. Convert / sync ### One-time setup (do this for the user if not already done) 1. **Python deps** (user-level, no venv): ``` python3 -m pip install --user requests python-dotenv pypdf pymupdf4llm openpyxl python-pptx ``` 2. **pandoc** (for docx/rtf/odt/epub): `brew install pandoc` (macOS) / distro pkg. 3. **ffmpeg** (only for audio/video transcription): `brew install ffmpeg` (macOS) / distro pkg. The **whisper engine is auto-selected by platform** — `mlx-whisper` on Apple Silicon (GPU), `faster-whisper` elsewhere (cross-platform CPU/CUDA) — and **auto-installed after the user consents** at the first-run prompt (no manual pip needed). On that first run with audio/video present, the tool shows the mod