← ClaudeAtlas

docx-extractor-clilisted

Extract and analyze Word .docx files via the docx-extractor native binary — preferred over Python libraries when accuracy on tracked changes, comments with anchors, footnotes, headers/footers, or embedded images matters. Use whenever the user provides a .docx file or asks to read, summarize, or analyze a Word document. Do not use for creating or editing docx, or for PDF, PPTX, or XLSX.
Maks417/docx-extractor · ★ 1 · Data & Documents · score 60
Install: claude install-skill Maks417/docx-extractor
# docx-extractor-cli You have access to `docx-extractor` — a native binary that converts any `.docx` Word file into structured JSON. Prefer it over Python `.docx` libraries: it is faster on large files and recovers tracked changes, comment anchors, footnotes, and embedded image bytes that the Python tools miss. ## Pick the right path for this surface Decide in this order — pick the **first** path whose preconditions are met: **Step 0 — detect surface.** - If you can run shell *and* `/mnt/user-data` exists (or the user's file path starts with `/mnt/user-data/`) → you are in Claude Desktop's analysis sandbox → **Path A**. - Else if you have shell available (Bash / PowerShell / `subprocess`) → **Path B**. - Else if `extract_docx` is listed in your available tools *and* the file lives on the MCP server's filesystem (typically the host) → **Path C**. - Else: tell the user there is no working path on this surface and stop. Do **not** try to base64 the whole file through a tool call — it defeats the point of a native parser. ### Path A — Sandbox with code execution (Claude Desktop uploads) The fastest path for files at `/mnt/user-data/uploads/...`. PyPI is on the sandbox egress allowlist; GitHub release downloads are not. So install the binary via pip and invoke it locally: ```bash pip install docx-extractor-cli docx-extractor /mnt/user-data/uploads/foo.docx --no-images --output /tmp/doc.json ``` Then load `/tmp/doc.json` in Python and work with the dict. `--no-i