← ClaudeAtlas

document-handlinglisted

Use when reading/writing docx/pdf/pptx/xlsx files. Prefer programmatic libs over shelling out.
liujiarui0918/claude-code-codex-strongest · ★ 0 · Data & Documents · score 62
Install: claude install-skill liujiarui0918/claude-code-codex-strongest
# Handling Office and PDF documents `.docx`, `.xlsx`, `.pptx`, and `.pdf` are binary container formats. `Read` will not show you their text directly, and shelling out to `cat` returns garbage. Always use a programmatic library through a short Python (or Node) script. ## When to trigger - The user asks to read, extract, modify, or generate any of: `.docx`, `.xlsx`, `.xls`, `.pptx`, `.pdf`. - A workflow expects "the spreadsheet" or "the slides" as input. - You see a binary Office/PDF path passed as a path parameter. ## Library selection Pick the smallest library that does the job. Python is usually the right host because the libraries are mature and well-documented. ### `.docx` (Word) - **Read + write**: `python-docx`. Stable, handles paragraphs, tables, styles, headers. - Node alternatives (`docx`, `mammoth`) exist but are weaker for round-tripping. Use `python-docx` unless you are already deep in a Node project. - For complex layout or comments preservation, consider exporting to markdown via `pandoc` first. ### `.xlsx` / `.xls` (Excel) - **Read + write**: `openpyxl` (xlsx only). Cell-level access, formulas, styles. - **Bulk read**: `pandas.read_excel` with the `openpyxl` engine. Fastest path when you only need data, not formatting. - **`.xls` legacy**: `xlrd` (read only, version 1.2.0). - **Node**: `exceljs` is the closest equivalent to openpyxl. ### `.pdf` - **Read text**: `pdfplumber` (best layout fidelity) or `pypdf` (simpler, faster for plain text). - **Read ta