← ClaudeAtlas

processing-documentslisted

Processes PDF, DOCX, XLSX, PPTX, HWP, HWPX documents including analysis, summarization, and format conversion. Use for "문서 분석", "PDF 변환", "Excel 추출", "문서 요약", "HWP 변환", "HWPX 변환", "한글 문서" requests or when working with office documents.
Open330/agt · ★ 1 · Data & Documents · score 64
Install: claude install-skill Open330/agt
# Document Processor Analyze, summarize, and convert office documents. ## Supported Formats | Format | Read | Write | Tools | |--------|------|-------|-------| | PDF | ✅ | ✅ | pdfplumber, pypdf | | DOCX | ✅ | ✅ | python-docx | | XLSX | ✅ | ✅ | openpyxl | | PPTX | ✅ | ✅ | python-pptx | | HWP | ✅ | ❌ | python-hwp, olefile | | HWPX | ✅ | ❌ | zipfile, xml.etree | ## Quick Reference ### PDF Text Extraction ```python import pdfplumber with pdfplumber.open("doc.pdf") as pdf: text = "\n".join(p.extract_text() for p in pdf.pages) ``` ### Excel Reading ```python import openpyxl wb = openpyxl.load_workbook("data.xlsx") ws = wb.active data = [[cell.value for cell in row] for row in ws.iter_rows()] ``` ### Word Document ```python from docx import Document doc = Document("report.docx") text = "\n".join(p.text for p in doc.paragraphs) ``` ### HWP Document (한글) ```python # Using pyhwp (hwp5) - primary library from hwp5.xmlmodel import Hwp5File hwp = Hwp5File("document.hwp") for section in hwp.bodytext: # Process paragraphs in each section pass ``` ```python # Using olefile - low-level OLE access fallback import olefile ole = olefile.OleFileIO("document.hwp") # HWP stores text in "BodyText/Section0", "BodyText/Section1", etc. streams = [s for s in ole.listdir() if s[0] == "BodyText"] for stream_path in streams: data = ole.openstream(stream_path).read() # HWP uses UTF-16LE encoding for text content text = data.decode("utf-16-le", errors="ignore") ``` ###