← ClaudeAtlas

docx-advanced-patternslisted

Advanced python-docx patterns for nested tables, complex cells, and content extraction beyond .text property. Techniques for forms, checklists, and complex layouts.
belumume/claude-skills · ★ 47 · Data & Documents · score 71
Install: claude install-skill belumume/claude-skills
# DOCX Advanced Patterns Skill Specialized patterns for python-docx that handle complex document structures not covered by basic `.text` extraction. ## When to Use This Skill Invoke this skill when working with DOCX files that have: - Nested tables within table cells - Forms with checkbox options - Complex multi-row cell layouts - Checklists with embedded options - Cell content that doesn't appear with `.text` property **Use alongside** the official `docx` skill for comprehensive document handling. ## Core Pattern: Nested Table Extraction ### Problem python-docx's `cell.text` property only extracts direct paragraph text - it **does not** traverse nested tables within cells. **Symptom:** ```python cell.text # Returns: '' or '\n' # But cell visually contains content! ``` ### Detection Check if a cell contains nested tables: ```python if cell.tables: print(f"Found {len(cell.tables)} nested table(s)") # Cell has nested content - need special extraction ``` ### Solution (Simple) ```python def extract_cell_content_with_nested_tables(cell): """ Extract all text from a cell, including text from nested tables. Args: cell: python-docx _Cell object Returns: str: Combined text from cell paragraphs and nested tables """ text_parts = [] # Get direct paragraph text (not inside nested tables) for para in cell.paragraphs: para_text = para.text.strip() if para_text: text_parts.append(para_text)