office-to-mdlisted
Install: claude install-skill Azistoteles/WebCode
# Office Document to Markdown Converter
Convert various Office document formats to structured Markdown with text, table, and image extraction.
## File Description
- `enhanced_parser.py` - Core document parser
- `doc_converter.py` - DOC to DOCX converter (requires LibreOffice)
- `requirements.txt` - Python dependencies
## Install Dependencies
```bash
pip install -r requirements.txt
```
### Additional Dependencies for DOC Format
.doc format requires LibreOffice:
```bash
# Windows: Install LibreOffice from official website
# https://www.libreoffice.org/download/
# Linux
sudo apt install libreoffice
# Mac
brew install --cask libreoffice
```
## Quick Start
### Python Code
```python
from enhanced_parser import EnhancedDocumentParser
# Initialize parser
parser = EnhancedDocumentParser(
image_base_url="http://localhost:5000",
image_save_dir="./static/images",
filter_headers_footers=True # Filter headers and footers
)
# Parse document
result = parser.parse_document("document.docx")
if result["success"]:
print(result["markdown"])
print(f"Extracted {result['images_count']} images")
```
### Start API Service
```bash
# Start service using app.py from project root
python app.py
# Visit http://localhost:5000/analyzer to upload files
```
## Supported Formats
| Format | Extensions | Notes |
|--------|-----------|-------|
| Word | .docx, .doc | .doc requires LibreOffice |
| Excel | .xlsx, .xls | Supports multiple worksheets and date formats |
| PowerPo