← ClaudeAtlas

minerulisted

MinerU document parsing API - convert PDF/DOC/PPT/images to Markdown/JSON. Supports OCR, formula recognition, table extraction, and batch processing.
LeoLin990405/grimoire-skill · ★ 6 · Data & Documents · score 74
Install: claude install-skill LeoLin990405/grimoire-skill
# MinerU API Skill ## Overview MinerU converts PDF, DOC, DOCX, PPT, PPTX, PNG, JPG, JPEG, HTML into machine-readable Markdown/JSON. Supports OCR (109 languages), formula/table recognition, cross-page table merging, and batch processing. **Two modes:** - **Cloud API** — `https://mineru.net/api/v4` (no GPU required, token-based) - **Local API** — `mineru-api --port 8000` (self-hosted, requires GPU or CPU backend) ## Authentication (Cloud API) - **Token file**: `~/.config/mineru/token` - **Header**: `Authorization: Bearer <token>` - **Get token**: https://mineru.net/apiManage/token ```bash mkdir -p ~/.config/mineru echo "YOUR_TOKEN" > ~/.config/mineru/token chmod 600 ~/.config/mineru/token ``` ## Limits (Cloud API) | Item | Limit | |------|-------| | Single file size | 200MB max | | Single file pages | 600 pages max | | Daily priority pages | 2000 pages/account | | Batch upload | 200 files/request | | Token validity | 90 days | ## Model Versions | Model | Use Case | Speed | Notes | |-------|----------|-------|-------| | `hybrid` | **Default since v2.7.0** — best of pipeline + vlm | Medium | Recommended for most use | | `pipeline` | General documents, CPU-friendly | Fast | Pure CPU support | | `vlm` | Complex layouts, higher accuracy | Slower | Needs GPU (10GB+ VRAM) | | `MinerU-HTML` | HTML output, preserves formatting | Medium | For web content | ## API Endpoints (Cloud) Base URL: `https://mineru.net/api/v4` ### 1. Create Extraction Task (Single File) ``` POST /extr