markdroplisted

Professional AI skill and usage instructions for the Markdrop package, a Python tool for converting PDFs to Markdown/HTML with AI-powered image/table descriptions.
shoryasethia/markdrop · ★ 204 · Data & Documents · score 80

Install: claude install-skill shoryasethia/markdrop

# Markdrop Skill Welcome to the `markdrop` skill. `markdrop` is a powerful Python package and CLI tool used to convert PDF documents into structured Markdown and interactive HTML, while natively leveraging AI vision models to interpret and describe extracted images and tables. If you are an AI agent or a user aiming to process PDFs and augment them with text or image descriptions, this document serves as your complete guide on utilizing `markdrop` efficiently and accurately. ## 1. Capabilities - **PDF to Markdown/HTML**: Retains formatting, extracts images, and detects tables via Microsoft Table Transformer and Docling. Supports processing both local file paths and direct PDF URLs. - **AI Vision Descriptions**: Query GEMINI, OPENAI, ANTHROPIC, GROQ, OPENROUTER, or LITELLM to generate rich descriptions of images and tables. - **Batch Processing**: Describe entire directories of images in single commands using multiple LLM backends simultaneously. - **Extensible Configuration**: Precise override control over which structural text-models vs vision-models are used, as well as prompts, resolution scales, and output features. ## 2. API Keys Setup Before using AI features, API keys must be available in the root `.env` file or environment variables. If deploying programmatically, you can run the built-in CLI command, or inject them into `os.environ`: ```bash markdrop setup gemini # -> GEMINI_API_KEY markdrop setup openai # -> OPENAI_API_KEY markdrop setup anthropic #