smolvlmlisted

Local vision-language model for image analysis using SmolVLM-2B
tdimino/claude-code-minoan · ★ 32 · AI & Automation · score 85

Install: claude install-skill tdimino/claude-code-minoan

# SmolVLM - Local Image Analysis Analyze images locally using SmolVLM-2B, a state-of-the-art compact vision-language model optimized for Apple Silicon via mlx-vlm. ## Quick Usage ### Describe an Image ```bash python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png ``` ### Ask a Question About an Image ```bash python ~/.claude/skills/smolvlm/scripts/view_image.py /path/to/image.png "What text is visible?" ``` ### Specific Tasks ```bash # Extract text (OCR) python ~/.claude/skills/smolvlm/scripts/view_image.py screenshot.png "Extract all text" # UI analysis python ~/.claude/skills/smolvlm/scripts/view_image.py ui.png "Describe the UI elements" # Detailed description python ~/.claude/skills/smolvlm/scripts/view_image.py photo.jpg --detailed ``` ## Effective Prompts ### General Description - `"Describe this image"` - Basic description - `"Describe this image in detail, including colors, composition, and any text"` - Comprehensive ### Text Extraction (OCR) - `"Extract all visible text from this image"` - `"What text appears in this screenshot?"` - `"Read the text in this document"` ### UI/Screenshot Analysis - `"Describe the user interface elements"` - `"What buttons and controls are visible?"` - `"Identify the application and its current state"` ### Visual Question Answering - `"How many [objects] are in this image?"` - `"What color is the [object]?"` - `"Is there a [object] in this image?"` ### Code/Technical - `"What programming language is shown?"