ocr-document-processorlisted
Install: claude install-skill rvanbaalen/skills
# OCR Document Processor
Extract text from images, scanned PDFs, and photographs using Optical Character Recognition (OCR). Supports multiple languages, structured output formats, and intelligent document parsing.
## Core Capabilities
- **Image OCR**: Extract text from PNG, JPEG, TIFF, BMP images
- **PDF OCR**: Process scanned PDFs page by page
- **Multi-language**: Support for 100+ languages
- **Structured Output**: Plain text, Markdown, JSON, or HTML
- **Table Detection**: Extract tabular data to CSV/JSON
- **Batch Processing**: Process multiple documents at once
- **Quality Assessment**: Confidence scoring for OCR results
## Quick Start
```python
from scripts.ocr_processor import OCRProcessor
# Simple text extraction
processor = OCRProcessor("document.png")
text = processor.extract_text()
print(text)
# Extract to structured format
result = processor.extract_structured()
print(result['text'])
print(result['confidence'])
print(result['blocks']) # Text blocks with positions
```
## Core Workflow
### 1. Basic Text Extraction
```python
from scripts.ocr_processor import OCRProcessor
# From image
processor = OCRProcessor("scan.png")
text = processor.extract_text()
# From PDF
processor = OCRProcessor("scanned.pdf")
text = processor.extract_text() # All pages
# Specific pages
text = processor.extract_text(pages=[1, 2, 3])
```
### 2. Structured Extraction
```python
# Get detailed results
result = processor.extract_structured()
# Result contains:
# - text: Full extra