← ClaudeAtlas

structured-output-extractionlisted

Builds a reliable LLM-powered extraction pipeline for messy inputs (PDFs, emails, transcripts, HTML) into a strict JSON schema with validation, automated correction loop, and observability. Use when designing extraction from unstructured documents or hardening one that fails too often
hotak92/vibecoded-orchestrator · ★ 3 · Data & Documents · score 72
Install: claude install-skill hotak92/vibecoded-orchestrator
# Structured Output Extraction (Opus) **Purpose**: Design a pipeline that turns messy unstructured input (PDFs, emails, transcripts, HTML, customer-support tickets) into validated structured data matching a strict schema. Covers schema design, prompt structure, validate-correct loop, fallback strategies, observability, and cost/latency budgeting. **Model**: Opus 4.7 ## When to invoke autonomously Invoke when: 1. **New extraction task**: "Pull line items from invoice PDFs", "Extract action items from meeting transcripts", "Parse resumes into structured candidate records". 2. **Hardening an existing pipeline**: "Our extraction breaks on 20% of inputs — fix it." 3. **Schema design**: "What's the right JSON schema for this extraction task?" 4. **Cost reduction**: "Extraction costs are blowing the budget — how do we shrink them?" **Don't invoke for**: - Structured-to-structured transformation (just code it). - Strict OCR-only tasks (use a vision model directly, no schema design needed). - Agentic tool-use workflows (use `@ai-agentic-architect` or the function-calling reliability KG node). ## Usage ``` /structured-output-extraction design for [task] with [input type] /structured-output-extraction audit [path to existing extractor] /structured-output-extraction harden [failure mode] ``` ## The pipeline ``` Input → Preprocess → Prompt → LLM → Parse → Validate → [Correct]* → Persist │