harvest-structured

Solid

Structured data extraction - tables, pricing, products, API endpoints with schema

AI & Automation 496 stars 41 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%
90
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Harvest Structured Extract structured data from web pages using user-defined schemas. Turns messy HTML into clean JSON/CSV - pricing tables, product listings, API endpoint docs, comparison matrices. ## Usage ``` /scrape <url> --schema "<field descriptions>" ``` ## Examples ```bash # Extract pricing data /scrape https://example.com/pricing --schema "plan_name, price, features[], cta_text" # Extract product listings /scrape https://store.example.com/products --schema "name, price, rating, reviews_count, image_url" # Extract API endpoints /scrape https://docs.api.com/reference --schema "method, path, description, parameters[], response_code" ``` ## Schema Definition Define fields as comma-separated names. Use `[]` for arrays: ``` name → Single text value price → Single value (auto-detects currency) features[] → Array of items description → Long text url → Auto-detects links image_url → Auto-detects image sources ``` ## How It Works 1. Fetch page content 2. Parse schema definition 3. Use CSS selectors or LLM extraction to match fields 4. Validate extracted data against schema 5. Output as JSON (default) or CSV ## Output Format ### JSON (default) ```json [ { "plan_name": "Pro", "price": "$29/mo", "features": ["Unlimited projects", "Priority support", "API access"], "source_url": "https://example.com/pricing" } ] ``` ### CSV ```csv plan_name,price,features,source_url Pro,"$29/mo","Unlimited project...

Details

Author
vibeeval
Repository
vibeeval/vibecosystem
Created
2 months ago
Last Updated
1 months ago
Language
C#
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

harvest-single

Single page smart extraction - articles, docs, blog posts to clean markdown

496 Updated 1 months ago
vibeeval
AI & Automation Featured

web-scraper

Web scraping inteligente multi-estrategia. Extrai dados estruturados de paginas web (tabelas, listas, precos). Paginacao, monitoramento e export CSV/JSON.

39,350 Updated today
sickn33
AI & Automation Listed

ghost-scraper

Extracts structured data from websites — static HTML, JavaScript-rendered SPAs, paginated listings, and API-backed pages. Handles anti-bot detection awareness, rate limiting, and robots.txt compliance. Use this skill whenever the user wants to scrape a website, extract data from a URL, pull product listings, harvest structured data, reverse-engineer a site's API, or deal with dynamic JS-rendered content. Also triggers on "get me data from this site," "extract prices from," "crawl these pages," or any request involving web data extraction, even casual ones like "can you pull info from this URL."

1 Updated 4 days ago
mturac
AI & Automation Listed

web-scraper

Generate browser console scripts to scrape paginated websites. Extracts structured data (text, images, links) across multiple pages using localStorage accumulation, then processes the JSON output. Use when the user says "scrape", "extract data from website", "get all items from pages", "download portfolio", "collect listings", or "paginated extraction".

2 Updated today
jqaisystems
AI & Automation Solid

harvest-competitive

Competitive intelligence - extract features, pricing, tech stack from competitor sites

496 Updated 1 months ago
vibeeval