harvest-single

Solid

Single page smart extraction - articles, docs, blog posts to clean markdown

AI & Automation 496 stars 41 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%
90
Recency 20%
75
Frontmatter 20%
70
Documentation 15%
66
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Harvest Single Page Extract and clean content from a single web page. Auto-detects content type (article, documentation, API reference, blog post) and produces clean, structured markdown. ## Usage ``` /harvest <url> ``` ## Examples ```bash # Extract a blog post /harvest https://blog.example.com/best-practices-2024 # Extract API documentation page /harvest https://docs.stripe.com/api/charges # Extract a GitHub README /harvest https://github.com/owner/repo ``` ## How It Works 1. Fetch URL content via WebFetch or crawl4ai 2. Detect content type (article, docs, API ref, blog, wiki) 3. Extract main content, strip navigation/ads/footers 4. Preserve code blocks, tables, images 5. Add metadata header (source, date, word count) 6. Save to `.claude/cache/agents/harvest/` ## Output Format ```markdown # [Page Title] > Source: [URL] > Extracted: [timestamp] > Type: [article|docs|api|blog|wiki] > Words: [count] [Clean extracted content in markdown] ## Links Found - [Link text](URL) ``` ## Fallback Chain 1. crawl4ai Docker (port 11235) - preferred 2. WebFetch tool - built-in fallback 3. curl + html2text - last resort ## When to Use - Quick grab of a single page's content - Extracting a specific doc page for reference - Saving an article for later analysis - Getting clean markdown from messy HTML

Details

Author
vibeeval
Repository
vibeeval/vibecosystem
Created
2 months ago
Last Updated
1 months ago
Language
C#
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

harvest-deep-crawl

Multi-page deep crawling - documentation sites, wikis, knowledge bases

496 Updated 1 months ago
vibeeval
AI & Automation Listed

doc-harvester

Mirror an external platform's documentation site into the current project as local markdown files, so the coding agent can cross-check the official docs offline while building. Use this whenever a documentation or wiki URL appears together with a request to study, learn, fetch, download, mirror, save, scrape, or "vendor" the docs — including phrasings like "изучи документацию", "скачай документацию по ссылке", "собери вики", "study these docs", "pull the API docs into the repo", "I'm integrating with X, get its docs". Use it even if the user just pastes a docs link and says "learn this" without naming a tool, and even if they only give a deep link to one page. Prefer this skill over ad-hoc WebFetch whenever the goal is to capture a whole documentation set rather than read a single page.

0 Updated 1 weeks ago
arturayupov
AI & Automation Solid

harvest-structured

Structured data extraction - tables, pricing, products, API endpoints with schema

496 Updated 1 months ago
vibeeval
Data & Documents Listed

web-fetch

Fetches web content with intelligent content extraction, converting HTML to clean markdown. Use for documentation, articles, and reference pages http/https URLs.

335 Updated today
aiskillstore
AI & Automation Solid

harvest-adaptive

Adaptive content summarization - auto-detect content type and produce relevant summary

496 Updated 1 months ago
vibeeval