data-scraper-agent

Solid

Build a fully automated AI-powered data collection agent for any public source — job boards, prices, news, GitHub, sports, anything. Scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, and learns from user feedback. Runs 100% free on GitHub Actions. Use when the user wants to monitor, collect, or track any public data automatically.

AI & Automation 201,447 stars 30903 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Data Scraper Agent Build a production-ready, AI-powered data collection agent for any public data source. Runs on a schedule, enriches results with a free LLM, stores to a database, and improves over time. **Stack: Python · Gemini Flash (free) · GitHub Actions (free) · Notion / Sheets / Supabase** ## When to Activate - User wants to scrape or monitor any public website or API - User says "build a bot that checks...", "monitor X for me", "collect data from..." - User wants to track jobs, prices, news, repos, sports scores, events, listings - User asks how to automate data collection without paying for hosting - User wants an agent that gets smarter over time based on their decisions ## Core Concepts ### The Three Layers Every data scraper agent has three layers: ``` COLLECT → ENRICH → STORE │ │ │ Scraper AI (LLM) Database runs on scores/ Notion / schedule summarises Sheets / & classifies Supabase ``` ### Free Stack | Layer | Tool | Why | |---|---|---| | **Scraping** | `requests` + `BeautifulSoup` | No cost, covers 80% of public sites | | **JS-rendered sites** | `playwright` (free) | When HTML scraping fails | | **AI enrichment** | Gemini Flash via REST API | 500 req/day, 1M tokens/day — free | | **Storage** | Notion API | Free tier, great UI for review | | **Schedule** | GitHub Actions cron | Free for public repos | | **Learning** | JSON feedback file in repo | Zero infra, persists in git | ### AI Model Fallback Chain Buil...

Details

Author: affaan-m
Repository: affaan-m/everything-claude-code
Created: 4 months ago
Last Updated: yesterday
Language: JavaScript
License: MIT

Integrates with

Anthropic · AI Supabase · Cloud GitHub · DevTools Playwright · Testing Notion · Productivity REST API · API

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

ghost-scraper

Extracts structured data from websites — static HTML, JavaScript-rendered SPAs, paginated listings, and API-backed pages. Handles anti-bot detection awareness, rate limiting, and robots.txt compliance. Use this skill whenever the user wants to scrape a website, extract data from a URL, pull product listings, harvest structured data, reverse-engineer a site's API, or deal with dynamic JS-rendered content. Also triggers on "get me data from this site," "extract prices from," "crawl these pages," or any request involving web data extraction, even casual ones like "can you pull info from this URL."

1 Updated 4 days ago

mturac

AI & Automation Listed

agentic-browser-automation

Build AI-powered browser agents that autonomously navigate, scrape, fill forms, and extract data from dynamic websites using LLM reasoning + Playwright

2 Updated 2 days ago

mouadja02

AI & Automation Listed

web-scraper

Generate browser console scripts to scrape paginated websites. Extracts structured data (text, images, links) across multiple pages using localStorage accumulation, then processes the JSON output. Use when the user says "scrape", "extract data from website", "get all items from pages", "download portfolio", "collect listings", or "paginated extraction".

2 Updated today

jqaisystems

AI & Automation Featured

daily-news-report

Scrapes content based on a preset URL list, filters high-quality technical information, and generates daily Markdown reports.

39,350 Updated today

sickn33

AI & Automation Solid

scrapling

Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python.

175,435 Updated today

NousResearch