fetching-blocked-urls

Solid

Retrieve clean markdown from URLs when web_fetch fails. Converts pages via Jina AI reader service with automatic retry. Use when web_fetch or curl returns 403, blocked, paywall, timeout, JavaScript-rendering errors, or empty content or user explicitly suggests using jina.

Data & Documents 134 stars 7 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 82/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Fetching Blocked URLs Retrieve readable content from URLs that web_fetch cannot access. Jina AI's reader service renders JavaScript, bypasses soft blocks, and returns clean markdown. ## Activation Triggers Invoke this skill immediately when web_fetch returns: - 403 Forbidden or access denied - Paywall or login wall indicators - Empty, garbled, or truncated content - JavaScript-heavy SPA failures - Timeout errors ## Core Command ```bash curl -s --max-time 30 "https://r.jina.ai/TARGET_URL" ``` The service returns markdown with page title, body text, and preserved links. ## Retry Pattern Jina's backend has ~10% intermittent failures. Use retry logic to achieve 99%+ success: ```bash for attempt in 1 2 3; do result=$(curl -s --max-time 30 "https://r.jina.ai/TARGET_URL" 2>&1) echo "$result" | grep -q "upstream connect error" || { echo "$result"; break; } [ $attempt -lt 3 ] && sleep 1 done ``` ## Workflow Integration 1. **Primary**: Use web_fetch (native tool) 2. **Fallback**: This skill with retry when web_fetch fails 3. **Escalate**: Request user assistance only after retry exhaustion Attempt this fallback before asking users to copy-paste content manually. ## Output Format Jina returns structured markdown: - `Title:` page title - `URL Source:` original URL - `Markdown Content:` extracted body text, links preserved ## Limitations - Long pages may truncate - Sites blocking all scrapers remain inaccessible - Login-required content limited to public portions -...

Details

Author: oaustegard
Repository: oaustegard/claude-skills
Created: 9 months ago
Last Updated: yesterday
Language: Python
License: MIT

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Solid

fetch-url-as-markdown

Fetch a web page (URL) and return clean Markdown via local trafilatura, with Exa MCP as a fallback for JS-rendered or anti-bot pages. Use when the user asks to read, fetch, scrape, summarize, or quote a URL — prefer this over the built-in WebFetch tool. Don't use for binary files (PDFs, images, archives) or for fetching API/JSON endpoints.

110 Updated today

CodeAlive-AI

Web & Frontend Listed

web-research-cascade

A robust web-fetch cascade so that blocked primary sources still get read instead of being silently skipped. Use this skill whenever a URL is fetched, a page needs to be researched, or WebFetch/WebSearch returns a 403/401/429/"unable to fetch"/empty body/CAPTCHA. Also use it for any analysis or comparison research where primary sources matter (not just secondary blogs), and on "read this page", "what does X say", "research Y", "look at this". Cascade: WebFetch, then Jina Reader, then a Firecrawl script (sparingly), then Chrome (browser MCP). Also covers Reddit (threads, subreddits, search) and single X/Twitter tweets, which need their own route instead of the generic cascade. Core rule: a good but blocked source is opened via the cascade, NEVER swapped for a weaker error-free one.

1 Updated 1 weeks ago

belschak

Web & Frontend Solid

web-fetch

Fetches web content as clean markdown by preferring markdown-native responses and falling back to selector-based HTML extraction. Use for documentation, articles, and reference pages at http/https URLs.

402 Updated today

aiskillstore