web-scrapinglisted
Install: claude install-skill NewAbra/auto-co-meta
# Web scraping methodology
Patterns for reliable, ethical web scraping with fallback strategies and anti-bot handling.
## Scraping cascade architecture
Implement multiple extraction strategies with automatic fallback:
```python
from abc import ABC, abstractmethod
from typing import Optional
import requests
from bs4 import BeautifulSoup
import trafilatura
#for .py files
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
#for .ipynb files
import asyncio
from playwright.async_api import async_playwright
class ScrapingResult:
def __init__(self, content: str, title: str, method: str):
self.content = content
self.title = title
self.method = method # Track which method succeeded
class Scraper(ABC):
@abstractmethod
def fetch(self, url: str) -> Optional[ScrapingResult]: ...
class TrafilaturaСscraper(Scraper):
"""Fast, lightweight extraction for standard articles."""
def fetch(self, url: str) -> Optional[ScrapingResult]:
try:
downloaded = trafilatura.fetch_url(url)
if not downloaded:
return None
content = trafilatura.extract(
downloaded,
include_comments=False,
include_tables=True,
favor_recall=True
)
if not content or len(content) < 100:
return None
# Extract title separately
soup = BeautifulSoup(downlo