seo-crawlabilitylisted

Audit and generate robots.txt and general crawl access for a page — verify robots.txt reachability and syntax, detect Disallow rules that block CSS/JS or important content, sanity-check crawl-delay, confirm a Sitemap directive, and assert overall crawl access for Googlebot/Bingbot. Module M1. Feeds the Search SEO score.
Hainrixz/claude-seo-ai · ★ 14 · AI & Automation · score 81

Install: claude install-skill Hainrixz/claude-seo-ai

# seo-crawlability (M1) Crawl access is the precondition for every other search signal: if Googlebot/Bingbot can't fetch the page and its assets, nothing else ranks. This module covers general-purpose crawl access only. AI-specific bot directives (GPTBot, Claude-SearchBot, etc.) and `llms.txt` live in `seo-ai-crawlers` (M14/M21); see `references/ai-crawlers.md` for that boundary. ## Audits Working from the PageSnapshot (`rendered_dom` if present, else `raw_html`) plus a fetch of `/robots.txt`: 1. **Reachability**: `/robots.txt` returns 200 (a 404 means "allow all" but is worth flagging; a 5xx can suspend crawling). 2. **Syntax**: each line is a valid directive (`User-agent`, `Disallow`, `Allow`, `Sitemap`, `Crawl-delay`); flag unknown tokens, missing `User-agent` group headers, and BOM/encoding issues. 3. **Asset blocking**: any `Disallow` that blocks CSS/JS, fonts, or `/wp-includes/`-style paths — this breaks rendering and is a leading cause of "page looks broken to Google" (cross-check with M-render). 4. **Content blocking**: `Disallow` rules that hide important indexable paths from `Googlebot`/`Bingbot`. 5. **Crawl-delay sanity**: a large `Crawl-delay` (or one applied to the global group) can starve crawl budget; note that Googlebot ignores `Crawl-delay` but Bingbot honors it. 6. **Sitemap directive**: presence of at least one absolute `Sitemap:` URL. 7. **Overall access**: resolve the effective ruleset for `Googlebot` and `Bingbot` against the audited URL — does it end