seo-ai-crawlerslisted
Install: claude install-skill Hainrixz/claude-seo-ai
# seo-ai-crawlers (M14)
Controls whether AI search engines can crawl and cite the page, and whether they can read it without JS. The training-vs-search-vs-fetch distinction is everything. Reference: `references/ai-crawlers.md`.
## Audits
Working from the PageSnapshot (`rendered_dom` if present, else `raw_html`) plus the site `robots.txt`:
1. **Citation access**: are retrieval/citation bots — `OAI-SearchBot`, `Claude-SearchBot`, `PerplexityBot`, `Bingbot` — actually allowed (not caught by a broad `Disallow: /` or a wildcard block)? Confirm `Googlebot` is not blocked and the `Googlebot` (search) vs `Google-Extended` (Gemini training control) split is correct.
2. **User-agent classification**: bucket every AI agent in `robots.txt` into **training** (GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, CCBot), **search/retrieval** (OAI-SearchBot, Claude-SearchBot, PerplexityBot), and **user-triggered fetch** (ChatGPT-User, Claude-User, Perplexity-User). Match user-agents case-insensitively; treat the table in `references/ai-crawlers.md` as a starting set, not exhaustive.
3. **Renderability for non-JS crawlers**: pull the M4 (seo-crawl-render) render result — most AI crawlers do not execute JS. If primary content only appears in `rendered_dom` and is absent from `raw_html`, flag it as invisible to AI retrieval.
4. **llms.txt / llms-full.txt** (also covers M21): presence at the site root, valid Markdown structure (H1 title, summary blockquote, sectioned link lists), and that li