seo-robots-ailisted
Install: claude install-skill YogeshKu7877/claude-seo-skills
# AI Crawler Robots.txt Audit
Analyzes a site's robots.txt specifically for AI crawler access policies.
Complements `/seo-technical` (which does a broad robots.txt check) with
deep AI-specific analysis.
@skills/seo/references/ai-crawlers-guide.md
## AI Crawler Registry
| Bot Name | Owner | Purpose |
|---|---|---|
| GPTBot | OpenAI | Training data + ChatGPT web search |
| OAI-SearchBot | OpenAI | ChatGPT search only (not training) |
| ChatGPT-User | OpenAI | ChatGPT browsing (real-time) |
| ClaudeBot | Anthropic | Training data collection |
| anthropic-ai | Anthropic | Anthropic web crawler |
| PerplexityBot | Perplexity | AI search engine |
| Google-Extended | Google | Gemini / AI training (not Search) |
| Bytespider | ByteDance | TikTok / AI training |
| CCBot | Common Crawl | Open dataset used by many AI models |
| Applebot-Extended | Apple | Apple Intelligence training |
| cohere-ai | Cohere | AI model training |
| FacebookBot | Meta | Meta AI training |
| Meta-ExternalAgent | Meta | Meta AI browsing agent |
| Amazonbot | Amazon | Alexa / AI training |
| Diffbot | Diffbot | AI knowledge graph |
| ImagesiftBot | ImagesiftBot | AI image training |
| Omgili | Webz.io | AI data feeds |
## Inputs
- `url`: The website URL to audit (will fetch `/robots.txt` from site root)
- Normalize to domain root: `example.com/page` → `https://example.com/robots.txt`
## Execution
1. **Fetch robots.txt**: WebFetch `<domain>/robots.txt`
- If 404 → report "No robots.txt found — all c