Complete reference for configuring robots.txt to control AI crawler access to your website. Covers all major AI bots: OpenAI (GPTBot, ChatGPT-User, OAI-SearchBot), Anthropic (ClaudeBot), Google (Google-Extended), Perplexity, ByteDance, Meta, and more.
Maximizes your visibility in AI search results and AI-generated answers
# robots.txt — Allow AI Bots # Generated by NORAD.io (https://norad.io/guides/robots-txt-ai-bots) # OpenAI User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / # Anthropic (Claude) User-agent: ClaudeBot Allow: / User-agent: claude-web Allow: / User-agent: anthropic-ai Allow: / # Google AI User-agent: Google-Extended Allow: / # Perplexity User-agent: PerplexityBot Allow: / # Other AI Crawlers User-agent: Bytespider Allow: / User-agent: CCBot Allow: / User-agent: meta-externalagent Allow: / User-agent: cohere-ai Allow: / Sitemap: https://yoursite.com/sitemap.xml
Prevents AI training on your content — but also removes you from AI search results
# robots.txt — Block All AI Bots # WARNING: This removes your content from ChatGPT, Claude, Perplexity answers # OpenAI User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: OAI-SearchBot Disallow: / # Anthropic User-agent: ClaudeBot Disallow: / User-agent: claude-web Disallow: / User-agent: anthropic-ai Disallow: / # Google AI (blocks Gemini training, NOT regular Google Search) User-agent: Google-Extended Disallow: / # Perplexity User-agent: PerplexityBot Disallow: / # ByteDance User-agent: Bytespider Disallow: / # Common Crawl User-agent: CCBot Disallow: / # Meta AI User-agent: meta-externalagent Disallow: / # Cohere User-agent: cohere-ai Disallow: /
Allow search/citation bots, block training-only crawlers
# robots.txt — Selective AI Bot Access # Allow citation/search bots, restrict training crawlers # ALLOW: Search and citation bots (appear in AI answers) User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: PerplexityBot Allow: / User-agent: Perplexity-User Allow: / # RESTRICT: Training-only crawlers (protect content) User-agent: GPTBot Disallow: /premium/ Disallow: /members/ Allow: /blog/ Allow: /docs/ Allow: / User-agent: ClaudeBot Disallow: /premium/ Disallow: /members/ Allow: /blog/ Allow: /docs/ Allow: / User-agent: Google-Extended Disallow: /premium/ Allow: / # BLOCK: Aggressive/unwanted crawlers User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / Sitemap: https://yoursite.com/sitemap.xml
All known AI bot User-Agent directives for robots.txt as of February 2026:
| ROBOTS.TXT DIRECTIVE | ORGANIZATION | TYPE | RESPECTS ROBOTS.TXT |
|---|---|---|---|
User-agent: GPTBot | OpenAI | Training | ✅ Yes |
User-agent: ChatGPT-User | OpenAI | Search/Browse | ✅ Yes |
User-agent: OAI-SearchBot | OpenAI | Search | ✅ Yes |
User-agent: ClaudeBot | Anthropic | Training | ✅ Yes |
User-agent: claude-web | Anthropic | Browse | ✅ Yes |
User-agent: anthropic-ai | Anthropic | Training | ✅ Yes |
User-agent: Google-Extended | AI Training | ✅ Yes | |
User-agent: PerplexityBot | Perplexity AI | Search Index | ⚠️ Disputed |
User-agent: Bytespider | ByteDance | Training | ⚠️ Partial |
User-agent: CCBot | Common Crawl | Archive | ✅ Yes |
User-agent: meta-externalagent | Meta | Training | ✅ Yes |
User-agent: cohere-ai | Cohere | Training | ✅ Yes |
User-agent: Amazonbot | Amazon | Alexa/AI | ✅ Yes |
User-agent: YouBot | You.com | Search | ✅ Yes |
User-agent: AI2Bot | Allen AI | Research | ✅ Yes |
User-agent: Applebot | Apple | Siri/Search | ✅ Yes |
Legitimate crawlers respect robots.txt, but malicious ones ignore it. For enforcement, use IP blocking or authentication.
Google-Extended controls only Gemini AI training. Googlebot (regular search) is a separate directive.
Use NORAD to see which bots visit your site and how often. Data-driven decisions are better than blanket blocks.
The AI crawler landscape changes monthly. Bookmark this page — we update it as new bots emerge.