CLAUDEBOT

LOW RISK🔍 SEARCH & AI CRAWLER

Anthropic's web crawler for training Claude AI models

ORGANIZATION
Anthropic
FIRST SEEN
2023-11
RESPECTS ROBOTS.TXT
✓ YES
DOCUMENTATION
support.anthropic.com
DAILY VISITS
COUNTRIES ACTIVE
TRACKING
STATUS
LAST SEEN

📡 CLAUDEBOT USER-AGENT STRING

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +https://claudebot.ai)

This is the User-Agent header sent by ClaudeBot in HTTP requests. Use this to identify ClaudeBot in your server access logs.

📋 ABOUT CLAUDEBOT

ClaudeBot is Anthropic's official web crawler, used to gather publicly available content from the internet for training Claude, Anthropic's family of AI assistants. First observed in late 2023, ClaudeBot systematically indexes web pages to build the training datasets that power Claude's conversational and analytical abilities.

ClaudeBot operates in compliance with robots.txt directives and Anthropic provides clear opt-out mechanisms for website owners who prefer their content not be used for AI training. The crawler identifies itself transparently in the User-Agent string and crawls at moderate rates to avoid impacting site performance. It does not execute JavaScript, focusing purely on HTML content extraction.

NORAD.io monitors ClaudeBot activity globally, tracking crawl volumes, geographic patterns, and compliance with site policies. Through the NORAD radar network, site operators gain real-time insight into when and how frequently ClaudeBot accesses their content, enabling informed decisions about AI training data access controls.

🎯 HOW TO DETECT CLAUDEBOT

  • Look for 'ClaudeBot' in the User-Agent header
  • Verify the referrer URL contains claudebot.ai
  • Does not execute JavaScript or load external resources
  • Respects Crawl-delay directives in robots.txt
  • Typically crawls from AWS IP ranges

🌐 CLAUDEBOT KNOWN IP RANGES

160.79.104.0/2313.56.0.0/16

Use these CIDR ranges to verify ClaudeBot identity at the network level. Always combine with User-Agent verification for accurate detection.

🔄 CRAWL BEHAVIOR

Moderate crawl rate with polite intervals between requests. Respects robots.txt and crawl-delay directives. Primarily fetches HTML content without JavaScript execution.

PURPOSE

Collects publicly available web content to train Anthropic's Claude family of AI models. Used for pre-training data collection to improve Claude's knowledge and capabilities.

🤖 ROBOTS.TXT CONFIGURATION

User-agent: ClaudeBot
Disallow: /private/
Disallow: /api/

# To block completely:
# User-agent: ClaudeBot
# Disallow: /

ClaudeBot respects robots.txt directives. Add this to your robots.txt file at the root of your domain.

🗺️ WHERE IS CLAUDEBOT ACTIVE?

⚠️ RELATED THREATS

🔗 RELATED BOTS

📂 MORE 🔍 SEARCH & AI CRAWLERS

📚 RELATED GUIDES

PROTECT YOUR WEBSITE

Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.

INSTALL SITETRUST →