AI BOT & CRAWLER DATABASE
40 bot types tracked across the NORAD.io global radar network. Complete reference with User-Agent strings, IP ranges, detection guides, and robots.txt configurations.
WHAT ARE AI BOTS?
AI bots are automated programs that access websites to collect data for artificial intelligence systems. They include search engine crawlers like Googlebot and Bingbot that index content for search results, AI training crawlers like GPTBot and ClaudeBot that collect data for training large language models (LLMs), and AI assistant browsers like ChatGPT-User and Perplexity-User that fetch pages in real-time during AI conversations.
The rise of generative AI has dramatically increased bot traffic across the web. In 2025-2026, AI-related crawlers account for a growing share of website traffic, often exceeding human visitors on content-heavy sites. Understanding which AI bots access your site — and controlling that access through robots.txt, User-Agent detection, and IP-level policies — is essential for modern web operations.
NORAD.io monitors all major AI bots globally, providing real-time visibility into crawl activity, behavioral patterns, and compliance with site access policies. Each bot profile below includes the complete User-Agent string, known IP ranges, detection tips, and robots.txt configuration examples.
📋 ALL TRACKED BOTS
| BOT | RISK |
|---|---|
| GPTBot OpenAI | LOW |
| ClaudeBot Anthropic | LOW |
| ChatGPT-User OpenAI | LOW |
| Googlebot Google | LOW |
| Google-Extended Google | LOW |
| Bingbot Microsoft | LOW |
| PerplexityBot Perplexity AI | LOW |
| Bytespider ByteDance | MEDIUM |
| CCBot Common Crawl | LOW |
| Amazonbot Amazon | LOW |
| FacebookBot Meta | LOW |
| AhrefsBot Ahrefs | LOW |
| SemrushBot Semrush | LOW |
| Applebot Apple | LOW |
| YandexBot Yandex | LOW |
| Headless Chrome Unknown | HIGH |
| Playwright Microsoft | HIGH |
| Scrapy Open Source | MEDIUM |
| Python Requests Unknown | MEDIUM |
| DuckDuckBot DuckDuckGo | LOW |
| Puppeteer Google | HIGH |
| cURL Open Source | MEDIUM |
| Selenium Open Source | HIGH |
| Twitterbot X (Twitter) | LOW |
| LinkedInBot LinkedIn | LOW |
| Baiduspider Baidu | LOW |
| Sogou Spider Sogou | LOW |
| MJ12bot Majestic | LOW |
| DotBot Moz | LOW |
| Anthropic-AI Anthropic | LOW |
| OAI-SearchBot OpenAI | LOW |
| Perplexity-User Perplexity AI | LOW |
| Claude-Web Anthropic | LOW |
| Meta-ExternalAgent Meta | MEDIUM |
| Cohere-AI Cohere | LOW |
| AI2Bot Allen Institute for AI | LOW |
| YouBot You.com | LOW |
| PetalBot Huawei | LOW |
| DataForSeoBot DataForSEO | LOW |
| PhantomJS Open Source | HIGH |
🔍 SEARCH & AI CRAWLERS
Bots from search engines and AI companies that systematically crawl web content for indexing and AI model training.
OpenAI's web crawler used for training GPT models and improving AI capabilities
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatibl…Anthropic's web crawler for training Claude AI models
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatibl…Google's primary web crawler for search indexing — the most important bot for SEO
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.c…Google's AI training crawler for Gemini — separate from Googlebot search indexing
Mozilla/5.0 (compatible; Google-Extended; +https://developer…Microsoft Bing's web crawler for search indexing and AI features
Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/b…ByteDance's aggressive web crawler — one of the most active bots on the internet
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, …Common Crawl's open web archival bot — the largest open dataset of web content
CCBot/2.0 (https://commoncrawl.org/faq/)…Amazon's web crawler for Alexa answers and Amazon search features
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/…Meta's crawler that fetches pages for link previews on Facebook and Instagram
facebookexternalhit/1.1 (+http://www.facebook.com/externalhi…Apple's web crawler for Siri, Spotlight search, and Apple Intelligence features
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/…Yandex search engine crawler — Russia's largest search engine
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/b…DuckDuckGo's web crawler for its privacy-focused search engine
DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)…Twitter/X's crawler for generating link preview cards in tweets
Twitterbot/1.0…LinkedIn's crawler for generating link previews in posts and messages
LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient …Baidu's web crawler — China's largest search engine
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.…Sogou search engine crawler — China's third-largest search engine (Tencent-owned)
Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmast…OpenAI's search grounding crawler — fetches pages for ChatGPT Search and SearchGPT results
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatibl…Meta's AI training crawler — collects web data for Meta's LLaMA and AI models
Mozilla/5.0 (compatible; Meta-ExternalAgent/1.0; +https://de…Cohere's web crawler for training enterprise AI and retrieval-augmented generation models
Mozilla/5.0 (compatible; cohere-ai; +https://cohere.com/craw…Allen Institute for AI's crawler for academic AI research and open models
Mozilla/5.0 (compatible; AI2Bot/1.0; +https://allenai.org/cr…You.com's web crawler for its AI-powered search engine and assistant
Mozilla/5.0 (compatible; YouBot/1.0; +https://about.you.com/…Huawei's web crawler for Petal Search — Huawei's search engine for HarmonyOS devices
Mozilla/5.0 (compatible; PetalBot;+https://webmaster.petalse…🤖 AI ASSISTANTS
Bots that fetch pages in real-time during AI assistant conversations (ChatGPT browsing, Perplexity search, Claude web access).
ChatGPT's real-time web browsing mode that fetches pages during conversations
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatibl…Perplexity AI's web indexing crawler for its AI-powered search engine
Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplex…Anthropic's AI assistant web browsing capability for real-time information retrieval
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatibl…Perplexity's real-time page fetcher — retrieves content during user search sessions
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatibl…Anthropic's dedicated web browsing agent for Claude's tool-use web access
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatibl…📊 SEO & DATA SCRAPERS
Crawlers from SEO tools and data companies that analyze site structure, backlinks, and content for marketing intelligence.
Ahrefs SEO tool crawler — maps backlinks and site structure for SEO analysis
Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/r…Semrush's SEO analysis crawler for competitive intelligence and site auditing
Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrus…Python web scraping framework — the most popular open-source scraping tool
Scrapy/2.11 (+https://scrapy.org)…Python HTTP library — the most common library used for automated web access
python-requests/2.31.0…Command-line HTTP client — the most ubiquitous tool for automated HTTP requests
curl/8.5.0…Majestic's SEO crawler — builds the world's largest link intelligence database
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/…Moz's SEO analysis crawler for domain authority and link analysis
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplor…DataForSEO's crawler for SEO data API services and SERP analysis
Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://datafor…⚡ AUTOMATION AGENTS
Browser automation tools and headless browsers used for testing, scraping, and automated web interactions.
Automated headless Chrome browsers — commonly used for scraping, testing, and bot activity
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, l…Microsoft's cross-browser automation framework — used for testing and scraping
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, l…Google's Node.js browser automation library — widely used for scraping and testing
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, l…The original browser automation framework — still widely used for testing and scraping
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, l…Legacy headless browser — deprecated but still seen in scraping and legacy automation
Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML…📡 ALL AI BOT USER-AGENT STRINGS
Quick reference list of all tracked User-Agent strings. Use these to identify bots in your server access logs and configure detection rules.
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +https://claudebot.ai)Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)Mozilla/5.0 (compatible; Google-Extended; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)Mozilla/5.0 (compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)CCBot/2.0 (https://commoncrawl.org/faq/)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36 (compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/120.0.0.0 Safari/537.36Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36Scrapy/2.11 (+https://scrapy.org)python-requests/2.31.0DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/120.0.0.0 Safari/537.36curl/8.5.0Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36Twitterbot/1.0LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient +http://www.linkedin.com)Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Anthropic-AI; +https://anthropic.com)Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot)Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai)Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-Web/1.0; +https://anthropic.com/claude-web)Mozilla/5.0 (compatible; Meta-ExternalAgent/1.0; +https://developers.facebook.com/docs/sharing/webmasters/crawler)Mozilla/5.0 (compatible; cohere-ai; +https://cohere.com/crawler)Mozilla/5.0 (compatible; AI2Bot/1.0; +https://allenai.org/crawler)Mozilla/5.0 (compatible; YouBot/1.0; +https://about.you.com/youbot/)Mozilla/5.0 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1PROTECT YOUR WEBSITE
Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.
INSTALL SITETRUST →