GOOGLE-EXTENDED
LOW RISK🔍 SEARCH & AI CRAWLERGoogle's AI training crawler for Gemini — separate from Googlebot search indexing
📡 GOOGLE-EXTENDED USER-AGENT STRING
Mozilla/5.0 (compatible; Google-Extended; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)
This is the User-Agent header sent by Google-Extended in HTTP requests. Use this to identify Google-Extended in your server access logs.
📋 ABOUT GOOGLE-EXTENDED
Google-Extended is a distinct crawler token introduced by Google in September 2023 to give website owners granular control over how their content is used for AI training. While Googlebot crawls for Google Search indexing, Google-Extended specifically collects data for training Google's Gemini AI models and improving other Google AI products.
The critical distinction is that blocking Google-Extended in robots.txt does not affect your site's Google Search visibility. This separation allows website operators to continue appearing in Google Search results while opting out of having their content used to train Google's AI models. Google-Extended shares the same IP infrastructure as Googlebot, so identification relies on the User-Agent string rather than IP ranges.
NORAD.io tracks Google-Extended separately from Googlebot to give site operators clear visibility into AI training crawl activity versus search indexing. This distinction is essential for content licensing decisions and AI data governance policies.
🎯 HOW TO DETECT GOOGLE-EXTENDED
- ▸User-Agent token is 'Google-Extended' — check robots.txt compliance separately from Googlebot
- ▸Shares IP ranges with Googlebot, so IP-based detection alone is insufficient
- ▸The key differentiator is the User-Agent string
- ▸Blocking Google-Extended has no impact on Google Search rankings
- ▸May appear in server logs alongside regular Googlebot requests from same IPs
🌐 GOOGLE-EXTENDED KNOWN IP RANGES
66.249.64.0/1964.233.160.0/19Use these CIDR ranges to verify Google-Extended identity at the network level. Always combine with User-Agent verification for accurate detection.
🔄 CRAWL BEHAVIOR
Shares infrastructure with Googlebot but uses a separate User-Agent token for robots.txt control. Blocking Google-Extended does not affect Google Search indexing. Moderate crawl rates.
Collects web content specifically for training Google's Gemini AI models and improving Vertex AI products. Separate from search indexing, giving site owners independent control over AI training use.
🤖 ROBOTS.TXT CONFIGURATION
# Block AI training but keep search indexing: User-agent: Google-Extended Disallow: / # This does NOT affect Googlebot search crawling
Google-Extended respects robots.txt directives. Add this to your robots.txt file at the root of your domain.
🗺️ WHERE IS GOOGLE-EXTENDED ACTIVE?
⚠️ RELATED THREATS
Attempts to override bot instructions via malicious content embedded in web pages
Data ExfiltrationBots attempting to extract sensitive data from websites including PII and credentials
Credential StuffingAutomated login attempts using leaked credentials from data breaches
Aggressive Content ScrapingBots aggressively scraping content beyond robots.txt limits and ToS
🔗 RELATED BOTS
📂 MORE 🔍 SEARCH & AI CRAWLERS
📚 RELATED GUIDES
PROTECT YOUR WEBSITE
Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.
INSTALL SITETRUST →