Does Google-Extended respect robots.txt?

Yes, Google-Extended respects robots.txt directives.

What are Google-Extended's IP ranges?

Known Google-Extended IP ranges include: 66.249.64.0/19, 64.233.160.0/19

GOOGLE-EXTENDED

Q: What is the Google-Extended User-Agent string?

The Google-Extended User-Agent string is: Mozilla/5.0 (compatible; Google-Extended; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)

Q: How do I block Google-Extended in robots.txt?

To control Google-Extended access, add the following to your robots.txt file: # Block AI training but keep search indexing: User-agent: Google-Extended Disallow: / # This does NOT affect Googlebot search crawling

LOW RISK🔍 SEARCH & AI CRAWLER

Google's AI training crawler for Gemini — separate from Googlebot search indexing

ORGANIZATION

Google

FIRST SEEN

2023-09

RESPECTS ROBOTS.TXT

✓ YES

DOCUMENTATION

developers.google.com

—

DAILY VISITS

—

COUNTRIES ACTIVE

TRACKING

STATUS

—

LAST SEEN

📡 GOOGLE-EXTENDED USER-AGENT STRING

Mozilla/5.0 (compatible; Google-Extended; +https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers)

This is the User-Agent header sent by Google-Extended in HTTP requests. Use this to identify Google-Extended in your server access logs.

📋 ABOUT GOOGLE-EXTENDED

Google-Extended is a distinct crawler token introduced by Google in September 2023 to give website owners granular control over how their content is used for AI training. While Googlebot crawls for Google Search indexing, Google-Extended specifically collects data for training Google's Gemini AI models and improving other Google AI products.

The critical distinction is that blocking Google-Extended in robots.txt does not affect your site's Google Search visibility. This separation allows website operators to continue appearing in Google Search results while opting out of having their content used to train Google's AI models. Google-Extended shares the same IP infrastructure as Googlebot, so identification relies on the User-Agent string rather than IP ranges.

NORAD.io tracks Google-Extended separately from Googlebot to give site operators clear visibility into AI training crawl activity versus search indexing. This distinction is essential for content licensing decisions and AI data governance policies.

🎯 HOW TO DETECT GOOGLE-EXTENDED

▸User-Agent token is 'Google-Extended' — check robots.txt compliance separately from Googlebot
▸Shares IP ranges with Googlebot, so IP-based detection alone is insufficient
▸The key differentiator is the User-Agent string
▸Blocking Google-Extended has no impact on Google Search rankings
▸May appear in server logs alongside regular Googlebot requests from same IPs

🌐 GOOGLE-EXTENDED KNOWN IP RANGES

66.249.64.0/1964.233.160.0/19

Use these CIDR ranges to verify Google-Extended identity at the network level. Always combine with User-Agent verification for accurate detection.

🔄 CRAWL BEHAVIOR

Shares infrastructure with Googlebot but uses a separate User-Agent token for robots.txt control. Blocking Google-Extended does not affect Google Search indexing. Moderate crawl rates.

PURPOSE

Collects web content specifically for training Google's Gemini AI models and improving Vertex AI products. Separate from search indexing, giving site owners independent control over AI training use.

🤖 ROBOTS.TXT CONFIGURATION

# Block AI training but keep search indexing:
User-agent: Google-Extended
Disallow: /

# This does NOT affect Googlebot search crawling

Google-Extended respects robots.txt directives. Add this to your robots.txt file at the root of your domain.

→ Complete Guide: robots.txt for AI Bots

🗺️ WHERE IS GOOGLE-EXTENDED ACTIVE?

⚠️ RELATED THREATS

Prompt Injection

Attempts to override bot instructions via malicious content embedded in web pages

Data Exfiltration

Bots attempting to extract sensitive data from websites including PII and credentials

Credential Stuffing

Automated login attempts using leaked credentials from data breaches

Aggressive Content Scraping

Bots aggressively scraping content beyond robots.txt limits and ToS

🔗 RELATED BOTS

GooglebotLOW

Google · Google's primary web crawler for search indexing — the most important bot for SEO

📂 MORE 🔍 SEARCH & AI CRAWLERS

GPTBotOpenAI ClaudeBotAnthropic GooglebotGoogle BingbotMicrosoft BytespiderByteDance CCBotCommon Crawl

📚 RELATED GUIDES

How to Detect AI Bots →robots.txt for AI Bots →NORAD API Docs →

PROTECT YOUR WEBSITE

Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.

INSTALL SITETRUST →