How do I block GPTBot in robots.txt?

To control GPTBot access, add the following to your robots.txt file: User-agent: GPTBot Disallow: /private/ Disallow: /api/ # To block completely: # User-agent: GPTBot # Disallow: /

Does GPTBot respect robots.txt?

Yes, GPTBot respects robots.txt directives.

GPTBOT

Q: What is the GPTBot User-Agent string?

The GPTBot User-Agent string is: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

Q: What are GPTBot's IP ranges?

Known GPTBot IP ranges include: 20.15.240.64/28, 20.15.240.80/28, 20.15.240.96/28, 20.15.240.176/28, 20.15.241.0/28, 20.15.242.128/28, 20.15.242.144/28, 40.83.2.64/28

LOW RISK🔍 SEARCH & AI CRAWLER

OpenAI's web crawler used for training GPT models and improving AI capabilities

ORGANIZATION

OpenAI

FIRST SEEN

2023-08

RESPECTS ROBOTS.TXT

✓ YES

DOCUMENTATION

platform.openai.com

—

DAILY VISITS

—

COUNTRIES ACTIVE

TRACKING

STATUS

—

LAST SEEN

📡 GPTBOT USER-AGENT STRING

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

This is the User-Agent header sent by GPTBot in HTTP requests. Use this to identify GPTBot in your server access logs.

📋 ABOUT GPTBOT

GPTBot is OpenAI's official web crawler, first publicly documented in August 2023. It systematically crawls publicly accessible web pages to collect training data for OpenAI's large language models, including GPT-4, GPT-4o, and future model generations. GPTBot identifies itself clearly in the User-Agent header and operates from a set of published IP ranges, making it straightforward to identify and control.

Unlike OpenAI's ChatGPT-User bot (which fetches pages in real-time during conversations), GPTBot performs batch crawling operations for training data collection. It respects robots.txt directives, and OpenAI provides clear documentation on how to opt out of crawling. The bot does not execute JavaScript, does not render pages, and focuses on extracting text content from HTML pages. It follows links discovered in sitemaps and page content.

NORAD.io tracks GPTBot activity across its global sensor network, providing real-time visibility into crawl frequency, geographic distribution, and behavioral patterns. Many website operators use NORAD to monitor how aggressively GPTBot crawls their content and to enforce access policies through the Agent Passport Standard.

🎯 HOW TO DETECT GPTBOT

▸Check for 'GPTBot' in the User-Agent header string
▸Verify source IPs against OpenAI's published IP ranges (20.15.240.0/20, 40.83.2.64/28)
▸GPTBot does not execute JavaScript — if your bot detection relies on JS challenges, GPTBot will fail them
▸Crawl pattern is typically breadth-first across sitemaps
▸Does not load images, CSS, or other static assets

🌐 GPTBOT KNOWN IP RANGES

20.15.240.64/2820.15.240.80/2820.15.240.96/2820.15.240.176/2820.15.241.0/2820.15.242.128/2820.15.242.144/2840.83.2.64/28

Use these CIDR ranges to verify GPTBot identity at the network level. Always combine with User-Agent verification for accurate detection.

🔄 CRAWL BEHAVIOR

Crawls pages at moderate frequency. Respects robots.txt and rate limits. Fetches HTML content primarily. Does not execute JavaScript. Typical crawl intervals of several hours between revisits.

PURPOSE

Collects publicly available web content to train and improve OpenAI's GPT language models including GPT-4 and future versions. Data is used for pre-training and fine-tuning.

🤖 ROBOTS.TXT CONFIGURATION

User-agent: GPTBot
Disallow: /private/
Disallow: /api/

# To block completely:
# User-agent: GPTBot
# Disallow: /

GPTBot respects robots.txt directives. Add this to your robots.txt file at the root of your domain.

→ Complete Guide: robots.txt for AI Bots

🗺️ WHERE IS GPTBOT ACTIVE?

⚠️ RELATED THREATS

Prompt Injection

Attempts to override bot instructions via malicious content embedded in web pages

Data Exfiltration

Bots attempting to extract sensitive data from websites including PII and credentials

Credential Stuffing

Automated login attempts using leaked credentials from data breaches

Aggressive Content Scraping

Bots aggressively scraping content beyond robots.txt limits and ToS

🔗 RELATED BOTS

ChatGPT-UserLOW

OpenAI · ChatGPT's real-time web browsing mode that fetches pages during conversations

OAI-SearchBotLOW

OpenAI · OpenAI's search grounding crawler — fetches pages for ChatGPT Search and SearchGPT results

📂 MORE 🔍 SEARCH & AI CRAWLERS

ClaudeBotAnthropic GooglebotGoogle Google-ExtendedGoogle BingbotMicrosoft BytespiderByteDance CCBotCommon Crawl

📚 RELATED GUIDES

How to Detect AI Bots →robots.txt for AI Bots →NORAD API Docs →

PROTECT YOUR WEBSITE

Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.

INSTALL SITETRUST →