What is the difference between GPTBot and ChatGPT-User?

GPTBot crawls websites for OpenAI model training data. ChatGPT-User fetches pages in real-time when a ChatGPT user asks it to browse a specific URL. GPTBot is autonomous; ChatGPT-User is triggered by user requests.

GUIDEUPDATED FEBRUARY 2026

HOW TO DETECT AI BOTS ON YOUR WEBSITE

AI bots now account for over 20% of all web traffic. GPTBot, ClaudeBot, PerplexityBot, and dozens of other AI crawlers visit your site daily — for training data collection, real-time search grounding, and content indexing. Here's how to detect, monitor, and manage them.

METHOD 1: USER-AGENT STRING MATCHING

The most straightforward detection method. Most legitimate AI crawlers identify themselves via their User-Agent header. Here are the current (February 2026) User-Agent strings for major AI bots:

BOT	USER-AGENT STRING	PURPOSE
GPTBot	`GPTBot/1.2`	OpenAI model training
ChatGPT-User	`ChatGPT-User/1.0`	ChatGPT live browsing
OAI-SearchBot	`OAI-SearchBot/1.0`	OpenAI search grounding
ClaudeBot	`ClaudeBot/1.0`	Anthropic model training
Claude-Web	`claude-web`	Claude web browsing
PerplexityBot	`PerplexityBot/1.0`	Perplexity index building
Perplexity-User	`Perplexity-User`	Real-time search fetch
Google-Extended	`Google-Extended`	Gemini AI training
Bytespider	`Bytespider`	ByteDance/TikTok AI
Meta-ExternalAgent	`meta-externalagent/1.0`	Meta AI training
Cohere-AI	`cohere-ai`	Cohere model training
CCBot	`CCBot/2.0`	Common Crawl archive

NGINX DETECTION EXAMPLE

# /etc/nginx/conf.d/ai-bot-detection.conf
map $http_user_agent $is_ai_bot {
    default         0;
    ~*GPTBot        1;
    ~*ChatGPT-User  1;
    ~*OAI-SearchBot 1;
    ~*ClaudeBot     1;
    ~*claude-web    1;
    ~*PerplexityBot 1;
    ~*Bytespider    1;
    ~*Google-Extended 1;
    ~*CCBot         1;
    ~*meta-externalagent 1;
    ~*cohere-ai     1;
    ~*anthropic-ai  1;
}

server {
    # Log AI bots separately
    access_log /var/log/nginx/ai-bots.log combined if=$is_ai_bot;
    
    # Optional: rate limit AI bots
    limit_req_zone $is_ai_bot zone=ai_bots:10m rate=10r/s;
}

NEXT.JS MIDDLEWARE DETECTION

// middleware.ts
import { NextRequest, NextResponse } from 'next/server'

const AI_BOT_PATTERNS = [
  /GPTBot/i, /ChatGPT-User/i, /OAI-SearchBot/i,
  /ClaudeBot/i, /claude-web/i, /anthropic-ai/i,
  /PerplexityBot/i, /Perplexity-User/i,
  /Google-Extended/i, /Bytespider/i,
  /CCBot/i, /meta-externalagent/i, /cohere-ai/i,
]

export function middleware(request: NextRequest) {
  const ua = request.headers.get('user-agent') || ''
  const isAIBot = AI_BOT_PATTERNS.some(p => p.test(ua))
  
  if (isAIBot) {
    // Log, track, or report to NORAD
    console.log(`AI Bot detected: ${ua.slice(0, 100)}`)
    
    // Optional: report to NORAD
    fetch('https://api.clawbotden.com/api/v1/public/norad/ingest', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        bot_type: ua.match(AI_BOT_PATTERNS.find(p => p.test(ua))!)?.[0] || 'Unknown',
        user_agent: ua.slice(0, 500),
        decision: 'allow',
        page_url: request.nextUrl.href,
      }),
    }).catch(() => {})
  }
  
  return NextResponse.next()
}

METHOD 2: IP RANGE VERIFICATION

User-Agent strings can be spoofed. For high-confidence detection, verify the source IP against published crawler IP ranges. Several AI companies publish their crawler IPs:

OPENAI (GPTBot)

Published at openai.com/gptbot-ranges.txt

OpenAI publishes a JSON file of IP ranges that GPTBot and ChatGPT-User use. Verify by doing a reverse DNS lookup — legitimate requests resolve to *.openai.com.

GOOGLE (Googlebot / Google-Extended)

Published at developers.google.com/search/docs/crawling-indexing/verifying-googlebot

Reverse DNS must resolve to *.googlebot.com or *.google.com. Google publishes complete IP ranges as JSON.

MICROSOFT (Bingbot)

Published at bing.com/webmasters/help/how-to-verify-bingbot

Reverse DNS resolves to *.search.msn.com. IP ranges published via Bing Webmaster Tools.

ANTHROPIC (ClaudeBot)

No published IP ranges

Anthropic does not currently publish ClaudeBot IP ranges. Detection relies on User-Agent matching and behavioral analysis.

REVERSE DNS VERIFICATION (BASH)

#!/bin/bash
# Verify if an IP belongs to a known AI crawler
IP="66.249.66.1"

# Reverse DNS lookup
HOST=$(dig -x $IP +short)
echo "Reverse DNS: $HOST"

# Forward DNS verification
if [[ "$HOST" == *"googlebot.com." ]] || [[ "$HOST" == *"google.com." ]]; then
    FORWARD=$(dig $HOST +short)
    if [[ "$FORWARD" == "$IP" ]]; then
        echo "✅ Verified Googlebot"
    else
        echo "❌ Spoofed Googlebot"
    fi
elif [[ "$HOST" == *"openai.com." ]]; then
    echo "✅ Verified OpenAI crawler"
elif [[ "$HOST" == *"search.msn.com." ]]; then
    echo "✅ Verified Bingbot"
else
    echo "⚠️ Unknown origin: $HOST"
fi

METHOD 3: BEHAVIORAL FINGERPRINTING

Some AI bots disguise their User-Agent or use headless browsers. Behavioral fingerprinting detects them through JavaScript-based checks:

navigator.webdriver

True for automated browsers (Puppeteer, Playwright, Selenium)

window.chrome === undefined

Missing in headless Chrome environments

navigator.plugins.length === 0

Real browsers have plugins; headless ones don't

No mouse/touch events

Bots don't generate human interaction events

Canvas fingerprint anomalies

Headless browsers produce different canvas hashes

WebGL renderer = "SwiftShader"

Google's software renderer used in headless Chrome

METHOD 4: AUTOMATED MONITORING WITH NORAD

Instead of building custom detection, use NORAD's three-layer detection system. Install a single script tag or CMS plugin and get automatic detection of 35+ AI bots with real-time alerting and global analytics.

ONE-LINE INSTALLATION

<script src="https://norad.io/site-trust.js" 
  data-site-id="YOUR_SITE_ID" 
  data-mode="monitor" async></script>

35+

AI bots detected

Detection layers

<1ms

Detection latency

GET STARTED →API DOCS →BOT DATABASE →

FREQUENTLY ASKED QUESTIONS

How many AI bots are currently crawling the web?

As of February 2026, NORAD tracks 35+ distinct AI bots from OpenAI, Anthropic, Google, Meta, ByteDance, Perplexity AI, and others. AI bots account for approximately 5-20% of all website traffic.

Can AI bots spoof their User-Agent?

Yes. Some crawlers use standard browser User-Agent strings. This is why NORAD uses three detection layers: UA matching, IP verification, and behavioral fingerprinting.

Should I block AI crawlers?

Usually no. Blocking AI crawlers removes your content from AI-generated answers (ChatGPT, Perplexity, Claude). Monitor first, then decide. Use NORAD to see exactly who's crawling your site before making blocking decisions.

What's the difference between GPTBot and ChatGPT-User?

GPTBot crawls for model training (autonomous). ChatGPT-User fetches pages in real-time when a user asks ChatGPT to browse a URL (user-triggered). Different purposes, different frequencies.

How do I detect bots that don't identify themselves?

Behavioral fingerprinting: check navigator.webdriver, plugin count, canvas fingerprint, and interaction events. NORAD's SiteTrust.js automates this.