BAIDUSPIDER
LOW RISK🔍 SEARCH & AI CRAWLERBaidu's web crawler — China's largest search engine
📡 BAIDUSPIDER USER-AGENT STRING
Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)
This is the User-Agent header sent by Baiduspider in HTTP requests. Use this to identify Baiduspider in your server access logs.
📋 ABOUT BAIDUSPIDER
Baiduspider is the web crawler for Baidu, China's largest search engine with approximately 60% domestic market share. Baiduspider indexes web content for Baidu Search, Baidu News, Baidu Image, and increasingly for Baidu's AI products including Ernie Bot, Baidu's large language model.
Baiduspider crawls from Chinese IP ranges and primarily targets Chinese-language content, though it does crawl international websites. For websites targeting the Chinese market, Baiduspider indexing is essential for visibility. The crawler respects robots.txt directives and can be verified through reverse DNS resolution to baidu.com or baidu.jp domains.
NORAD.io monitors Baiduspider activity globally, tracking crawl volumes and geographic patterns. For sites not targeting Chinese audiences, unexpected Baiduspider activity may be noteworthy. NORAD helps site operators make informed decisions about allowing or restricting Baiduspider access based on their audience and content strategy.
🎯 HOW TO DETECT BAIDUSPIDER
- ▸User-Agent contains 'Baiduspider/2.0'
- ▸Crawls from Chinese IP ranges (180.76.x.x, 119.63.x.x, 106.12.x.x)
- ▸Verify via reverse DNS — should resolve to *.baidu.com or *.baidu.jp
- ▸Multiple variants: Baiduspider-image, Baiduspider-video, Baiduspider-news
- ▸May crawl non-Chinese sites less frequently
🌐 BAIDUSPIDER KNOWN IP RANGES
180.76.0.0/16119.63.192.0/21106.12.0.0/15182.61.0.0/16Use these CIDR ranges to verify Baiduspider identity at the network level. Always combine with User-Agent verification for accurate detection.
🔄 CRAWL BEHAVIOR
Systematic crawling with moderate to high request rates. Respects robots.txt. Can be aggressive on large sites. Primarily targets Chinese-language content but crawls globally. Does not render JavaScript.
Indexes web content for Baidu Search, the dominant search engine in China with approximately 60% market share. Also powers Baidu's AI assistant Ernie Bot.
🤖 ROBOTS.TXT CONFIGURATION
User-agent: Baiduspider Allow: / # To block: # User-agent: Baiduspider # Disallow: /
Baiduspider respects robots.txt directives. Add this to your robots.txt file at the root of your domain.
🗺️ WHERE IS BAIDUSPIDER ACTIVE?
⚠️ RELATED THREATS
Attempts to override bot instructions via malicious content embedded in web pages
Data ExfiltrationBots attempting to extract sensitive data from websites including PII and credentials
Credential StuffingAutomated login attempts using leaked credentials from data breaches
Aggressive Content ScrapingBots aggressively scraping content beyond robots.txt limits and ToS
📂 MORE 🔍 SEARCH & AI CRAWLERS
📚 RELATED GUIDES
PROTECT YOUR WEBSITE
Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.
INSTALL SITETRUST →