AI2BOT

LOW RISK🔍 SEARCH & AI CRAWLER

Allen Institute for AI's crawler for academic AI research and open models

ORGANIZATION
Allen Institute for AI
FIRST SEEN
2023-09
RESPECTS ROBOTS.TXT
✓ YES
DOCUMENTATION
allenai.org
DAILY VISITS
COUNTRIES ACTIVE
TRACKING
STATUS
LAST SEEN

📡 AI2BOT USER-AGENT STRING

Mozilla/5.0 (compatible; AI2Bot/1.0; +https://allenai.org/crawler)

This is the User-Agent header sent by AI2Bot in HTTP requests. Use this to identify AI2Bot in your server access logs.

📋 ABOUT AI2BOT

AI2Bot is the web crawler operated by the Allen Institute for AI (AI2), a non-profit research institute founded by Paul Allen. AI2 develops open-source AI models and tools, including OLMo, one of the most transparent open language models available. AI2Bot crawls web content to build training datasets for these open research models.

AI2's approach to web data collection emphasizes transparency and reproducibility. Unlike commercial AI companies, AI2 publishes detailed information about its training datasets (like Dolma) and makes its models fully open-source. This transparency extends to AI2Bot's crawling practices, which are documented and designed to be respectful of site operator preferences.

NORAD.io monitors AI2Bot as part of the growing ecosystem of AI training crawlers. While AI2Bot generates lower traffic than commercial AI crawlers, tracking its activity helps site operators understand the full landscape of organizations using their content for AI development — including the academic and open-source AI community.

🎯 HOW TO DETECT AI2BOT

  • User-Agent contains 'AI2Bot'
  • Lower crawl volume than commercial AI crawlers
  • Batch crawling patterns — periodic rather than continuous
  • Focuses on English-language content
  • Associated with academic/non-profit AI research

🔄 CRAWL BEHAVIOR

Moderate crawl rates focused on research-relevant content. Respects robots.txt. Periodic batch crawling rather than continuous. Primarily targets English-language content.

PURPOSE

Collects web data for training open AI models developed by the Allen Institute for AI (AI2), including OLMo and other open-source language models. AI2 focuses on open, reproducible AI research.

🤖 ROBOTS.TXT CONFIGURATION

User-agent: AI2Bot
Allow: /

# AI2 produces open-source AI models
# To block:
# User-agent: AI2Bot
# Disallow: /

AI2Bot respects robots.txt directives. Add this to your robots.txt file at the root of your domain.

🗺️ WHERE IS AI2BOT ACTIVE?

⚠️ RELATED THREATS

📂 MORE 🔍 SEARCH & AI CRAWLERS

📚 RELATED GUIDES

PROTECT YOUR WEBSITE

Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.

INSTALL SITETRUST →