META-EXTERNALAGENT

MEDIUM RISK🔍 SEARCH & AI CRAWLER

Meta's AI training crawler — collects web data for Meta's LLaMA and AI models

ORGANIZATION
Meta
FIRST SEEN
2024-01
RESPECTS ROBOTS.TXT
✓ YES
DOCUMENTATION
developers.facebook.com
DAILY VISITS
COUNTRIES ACTIVE
TRACKING
STATUS
LAST SEEN

📡 META-EXTERNALAGENT USER-AGENT STRING

Mozilla/5.0 (compatible; Meta-ExternalAgent/1.0; +https://developers.facebook.com/docs/sharing/webmasters/crawler)

This is the User-Agent header sent by Meta-ExternalAgent in HTTP requests. Use this to identify Meta-ExternalAgent in your server access logs.

📋 ABOUT META-EXTERNALAGENT

Meta-ExternalAgent is Meta's dedicated AI training crawler, used to collect web content for training the LLaMA family of large language models and other Meta AI products. This crawler is distinct from FacebookBot (facebookexternalhit), which fetches pages for link previews on Facebook and Instagram.

Meta introduced Meta-ExternalAgent as a separate User-Agent token to give website operators independent control over AI training data collection versus social media link previews. This mirrors the approach taken by Google (Google-Extended vs Googlebot) and OpenAI (GPTBot vs ChatGPT-User). Blocking Meta-ExternalAgent prevents your content from being used in LLaMA training without affecting Facebook or Instagram link preview functionality.

NORAD.io classifies Meta-ExternalAgent as medium risk due to the scale of Meta's AI training operations and the volume of data collected. NORAD tracks Meta-ExternalAgent separately from FacebookBot to help site operators implement precise access policies — allowing social media functionality while controlling AI training data access.

🎯 HOW TO DETECT META-EXTERNALAGENT

  • User-Agent contains 'Meta-ExternalAgent'
  • Distinct from 'facebookexternalhit' (link preview bot)
  • Crawls systematically rather than on-demand
  • Shares Meta's IP infrastructure with FacebookBot
  • Blocking Meta-ExternalAgent does not affect Facebook/Instagram link previews

🌐 META-EXTERNALAGENT KNOWN IP RANGES

69.63.176.0/2066.220.144.0/20

Use these CIDR ranges to verify Meta-ExternalAgent identity at the network level. Always combine with User-Agent verification for accurate detection.

🔄 CRAWL BEHAVIOR

Systematic crawling for AI training data. Moderate to high request rates. Separate from FacebookBot (link previews). Respects robots.txt with its own User-Agent token. Does not execute JavaScript.

PURPOSE

Collects web content for training Meta's AI models including LLaMA, Llama 2, Llama 3, and Meta AI assistant products. This is Meta's AI training crawler, distinct from FacebookBot which generates link previews.

🤖 ROBOTS.TXT CONFIGURATION

# Block AI training but keep Facebook link previews:
User-agent: Meta-ExternalAgent
Disallow: /

User-agent: facebookexternalhit
Allow: /

Meta-ExternalAgent respects robots.txt directives. Add this to your robots.txt file at the root of your domain.

🗺️ WHERE IS META-EXTERNALAGENT ACTIVE?

⚠️ RELATED THREATS

🔗 RELATED BOTS

📂 MORE 🔍 SEARCH & AI CRAWLERS

📚 RELATED GUIDES

PROTECT YOUR WEBSITE

Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.

INSTALL SITETRUST →