META-EXTERNALAGENT
MEDIUM RISK🔍 SEARCH & AI CRAWLERMeta's AI training crawler — collects web data for Meta's LLaMA and AI models
📡 META-EXTERNALAGENT USER-AGENT STRING
Mozilla/5.0 (compatible; Meta-ExternalAgent/1.0; +https://developers.facebook.com/docs/sharing/webmasters/crawler)
This is the User-Agent header sent by Meta-ExternalAgent in HTTP requests. Use this to identify Meta-ExternalAgent in your server access logs.
📋 ABOUT META-EXTERNALAGENT
Meta-ExternalAgent is Meta's dedicated AI training crawler, used to collect web content for training the LLaMA family of large language models and other Meta AI products. This crawler is distinct from FacebookBot (facebookexternalhit), which fetches pages for link previews on Facebook and Instagram.
Meta introduced Meta-ExternalAgent as a separate User-Agent token to give website operators independent control over AI training data collection versus social media link previews. This mirrors the approach taken by Google (Google-Extended vs Googlebot) and OpenAI (GPTBot vs ChatGPT-User). Blocking Meta-ExternalAgent prevents your content from being used in LLaMA training without affecting Facebook or Instagram link preview functionality.
NORAD.io classifies Meta-ExternalAgent as medium risk due to the scale of Meta's AI training operations and the volume of data collected. NORAD tracks Meta-ExternalAgent separately from FacebookBot to help site operators implement precise access policies — allowing social media functionality while controlling AI training data access.
🎯 HOW TO DETECT META-EXTERNALAGENT
- ▸User-Agent contains 'Meta-ExternalAgent'
- ▸Distinct from 'facebookexternalhit' (link preview bot)
- ▸Crawls systematically rather than on-demand
- ▸Shares Meta's IP infrastructure with FacebookBot
- ▸Blocking Meta-ExternalAgent does not affect Facebook/Instagram link previews
🌐 META-EXTERNALAGENT KNOWN IP RANGES
69.63.176.0/2066.220.144.0/20Use these CIDR ranges to verify Meta-ExternalAgent identity at the network level. Always combine with User-Agent verification for accurate detection.
🔄 CRAWL BEHAVIOR
Systematic crawling for AI training data. Moderate to high request rates. Separate from FacebookBot (link previews). Respects robots.txt with its own User-Agent token. Does not execute JavaScript.
Collects web content for training Meta's AI models including LLaMA, Llama 2, Llama 3, and Meta AI assistant products. This is Meta's AI training crawler, distinct from FacebookBot which generates link previews.
🤖 ROBOTS.TXT CONFIGURATION
# Block AI training but keep Facebook link previews: User-agent: Meta-ExternalAgent Disallow: / User-agent: facebookexternalhit Allow: /
Meta-ExternalAgent respects robots.txt directives. Add this to your robots.txt file at the root of your domain.
🗺️ WHERE IS META-EXTERNALAGENT ACTIVE?
⚠️ RELATED THREATS
Attempts to override bot instructions via malicious content embedded in web pages
Data ExfiltrationBots attempting to extract sensitive data from websites including PII and credentials
Credential StuffingAutomated login attempts using leaked credentials from data breaches
Aggressive Content ScrapingBots aggressively scraping content beyond robots.txt limits and ToS
🔗 RELATED BOTS
📂 MORE 🔍 SEARCH & AI CRAWLERS
📚 RELATED GUIDES
PROTECT YOUR WEBSITE
Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.
INSTALL SITETRUST →