COHERE-AI

LOW RISK🔍 SEARCH & AI CRAWLER

Cohere's web crawler for training enterprise AI and retrieval-augmented generation models

ORGANIZATION
Cohere
FIRST SEEN
2023-06
RESPECTS ROBOTS.TXT
✓ YES
DOCUMENTATION
cohere.com
DAILY VISITS
COUNTRIES ACTIVE
TRACKING
STATUS
LAST SEEN

📡 COHERE-AI USER-AGENT STRING

Mozilla/5.0 (compatible; cohere-ai; +https://cohere.com/crawler)

This is the User-Agent header sent by Cohere-AI in HTTP requests. Use this to identify Cohere-AI in your server access logs.

📋 ABOUT COHERE-AI

Cohere-AI is the web crawler operated by Cohere, a Canadian AI company focused on enterprise natural language processing. Cohere develops AI models for enterprise use cases including text generation (Command), semantic search (Embed), and search result ranking (Rerank). The crawler collects web data to train and improve these enterprise-focused models.

Cohere's approach to web crawling is generally less aggressive than consumer AI companies like OpenAI or Anthropic, reflecting its enterprise focus. The crawler respects robots.txt and crawls at moderate rates. Cohere has been transparent about its data practices and provides opt-out mechanisms for website operators.

NORAD.io tracks Cohere-AI as part of its comprehensive AI crawler monitoring. While Cohere-AI generates less traffic than GPTBot or ClaudeBot, it represents the growing ecosystem of AI companies that crawl the web for training data. NORAD provides visibility into all AI training crawlers to help site operators make comprehensive data access decisions.

🎯 HOW TO DETECT COHERE-AI

  • User-Agent contains 'cohere-ai'
  • Lower volume than major AI training crawlers (GPTBot, ClaudeBot)
  • Focuses on content-rich pages suitable for NLP training
  • Respects robots.txt directives
  • Enterprise-focused crawler — less aggressive than consumer AI crawlers

🔄 CRAWL BEHAVIOR

Moderate crawl rates. Respects robots.txt. Focuses on high-quality content pages. Does not execute JavaScript. Follows sitemaps for content discovery.

PURPOSE

Collects web data for training Cohere's enterprise AI models including Command, Embed, and Rerank. Cohere focuses on enterprise NLP and retrieval-augmented generation (RAG) use cases.

🤖 ROBOTS.TXT CONFIGURATION

User-agent: cohere-ai
Disallow: /

# Or allow with restrictions:
# User-agent: cohere-ai
# Disallow: /private/
# Allow: /

Cohere-AI respects robots.txt directives. Add this to your robots.txt file at the root of your domain.

🗺️ WHERE IS COHERE-AI ACTIVE?

⚠️ RELATED THREATS

📂 MORE 🔍 SEARCH & AI CRAWLERS

📚 RELATED GUIDES

PROTECT YOUR WEBSITE

Deploy SiteTrust to monitor and control AI bot access to your site with the Agent Passport Standard.

INSTALL SITETRUST →