Independent comparison — not affiliated with any listed provider
Reference Last updated: April 2026

AI Crawler Directory — Every Bot, What It Does, How to Allow It

Complete directory of every known AI crawler: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and more. Includes what each crawler does, which platform it serves, and recommended robots.txt configuration.

OM
Oliver Mackman
AI Search Analyst

There are currently 12 known AI crawlers operated by OpenAI, Anthropic, Google, Perplexity, Meta, Apple, Amazon, ByteDance, and Common Crawl. Each crawler serves a different purpose — from training AI models to powering real-time search. Blocking the wrong crawler can remove your site from AI search results entirely. This directory covers every crawler, what it does, and whether you should allow it.

Complete AI crawler directory

Crawler Platform Purpose robots.txt name Recommended
GPTBot OpenAI Training data collection for GPT models GPTBot Allow (with caution)
OAI-SearchBot OpenAI Real-time search indexing for ChatGPT search OAI-SearchBot Allow
ChatGPT-User OpenAI Fetches pages when users share URLs in ChatGPT or use custom GPTs ChatGPT-User Allow
ClaudeBot Anthropic Training data and web retrieval for Claude ClaudeBot Allow
PerplexityBot Perplexity Indexing for Perplexity AI search PerplexityBot Allow
Google-Extended Google Training data for Gemini and AI Overviews Google-Extended Allow
GoogleOther Google Additional crawling for Google AI features and research GoogleOther Allow
CCBot Common Crawl Open training data used by many AI models CCBot Optional
Bytespider ByteDance / TikTok Training data for ByteDance AI models Bytespider Optional
Amazonbot Amazon Indexing for Alexa answers and Rufus shopping AI Amazonbot Allow (ecommerce)
FacebookBot Meta Crawling for Meta AI features and link previews FacebookBot Allow
AppleBot-Extended Apple Training data for Siri and Apple Intelligence features Applebot-Extended Allow

Understanding the three types of AI crawler

Training crawlers

These crawlers collect content to train AI models. Blocking them prevents your content from being included in future model training but does not affect current AI search results. Examples: GPTBot, Google-Extended, CCBot, Bytespider.

Search indexing crawlers

These crawlers index your content for real-time AI search. Blocking them removes your site from that platform's AI search results entirely. Examples: OAI-SearchBot, PerplexityBot, Amazonbot.

User-triggered crawlers

These crawlers fetch pages when a user specifically requests it (e.g., sharing a URL in ChatGPT). Blocking them prevents the AI from reading pages users share. Example: ChatGPT-User.

Recommended robots.txt configuration

The following configuration allows all AI search crawlers (so your content appears in AI search results) while optionally blocking training-only crawlers if you prefer not to contribute to model training.

Option 1: Allow all AI crawlers (recommended)

# AI Search Crawlers — Allow All
# This ensures maximum AI search visibility

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: GoogleOther
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: FacebookBot
Allow: /

User-agent: Applebot-Extended
Allow: /

Option 2: Allow search crawlers, block training-only crawlers

# AI Search Crawlers — Allow search, block training
# Maintains AI search visibility while limiting training use

# ALLOW — these power AI search results
User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: FacebookBot
Allow: /

# BLOCK — these are training-only
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

Key decisions and trade-offs

Should you block GPTBot?

GPTBot collects training data for future GPT models. Blocking it will not remove you from current ChatGPT search results (that is OAI-SearchBot). However, allowing GPTBot means your content may influence how future GPT models understand your industry and brand — which could be beneficial for long-term AI visibility. Most AI search agencies recommend allowing GPTBot.

Does blocking Google-Extended affect Google Search?

No. Google-Extended is separate from Googlebot. Blocking Google-Extended will prevent your content from being used by Gemini and AI Overviews training but will not affect your Google Search rankings. However, you will lose potential visibility in Gemini responses.

What about CCBot and Common Crawl?

Common Crawl is an open dataset used to train many AI models. Blocking CCBot prevents your content from appearing in the Common Crawl dataset. The trade-off: many smaller AI models and research projects use Common Crawl data, so blocking it reduces your content's reach across the broader AI ecosystem.

How to check which AI crawlers are visiting your site

Check your server access logs for these user-agent strings. Most hosting platforms and CDNs (Cloudflare, Vercel, Netlify) provide bot traffic reports that can identify AI crawler visits. If you use Google Search Console, note that it only reports on Googlebot and Google-Extended — not third-party AI crawlers.

For more on configuring your site for AI crawlers, see our guide on robots.txt configuration for AI search.

Frequently asked questions

How many AI crawlers are there?

There are currently 12 known AI crawlers from major platforms: three from OpenAI (GPTBot, OAI-SearchBot, ChatGPT-User), two from Google (Google-Extended, GoogleOther), and one each from Anthropic, Perplexity, Amazon, Meta, Apple, ByteDance, and Common Crawl. New crawlers are announced periodically as AI platforms expand.

Will blocking AI crawlers hurt my SEO?

Blocking AI crawlers does not directly affect traditional SEO rankings. Google has confirmed that blocking Google-Extended does not impact Google Search rankings. However, blocking AI search crawlers (like OAI-SearchBot or PerplexityBot) will prevent your content from appearing in those AI platforms' search results — which is an increasingly important traffic source.

Should I allow all AI crawlers?

For maximum AI search visibility, yes. Allowing all 12 crawlers ensures your content can appear across ChatGPT, Gemini, Perplexity, Claude, and other AI platforms. The only reason to block specific crawlers is if you have concerns about your content being used for AI model training — in which case, block training crawlers (GPTBot, CCBot, Bytespider) while allowing search crawlers (OAI-SearchBot, PerplexityBot).

OM

Oliver Mackman

AI Search Analyst, SEOCompare

Oliver leads SEOCompare's editorial and comparison research. With over a decade in digital marketing, he oversees agency evaluation, tool testing, and AI search data analysis.

Last reviewed: 7 April 2026

Need help with AI search visibility?

Get a free AI visibility audit to see how your business appears across ChatGPT, Gemini, Perplexity, and AI Overviews.

Request your free audit

AI Search Agencies Worldwide

We compare agencies across 12 countries. Click a location to see local ratings.