Skip to main content
Guide · Updated 2026

Robots.txt for AI Search: What You Must Change in 2026

Robots.txt was designed for Googlebot. But in 2026, dozens of AI crawlers index your site to train and retrieve content for ChatGPT, Perplexity, Claude, and Google's AI Overviews. Most sites haven't updated their robots.txt since AI crawlers emerged — and are silently blocking the very bots that decide whether they get cited. Here's how to fix it.

Why robots.txt matters for AI search (and why most sites get it wrong)

Traditional robots.txt guides teach you to allow Googlebot and optionally block scrapers. But since 2023, every major AI system has its own crawler: OpenAI uses GPTBot and ChatGPT-User, Anthropic uses ClaudeBot, Perplexity uses PerplexityBot, Meta uses Meta-ExternalAgent, Apple uses Applebot-Extended, and Google's AI systems use both Googlebot and Google-Extended. A misconfigured robots.txt — including a wildcard disallow (User-agent: * / Disallow: /) — blocks every one of these crawlers from reading and indexing your content. If AI crawlers can't read your site, you cannot appear in AI-generated answers. Period.

The AI crawlers you need to allow (2026 complete list)

Add these explicit allow rules to your robots.txt to ensure full AI indexing coverage: User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: / User-agent: Meta-ExternalAgent Allow: / Only restrict pages that genuinely shouldn't be indexed by AI: internal dashboards, checkout flows, admin routes, and private user data. Everything else should be accessible.

How robots.txt affects Google AI Overviews specifically

Google's AI Overviews use two distinct crawlers: Googlebot (traditional) and Google-Extended (AI training and retrieval). If you've blocked Google-Extended in your robots.txt — intentionally or accidentally — your pages will still rank in blue-link results but won't appear in AI Overviews. This is a hidden indexing split that very few sites are aware of. To check: open your robots.txt and search for 'Google-Extended'. If you see 'Disallow: /', change it to 'Allow: /' immediately.

llms.txt — the emerging standard for AI-readable site maps

Beyond robots.txt, a new standard called llms.txt is gaining traction. Modelled on robots.txt, it's a plain-text file at yourdomain.com/llms.txt that tells AI systems which pages are most important, what your site is about, and how your content is structured. Early adopters report improved citation rates because AI systems can prioritise their crawling more effectively. Format: a brief markdown file listing your key pages with one-line descriptions. OptiAISEO's AEO audit checks for llms.txt presence and helps you generate one.

How to audit your robots.txt for AI readiness (step by step)

Step 1: Open yourdomain.com/robots.txt in a browser. Step 2: Look for any 'Disallow: /' rules under 'User-agent: *' — these block every crawler including AI bots. Step 3: Search for each AI crawler name listed above. If they're absent, they inherit the wildcard rules. Step 4: Add explicit 'Allow: /' rules for every AI crawler you want to index your site. Step 5: Re-validate using Google Search Console's robots.txt tester, and check your OptiAISEO AEO audit score — it now includes an AI crawler accessibility check.

Ready to put this into practice?

Audit your robots.txt for AI search

No account required

Frequently asked questions

Does blocking AI crawlers hurt my Google rankings?+
Blocking GPTBot and ClaudeBot doesn't directly affect Google blue-link rankings — Googlebot is separate. But blocking Google-Extended does prevent your content from appearing in Google AI Overviews, which is a growing traffic source.
Should I allow all AI crawlers to train on my content?+
This is a business decision. Allowing training crawlers (GPTBot with 'model-training' purpose) means your content may be used to train future AI models. Allowing retrieval crawlers (ChatGPT-User, PerplexityBot) means your content can be cited in real-time answers. Most SEO-focused sites allow both. Publishers with proprietary content sometimes restrict training but allow retrieval.
What is Google-Extended and should I allow it?+
Google-Extended is Google's crawler specifically for AI products and training. Blocking it prevents your content from appearing in Google AI Overviews and Google Bard/Gemini responses. Unless you have a specific legal reason to block it, allowing Google-Extended is strongly recommended for organic AI visibility.
How do I check if my robots.txt is blocking AI crawlers?+
Visit yourdomain.com/robots.txt and look for wildcard rules (User-agent: *) with Disallow: / or Disallow: rules covering your key pages. Then check whether GPTBot, ClaudeBot, and PerplexityBot appear with explicit Allow rules. OptiAISEO's AEO audit automates this check and flags any AI crawler blocks.

Related guides

Robots.txt for AI Search: What You Must Change in 2026 | OptiAISEO | OptiAISEO