Robots.txt for AI Search: What You Must Change in 2026
Robots.txt was designed for Googlebot. But in 2026, dozens of AI crawlers index your site to train and retrieve content for ChatGPT, Perplexity, Claude, and Google's AI Overviews. Most sites haven't updated their robots.txt since AI crawlers emerged — and are silently blocking the very bots that decide whether they get cited. Here's how to fix it.
Why robots.txt matters for AI search (and why most sites get it wrong)
Traditional robots.txt guides teach you to allow Googlebot and optionally block scrapers. But since 2023, every major AI system has its own crawler: OpenAI uses GPTBot and ChatGPT-User, Anthropic uses ClaudeBot, Perplexity uses PerplexityBot, Meta uses Meta-ExternalAgent, Apple uses Applebot-Extended, and Google's AI systems use both Googlebot and Google-Extended. A misconfigured robots.txt — including a wildcard disallow (User-agent: * / Disallow: /) — blocks every one of these crawlers from reading and indexing your content. If AI crawlers can't read your site, you cannot appear in AI-generated answers. Period.
The AI crawlers you need to allow (2026 complete list)
Add these explicit allow rules to your robots.txt to ensure full AI indexing coverage: User-agent: GPTBot Allow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: / User-agent: ClaudeBot Allow: / User-agent: anthropic-ai Allow: / User-agent: PerplexityBot Allow: / User-agent: Google-Extended Allow: / User-agent: Applebot-Extended Allow: / User-agent: Meta-ExternalAgent Allow: / Only restrict pages that genuinely shouldn't be indexed by AI: internal dashboards, checkout flows, admin routes, and private user data. Everything else should be accessible.
How robots.txt affects Google AI Overviews specifically
Google's AI Overviews use two distinct crawlers: Googlebot (traditional) and Google-Extended (AI training and retrieval). If you've blocked Google-Extended in your robots.txt — intentionally or accidentally — your pages will still rank in blue-link results but won't appear in AI Overviews. This is a hidden indexing split that very few sites are aware of. To check: open your robots.txt and search for 'Google-Extended'. If you see 'Disallow: /', change it to 'Allow: /' immediately.
llms.txt — the emerging standard for AI-readable site maps
Beyond robots.txt, a new standard called llms.txt is gaining traction. Modelled on robots.txt, it's a plain-text file at yourdomain.com/llms.txt that tells AI systems which pages are most important, what your site is about, and how your content is structured. Early adopters report improved citation rates because AI systems can prioritise their crawling more effectively. Format: a brief markdown file listing your key pages with one-line descriptions. OptiAISEO's AEO audit checks for llms.txt presence and helps you generate one.
How to audit your robots.txt for AI readiness (step by step)
Step 1: Open yourdomain.com/robots.txt in a browser. Step 2: Look for any 'Disallow: /' rules under 'User-agent: *' — these block every crawler including AI bots. Step 3: Search for each AI crawler name listed above. If they're absent, they inherit the wildcard rules. Step 4: Add explicit 'Allow: /' rules for every AI crawler you want to index your site. Step 5: Re-validate using Google Search Console's robots.txt tester, and check your OptiAISEO AEO audit score — it now includes an AI crawler accessibility check.