Run free audit
Technical Standards

robots.txt for AI Crawlers

robots.txt for AI crawlers is the use of the standard /robots.txt file to allow or block specific AI user agents such as GPTBot, ClaudeBot, PerplexityBot and Google-Extended, controlling which AI systems can access the sites content for training or real-time grounding.

Also known as:AI robots.txt, robots.txt AI rules, AI crawler access policy

robots.txt is an older web standard that lets a site tell crawlers which paths they may or may not fetch. AI companies have adopted it for their own crawlers and publish the user agent names they use. A site that wants to manage AI access lists those user agents in robots.txt with explicit Allow or Disallow rules.

Two categories of decision matter. The first is whether to allow training crawlers, which fetch content to be included in future model training. Blocking them protects content from being absorbed into training data but does not affect whether current AI products can answer about the brand. The second is whether to allow real-time crawlers, which fetch pages on demand to ground specific answers. Blocking real-time crawlers usually removes the site from the AI products answer surface entirely.

Common user agents to consider include GPTBot, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, Perplexity-User, Google-Extended (controls AI use specifically), CCBot (Common Crawl, used by many downstream models), Applebot-Extended and Bytespider. The list changes over time and a robots.txt that wants to remain meaningful needs occasional review.

Key points

  • robots.txt controls which AI crawlers may access which paths on a site.
  • Distinguishes (in practice) between training and real-time crawlers.
  • Blocking real-time crawlers usually removes the site from AI answer surfaces.
  • Crawler user-agent names change, so the file needs periodic review.

Frequently asked questions

How do I block AI crawlers in robots.txt?

Add a User-agent line naming the crawler (for example, User-agent: GPTBot) followed by a Disallow rule (Disallow: /). Repeat for each crawler you want to block. Common targets include GPTBot, ClaudeBot, PerplexityBot, Google-Extended and CCBot.

Will blocking AI crawlers remove my site from ChatGPT or Perplexity?

Blocking real-time crawlers usually removes the site from that products answer surface, because the product can no longer fetch your pages on demand. Blocking training crawlers does not affect current answers but prevents the content from contributing to future model versions.

What is Google-Extended?

Google-Extended is a robots.txt token used to control whether Google may use the sites content for its AI products (such as Bard/Gemini and AI Overviews) separately from its main search index. Disallowing Google-Extended opts out of AI use while keeping classic search indexing.

Related VisibAI tools

Related terms

AI Crawler
An AI crawler is an automated user agent operated by an AI company that fetches public web pages to use either for training large language models or for real-time grounding inside AI answers, with named examples including GPTBot, ClaudeBot, PerplexityBot, Google-Extended and CCBot.
llms.txt
llms.txt is a proposed plain-text file placed at the root of a website that gives large language models a concise, curated map of the sites most important pages and content sections, so AI systems can find the right pages without having to crawl the entire site.
Generative Engine Optimization (GEO)
Generative Engine Optimization (GEO) is the practice of shaping web content, structure and authority signals so that generative AI engines such as ChatGPT, Perplexity and Google AI Overviews recommend or cite a brand in their synthesized answers.
Brand Visibility (AI)
Brand visibility in AI refers to how often and how prominently a brand appears in answers produced by AI engines such as ChatGPT, Perplexity, Gemini and Google AI Overviews, measured across the queries that matter to the brand.
See how AI engines describe your brand.

Free audit. Score across ChatGPT, Perplexity, Gemini and Google AI Overviews.

Run a free audit
Back to the dictionary