EN DE

robots.txt for AI Bots

Configure your robots.txt correctly so GPTBot, ClaudeBot, PerplexityBot and other AI crawlers can access your content — and you get cited in AI-generated answers.

What is robots.txt?

robots.txt is a plain-text file placed at the root of your domain (e.g. https://example.com/robots.txt). It instructs web crawlers which pages or sections of your site they may or may not access. The file follows the Robots Exclusion Protocol — a standard supported by all major search engines and AI crawlers.

Each entry consists of a User-agent line identifying the bot, followed by one or more Allow or Disallow directives.

Why AI Bots Need Explicit Access

Most AI crawlers — including GPTBot (OpenAI), ClaudeBot (Anthropic), and PerplexityBot — respect robots.txt. If your file blocks them, either with a blanket Disallow: / under User-agent: * or by not explicitly allowing them, your content will not be indexed and will remain invisible to AI-generated answers.

Many websites have robots.txt files originally written to control search engine crawlers. These files often contain broad restrictions that unintentionally block AI bots. Auditing and updating your robots.txt is one of the highest-impact steps you can take for AI visibility.

Two Types of AI Crawlers — and Why the Difference Matters

Not all AI bots work the same way. There are two fundamentally different mechanisms, and your robots.txt settings produce very different effects for each.

1. Live Crawlers (Retrieval)

These bots fetch your page in real time when a user asks the AI a question. Your content appears directly in the answer, often with a citation link. Examples:

Effect of robots.txt changes: immediate. Allow them today, get cited today. Block them today, disappear from AI answers today.

2. Training Crawlers

These bots collect data to train the next version of an AI model. Today's ChatGPT answers come from its training dataset — which is typically 6–12 months old. Blocking a training bot today does not affect what the AI already knows; it affects what the next model version will know. Examples:

Effect of robots.txt changes: delayed. Your changes only show up when the AI company releases its next trained model — which can take months.

Practical advice: Allow retrieval bots in any case — they bring real-time visibility with citations. Decide about training bots based on your content strategy: allow them if you want your work included in future AI knowledge; block them if you want to retain control over how your content is used.

Complete robots.txt Example

Copy this template and adapt it for your domain. The Sitemap line at the bottom helps crawlers discover your pages efficiently.

User-agent: *
Allow: /

# ── Live Crawlers (Retrieval — immediate effect) ──

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: Claude-SearchBot
Allow: /

# ── Training Crawlers (delayed effect until next model) ──

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: Bytespider
Allow: /

User-agent: Amazonbot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: anthropic-ai
Allow: /

Sitemap: https://example.com/sitemap.xml

Replace https://example.com/sitemap.xml with your actual sitemap URL. If you have multiple sitemaps, add one Sitemap: line per file.

How to Verify Your robots.txt

Common Mistakes

Official Sources

Test your robots.txt now — free AI Visibility Score 26 checks in under 30 seconds

Related Guides