robots.txt for AI Bots
Configure your robots.txt correctly so GPTBot, ClaudeBot, PerplexityBot and other AI crawlers can access your content — and you get cited in AI-generated answers.
What is robots.txt?
robots.txt is a plain-text file placed at the root of your domain (e.g. https://example.com/robots.txt). It instructs web crawlers which pages or sections of your site they may or may not access. The file follows the Robots Exclusion Protocol — a standard supported by all major search engines and AI crawlers.
Each entry consists of a User-agent line identifying the bot, followed by one or more Allow or Disallow directives.
Why AI Bots Need Explicit Access
Most AI crawlers — including GPTBot (OpenAI), ClaudeBot (Anthropic), and PerplexityBot — respect robots.txt. If your file blocks them, either with a blanket Disallow: / under User-agent: * or by not explicitly allowing them, your content will not be indexed and will remain invisible to AI-generated answers.
Many websites have robots.txt files originally written to control search engine crawlers. These files often contain broad restrictions that unintentionally block AI bots. Auditing and updating your robots.txt is one of the highest-impact steps you can take for AI visibility.
Complete robots.txt Example
Copy this template and adapt it for your domain. The Sitemap line at the bottom helps crawlers discover your pages efficiently.
User-agent: *
Allow: /
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Amazonbot
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: cohere-ai
Allow: /
User-agent: anthropic-ai
Allow: /
Sitemap: https://example.com/sitemap.xml
Replace https://example.com/sitemap.xml with your actual sitemap URL. If you have multiple sitemaps, add one Sitemap: line per file.
How to Verify Your robots.txt
- Google Search Console — Use the built-in robots.txt Tester to check which rules apply to any given URL.
- curl — Run
curl -s https://yourdomain.com/robots.txtfrom a terminal to confirm the file is served correctly and with a200status code. - AI Visibility Scanner — Our free scanner checks your robots.txt as part of a full 19-point AI visibility audit.
Common Mistakes
- Blanket block:
User-agent: *followed byDisallow: /blocks all bots — including every AI crawler. This is often left over from development or staging environments. - Missing Sitemap line: Without a Sitemap declaration, crawlers have to discover pages through links alone. Always include your sitemap URL.
- Wrong file location: robots.txt must be at the root of your domain (
/robots.txt), not in a subdirectory. A file at/blog/robots.txtis ignored by crawlers. - Case sensitivity: The
User-agentbot names are case-sensitive.GPTBotis not the same asgptbot. - No-index via meta tag: Note that robots.txt controls crawl access; it does not control indexing. Use
<meta name="robots" content="noindex">tags orX-Robots-TagHTTP headers to prevent specific pages from being indexed even if they are crawled.
Official Sources
- Google — robots.txt specification and syntax
- OpenAI — GPTBot documentation
- Anthropic — ClaudeBot and anthropic-ai crawler docs
- Perplexity — How to get indexed by PerplexityBot