---
description: Which AI bots to block vs. allow in robots.txt. Protect your IP from training crawlers while staying visible in ChatGPT and Perplexity.
title: AI Bot robots.txt Guide: Block vs. Allow GPTBot &amp; ClaudeBot
image: https://www.mersel.ai/blog-covers/Software%20code%20testing-cuate.svg
---

Platform

[GEO content agentWe write the content so AI recommends you](/platform/content-agent)[AI visibility analyticsSee which AI platforms visit your site and mention your brand](/platform/visibility-analytics)[Agent-optimized pagesShow AI a version of your site built to get recommended](/platform/ai-optimized-pages)

[Blog](/blog)[Pricing](/#plan)[About](/about)[Contact Us](/contact)

Language

[English](/en/blog/how-to-block-or-allow-ai-bots-on-your-website)[繁體中文](/zh-TW/blog/how-to-block-or-allow-ai-bots-on-your-website)

[Home](/)[Blog](/blog)AI Bot robots.txt Guide: Block vs. Allow GPTBot & ClaudeBot

16 min read

# AI Bot robots.txt Guide: Block vs. Allow GPTBot & ClaudeBot

![Mersel AI Team](/_next/image?url=%2Fworks%2Fjoseph-headshot.webp&w=96&q=75)

Mersel AI Team

March 13, 2026

Book a Free Call

On this page

[Key Takeaways](#key-takeaways)[Why This Problem Keeps Getting Worse](#why-this-problem-keeps-getting-worse)[The Core Framework: Training Crawlers vs. Search Crawlers](#the-core-framework-training-crawlers-vs-search-crawlers)[Step-by-Step Implementation Guide](#step-by-step-implementation-guide)[Core Products](#core-products)[Key Comparisons and Use Cases](#key-comparisons-and-use-cases)[Contact](#contact)[When DIY Implementation Falls Short](#when-diy-implementation-falls-short)[The Managed Path: What Full-Stack AI Crawler Optimization Looks Like](#the-managed-path-what-full-stack-ai-crawler-optimization-looks-like)[FAQ](#faq)[Sources](#sources)[Ready to See Your Real AI Traffic?](#ready-to-see-your-real-ai-traffic)[Related Reading](#related-reading)

## Should I Block or Allow AI Bots Like GPTBot and ClaudeBot on My Website?

**Block AI training crawlers. Allow AI search crawlers. That single distinction is the entire strategic framework.** Blanket blocking removes your brand from ChatGPT and Perplexity results entirely. Blanket allowing hands your proprietary content to model training datasets with no attribution, no backlinks, and no referral traffic in return.

This matters right now because the number of active AI bots has doubled since August 2023, and Cloudflare, which protects roughly 20% of all websites globally, began blocking AI crawlers by default on new domains in 2024\. Many technical SEO teams have perfectly configured `robots.txt` files that are being silently overridden at the CDN layer. The result is accidental invisibility in the exact AI systems your buyers use to build their vendor shortlists.

In this guide, you will get the exact `robots.txt` configuration to implement today, a step-by-step process for auditing your CDN and rendering stack, and a clear framework for when to use `llms.txt` to further structure your content for AI extraction.

![](/blog-covers/Software code testing-cuate.svg) 

## Key Takeaways

* **Training crawlers and search crawlers are different bots from the same company.** `GPTBot` trains OpenAI's models; `OAI-SearchBot` powers ChatGPT's live search results. Blocking one has zero effect on the other.
* **Approximately 27% of B2B SaaS and ecommerce websites are accidentally blocking major LLM crawlers** due to CDN-level rules, often without knowing it, according to research cited by ziptie.dev.
* **69% of AI crawlers cannot execute JavaScript**, according to research by Vercel and MERJ. If your site relies on client-side rendering, AI bots see a blank page regardless of your `robots.txt` settings.
* **Blocking `GPTBot` has no measurable impact on Google Search rankings**, based on publisher network analysis reviewed by Playwire, but blocking `OAI-SearchBot` removes you from ChatGPT search answers entirely.
* **AI-referred traffic converts 4.4x better than standard organic search**, according to data aggregated by Superlines, making visibility in AI search results a high-value pipeline source.
* **`llms.txt` adoption sits at around 10% of domains**, according to Ahrefs, but it is a zero-risk, low-effort signal that guides AI agents toward your highest-value content.

## Why This Problem Keeps Getting Worse

Gartner projects that traditional search engine volume will drop 25% by 2026 as generative AI platforms absorb informational queries. That shift is already visible in referral data: 60% of all Google searches end without a click, and organic click-through rates drop by up to 61% when a Google AI Overview appears for a query.

The buyers who do click from AI-generated answers are significantly more qualified. They have already consumed an AI-curated summary, evaluated alternatives, and arrived at your site with intent. But you only capture that traffic if AI search bots can read and cite your content in the first place.

Most organizations are failing at this for three reasons that have nothing to do with content quality.

**Reason 1: They are treating all AI bots as one entity.** A brand manager reads a headline about AI scrapers and adds a blanket `Disallow: /` for every user agent with "AI" or "Bot" in the name. This blocks `OAI-SearchBot` alongside `GPTBot`, removing the brand from ChatGPT's live search results entirely.

**Reason 2: Their CDN is overriding their `robots.txt` before bots even read it.** Cloudflare's AI blocking feature operates at the edge, returning a 403 Forbidden error to AI crawlers before the request reaches the origin server. A perfectly configured `robots.txt` is irrelevant when the firewall never lets the bot through.

**Reason 3: Their site is invisible to AI bots for rendering reasons.** Unlike Googlebot, which runs a full Chromium engine, major AI crawlers do not execute JavaScript. A React or Vue single-page application delivers a blank `<div id="root"></div>` to AI bots. Your content simply does not exist for them. To understand the full scope of how AI bots discover and read web pages, see our guide on [what an AI bot crawler actually is and how it works](/blog/what-is-an-ai-bot-crawler).

## The Core Framework: Training Crawlers vs. Search Crawlers

Every major AI company operates at least two distinct crawlers with completely separate functions. Confusing them is the root cause of most AI visibility failures.

Training CrawlersGPTBot (OpenAI)ClaudeBot (Anthropic)Google-ExtendedCCBot (Common Crawl)Harvests data for model weights.No attribution. No links. No traffic.BLOCKSearch & Citation CrawlersOAI-SearchBot (OpenAI)ChatGPT-User (OpenAI)PerplexityBotClaude-User / Claude-SearchBotRetrieves content to answer liveuser queries. Cites sources. Sends traffic.ALLOWSame parent company, entirely separate bots with independent controls 

_The diagram above shows the two categories of AI crawlers from the same parent companies. Training crawlers absorb content into model weights with no attribution. Search crawlers retrieve live content to cite in user-facing answers. Blocking the wrong category has the opposite of the intended effect._

OpenAI states this explicitly in its developer documentation: "OAI-SearchBot is used to surface websites in search results in ChatGPT's search features. Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers." Separately, OpenAI confirms that `GPTBot` is "used to crawl content that may be used in training" and that blocking it is entirely independent from search visibility.

"The key insight that most SEO teams miss is that these are independent systems," according to technical documentation from xseek.io. "A webmaster can block `GPTBot` to protect their IP while allowing `OAI-SearchBot` to remain visible in ChatGPT search results."

## Step-by-Step Implementation Guide

### Step 1: Configure Your `robots.txt` with Selective Access

Place this file at the root of your domain (`https://yourdomain.com/robots.txt`). The structure below explicitly separates search bots from training bots, which is the foundation everything else builds on.

```
# --------------------------------------------------------
# 1. ALLOW AI Search & Retrieval (For GEO / Visibility)
# --------------------------------------------------------
# OpenAI Search and User-Triggered Fetches
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /
 
# Anthropic Real-Time Fetches
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /
 
# Perplexity AI Search
User-agent: PerplexityBot
Allow: /
 
# You.com Search
User-agent: YouBot
Allow: /
 
# --------------------------------------------------------
# 2. BLOCK AI Bulk Training Data Crawlers (IP Protection)
# --------------------------------------------------------
# OpenAI Training
User-agent: GPTBot
Disallow: /
 
# Anthropic Training
User-agent: ClaudeBot
Disallow: /
 
# Google Generative AI Training (Does not impact Googlebot)
User-agent: Google-Extended
Disallow: /
 
# Common Crawl (Used by many open-source LLMs)
User-agent: CCBot
Disallow: /
 
# Meta/Facebook Training
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: FacebookBot
Disallow: /
 
# ByteDance/TikTok
User-agent: Bytespider
Disallow: /
 
# Apple Training
User-agent: Applebot-Extended
Disallow: /
 
# --------------------------------------------------------
# 3. Standard Search Engines (Unchanged)
# --------------------------------------------------------
User-agent: *
Allow: /
```

Changes to `robots.txt` typically take approximately 24 hours for OpenAI's systems to process and adjust search behavior. One critical note on Anthropic: avoid using the deprecated strings `Claude-Web` and `anthropic-ai`. These are no longer active user agents, and sites relying on them for blocking are not actually blocking the current `ClaudeBot`.

### Step 2: Audit and Disable CDN-Level AI Blocking

Once your `robots.txt` is configured, the next step is to verify your CDN is not silently overriding it. This is the step most teams skip, and it accounts for a significant share of accidental AI invisibility.

If you use Cloudflare, navigate to Security > Bots (or the "Control AI Crawlers" section in your dashboard). Set "Block AI training bots" to allow crawlers, or configure WAF rules to explicitly allowlist `OAI-SearchBot` and `PerplexityBot` by user agent string. Also verify that "Manage your robots.txt" within Cloudflare is disabled if you want your origin server's custom file to take precedence.

Research cited by ziptie.dev indicates that approximately 27% of B2B SaaS and ecommerce websites are accidentally blocking major LLM crawlers at the CDN layer. If your site sits behind Cloudflare, Fastly, or a managed SaaS platform like Shopify or Wix that handles edge security, audit this before assuming your `robots.txt` is working.

### Step 3: Verify Bot Authentication Against IP Ranges

Because malicious scrapers frequently spoof user agent strings, a sophisticated `robots.txt` alone is not a complete defense against unauthorized harvesting. Both OpenAI and Anthropic publish JSON feeds of their legitimate IP address ranges: `openai.com/gptbot.json` and `openai.com/searchbot.json`. You can use these feeds within WAF configurations or bot management platforms to authenticate legitimate AI search crawlers while blocking spoofed requests that claim to be `OAI-SearchBot` but originate from unauthorized IP ranges.

### Step 4: Fix the JavaScript Rendering Problem

Research by Vercel and MERJ reveals that 69% of AI crawlers cannot execute JavaScript. This is not a minor edge case. If your marketing site, product pages, or blog are rendered client-side using React, Vue, or Angular, an AI crawler visits your page and reads a blank `<div id="root"></div>`. Your carefully written content is invisible regardless of your `robots.txt` configuration.

The fix is server-side rendering (SSR). Frameworks like Next.js and Nuxt deliver fully rendered HTML in the initial response, which AI crawlers can parse as simple HTTP clients. Beyond the rendering layer, use semantic HTML structure (`<article>`, `<section>`, `<h1>`, `<h2>`) rather than nested `<div>` structures, and implement JSON-LD schema markup for Organization, Product, FAQPage, and Article types. Schema markup gives AI bots an explicit map of entity relationships so they do not have to infer them from prose. For a complete walkthrough of these structural requirements, see our guide on [how to structure your website for AI visibility](/blog/how-to-structure-my-website-for-ai-visibility).

### Step 5: Deploy an `llms.txt` File

Once the rendering and access layers are working correctly, `llms.txt` is a low-effort, zero-risk addition that guides AI agents to your highest-value pages. Place it at `yourdomain.com/llms.txt` in Markdown format. According to Ahrefs, adoption sits at around 10% of domains, so implementing it now is a differentiation signal even if direct citation correlation is still being studied.

```
# [Brand Name] - AI Agent Documentation
 
> [Brand Name] is a leading provider of [Category] for [Target Audience].
 
## Core Products
- [Product A]: Use case description. [/product-a]
- [Product B]: Use case description. [/product-b]
 
## Key Comparisons and Use Cases
- [Brand] vs [Competitor]: [/comparisons/competitor]
- Use Cases: [/use-cases]
 
## Contact
- Pricing: [/pricing]
- Sales: [/contact]
```

A secondary `llms-full.txt` file can concatenate all critical documentation into a single machine-readable file, which is particularly useful for AI agents operating within limited context windows.

**Why this sequence is correct:** You cannot benefit from `llms.txt` if AI bots are blocked at the CDN before they reach it. You cannot benefit from well-structured schema if the JavaScript rendering layer is hiding your content from bots. And you cannot benefit from any of these optimizations if your `robots.txt` is blocking the search crawlers you need to allow. The sequence flows from access to rendering to structure, each layer depending on the one before it.

This infrastructure work sits at the core of what we call generative engine optimization. For a broader view of how these signals combine to drive AI citation visibility, the Mersel AI guide on [generative engine optimization](https://www.mersel.ai/generative-engine-optimization) covers the full framework.

## When DIY Implementation Falls Short

The `robots.txt` configuration above is straightforward to copy. The harder parts are what follow it.

**CDN audit depth.** Most marketing teams do not have direct access to Cloudflare WAF rules or know which managed security rules are running at the edge. Identifying the specific rule silently blocking `PerplexityBot` often requires a backend engineer and server-level logging to confirm the 403 is happening.

**Rendering architecture changes.** Moving from client-side rendering to SSR is not a `robots.txt` edit. It is a development project. For teams with active sprint backlogs and no dedicated engineering bandwidth, this work tends to be deprioritized indefinitely, leaving the entire content investment invisible to AI crawlers.

**Keeping user agents current.** The list of active AI bot strings changes. Anthropic deprecated `Claude-Web` without broad announcement. New crawlers are launched as AI platforms expand their search features. Maintaining an accurate blocklist requires ongoing monitoring that most SEO teams do not have a process for.

**Verifying the system is actually working.** The typical way to confirm your configuration is correct is to review server logs for bot-specific 200 vs. 403 response codes, cross-reference against AI citation tracking, and monitor AI referral traffic in GA4\. Without that closed loop, teams often assume their configuration is working when AI bots are still being silently blocked.

## The Managed Path: What Full-Stack AI Crawler Optimization Looks Like

The Mersel AI approach addresses the gap between knowing the right `robots.txt` configuration and actually being visible to AI search engines in production.

The infrastructure layer deploys behind your existing site. AI crawlers like `OAI-SearchBot` and `PerplexityBot` receive a clean, server-side rendered, schema-rich version of your brand. Entity definitions are explicit. Product relationships are mapped with JSON-LD. The `llms.txt` file is configured and maintained. Human visitors see nothing different. No engineering sprints are required, and existing SEO, design, and UX are untouched.

One honest limitation to name: Mersel AI is a fully managed service, not a self-serve dashboard. Teams that need real-time prompt monitoring with direct UI access will find self-serve platforms like Profound or AthenaHQ more appropriate for that specific need. Mersel is built for teams that want the infrastructure deployed and the content published without pulling engineers or content managers into a new discipline they do not have bandwidth to own.

Alongside the infrastructure work, Mersel's content engine maps the actual prompts your buyers are typing into ChatGPT and Perplexity right now, the questions at the bottom of the funnel like "best alternative to \[competitor\] for a Series A SaaS company." Publish-ready articles go directly into your CMS on a continuous cadence, connected to a feedback loop from Google Search Console and GA4\. Posts get updated based on what is actually earning citations, not assumptions.

A mid-market B2B fintech client (unified finance OS, approximately 20 employees) reached a Category Share of Voice of 10.8% from a starting point of 3.1% in 92 days, with 94 AI citations across competitive fintech prompts and 20% of demo requests influenced by AI search. For a broader view of how AI referral traffic flows translate into pipeline, see our guide on [AI traffic analysis](/blog/how-to-measure-ai-visibility).

## FAQ

**Does blocking GPTBot hurt my Google Search rankings?**

No. Blocking `GPTBot` has no impact on Google Search rankings, according to publisher network analysis reviewed by Playwire. `GPTBot` is an OpenAI training crawler, entirely separate from Googlebot. Your Google rankings are determined by Googlebot's crawl and Google's ranking algorithm, neither of which is affected by your `GPTBot` directive. You can block `GPTBot` and `Google-Extended` simultaneously without touching your Google Search visibility.

**What happens if I block OAI-SearchBot by accident?**

According to OpenAI's developer documentation, "Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers." This means your content will not appear in ChatGPT's real-time search results even if `GPTBot` has previously crawled and indexed your content for training purposes. The two systems operate independently. Accidental blocking of `OAI-SearchBot` is one of the most common and highest-impact AI visibility errors.

**How do I know if my Cloudflare settings are blocking AI search bots?**

Log into the Cloudflare dashboard and navigate to Security > Bots or the "Control AI Crawlers" section. Check whether the AI scraper blocking feature is enabled. Then review your server logs for 403 response codes returned to `OAI-SearchBot`, `PerplexityBot`, or `Claude-User`. According to research cited by ziptie.dev, approximately 27% of B2B SaaS and ecommerce sites are unknowingly blocking major LLM crawlers at the CDN layer, making this audit a high-priority check even if your `robots.txt` is correctly configured.

**Do AI bots respect robots.txt at all?**

Major AI companies publicly commit to honoring `robots.txt` directives for their named crawlers. OpenAI and Anthropic both document this in their developer resources, and both publish JSON feeds of their legitimate IP address ranges for verification. However, `robots.txt` is an honor system. Malicious scrapers frequently spoof user agent strings and ignore `robots.txt` entirely. For content you genuinely need to protect, bot management platforms and WAF-level IP range authentication provide a stronger enforcement layer than `robots.txt` alone.

**Is llms.txt worth implementing if adoption is still low?**

Yes, for two reasons. First, it is a zero-risk, low-effort implementation that takes less than an hour to set up. Second, AI agents and LLM-powered search tools are increasingly designed to look for this file as a structured entry point to a site's content hierarchy. According to Ahrefs, only around 10% of domains have implemented `llms.txt`, so deploying it now is a meaningful differentiation signal. Direct correlation to increased citation frequency is still being studied, but there is no downside to giving AI systems a clean, structured map of your most important pages.

## Sources

1. [Gartner: Search Engine Volume Will Drop 25% by 2026](https://www.gartner.com/en/newsroom/press-releases/2024-02-19-gartner-predicts-search-engine-volume-will-drop-25-percent-by-2026-due-to-ai-chatbots-and-other-virtual-agents)
2. [Stronger Content: Gartner Search Engine Volume Decrease](https://strongercontent.com/gartner-search-engine-volume-to-decrease-by-25-thanks-to-ai/)
3. [Ahrefs: AI Bot Block Rates](https://ahrefs.com/blog/ai-bot-block-rates/)
4. [Superlines: AI Search Statistics](https://www.superlines.io/articles/ai-search-statistics/)
5. [Ziptie.dev: Technical SEO for AI Crawlability](https://ziptie.dev/blog/technical-seo-for-ai-crawlability/)
6. [Playwire: AI Scraping vs. Traditional SEO Crawling](https://www.playwire.com/blog/ai-scraping-vs-traditional-seo-crawling-what-publishers-need-to-know-about-blocking-ai)
7. [Vercel: The Rise of the AI Crawler](https://vercel.com/blog/the-rise-of-the-ai-crawler)
8. [SearchEngineWorld: Tracking OpenAI ChatGPT Bots](https://www.searchengineworld.com/tracking-openai-chatgpt-bots-a-fresh-guide-for-webmasters-site-owners-and-seos)
9. [OpenAI: Developer Documentation on Bots](https://developers.openai.com/api/docs/bots)
10. [Almcorp: Anthropic Claude Bots robots.txt Strategy](https://almcorp.com/blog/anthropic-claude-bots-robots-txt-strategy/)
11. [Lowtouch.ai: Cloudflare AI Data War](https://www.lowtouch.ai/cloudflare-just-fired-the-first-shot-in-the-ai-data-war/)
12. [llmrefs.com: Cloudflare Blocks AI Crawlers](https://llmrefs.com/blog/cloudflare-blocks-ai-crawlers)
13. [Searchviu: AI Crawlers JavaScript Rendering](https://www.searchviu.com/en/ai-crawlers-javascript-rendering/)
14. [Ahrefs: What Is llms.txt?](https://ahrefs.com/blog/what-is-llms-txt/)
15. [llmstxt.org: The llms.txt Standard](https://llmstxt.org/)

## Ready to See Your Real AI Traffic?

Your `robots.txt` might be configured correctly and your site still invisible to AI search bots. The CDN audit, the rendering check, and the citation tracking are where most teams discover the actual problem.

[Book a call with the Mersel AI team](/contact) to see exactly which AI crawlers are reaching your site, which prompts your buyers are using right now, and what is standing between your content and AI citations.

## Related Reading

* [How to Translate Human Website Content for AI Crawlers](/blog/how-to-translate-human-website-content-for-ai-crawlers)
* [Do I Need Code Changes for Generative Engine Optimization?](/blog/do-i-need-code-changes-for-generative-engine-optimization)
* [How to Update Your Knowledge Graph for LLMs](/blog/how-to-update-your-knowledge-graph-for-llms)

```json
{"@context":"https://schema.org","@graph":[{"@type":"BlogPosting","headline":"AI Bot robots.txt Guide: Block vs. Allow GPTBot & ClaudeBot","description":"Which AI bots to block vs. allow in robots.txt. Protect your IP from training crawlers while staying visible in ChatGPT and Perplexity.","image":{"@type":"ImageObject","url":"https://www.mersel.ai/blog-covers/Software code testing-cuate.svg","width":1200,"height":630},"author":{"@type":"Person","@id":"https://www.mersel.ai/about#joseph-wu","name":"Joseph Wu","jobTitle":"CEO & Founder","url":"https://www.mersel.ai/about","sameAs":"https://www.linkedin.com/in/josephwuu/"},"publisher":{"@id":"https://www.mersel.ai/#organization"},"datePublished":"2026-03-13","dateModified":"2026-03-13","mainEntityOfPage":{"@type":"WebPage","@id":"https://www.mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website"},"keywords":"AI bots, GPTBot, ClaudeBot, robots.txt, GEO, AI crawler, technical SEO, generative engine optimization","articleSection":"GEO","inLanguage":"en"},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.mersel.ai"},{"@type":"ListItem","position":2,"name":"Blog","item":"https://www.mersel.ai/blog"},{"@type":"ListItem","position":3,"name":"AI Bot robots.txt Guide: Block vs. Allow GPTBot & ClaudeBot","item":"https://www.mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website"}]},{"@type":"FAQPage","mainEntity":[{"@type":"Question","name":"Does blocking GPTBot hurt my Google Search rankings?","acceptedAnswer":{"@type":"Answer","text":"No. Blocking `GPTBot` has no impact on Google Search rankings, according to publisher network analysis reviewed by Playwire. `GPTBot` is an OpenAI training crawler, entirely separate from Googlebot. Your Google rankings are determined by Googlebot's crawl and Google's ranking algorithm, neither of which is affected by your `GPTBot` directive. You can block `GPTBot` and `Google-Extended` simultaneously without touching your Google Search visibility."}},{"@type":"Question","name":"What happens if I block OAI-SearchBot by accident?","acceptedAnswer":{"@type":"Answer","text":"According to OpenAI's developer documentation, \"Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers.\" This means your content will not appear in ChatGPT's real-time search results even if `GPTBot` has previously crawled and indexed your content for training purposes. The two systems operate independently. Accidental blocking of `OAI-SearchBot` is one of the most common and highest-impact AI visibility errors."}},{"@type":"Question","name":"How do I know if my Cloudflare settings are blocking AI search bots?","acceptedAnswer":{"@type":"Answer","text":"Log into the Cloudflare dashboard and navigate to Security > Bots or the \"Control AI Crawlers\" section. Check whether the AI scraper blocking feature is enabled. Then review your server logs for 403 response codes returned to `OAI-SearchBot`, `PerplexityBot`, or `Claude-User`. According to research cited by ziptie.dev, approximately 27% of B2B SaaS and ecommerce sites are unknowingly blocking major LLM crawlers at the CDN layer, making this audit a high-priority check even if your `robots.txt` is correctly configured."}},{"@type":"Question","name":"Do AI bots respect robots.txt at all?","acceptedAnswer":{"@type":"Answer","text":"Major AI companies publicly commit to honoring `robots.txt` directives for their named crawlers. OpenAI and Anthropic both document this in their developer resources, and both publish JSON feeds of their legitimate IP address ranges for verification. However, `robots.txt` is an honor system. Malicious scrapers frequently spoof user agent strings and ignore `robots.txt` entirely. For content you genuinely need to protect, bot management platforms and WAF-level IP range authentication provide a stronger enforcement layer than `robots.txt` alone."}},{"@type":"Question","name":"Is llms.txt worth implementing if adoption is still low?","acceptedAnswer":{"@type":"Answer","text":"Yes, for two reasons. First, it is a zero-risk, low-effort implementation that takes less than an hour to set up. Second, AI agents and LLM-powered search tools are increasingly designed to look for this file as a structured entry point to a site's content hierarchy. According to Ahrefs, only around 10% of domains have implemented `llms.txt`, so deploying it now is a meaningful differentiation signal. Direct correlation to increased citation frequency is still being studied, but there is no downside to giving AI systems a clean, structured map of your most important pages."}}]}]}
```