---
title: How to Block AI Bots in robots.txt: GPTBot, ClaudeBot & More (2026) | Mersel AI
site: Mersel AI
site_url: https://mersel.ai
description: A technical guide for B2B and ecommerce brands on configuring robots.txt and CDN settings to block AI training crawlers while maintaining visibility in AI search engines like ChatGPT and Perplexity.
page_type: blog
url: https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website
canonical_url: https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website
language: en
author: Mersel AI
breadcrumb: Home > Blog > How to Block or Allow AI Bots
date_modified: 2024-05-22
---

> Strategic AI bot management is critical as AI-referred traffic converts 4.4x better than standard organic search, yet 27% of B2B sites accidentally block these high-value crawlers at the CDN layer. With 69% of AI crawlers unable to execute JavaScript and Gartner projecting a 25% drop in traditional search volume by 2026, brands must implement a selective robots.txt framework that blocks training bots like GPTBot while allowing search bots like OAI-SearchBot. Proper configuration ensures your content is cited in AI answers rather than being harvested without attribution or rendered invisible by client-side code.

### Mersel AI Platform and Navigation
* [**Cite - Content engine:** Your dedicated website section that brings leads](/cite)
* [**AI visibility analytics:** See which AI platforms visit your site and mention your brand](/platform/visibility-analytics)
* [**Agent-optimized pages:** Show AI a version of your site built to get recommended](/platform/ai-optimized-pages)
* **Live AI Traffic:** 3 AI visits today (GPTBotOptimized, ClaudeBotOptimized, PerplexityBotOptimized)
* **System Status:** Chrome 122Original
* **Links:** [Home](/) | [Blog](/blog) | [Pricing](/pricing) | [Login](https://app.mersel.ai) | Book an Audit Call | Book a Free Call | Platform | Language

# How to Block AI Bots in robots.txt: GPTBot, ClaudeBot & More (2026)
**Read Time:** 18 min read | **Author:** Mersel AI Team | **Date:** March 13, 2026

**The core strategic framework for AI bot management is to block AI training crawlers while allowing AI search crawlers.** Blanket blocking removes your brand from ChatGPT and Perplexity results entirely. Conversely, blanket allowing hands your proprietary content to model training datasets with no attribution, no backlinks, and no referral traffic in return.

| AI Crawler Market Metric | Data Point |
| :--- | :--- |
| AI Bot Growth Rate | Doubled since August 2023 |
| Cloudflare Global Protection | Roughly 20% of all websites |
| Cloudflare Default Policy (2024) | Blocking AI crawlers by default on new domains |

The number of active AI bots has doubled since August 2023, necessitating precise crawler management. Cloudflare, which protects roughly 20% of all websites globally, began blocking AI crawlers by default on new domains in 2024. Many technical SEO teams have perfectly configured `robots.txt` files that are being silently overridden at the CDN layer, resulting in accidental invisibility in the exact AI systems buyers use to build vendor shortlists.

This guide provides the following resources for AI crawler optimization:
* The exact `robots.txt` configuration to implement today.
* A step-by-step process for auditing your CDN and rendering stack.
* A clear framework for when to use `llms.txt` to further structure your content for AI extraction.

## Quick Answer: Which AI Bots to Block vs. Allow

**Website owners should block training crawlers that harvest data without providing traffic while allowing search and citation crawlers that drive qualified users to the brand.** An AI crawler is an automated program operated by AI companies to harvest training data or fetch live content for user-facing answers. These two bot types have completely different impacts on visibility, requiring separate treatment within a site's `robots.txt` file.

| Bot Category | User Agents | Impact and Recommended Action |
| :--- | :--- | :--- |
| **Training Crawlers** | GPTBot (OpenAI), ClaudeBot (Anthropic), Google-Extended, CCBot (Common Crawl), Meta-ExternalAgent, Bytespider, Applebot-Extended | **Block**: These bots harvest content for model training but do not send traffic back to the source website. |
| **Search & Citation Crawlers** | OAI-SearchBot, ChatGPT-User (OpenAI), Claude-SearchBot, Claude-User (Anthropic), PerplexityBot, YouBot | **Allow**: These bots cite your brand and send qualified traffic through user-facing search and content fetches. |

Access the full copy-paste `robots.txt` configuration in [Step 1 below](#step-1-configure-your-robotstxt-with-selective-access). For a comprehensive user agent reference table including official documentation links, refer to [the next section](#official-ai-bot-user-agent-reference-2026).

## Key Takeaways

Training crawlers and search crawlers are distinct bots operated by the same company, and blocking one has zero effect on the functionality of the other.

| Bot Name | Primary Function | Impact of Blocking | Source/Evidence |
| :--- | :--- | :--- | :--- |
| GPTBot | Trains OpenAI's models | No measurable impact on Google Search rankings | Publisher network analysis reviewed by Playwire |
| OAI-SearchBot | Powers ChatGPT live search results | Removes site from ChatGPT search answers entirely | Direct system functionality |

- **27% of B2B SaaS and ecommerce websites accidentally block major LLM crawlers** due to CDN-level rules, often without the owners' knowledge, according to research cited by ziptie.dev. These unintentional blocks prevent AI models from accessing critical business data and product information.
- **69% of AI crawlers cannot execute JavaScript**, according to research by Vercel and MERJ. If a website relies heavily on client-side rendering, AI bots see a blank page regardless of robots.txt settings, effectively hiding the content from LLM discovery.
- **AI-referred traffic converts 4.4x better than standard organic search**, according to data aggregated by Superlines. This significant performance gap makes visibility in AI search results a high-value pipeline source for modern digital marketing and sales strategies.
- **llms.txt adoption sits at approximately 10% of domains**, according to Ahrefs. This emerging standard serves as a zero-risk, low-effort signal that effectively guides AI agents toward a website's highest-value content while improving overall machine readability.

## Why This Problem Keeps Getting Worse

**Gartner projects that traditional search engine volume will drop 25% by 2026 as generative AI platforms absorb informational queries.** This shift is already visible in referral data: 60% of all Google searches end without a click, and organic click-through rates drop by up to 61% when a Google AI Overview appears for a query.

Buyers who click from AI-generated answers are significantly more qualified because they have already consumed an AI-curated summary and evaluated alternatives. These users arrive at your site with high intent. However, you only capture this traffic if AI search bots can successfully read and cite your content in the first place.

Most organizations fail to capture AI traffic for three reasons that have nothing to do with content quality:

*   **Reason 1: Treating all AI bots as one entity.** Brand managers often add a blanket `Disallow: /` for every user agent with "AI" or "Bot" in the name. This blocks `OAI-SearchBot` alongside `GPTBot`, removing the brand from ChatGPT's live search results entirely.
*   **Reason 2: CDN overrides of `robots.txt` instructions.** Cloudflare's AI blocking feature operates at the edge, returning a 403 Forbidden error to AI crawlers before the request reaches the origin server. A perfectly configured `robots.txt` is irrelevant when the firewall never lets the bot through.
*   **Reason 3: Site invisibility due to rendering limitations.** Major AI crawlers do not execute JavaScript, which creates a significant visibility gap compared to traditional search engines like Googlebot.

| Feature | Googlebot | Major AI Crawlers |
| :--- | :--- | :--- |
| Rendering Engine | Full Chromium engine | No JavaScript execution |
| SPA Compatibility | Executes React/Vue | Delivers blank `<body>` |

React or Vue single-page applications deliver a blank `<body>` to AI bots, meaning your content simply does not exist for them. To understand the full scope of how AI bots discover and read web pages, see our guide on [what an AI bot crawler actually is and how it works](/blog/what-is-an-ai-bot-crawler).

## The Core Framework: Training Crawlers vs. Search Crawlers

Major AI companies operate at least two distinct crawlers with completely separate functions, and confusing these bots is the root cause of most AI visibility failures. Training crawlers absorb content directly into model weights without providing attribution. In contrast, search crawlers retrieve live content specifically to cite it in user-facing answers. Blocking the wrong category of crawler often produces the opposite of the intended visibility effect for a website.

| Feature | Training Crawlers | Search Crawlers |
| :--- | :--- | :--- |
| **Primary Function** | Absorb content into model weights | Retrieve live content for user answers |
| **Attribution** | No attribution provided | Cites sources in user-facing answers |
| **Impact of Blocking** | Protects intellectual property (IP) | Removes site from AI search results |

OpenAI developer documentation explicitly distinguishes between its two primary bots: `OAI-SearchBot` and `GPTBot`. `OAI-SearchBot` surfaces websites within ChatGPT's search features, and sites that opt out will not appear in ChatGPT search answers. Conversely, `GPTBot` crawls content used for model training. OpenAI confirms that blocking `GPTBot` is an entirely independent action that does not impact a site's visibility in search results.

Technical documentation from xseek.io highlights that these are independent systems, a key insight often missed by SEO teams. This separation allows webmasters to strategically manage their digital assets by blocking `GPTBot` to protect intellectual property while simultaneously allowing `OAI-SearchBot` to remain visible. This dual-track framework ensures that a brand maintains its presence in ChatGPT search results without contributing its data to underlying model training.

## Official AI Bot User Agent Reference (2026)

Verified user agent strings for 2026 allow site owners to distinguish between AI training crawlers and search citation bots. Official documentation links serve as the authoritative source for these frequently updated strings, which dictate how AI models interact with web content.

| AI company | Training crawler | Search / citation crawler | Recommended action |
| --- | --- | --- | --- |
| OpenAI | `GPTBot` ([docs](https://platform.openai.com/docs/bots)) | `OAI-SearchBot`, `ChatGPT-User` ([docs](https://platform.openai.com/docs/bots)) | Block GPTBot; allow OAI-SearchBot and ChatGPT-User |
| Anthropic | `ClaudeBot` ([docs](https://support.anthropic.com/en/articles/8896518)) | `Claude-SearchBot`, `Claude-User` ([docs](https://support.anthropic.com/en/articles/8896518)) | Block ClaudeBot; allow Claude-SearchBot and Claude-User |
| Google | `Google-Extended` ([docs](https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers)) | Uses Googlebot for AI Overviews | Block Google-Extended only — Googlebot still indexes for search |
| Perplexity | None (no separate training crawler) | `PerplexityBot`, `Perplexity-User` ([docs](https://docs.perplexity.ai/guides/bots)) | Allow both |
| Common Crawl | `CCBot` ([docs](https://commoncrawl.org/ccbot)) | N/A | Block — feeds many open-source LLM training sets |
| Meta | `Meta-ExternalAgent`, `FacebookBot` | N/A | Block both |
| ByteDance | `Bytespider` | N/A | Block |
| Apple | `Applebot-Extended` | Uses Applebot for Spotlight / Siri search | Block Applebot-Extended only |
| You.com | N/A | `YouBot` | Allow |

**Anthropic requires specific user agent configurations to ensure effective blocking and search visibility.** Site owners must avoid deprecated strings like `Claude-Web` and `anthropic-ai` as they are no longer active. The current 2026 active strings include `ClaudeBot` for training, `Claude-SearchBot` for search indexing, and `Claude-User` for per-user fetches initiated by Claude.ai.

### Step 1: Configure Your `robots.txt` with Selective Access

**Configuring a robots.txt file at the domain root provides the foundational layer for separating search bots from training crawlers.** This implementation ensures that intellectual property remains protected while maintaining visibility in Generative Engine Optimization (GEO) results. Place this file at `https://yourdomain.com/robots.txt` to explicitly define access permissions.

```

# --------------------------------------------------------

# 1. ALLOW AI Search & Retrieval (For GEO / Visibility)

# --------------------------------------------------------

# OpenAI Search and User-Triggered Fetches
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /

# Anthropic Real-Time Fetches
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /

# Perplexity AI Search
User-agent: PerplexityBot
Allow: /

# You.com Search
User-agent: YouBot
Allow: /

# --------------------------------------------------------

# 2. BLOCK AI Bulk Training Data Crawlers (IP Protection)

# --------------------------------------------------------

# OpenAI Training
User-agent: GPTBot
Disallow: /

# Anthropic Training
User-agent: ClaudeBot
Disallow: /

# Google Generative AI Training (Does not impact Googlebot)
User-agent: Google-Extended
Disallow: /

# Common Crawl (Used by many open-source LLMs)
User-agent: CCBot
Disallow: /

# Meta/Facebook Training
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: FacebookBot
Disallow: /

# ByteDance/TikTok
User-agent: Bytespider
Disallow: /

# --------------------------------------------------------

# 3. Standard Search Engines (Unchanged)

# --------------------------------------------------------
User-agent: *
Allow: /
```

**Robots.txt updates typically require approximately 24 hours for systems like OpenAI to process and adjust crawling behavior.** During this propagation period, existing crawler settings remain in effect. Administrators must verify that current active strings like `ClaudeBot` are targeted rather than inactive legacy strings such as `Claude-Web` or `anthropic-ai`.

### Step 2: Audit and Disable CDN-Level AI Blocking

## Technical Verification: CDN Configuration and Rendering Optimization

**Verify that your CDN is not silently overriding your `robots.txt` configuration to prevent accidental AI invisibility.** This verification step is frequently overlooked by technical teams and accounts for the largest share of unintended site exclusion from AI models. If your website sits behind Cloudflare, Fastly, Shopify, or Wix, you must audit these settings to ensure your crawler instructions are being respected.

**For Cloudflare users:**

| Step | Action |
| :--- | :--- |
| 1 | Navigate to Security > Bots (or the "Control AI Crawlers" section) in your dashboard. |
| 2 | Set "Block AI training bots" to allow crawlers, or configure WAF rules to explicitly allowlist OAI-SearchBot and PerplexityBot by user agent string. |
| 3 | Verify that "Manage your robots.txt" inside Cloudflare is disabled so your origin server's file takes precedence. |

**Approximately 27% of B2B SaaS and ecommerce websites are accidentally blocking major LLM crawlers at the CDN layer, according to research cited by ziptie.dev.** This misconfiguration often occurs even when the local `robots.txt` file is correctly configured. Auditing the CDN layer is essential for any site using managed platforms like Shopify or Wix to ensure AI search engines can access content.

### Step 3: Verify Bot Authentication Against IP Ranges

**Authenticate legitimate AI search crawlers using published IP address ranges to prevent malicious scrapers from spoofing user agent strings.** Because `robots.txt` alone does not provide a complete defense against unauthorized access, you should use official JSON feeds within your WAF or bot management platform to validate requests. Use these feeds to accept real crawlers and reject spoofed requests claiming to be `OAI-SearchBot` from unauthorized IP ranges:

*   **OpenAI training crawler:** openai.com/gptbot.json
*   **OpenAI search crawler:** openai.com/searchbot.json

### Step 4: Fix the JavaScript Rendering Problem

**Research by Vercel and MERJ reveals that 69% of AI crawlers cannot execute JavaScript, making client-side rendered content invisible to most models.** If your site is built using React, Vue, or Angular without server-side support, AI crawlers typically see a blank page. This rendering gap means your content remains unindexed regardless of your `robots.txt` permissions.

**The fix for JavaScript invisibility involves three core components:**

1.  **Server-side rendering (SSR):** Utilize frameworks such as Next.js or Nuxt that deliver fully rendered HTML in the initial response, allowing AI crawlers to parse content as simple HTTP clients.
2.  **Semantic HTML structure:** Implement `<header>`, `<main>`, `<article>`, and `<footer>` tags instead of nested `<div>` structures to provide AI bots with necessary structural cues.
3.  **JSON-LD schema markup:** Deploy schema for Organization, Product, FAQPage, and Article to provide AI bots with an explicit map of entity relationships.

For a complete walkthrough, see our guide on [how to structure your website for AI visibility](/blog/how-to-structure-my-website-for-ai-visibility).

### Step 5: Deploy an `llms.txt` File

**Deploy an `llms.txt` file as a low-effort, zero-risk method to guide AI agents toward your highest-value pages.** While the current adoption rate is only approximately 10% of domains according to Ahrefs, implementing this file now serves as a significant differentiation signal for AI discovery.

*   **File Location:** yourdomain.com/llms.txt
*   **File Format:** Markdown
*   **Current Adoption:** ~10% of domains (per Ahrefs)

# [Brand Name] - AI Agent Documentation

> [Brand Name] is a leading provider of [Category] for [Target Audience].

## Core Products
- [Product A]: Use case description. [/product-a]
- [Product B]: Use case description. [/product-b]

## Key Comparisons and Use Cases
- [Brand] vs [Competitor]: [/comparisons/competitor]
- Use Cases: [/use-cases]

## Contact
- Pricing: [/pricing]
- Sales: [/contact]

A secondary `llms-full.txt` file concatenates all critical documentation into a single machine-readable file. This format is specifically useful for AI agents operating within limited context windows that require a consolidated source of information to function effectively.

### Why this 5-step sequence is the right order
**The 5-step sequence follows a logical progression where each layer depends on the successful implementation of the previous step.** This infrastructure work sits at the core of [generative engine optimization](https://www.mersel.ai/generative-engine-optimization) by ensuring a seamless flow from access to rendering to structure.

| Optimization Layer | Dependency Failure |
| :--- | :--- |
| **Access (CDN)** | ❌ `llms.txt` doesn't help if CDN blocks the bot before it reaches your file. |
| **Rendering (JavaScript)** | ❌ Schema markup doesn't help if JavaScript rendering hides your content from bots. |
| **Structure (robots.txt)** | ❌ Rendering fixes don't help if robots.txt blocks the search crawlers you need. |

## When DIY Implementation Falls Short

**Technical execution of AI crawler management extends far beyond simple `robots.txt` configurations.** While copying code is straightforward, maintaining a functional environment requires deep integration across infrastructure, rendering, and monitoring. DIY approaches frequently fail to address the underlying complexities of server-level permissions and evolving bot identifiers.

**CDN audit depth requires specialized technical access and backend expertise.** Marketing teams typically lack direct access to Cloudflare WAF rules or knowledge of managed security rules running at the edge. Identifying a rule silently blocking `PerplexityBot` necessitates a backend engineer and server-level logging to confirm 403 error codes.

**Rendering architecture changes are complex development projects rather than simple text edits.** Moving from client-side rendering to Server-Side Rendering (SSR) demands significant engineering bandwidth. For teams with active sprint backlogs, this critical infrastructure work is often deprioritized indefinitely in favor of other features.

**Maintaining accurate user agent lists requires constant monitoring of the shifting AI landscape.** The list of active AI bot strings changes frequently; for example, Anthropic deprecated `Claude-Web` without a broad announcement. Most SEO teams lack the formal processes required to track new crawler launches as AI platforms expand search capabilities.

**Verifying system functionality requires a rigorous three-part closed-loop validation process.** Without these checks, teams mistakenly assume configurations work while AI bots remain silently blocked. The verification process includes:

*   Reviewing server logs for bot-specific 200 versus 403 response codes.
*   Cross-referencing configurations against AI citation tracking.
*   Monitoring AI referral traffic within GA4.

## The Managed Path: What Full-Stack AI Crawler Optimization Looks Like

The Mersel AI approach bridges the gap between robots.txt configuration and production visibility for AI search engines. **Pricing starts at $1,800/month** for managed execution of this full-stack optimization strategy. This service ensures your brand remains visible to AI crawlers while maintaining your existing SEO, design, and user experience without requiring internal engineering sprints.

### The infrastructure layer

The infrastructure layer deploys behind your existing site to provide AI crawlers with a server-side rendered, schema-rich version of your brand. Supported crawlers include `OAI-SearchBot`, `PerplexityBot`, `Claude-SearchBot`, and `Google-Extended`. This layer manages explicit entity definitions, JSON-LD product relationship mapping, and `llms.txt` file maintenance while verifying access across CDN and robots.txt configurations. Human visitors see no changes to the site experience.

### The content layer (Cite engine)

Mersel’s **Cite content engine** delivers **100+ high-intent pages and 20 backlinks over 6 months** by targeting actual buyer evaluation prompts rather than keyword guesses. Content is structured for AI citation with answer-first formatting, FAQ schema, and explicit entity relationships. The engine also secures third-party authority backlinks from sources AI engines frequently cite, using a feedback loop from Google Search Console and GA4 to update content based on earned citations.

### Real client outcomes

| Client | Vertical | Result | Timeframe |
| --- | --- | --- | --- |
| Series A fintech (~20 employees) | B2B SaaS | AI visibility 2.4% → 12.9%; non-branded citations +152%; **20% of demos AI-attributed** | 92 days |
| Publicly traded quantum computing company | B2B technical | 214 citations; **+16% QoQ AI-influenced enterprise leads** | 123 days |
| Mid-market beauty brand | DTC e-commerce | AI visibility 5.8% → 19.2%; AI-driven referral traffic +58% | 63 days |

For a broader view of how AI referral traffic translates into pipeline, see our guide on [AI traffic analysis](/blog/how-to-measure-ai-visibility).

### Honest limitation

Mersel AI is a fully managed service rather than a self-serve dashboard, designed for teams that want infrastructure and content deployed without internal resource strain. Organizations requiring real-time prompt monitoring or direct UI access should consider Profound or AthenaHQ. This approach is optimized for brands that prefer a hands-off implementation over managing a new technical discipline internally.

### Does blocking GPTBot hurt my Google Search rankings?

**No, blocking `GPTBot` does not hurt your Google Search rankings because it is an OpenAI training crawler entirely separate from Googlebot.** Your Google rankings are determined by Googlebot's crawl and Google's ranking algorithm, neither of which is affected by your `GPTBot` directive. According to publisher network analysis reviewed by Playwire, you can block `GPTBot` and `Google-Extended` simultaneously without impacting your Google Search visibility.

### What happens if I block OAI-SearchBot by accident?

**If you block `OAI-SearchBot` by accident, your content will not appear in ChatGPT's real-time search results, even if `GPTBot` has already crawled your content for training.** OpenAI documentation confirms that sites opted out of `OAI-SearchBot` are excluded from ChatGPT search answers. Because these two systems are independent, accidental blocking of the search-specific bot is a high-impact error for AI visibility.

### How do I know if my Cloudflare settings are blocking AI search bots?

**To determine if Cloudflare settings are blocking AI search bots, you must check your Security/Bots settings, review server logs for 403 responses, and cross-reference GA4 referral traffic.** Research from ziptie.dev indicates that approximately 27% of B2B SaaS and ecommerce sites unknowingly block major LLM crawlers at the CDN layer.

*   Log into Cloudflare and navigate to Security > Bots (or "Control AI Crawlers") to see if AI scraper blocking is enabled.
*   Review server logs for 403 responses specifically to `OAI-SearchBot`, `PerplexityBot`, or `Claude-User`.
*   Cross-reference these findings against AI referral traffic data in GA4 to identify unexpected visibility gaps.

### Do AI bots respect robots.txt at all?

Major AI companies including OpenAI and Anthropic publicly commit to honoring `robots.txt` for their named crawlers. These organizations document these protocols within their official developer resources and publish real-time JSON feeds of legitimate IP ranges to facilitate server-side verification.

The `robots.txt` standard operates strictly as an honor system that malicious scrapers often ignore by spoofing user agent strings. To protect sensitive content, organizations must deploy bot management platforms and WAF-level IP range authentication in addition to standard `robots.txt` files.

### Is llms.txt worth implementing if adoption is still low?

**Implementing llms.txt is highly recommended because it provides a low-cost, high-differentiation entry point for AI agents with zero risk to the domain.** This structured file helps AI systems navigate your content more efficiently than traditional methods.

| Factor | Impact and Data |
| :--- | :--- |
| **Implementation Cost** | Zero-risk; requires less than one hour to set up. |
| **Market Differentiation** | High; AI agents seek this file, yet only ~10% of domains have implemented it per Ahrefs. |

Researchers continue to study the direct correlation between llms.txt implementation and citation frequency in AI responses. Providing AI systems with a clean, structured map of your most important pages offers significant potential benefits with no technical downside.

## Sources and Technical References

1. Gartner: Search Engine Volume Will Drop 25% by 2026
2. Stronger Content: Gartner Search Engine Volume Decrease
3. Ahrefs: AI Bot Block Rates
4. Superlines: AI Search Statistics
5. Ziptie.dev: Technical SEO for AI Crawlability
6. Playwire: AI Scraping vs. Traditional SEO Crawling
7. Vercel: The Rise of the AI Crawler
8. SearchEngineWorld: Tracking OpenAI ChatGPT Bots
9. OpenAI: Developer Documentation on Bots
10. Almcorp: Anthropic Claude Bots robots.txt Strategy
11. Lowtouch.ai: Cloudflare AI Data War
12. llmrefs.com: Cloudflare Blocks AI Crawlers
13. Searchviu: AI Crawlers JavaScript Rendering
14. Ahrefs: What Is llms.txt?
15. llmstxt.org: The llms.txt Standard

## Ready to See Your Real AI Traffic?

Websites frequently remain invisible to AI search bots even when `robots.txt` files are configured correctly. Most teams identify the actual problems preventing AI visibility through comprehensive CDN audits, rendering checks, and citation tracking. These specific diagnostic areas reveal why content fails to appear in AI engine results despite standard configuration attempts.

[Book a call with the Mersel AI team](/contact) to gain full visibility into your site's AI performance and crawler interactions. This consultation identifies:
*   Exactly which AI crawlers are reaching your site.
*   The specific prompts your buyers are using right now.
*   The barriers standing between your content and AI citations.

## Related Reading

- How to Translate Human Website Content for AI Crawlers
- Do I Need Code Changes for Generative Engine Optimization?
- How to Update Your Knowledge Graph for LLMs

## Related Posts

### What Is an AI Bot Crawler and How Is It Different From Googlebot?

**AI bot crawlers and Googlebot serve fundamentally different purposes regarding site taxonomy, behavior gaps, and optimization requirements.** [Read the full article here (GEO · Mar 18)](/blog/what-is-an-ai-bot-crawler). This guide details the specific differences in how these bots interact with your site and how to ensure your content is accessible to both traditional search and AI-driven engines.

### What Is Retrieval Augmented Generation? Plain-English Guide

**Retrieval Augmented Generation (RAG) is the core architecture powering modern AI answers and generative search results.** [Read the full article here (GEO · Mar 18)](/blog/what-is-retrieval-augmented-generation). This plain-English guide explains the mechanics of RAG, its critical importance for modern SEO, and the specific steps required to optimize your content for this architecture.

### Your Website Content Isn't Written for AI — Here's Why That Matters

**AI engines cite structured, direct-answer content 3× more often than standard prose, yet most websites currently score below 40/100 on AI citability.** [Read the full article here (GEO · May 7)](/blog/website-content-not-written-for-ai). Learn why traditional content strategies fail to meet the requirements of generative engines and how to fix these issues to improve your visibility in AI search results.

### On this page

- Quick Answer: Which AI Bots to Block vs. Allow
- Key Takeaways
- Why This Problem Keeps Getting Worse
- The Core Framework: Training Crawlers vs. Search Crawlers
- Official AI Bot User Agent Reference (2026)
- Step-by-Step Implementation Guide
- Core Products
- Key Comparisons and Use Cases
- Contact
- When DIY Implementation Falls Short
- The Managed Path: What Full-Stack AI Crawler Optimization Looks Like
- FAQ
- Sources
- Ready to See Your Real AI Traffic?
- Related Reading

We help B2B businesses get inbound leads from AI search and Google.

| Partner Program | Resource Link |
| :--- | :--- |
| NVIDIA Inception | ![NVIDIA Inception [Cloudflare for Startups](/logos/cloudflare-startups-white.webp)](https://www.cloudflare.com/forstartups/) |
| Google Cloud for Startups | [![Google Cloud for Startups](/logos/CloudforStartups-3.webp)](https://cloud.google.com/startup) |

#### Learn
- [What is GEO?](/generative-engine-optimization)

#### Company
- [About](/about)
- [Blog](/blog)
- [Pricing](/pricing)
- [FAQs](/faqs)
- [Contact Us](/contact)
- [Login](/login)

#### Legal
- [Privacy Policy](/privacy)
- [Terms of Service](/terms)

#### Contact
San Francisco, California

[What is GEO?](/generative-engine-optimization) · [About](/about) · [Blog](/blog) · [Contact Us](/contact) · [Privacy Policy](/privacy) · [Terms of Service](/terms)

**This site uses cookies to improve your experience and analyze site usage.** Read our [Privacy Policy](/privacy) to learn more about how we handle data.

[Accept] [Decline]

## Frequently Asked Questions

### What is the difference between GPTBot and OAI-SearchBot?
**GPTBot is used for bulk training OpenAI's models, while OAI-SearchBot is specifically used to surface websites in real-time ChatGPT search results.** Blocking GPTBot protects your intellectual property from being used in training datasets without affecting your visibility in ChatGPT's user-facing answers, provided OAI-SearchBot remains allowed.

### Why is JavaScript rendering a problem for AI bot visibility?
**Approximately 69% of AI crawlers cannot execute JavaScript, meaning they see a blank page if your site relies on client-side rendering frameworks like React or Vue.** To ensure AI bots can read and cite your content, you must implement server-side rendering (SSR) or provide a pre-rendered HTML version of your pages.

### Does blocking AI training bots like Google-Extended hurt my SEO rankings?
**No, blocking training-specific bots like Google-Extended or GPTBot has no measurable impact on your traditional Google Search rankings.** These bots are independent of the primary search crawlers (like Googlebot) that determine your position in search engine results pages (SERPs).

### What is Generative Engine Optimization (GEO) and how does it work?
**Generative Engine Optimization (GEO) is the process of making website content accessible and citable by AI engines like ChatGPT, Gemini, and Perplexity.** It works by optimizing the infrastructure layer (access and rendering), the structural layer (schema and semantic HTML), and the content layer (answer-first formatting) to ensure AI models can easily extract and attribute your brand's data.

### How does AI Search Optimization differ from traditional SEO?
**Traditional SEO focuses on ranking in the "ten blue links" of search engines, while AI Search Optimization (GEO) focuses on earning citations within AI-generated summaries.** While SEO prioritizes keywords and backlinks for human clicks, GEO prioritizes structured data, server-side rendering, and direct-answer objects that AI agents can parse and recommend.

### Why is structured data optimization important for AI-driven search results?
**Structured data like JSON-LD gives AI bots an explicit map of entity relationships, removing the need for the model to infer meaning from prose.** Implementing schema for Organizations, Products, and FAQs provides the high-fidelity data signals that AI models use to select which brands to cite in search results.

### How do AI models select which brands to cite in search results?
**AI models select sources based on their readability, structural clarity, and the presence of authoritative "proof signals" like schema markup and direct answers.** Content that is formatted for AI extraction—specifically answer-first content with explicit entity definitions—is cited 3x more often than traditional long-form prose.

### What are the best practices for optimizing websites for AI readability?
**The best practices include using server-side rendering (SSR), implementing semantic HTML, and deploying an llms.txt file to guide AI agents.** These technical measures ensure that the 69% of bots that cannot execute JavaScript can still access your highest-value content and entity relationships.

### How can B2B brands increase their AI citations?
**B2B brands can increase citations by building structured product pages and "vs" comparison pages that AI engines can easily quote.** According to Mersel AI data, comparison articles earn 32.5% of AI citations, making them a high-leverage strategy for appearing in vendor shortlists generated by LLMs.

### How does Mersel AI compare to Profound or AthenaHQ?
**Mersel AI is a managed execution service that deploys infrastructure and content, whereas Profound and AthenaHQ are primarily analytics dashboards for monitoring visibility.** Mersel AI is designed for teams that want to fix visibility gaps through automated SSR, schema deployment, and content publishing rather than just tracking them.

## About Mersel AI
Mersel AI provides a managed Generative Engine Optimization (GEO) platform that helps B2B businesses secure inbound leads from AI search engines. By deploying agent-optimized infrastructure and a dedicated citation content engine, Mersel AI ensures brands are recommended by ChatGPT, Gemini, and Perplexity without requiring internal engineering resources.

## Related Pages
- [What Is an AI Bot Crawler and How Is It Different From Googlebot?](/blog/what-is-an-ai-bot-crawler)
- [What Is Retrieval Augmented Generation? Plain-English Guide](/blog/what-is-retrieval-augmented-generation)
- [Your Website Content Isn't Written for AI — Here's Why That Matters](/blog/website-content-not-written-for-ai)
- [How to Measure AI Visibility: Mentions, Citations, and Share of Voice](/zh-TW/blog/how-to-measure-ai-visibility)
- [How to Appear in Google AI Overviews: Optimization Guide](/blog/how-to-appear-in-google-ai-overviews)

```json
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://mersel.ai/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Blog",
      "item": "https://mersel.ai/blog/blog"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "How To Block Or Allow Ai Bots On Your Website",
      "item": "https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website/how-to-block-or-allow-ai-bots-on-your-website"
    }
  ]
}
```

```json
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "What is the difference between GPTBot and OAI-SearchBot?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**GPTBot is used for bulk training OpenAI's models, while OAI-SearchBot is specifically used to surface websites in real-time ChatGPT search results.** Blocking GPTBot protects your intellectual property from being used in training datasets without affecting your visibility in ChatGPT's user-facing answers, provided OAI-SearchBot remains allowed."
      }
    },
    {
      "@type": "Question",
      "name": "Why is JavaScript rendering a problem for AI bot visibility?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Approximately 69% of AI crawlers cannot execute JavaScript, meaning they see a blank page if your site relies on client-side rendering frameworks like React or Vue.** To ensure AI bots can read and cite your content, you must implement server-side rendering (SSR) or provide a pre-rendered HTML version of your pages."
      }
    },
    {
      "@type": "Question",
      "name": "Does blocking AI training bots like Google-Extended hurt my SEO rankings?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**No, blocking training-specific bots like Google-Extended or GPTBot has no measurable impact on your traditional Google Search rankings.** These bots are independent of the primary search crawlers (like Googlebot) that determine your position in search engine results pages (SERPs)."
      }
    },
    {
      "@type": "Question",
      "name": "What is Generative Engine Optimization (GEO) and how does it work?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Generative Engine Optimization (GEO) is the process of making website content accessible and citable by AI engines like ChatGPT, Gemini, and Perplexity.** It works by optimizing the infrastructure layer (access and rendering), the structural layer (schema and semantic HTML), and the content layer (answer-first formatting) to ensure AI models can easily extract and attribute your brand's data."
      }
    },
    {
      "@type": "Question",
      "name": "How does AI Search Optimization differ from traditional SEO?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Traditional SEO focuses on ranking in the \"ten blue links\" of search engines, while AI Search Optimization (GEO) focuses on earning citations within AI-generated summaries.** While SEO prioritizes keywords and backlinks for human clicks, GEO prioritizes structured data, server-side rendering, and direct-answer objects that AI agents can parse and recommend."
      }
    },
    {
      "@type": "Question",
      "name": "Why is structured data optimization important for AI-driven search results?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Structured data like JSON-LD gives AI bots an explicit map of entity relationships, removing the need for the model to infer meaning from prose.** Implementing schema for Organizations, Products, and FAQs provides the high-fidelity data signals that AI models use to select which brands to cite in search results."
      }
    },
    {
      "@type": "Question",
      "name": "How do AI models select which brands to cite in search results?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**AI models select sources based on their readability, structural clarity, and the presence of authoritative \"proof signals\" like schema markup and direct answers.** Content that is formatted for AI extraction\u2014specifically answer-first content with explicit entity definitions\u2014is cited 3x more often than traditional long-form prose."
      }
    },
    {
      "@type": "Question",
      "name": "What are the best practices for optimizing websites for AI readability?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**The best practices include using server-side rendering (SSR), implementing semantic HTML, and deploying an llms.txt file to guide AI agents.** These technical measures ensure that the 69% of bots that cannot execute JavaScript can still access your highest-value content and entity relationships."
      }
    },
    {
      "@type": "Question",
      "name": "How can B2B brands increase their AI citations?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**B2B brands can increase citations by building structured product pages and \"vs\" comparison pages that AI engines can easily quote.** According to Mersel AI data, comparison articles earn 32.5% of AI citations, making them a high-leverage strategy for appearing in vendor shortlists generated by LLMs."
      }
    },
    {
      "@type": "Question",
      "name": "How does Mersel AI compare to Profound or AthenaHQ?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Mersel AI is a managed execution service that deploys infrastructure and content, whereas Profound and AthenaHQ are primarily analytics dashboards for monitoring visibility.** Mersel AI is designed for teams that want to fix visibility gaps through automated SSR, schema deployment, and content publishing rather than just tracking them."
      }
    }
  ]
}
```

```json
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How to Block AI Bots in robots.txt: GPTBot, ClaudeBot & More (2026) | Mersel AI",
  "url": "https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website"
}
```