---
title: AI Bot robots.txt Guide: Block vs. Allow GPTBot & ClaudeBot | Mersel AI
site: Mersel AI
site_url: https://mersel.ai
description: A strategic guide on configuring robots.txt to protect intellectual property from AI training crawlers while maintaining visibility in AI search engines like ChatGPT and Perplexity.
page_type: blog
url: https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website
canonical_url: https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website
language: en
author: Mersel AI
breadcrumb: Home > Blog > AI Bot robots.txt Guide
date_modified: 2025-05-22
---

> AI-referred traffic converts 4.4x better than standard organic search, yet 27% of B2B SaaS and ecommerce sites accidentally block LLM crawlers at the CDN layer. With 69% of AI crawlers unable to execute JavaScript and traditional search volume projected to drop 25% by 2026, proper robots.txt configuration is critical for survival. Implementing a strategic framework can increase Category Share of Voice from 3.1% to 10.8% in as little as 92 days while protecting proprietary data from training crawlers.

# AI Bot robots.txt Guide: Block vs. Allow GPTBot & ClaudeBot

**Article Metadata**
* **Author:** Mersel AI Team
* **Date:** March 13, 2026
* **Reading Time:** 16 min read

**Navigation and Actions**
* Platform | Language
* [Home](/) | [Blog](/blog) | [Pricing](/pricing)
* [Login](https://app.mersel.ai) | Book a Call | Book an Audit Call | Book a Free Call

| Mersel AI Platform Solutions | Description |
| :--- | :--- |
| [Cite - Content engine](/cite) | Your dedicated website section that brings leads. |
| [AI visibility analytics](/platform/visibility-analytics) | See which AI platforms visit your site and mention your brand. |
| [Agent-optimized pages](/platform/ai-optimized-pages) | Show AI a version of your site built to get recommended. |

**Agent-optimized pages Traffic Data**

| AI Access Metric | Current Status |
| :--- | :--- |
| AI Visits Today | 3 |
| Optimized Bots | GPTBotOptimized, ClaudeBotOptimized, PerplexityBotOptimized |
| Environment | Chrome 122Original |

On this page:

**Block AI training crawlers and allow AI search crawlers to establish the necessary strategic framework for brand visibility.** This single distinction is the entire strategic framework. Blanket blocking removes your brand from ChatGPT and Perplexity results entirely, while blanket allowing hands your proprietary content to model training datasets with no attribution, no backlinks, and no referral traffic in return.

Active AI bots have doubled since August 2023, and Cloudflare, which protects roughly 20% of all websites globally, began blocking AI crawlers by default on new domains in 2024. Many technical SEO teams have perfectly configured `robots.txt` files that are being silently overridden at the CDN layer. The result is accidental invisibility in the exact AI systems your buyers use to build their vendor shortlists.

This guide provides the exact `robots.txt` configuration to implement today, a step-by-step process for auditing your CDN and rendering stack, and a clear framework for when to use `llms.txt` to further structure your content for AI extraction.

## Key Takeaways

| Focus Area | Key Finding | Source/Context |
| :--- | :--- | :--- |
| Bot Differentiation | GPTBot (training) and OAI-SearchBot (search) are distinct; blocking one has zero effect on the other. | OpenAI |
| CDN Blocking | 27% of B2B SaaS and ecommerce sites block LLM crawlers via CDN-level rules unknowingly. | ziptie.dev |
| JavaScript Rendering | 69% of AI crawlers cannot execute JavaScript, leading to blank pages for client-side rendered sites. | Vercel & MERJ |
| SEO & Visibility | Blocking GPTBot has no impact on Google rankings, but blocking OAI-SearchBot removes sites from ChatGPT. | Playwire |
| Traffic Conversion | AI-referred traffic converts 4.4x better than standard organic search results. | Superlines |
| llms.txt Adoption | Only 10% of domains utilize llms.txt to guide AI agents toward high-value content. | Ahrefs |

Training crawlers and search crawlers represent distinct bots from the same company, such as OpenAI's GPTBot for training and OAI-SearchBot for live search results. Blocking one bot has zero effect on the other. Research from ziptie.dev indicates that approximately 27% of B2B SaaS and ecommerce websites block major LLM crawlers through CDN-level rules, often without the site owner knowing it.

Technical rendering remains a significant hurdle as 69% of AI crawlers cannot execute JavaScript, according to research by Vercel and MERJ. Sites relying on client-side rendering appear as blank pages to these bots regardless of robots.txt settings. While blocking GPTBot has no measurable impact on Google Search rankings per Playwire publisher network analysis, blocking OAI-SearchBot entirely removes a brand from ChatGPT search answers.

AI-referred traffic provides a high-value pipeline source, converting 4.4x better than standard organic search according to data aggregated by Superlines. Despite this potential, Ahrefs reports that llms.txt adoption sits at only 10% of domains. Implementing an llms.txt file serves as a zero-risk, low-effort signal that effectively guides AI agents toward a brand's highest-value content.

## Why This Problem Keeps Getting Worse

**Traditional search engine volume will drop 25% by 2026 as generative AI platforms absorb informational queries, a shift already visible in current referral data.** Gartner projections indicate this decline while 60% of all Google searches currently end without a click. Organic click-through rates drop by up to 61% when a Google AI Overview appears for a query.

| Search Impact Metric | Data Point |
| :--- | :--- |
| Projected Search Volume Drop (by 2026) | 25% |
| Current Zero-Click Searches | 60% |
| Organic CTR Drop (with AI Overviews) | Up to 61% |

Buyers who click from AI-generated answers are significantly more qualified because they have already consumed an AI-curated summary and evaluated alternatives. You only capture this high-intent traffic if AI search bots can read and cite your content. Most organizations fail at this for three reasons that have nothing to do with content quality.

1. **Reason 1: Treating all AI bots as one entity.** Brand managers often add a blanket `Disallow: /` for every user agent with "AI" or "Bot" in the name after reading headlines about scrapers. This blocks `OAI-SearchBot` alongside `GPTBot`, removing the brand from ChatGPT's live search results entirely.
2. **Reason 2: CDN overrides of `robots.txt` configurations.** Cloudflare's AI blocking feature operates at the edge, returning a 403 Forbidden error to AI crawlers before the request reaches the origin server. A perfectly configured `robots.txt` is irrelevant when the firewall never lets the bot through.
3. **Reason 3: Invisibility due to rendering limitations.** Major AI crawlers do not execute JavaScript, unlike Googlebot which runs a full Chromium engine. A React or Vue single-page application delivers a blank `<body>` to AI bots, meaning your content does not exist for them. To understand how bots discover pages, see our guide on [what an AI bot crawler actually is and how it works](/blog/what-is-an-ai-bot-crawler).

## The Core Framework: Training Crawlers vs. Search Crawlers

**Major AI companies operate at least two distinct crawlers with completely separate functions to manage content ingestion and search visibility.** Training crawlers absorb content into model weights with no attribution, whereas search crawlers retrieve live content to cite in user-facing answers. Blocking the wrong category produces the opposite of the intended effect. Confusing these two systems is the root cause of most AI visibility failures.

**OpenAI uses OAI-SearchBot to surface websites in ChatGPT search results, while GPTBot crawls content for model training.** Sites that opt out of `OAI-SearchBot` will not be shown in ChatGPT search answers. OpenAI confirms that blocking `GPTBot` is entirely independent from search visibility. Technical documentation from xseek.io notes that webmasters can block `GPTBot` to protect intellectual property while allowing `OAI-SearchBot` to remain visible.

### AI Crawler Classification and Recommended Actions

| User-Agent | Purpose | Recommended Action |
| :--- | :--- | :--- |
| OAI-SearchBot | OpenAI Search & Retrieval | Allow |
| ChatGPT-User | User-Triggered Fetches | Allow |
| Claude-User | Anthropic Real-Time Fetches | Allow |
| Claude-SearchBot | Anthropic Search | Allow |
| PerplexityBot | Perplexity AI Search | Allow |
| YouBot | You.com Search | Allow |
| GPTBot | OpenAI Model Training | Block |
| ClaudeBot | Anthropic Model Training | Block |
| Google-Extended | Google Generative AI Training | Block |
| CCBot | Common Crawl (Open-source LLMs) | Block |
| Meta-ExternalAgent | Meta/Facebook Training | Block |
| FacebookBot | Meta/Facebook Training | Block |
| Bytespider | ByteDance/TikTok Training | Block |

### Step 1: Configure Your `robots.txt` with Selective Access

**Configure your robots.txt file at the domain root to explicitly separate search bots from training bots.** This file must be placed at `https://yourdomain.com/robots.txt`. Changes typically take approximately 24 hours for OpenAI's systems to process and adjust search behavior. For Anthropic, avoid using deprecated strings like `Claude-Web` and `anthropic-ai`, as these no longer block the active `ClaudeBot`.

```

# --------------------------------------------------------

# 1. ALLOW AI Search & Retrieval (For GEO / Visibility)

# --------------------------------------------------------

# OpenAI Search and User-Triggered Fetches
User-agent: OAI-SearchBot
Allow: /
User-agent: ChatGPT-User
Allow: /

# Anthropic Real-Time Fetches
User-agent: Claude-User
Allow: /
User-agent: Claude-SearchBot
Allow: /

# Perplexity AI Search
User-agent: PerplexityBot
Allow: /

# You.com Search
User-agent: YouBot
Allow: /

# --------------------------------------------------------

# 2. BLOCK AI Bulk Training Data Crawlers (IP Protection)

# --------------------------------------------------------

# OpenAI Training
User-agent: GPTBot
Disallow: /

# Anthropic Training
User-agent: ClaudeBot
Disallow: /

# Google Generative AI Training (Does not impact Googlebot)
User-agent: Google-Extended
Disallow: /

# Common Crawl (Used by many open-source LLMs)
User-agent: CCBot
Disallow: /

# Meta/Facebook Training
User-agent: Meta-ExternalAgent
Disallow: /
User-agent: FacebookBot
Disallow: /

# ByteDance/TikTok
User-agent: Bytespider
Disallow: /

# --------------------------------------------------------

# 3. Standard Search Engines (Unchanged)

# --------------------------------------------------------
User-agent: *
Allow: /
```

### Step 2: Audit and Disable CDN-Level AI Blocking

**Audit and disable CDN-level AI blocking to prevent your robots.txt settings from being silently overridden.** This oversight accounts for a significant portion of accidental AI invisibility. Cloudflare users should navigate to Security > Bots and set "Block AI training bots" to allow, or use WAF rules to allowlist `OAI-SearchBot` and `PerplexityBot`. Disable the "Manage your robots.txt" setting in Cloudflare to ensure the origin server's file takes precedence.

### Audit CDN and Managed Platform Blocks

Approximately 27% of B2B SaaS and ecommerce websites accidentally block major LLM crawlers at the CDN layer according to research by ziptie.dev. If your site utilizes Cloudflare, Fastly, or managed SaaS platforms like Shopify or Wix for edge security, you must audit these settings immediately. Do not assume a functional `robots.txt` file ensures access if the CDN layer is actively filtering bot traffic.

### Step 3: Verify Bot Authentication Against IP Ranges

Malicious scrapers frequently spoof user agent strings, meaning a sophisticated `robots.txt` file alone cannot provide a complete defense against unauthorized harvesting. Both OpenAI and Anthropic publish JSON feeds of their legitimate IP address ranges to help administrators distinguish real bots from spoofed requests. You can integrate these feeds into WAF configurations or bot management platforms to authenticate legitimate AI search crawlers.

| Provider | Verification Feed | Application |
| :--- | :--- | :--- |
| OpenAI | `openai.com/gptbot.json` | Authenticate legitimate GPTBot crawlers |
| OpenAI | `openai.com/searchbot.json` | Authenticate legitimate OAI-SearchBot crawlers |

### Step 4: Fix the JavaScript Rendering Problem

Research by Vercel and MERJ reveals that 69% of AI crawlers cannot execute JavaScript, making client-side rendering a significant barrier to visibility. If your marketing site, product pages, or blog rely on React, Vue, or Angular for client-side rendering, an AI crawler may only see a blank `<body>`. This invisibility persists regardless of your `robots.txt` configuration, preventing your content from being indexed or cited.

| Research Source | Metric | Impact |
| :--- | :--- | :--- |
| ziptie.dev | 27% of B2B SaaS/Ecommerce | Accidental blocking at the CDN layer |
| Vercel & MERJ | 69% of AI Crawlers | Inability to execute JavaScript |
| Ahrefs | 10% of Domains | Current adoption rate of `llms.txt` |

Server-side rendering (SSR) provides the necessary fix for JavaScript execution limitations. Frameworks like Next.js and Nuxt deliver fully rendered HTML in the initial response, allowing AI crawlers to parse content as simple HTTP clients. To further optimize for AI visibility, implement the following structural elements:

*   **Semantic HTML Structure:** Use tags like `<article>`, `<header>`, `<main>`, and `<section>` instead of nested `<div>` structures.
*   **JSON-LD Schema Markup:** Implement markup for Organization, Product, FAQPage, and Article types to provide an explicit map of entity relationships.
*   **Technical Walkthrough:** For a complete guide on these requirements, see our article on [how to structure your website for AI visibility](/blog/how-to-structure-my-website-for-ai-visibility).

### Step 5: Deploy an llms.txt File

Deploying an `llms.txt` file is a low-effort, zero-risk strategy that guides AI agents toward your highest-value pages. Place this Markdown file at `yourdomain.com/llms.txt` to provide clear instructions for AI crawlers. While direct citation correlation is still under study, Ahrefs reports that only 10% of domains have adopted this standard, making early implementation a powerful differentiation signal for your brand.

```markdown

# [Brand Name] - AI Agent Documentation

> [Brand Name] is a leading provider of [Category] for [Target Audience].
```

## Core Products
- [Product A]: Use case description. [/product-a]
- [Product B]: Use case description. [/product-b]

## Key Comparisons and Use Cases
- [Brand] vs [Competitor]: [/comparisons/competitor]
- Use Cases: [/use-cases]

## Contact
- Pricing: [/pricing]
- Sales: [/contact]

A secondary `llms-full.txt` file concatenates all critical documentation into a single machine-readable file, which is particularly useful for AI agents operating within limited context windows. This consolidated file ensures that AI bots can ingest comprehensive site data efficiently without losing context during the retrieval process.

The technical optimization sequence flows from access to rendering to structure, as each layer depends on the successful implementation of the one before it. This hierarchy ensures that bots can reach, read, and understand site content without being hindered by technical barriers:
- **CDN Access:** You cannot benefit from `llms.txt` if AI bots are blocked at the CDN before they reach it.
- **JavaScript Rendering:** You cannot benefit from well-structured schema if the JavaScript rendering layer is hiding your content from bots.
- **Robots.txt:** You cannot benefit from any optimizations if your `robots.txt` is blocking the search crawlers you need to allow.

This infrastructure work sits at the core of generative engine optimization. For a broader view of how these signals combine to drive AI citation visibility, the Mersel AI guide on [generative engine optimization](https://www.mersel.ai/generative-engine-optimization) covers the full framework.

## When DIY Implementation Falls Short

Copying a `robots.txt` configuration is straightforward, but the subsequent technical execution presents significant challenges for most organizations. Implementing a complete AI crawler strategy requires deep technical integration that extends far beyond simple file edits, involving complex server-side adjustments and ongoing maintenance to ensure visibility.

*   **CDN audit depth requires backend engineering expertise to identify silent blocks at the edge.** Most marketing teams lack direct access to Cloudflare WAF rules or knowledge of specific managed security rules running at the edge. Confirming that a 403 error is silently blocking `PerplexityBot` necessitates server-level logging and specialized technical oversight to resolve.
*   **Transitioning from client-side rendering to Server-Side Rendering (SSR) is a major development project rather than a simple configuration change.** Organizations with active sprint backlogs often deprioritize this essential architecture work due to limited engineering bandwidth. This neglect leaves the entire content investment invisible to AI crawlers that cannot execute complex JavaScript.
*   **Maintaining an accurate list of active AI user agents requires constant monitoring as bot strings change frequently.** For example, Anthropic deprecated `Claude-Web` without a broad announcement, and new crawlers emerge as AI platforms expand search features. Most SEO teams lack the established processes necessary to keep blocklists and allowlists current.
*   **Verifying system performance requires a closed-loop monitoring process involving server logs and traffic analytics.** Teams must review server logs for bot-specific 200 versus 403 response codes, cross-reference data against AI citation tracking, and monitor AI referral traffic within GA4. Without these checks, organizations often assume their configuration functions correctly while AI bots remain silently blocked.

## The Managed Path: What Full-Stack AI Crawler Optimization Looks Like

Mersel AI bridges the gap between `robots.txt` configuration and production visibility by deploying an infrastructure layer behind existing websites. This system delivers a clean, server-side rendered, schema-rich version of the brand to AI crawlers like `OAI-SearchBot` and `PerplexityBot`. Human visitors experience no changes, and existing SEO, design, and UX remain untouched without requiring engineering sprints.

* **Explicit Entity Definitions**: Product relationships are mapped using JSON-LD.
* **Maintained llms.txt**: The system configures and maintains the `llms.txt` file automatically.
* **Server-Side Rendering**: AI bots receive optimized, high-fidelity data versions of the site.

Mersel AI is a fully managed service rather than a self-serve dashboard, which is an honest limitation for teams requiring direct control. It is specifically built for those who want infrastructure and content published without pulling engineers or content managers into a new discipline they do not have bandwidth to own. Organizations requiring direct UI access for real-time prompt monitoring may find other platforms more suitable.

| Feature | Mersel AI | Profound / AthenaHQ |
| :--- | :--- | :--- |
| Service Model | Fully Managed Service | Self-Serve Dashboard |
| Primary Focus | Infrastructure & Content Publishing | Real-Time Prompt Monitoring |
| Access Type | Managed Deployment | Direct UI Access |

The Mersel content engine maps the actual prompts buyers are typing, such as "best alternative to [competitor] for a Series A SaaS company," to deliver publish-ready articles directly into the CMS. This engine utilizes a feedback loop from Google Search Console and GA4 to ensure articles are updated based on actual citation performance rather than assumptions.

A mid-market B2B fintech client (unified finance OS) with approximately 20 employees increased its Category Share of Voice from 3.1% to 10.8% in 92 days. This managed approach generated 94 AI citations across competitive prompts and influenced 20% of all demo requests via AI search. Further details on pipeline translation are available in the [AI traffic analysis](/blog/how-to-measure-ai-visibility) guide.

## FAQ

**Does blocking GPTBot hurt my Google Search rankings?**

**No, blocking GPTBot has no impact on your Google Search rankings because it is an OpenAI training crawler entirely separate from Googlebot.** Publisher network analysis reviewed by Playwire confirms that Google rankings are determined exclusively by Googlebot's crawl and Google's specific ranking algorithm. You can block `GPTBot` and `Google-Extended` simultaneously while maintaining full Google Search visibility.

| Crawler | Function | Google Ranking Impact |
| :--- | :--- | :--- |
| `GPTBot` | OpenAI Training | None |
| `Google-Extended` | Google AI Training | None |
| `Googlebot` | Search Indexing | Direct |

**What happens if I block OAI-SearchBot by accident?**

**Sites that are opted out of OAI-SearchBot will not be shown in ChatGPT search answers, according to OpenAI's developer documentation.** This exclusion occurs even if `GPTBot` has previously crawled and indexed your content for model training purposes. Because these two systems operate independently, accidental blocking of `OAI-SearchBot` represents one of the most common and highest-impact AI visibility errors for modern brands.

**How do I know if my Cloudflare settings are blocking AI search bots?**

**You can identify AI search bot blocking by checking the "Control AI Crawlers" section in the Cloudflare Security dashboard and reviewing server logs for 403 response codes.** Research from ziptie.dev indicates that approximately 27% of B2B SaaS and ecommerce sites unknowingly block major LLM crawlers at the CDN layer. Perform a high-priority audit for the following crawlers even if your `robots.txt` is correctly configured:

*   `OAI-SearchBot`
*   `PerplexityBot`
*   `Claude-User`

**Do AI bots respect robots.txt at all?**

**Major AI companies like OpenAI and Anthropic publicly commit to honoring robots.txt directives for their named crawlers and provide JSON feeds of legitimate IP address ranges for verification.** While these reputable firms follow the honor system, malicious scrapers frequently spoof user agent strings and ignore `robots.txt` entirely. For robust content protection, utilize bot management platforms and WAF-level IP range authentication rather than relying solely on `robots.txt` directives.

**Is llms.txt worth implementing if adoption is still low?**

**Yes, implementing llms.txt is a zero-risk, low-effort task that takes less than an hour and provides a structured entry point for AI agents and LLM-powered search tools.** According to Ahrefs, only 10% of domains have currently implemented `llms.txt`, making it a meaningful differentiation signal for your site. While the direct correlation to increased citation frequency is still under study, providing AI systems with a clean map of your content hierarchy has no downside.

1. **Efficiency**: Zero-risk implementation requiring less than one hour of setup.
2. **Accessibility**: Serves as a structured entry point for evolving AI agents.
3. **Differentiation**: Signals technical readiness to AI engines ahead of 90% of competitors.

## Sources

1. **Gartner**: Search Engine Volume Will Drop 25% by 2026
2. **Stronger Content**: Gartner Search Engine Volume Decrease
3. **Ahrefs**: AI Bot Block Rates
4. **Superlines**: AI Search Statistics
5. **Ziptie.dev**: Technical SEO for AI Crawlability
6. **Playwire**: AI Scraping vs. Traditional SEO Crawling
7. **Vercel**: The Rise of the AI Crawler
8. **SearchEngineWorld**: Tracking OpenAI ChatGPT Bots
9. **OpenAI**: Developer Documentation on Bots
10. **Almcorp**: Anthropic Claude Bots robots.txt Strategy
11. **Lowtouch.ai**: Cloudflare AI Data War
12. **llmrefs.com**: Cloudflare Blocks AI Crawlers
13. **Searchviu**: AI Crawlers JavaScript Rendering
14. **Ahrefs**: What Is llms.txt?
15. **llmstxt.org**: The llms.txt Standard

## Ready to See Your Real AI Traffic?

**Mersel AI identifies the specific technical barriers preventing AI search bots from citing your content, even when your robots.txt is configured correctly.** Most teams discover the actual problem through specific technical evaluations rather than basic configuration files that appear to be functioning properly. These critical diagnostic areas include:

* CDN audit
* Rendering check
* Citation tracking

[Book a call with the Mersel AI team](/contact) to see exactly which AI crawlers are reaching your site, which prompts your buyers are using right now, and what is standing between your content and AI citations to resolve current visibility issues.

## Related Reading

- How to Translate Human Website Content for AI Crawlers
- Do I Need Code Changes for Generative Engine Optimization?
- How to Update Your Knowledge Graph for LLMs

## Related Posts

### What Is an AI Bot Crawler and How Is It Different From Googlebot?
**AI bot crawlers and Googlebot serve fundamentally different purposes regarding how they ingest, process, and utilize website data.** Published on Mar 18 by GEO, this article explores the taxonomy and behavior gaps between these technologies. It provides actionable insights on how to optimize your site for both traditional search engines and AI-driven discovery engines. [Read more](/blog/what-is-an-ai-bot-crawler)

### What Is Retrieval Augmented Generation? Plain-English Guide
**Retrieval Augmented Generation (RAG) is the core technical architecture that powers AI-generated answers by combining large language models with external data retrieval.** This Mar 18 guide by GEO explains why RAG is critical for modern SEO. It details how the architecture works and provides specific strategies to ensure your content is optimized for RAG-based AI engines. [Read more](/blog/what-is-retrieval-augmented-generation)

### AI Is Showing Wrong Info About Your Product: How to Fix It
**AI hallucinations cost businesses $67.4B in 2024 by presenting incorrect pricing, fake features, and fabricated product limits to potential customers.** These inaccuracies can silently damage your sales pipeline. This Mar 18 post by GEO outlines the steps necessary to identify and fix product misinformation when AI engines display fabricated data about your brand. [Read more](/blog/what-happens-when-ai-gets-product-information-wrong)

### On this page
- Key Takeaways
- Why This Problem Keeps Getting Worse
- The Core Framework: Training Crawlers vs. Search Crawlers
- Step-by-Step Implementation Guide
- Core Products
- Key Comparisons and Use Cases
- Contact
- When DIY Implementation Falls Short
- The Managed Path: What Full-Stack AI Crawler Optimization Looks Like
- FAQ
- Sources
- Ready to See Your Real AI Traffic?
- Related Reading

Mersel AI helps B2B businesses generate inbound leads from AI search and Google. Our platform is supported by industry-leading infrastructure:
- [NVIDIA Inception](https://www.cloudflare.com/forstartups/)
- [Cloudflare for Startups](https://www.cloudflare.com/forstartups/)
- [Google Cloud for Startups](https://cloud.google.com/startup)

#### Learn
- [What is GEO?](/generative-engine-optimization)

#### Company
- [About](/about)
- [Blog](/blog)
- [Pricing](/pricing)
- [FAQs](/faqs)
- [Contact Us](/contact)
- [Login](/login)

#### Legal
- [Privacy Policy](/privacy)
- [Terms of Service](/terms)

#### Contact
- San Francisco, California

[What is GEO?](/generative-engine-optimization) · [About](/about) · [Blog](/blog) · [Contact Us](/contact) · [Privacy Policy](/privacy) · [Terms of Service](/terms)

**Cookie Notice**
This site uses cookies to improve your experience and analyze site usage. Read our [Privacy Policy](/privacy) to learn more.
[Accept] [Decline]

## Frequently Asked Questions

### Does blocking GPTBot hurt my Google Search rankings?
**No, blocking GPTBot has no measurable impact on Google Search rankings because it is a training crawler entirely separate from Googlebot.** Your Google visibility is determined by Google's own ranking algorithm and crawl, which remains unaffected by directives targeting OpenAI's training bots.

### What happens if I block OAI-SearchBot by accident?
**Accidentally blocking OAI-SearchBot will remove your website from ChatGPT's live search results entirely.** While GPTBot is used for model training, OAI-SearchBot powers real-time retrieval; opting out of the latter ensures your brand is never cited in ChatGPT search answers.

### How do I know if my Cloudflare settings are blocking AI search bots?
**You can verify if Cloudflare is blocking bots by checking the Security > Bots section of your dashboard and reviewing server logs for 403 Forbidden errors.** Research shows that 27% of sites unknowingly block AI crawlers at the CDN layer, which overrides any permissions set in your robots.txt file.

### Do AI bots respect robots.txt directives?
**Major AI companies like OpenAI and Anthropic publicly commit to honoring robots.txt directives for their specific named crawlers.** However, because robots.txt is an honor system, it is recommended to use WAF-level IP range authentication to block malicious scrapers that spoof user agent strings.

### Is llms.txt worth implementing if adoption is still low?
**Yes, llms.txt is a zero-risk, low-effort signal that guides AI agents to your highest-value content, even with current adoption at only 10%.** It provides a structured entry point for LLM-powered search tools to understand your site's hierarchy and core product offerings.

### Why can't AI crawlers read my React or Vue website?
**Approximately 69% of AI crawlers cannot execute JavaScript, meaning they only see a blank page when visiting client-side rendered sites.** To be visible to AI bots, websites must implement server-side rendering (SSR) to deliver fully rendered HTML in the initial response.

### What is Generative Engine Optimization and how does it work?
**Generative Engine Optimization (GEO) is a framework for ensuring brand visibility in AI search results by optimizing site infrastructure and content for LLM retrieval.** It involves a sequence of technical improvements including selective robots.txt access, server-side rendering, and the implementation of schema markup and llms.txt files.

### How to monitor brand mentions across leading AI platforms?
**Brand mentions can be monitored using AI visibility analytics that track which AI platforms visit your site and cite your content.** This involves reviewing server logs for specific bot activity and using managed services to cross-reference citations across platforms like ChatGPT and Perplexity.

### How to write FAQs that are frequently cited by AI models?
**To earn AI citations, FAQs should use FAQPage JSON-LD schema markup and provide direct, authoritative answers to high-intent buyer prompts.** Structuring content this way helps AI bots extract key facts and entity relationships more efficiently than through standard prose.

### How do AI models select which brands to cite in search results?
**AI models prioritize brands that provide accessible, server-side rendered content and clear entity definitions via structured data.** Visibility is also influenced by the presence of machine-readable files like llms.txt which help AI agents navigate context-limited windows.

### How does AI Search Optimization differ from traditional SEO?
**AI Search Optimization focuses on bot-specific accessibility and RAG (Retrieval-Augmented Generation) compatibility rather than just keyword rankings.** Unlike traditional SEO, it requires managing separate crawlers for training versus search and solving for the fact that most AI bots cannot process JavaScript.

### Measuring the impact of AI citations on organic traffic?
**The impact of AI citations is measured by tracking AI referral traffic in GA4 and monitoring changes in Category Share of Voice.** Data indicates that users arriving via AI-generated answers are more qualified, leading to conversion rates 4.4x higher than standard search.

### Optimizing content for Google AI Overviews?
**Optimizing for Google AI Overviews requires using semantic HTML and JSON-LD schema to make content easily extractable for Google's generative engine.** This is a vital defensive strategy as organic click-through rates can drop by 61% when an AI Overview is present for a query.

### How does Mersel AI compare to Semrush?
**Mersel AI is a managed infrastructure service that automates AI visibility and rendering, whereas Semrush is a self-serve tool for manual SEO monitoring.** Mersel AI handles the technical deployment of agent-optimized pages and CDN audits, which are not features of traditional SEO dashboards.

## About Mersel AI

Mersel AI helps brands get discovered and recommended by AI search engines. Mersel AI specializes in enhancing brand visibility through AI-driven search optimization. By leveraging advanced techniques, Mersel AI ensures that brands are prominently featured in AI-generated content, facilitating growth and engagement in the digital landscape.

## Related Pages

- [Home](https://mersel.ai/)
- [The Mersel Platform](https://mersel.ai/platform)
- [Blog](https://mersel.ai/blog)
- [Contact Us](https://mersel.ai/contact)
- [Privacy Policy](https://mersel.ai/privacy)

```json
{
  "@context": "https://schema.org",
  "@type": "BreadcrumbList",
  "itemListElement": [
    {
      "@type": "ListItem",
      "position": 1,
      "name": "Home",
      "item": "https://mersel.ai/"
    },
    {
      "@type": "ListItem",
      "position": 2,
      "name": "Blog",
      "item": "https://mersel.ai/blog/blog"
    },
    {
      "@type": "ListItem",
      "position": 3,
      "name": "How To Block Or Allow Ai Bots On Your Website",
      "item": "https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website/how-to-block-or-allow-ai-bots-on-your-website"
    }
  ]
}
```

```json
{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Does blocking GPTBot hurt my Google Search rankings?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**No, blocking GPTBot has no measurable impact on Google Search rankings because it is a training crawler entirely separate from Googlebot.** Your Google visibility is determined by Google's own ranking algorithm and crawl, which remains unaffected by directives targeting OpenAI's training bots."
      }
    },
    {
      "@type": "Question",
      "name": "What happens if I block OAI-SearchBot by accident?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Accidentally blocking OAI-SearchBot will remove your website from ChatGPT's live search results entirely.** While GPTBot is used for model training, OAI-SearchBot powers real-time retrieval; opting out of the latter ensures your brand is never cited in ChatGPT search answers."
      }
    },
    {
      "@type": "Question",
      "name": "How do I know if my Cloudflare settings are blocking AI search bots?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**You can verify if Cloudflare is blocking bots by checking the Security > Bots section of your dashboard and reviewing server logs for 403 Forbidden errors.** Research shows that 27% of sites unknowingly block AI crawlers at the CDN layer, which overrides any permissions set in your robots.txt file."
      }
    },
    {
      "@type": "Question",
      "name": "Do AI bots respect robots.txt directives?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Major AI companies like OpenAI and Anthropic publicly commit to honoring robots.txt directives for their specific named crawlers.** However, because robots.txt is an honor system, it is recommended to use WAF-level IP range authentication to block malicious scrapers that spoof user agent strings."
      }
    },
    {
      "@type": "Question",
      "name": "Is llms.txt worth implementing if adoption is still low?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Yes, llms.txt is a zero-risk, low-effort signal that guides AI agents to your highest-value content, even with current adoption at only 10%.** It provides a structured entry point for LLM-powered search tools to understand your site's hierarchy and core product offerings."
      }
    },
    {
      "@type": "Question",
      "name": "Why can't AI crawlers read my React or Vue website?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Approximately 69% of AI crawlers cannot execute JavaScript, meaning they only see a blank page when visiting client-side rendered sites.** To be visible to AI bots, websites must implement server-side rendering (SSR) to deliver fully rendered HTML in the initial response."
      }
    },
    {
      "@type": "Question",
      "name": "What is Generative Engine Optimization and how does it work?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Generative Engine Optimization (GEO) is a framework for ensuring brand visibility in AI search results by optimizing site infrastructure and content for LLM retrieval.** It involves a sequence of technical improvements including selective robots.txt access, server-side rendering, and the implementation of schema markup and llms.txt files."
      }
    },
    {
      "@type": "Question",
      "name": "How to monitor brand mentions across leading AI platforms?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Brand mentions can be monitored using AI visibility analytics that track which AI platforms visit your site and cite your content.** This involves reviewing server logs for specific bot activity and using managed services to cross-reference citations across platforms like ChatGPT and Perplexity."
      }
    },
    {
      "@type": "Question",
      "name": "How to write FAQs that are frequently cited by AI models?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**To earn AI citations, FAQs should use FAQPage JSON-LD schema markup and provide direct, authoritative answers to high-intent buyer prompts.** Structuring content this way helps AI bots extract key facts and entity relationships more efficiently than through standard prose."
      }
    },
    {
      "@type": "Question",
      "name": "How do AI models select which brands to cite in search results?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**AI models prioritize brands that provide accessible, server-side rendered content and clear entity definitions via structured data.** Visibility is also influenced by the presence of machine-readable files like llms.txt which help AI agents navigate context-limited windows."
      }
    },
    {
      "@type": "Question",
      "name": "How does AI Search Optimization differ from traditional SEO?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**AI Search Optimization focuses on bot-specific accessibility and RAG (Retrieval-Augmented Generation) compatibility rather than just keyword rankings.** Unlike traditional SEO, it requires managing separate crawlers for training versus search and solving for the fact that most AI bots cannot process JavaScript."
      }
    },
    {
      "@type": "Question",
      "name": "Measuring the impact of AI citations on organic traffic?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**The impact of AI citations is measured by tracking AI referral traffic in GA4 and monitoring changes in Category Share of Voice.** Data indicates that users arriving via AI-generated answers are more qualified, leading to conversion rates 4.4x higher than standard search."
      }
    },
    {
      "@type": "Question",
      "name": "Optimizing content for Google AI Overviews?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Optimizing for Google AI Overviews requires using semantic HTML and JSON-LD schema to make content easily extractable for Google's generative engine.** This is a vital defensive strategy as organic click-through rates can drop by 61% when an AI Overview is present for a query."
      }
    },
    {
      "@type": "Question",
      "name": "How does Mersel AI compare to Semrush?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "**Mersel AI is a managed infrastructure service that automates AI visibility and rendering, whereas Semrush is a self-serve tool for manual SEO monitoring.** Mersel AI handles the technical deployment of agent-optimized pages and CDN audits, which are not features of traditional SEO dashboards."
      }
    }
  ]
}
```

```json
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "AI Bot robots.txt Guide: Block vs. Allow GPTBot & ClaudeBot | Mersel AI",
  "url": "https://mersel.ai/blog/how-to-block-or-allow-ai-bots-on-your-website",
  "publisher": {
    "@type": "Organization",
    "name": "Mersel AI"
  }
}
```