Is robots.txt Blocking AI on Your Site?

Learn how your robots.txt file may be preventing AI assistants from accessing and citing your content.

If you're seeing this page, our analysis detected that your robots.txt file is blocking crawlers from accessing your site. This may be preventing both our analysis tool and AI assistants like ChatGPT, Claude, and Perplexity from reading your content.

The Problem: Overly Restrictive robots.txt

Your robots.txt file tells crawlers which parts of your site they're allowed to access. However, if configured too restrictively, it can block legitimate AI assistants from accessing your content.

Common blocking patterns we detected:

Wildcard Disallow: Disallow: / under User-agent: *
Specific AI Bot Blocking: Disallow rules for GPTBot, ClaudeBot, etc.
Spyglasses Blocking: Disallow rules specifically for our crawler

Why This Matters

When AI assistants respect your robots.txt and find they're blocked:

ChatGPT cannot cite your content in responses
Claude cannot reference your information when answering questions
Perplexity cannot include you in search results
Google Gemini skips your pages in AI-powered overviews

Unlike malicious crawlers, AI assistants from major companies respect robots.txt. Blocking them removes you from the future of search.

How to Check Your robots.txt

View Your Current File

Visit: https://yoursite.com/robots.txt

Look for problematic patterns:

❌ Problem: Blocking Everything

User-agent: *
Disallow: /

This blocks all crawlers from all pages. Unless you intentionally want to hide your site from all search engines and AI, this is too restrictive.

❌ Problem: Blocking AI Assistants

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

This specifically prevents AI assistants from accessing your content.

❌ Problem: Blocking Analysis Tools

User-agent: Spyglasses
Disallow: /

This prevents us (and similar tools) from analyzing your AI visibility.

How to Fix It

Option 1: Allow All Crawlers (Recommended)

If you don't have sensitive content to protect:

User-agent: *
Allow: /

Or simply have an empty robots.txt or no file at all—this allows everything.

Option 2: Selectively Allow AI Assistants

If you need to block some crawlers but want AI visibility:

# Allow major AI assistants
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Spyglasses
Allow: /

# Block others
User-agent: *
Disallow: /

Note: Allow directives must come before Disallow for the same user agent.

Option 3: Allow Most, Block Sensitive Areas

If you have specific pages to protect:

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /internal/
Allow: /

This allows access to public content while protecting sensitive areas.

AI Crawler User Agents to Allow

Here are the major AI assistants you should consider allowing:

AI Assistant	User Agent	Company
ChatGPT	`GPTBot`	OpenAI
ChatGPT (Citations)	`ChatGPT-User`	OpenAI
Claude	`ClaudeBot`	Anthropic
Claude (Web)	`anthropic-ai`	Anthropic
Perplexity	`PerplexityBot`	Perplexity
Google Gemini	`Google-Extended`	Google
Bing Copilot	`Bingbot`	Microsoft
Spyglasses	`Spyglasses`	Spyglasses

Comprehensive AI Allowlist

# OpenAI (ChatGPT)
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Anthropic (Claude)
User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

# Perplexity
User-agent: PerplexityBot
Allow: /

# Google (Gemini, AI Overviews)
User-agent: Google-Extended
Allow: /

User-agent: Googlebot
Allow: /

# Microsoft (Copilot)
User-agent: Bingbot
Allow: /

# Spyglasses
User-agent: Spyglasses
Allow: /

# Allow general crawlers
User-agent: *
Allow: /

Testing Your robots.txt

1. Validate Syntax

Use Google's robots.txt Tester:

Visit Google Search Console
Go to robots.txt Tester
Enter your robots.txt content
Test specific user agents and URLs

2. Test Specific User Agents

Use our testing tool or manually test:

# Test if GPTBot is allowed
curl -A "GPTBot" https://yoursite.com/robots.txt
 
# Test a specific page
curl -A "GPTBot" https://yoursite.com/about

3. Request a New Analysis

After updating your robots.txt:

Return to Spyglasses AI Visibility Report
Request a new analysis
We'll verify that crawlers can now access your site

Common Scenarios

E-commerce Sites

User-agent: *
# Block cart and checkout (private)
Disallow: /cart/
Disallow: /checkout/
# Block search results (duplicate content)
Disallow: /search
# Allow everything else including product pages
Allow: /

SaaS/Software Companies

User-agent: *
# Block admin and user dashboards
Disallow: /app/
Disallow: /dashboard/
# Allow public marketing pages, docs, blog
Allow: /

Content/Media Sites

User-agent: *
# Allow everything - you want maximum visibility
Allow: /

Local Businesses

User-agent: *
# Allow everything - AI citations drive local discovery
Allow: /

Best Practices

Do's

✅ Do allow AI assistants unless you have a specific reason not to
✅ Do use specific Disallow rules for sensitive areas
✅ Do test your robots.txt after changes
✅ Do monitor crawler behavior in your analytics
✅ Do keep your robots.txt simple and maintainable

Don'ts

❌ Don't use Disallow: / under User-agent: * unless intentional
❌ Don't block AI crawlers without understanding the impact
❌ Don't use robots.txt for access control (use authentication instead)
❌ Don't block content you want to be discovered
❌ Don't copy robots.txt from other sites without understanding it

Understanding Crawler Respect Levels

Not all crawlers respect robots.txt equally:

Crawler Type	Respects robots.txt
Major AI Assistants (GPT, Claude, etc.)	✅ Yes, strictly
Search Engines (Google, Bing)	✅ Yes, strictly
Malicious Bots	❌ Often ignore it
Aggressive Scrapers	❌ Often ignore it

Key Insight: The crawlers you want to access your site (AI assistants, search engines) respect robots.txt. Malicious crawlers often ignore it. Blocking legitimate crawlers only hurts your visibility.

What About AI Training?

Some companies have added robots.txt rules to prevent AI training:

User-agent: GPTBot
Disallow: /

Important distinction:

GPTBot: Used by OpenAI for both training AND ChatGPT responses
ChatGPT-User: Used only for ChatGPT browsing (not training)

If you want to block training but allow ChatGPT citations:

# Block training
User-agent: GPTBot
Disallow: /

# Allow ChatGPT responses
User-agent: ChatGPT-User
Allow: /

However, consider that being cited in AI responses is valuable for brand awareness and traffic, even if your content contributes to training.

Monitoring and Maintenance

Set Up Alerts

Monitor robots.txt access in your web analytics
Set up alerts for unexpected changes to robots.txt
Track AI crawler traffic to verify access

Regular Reviews

Review your robots.txt quarterly
Update when adding new site sections
Remove obsolete rules
Test after any changes

Analytics Integration

Track AI crawler visits:

Set up custom segments in Google Analytics
Filter by user agent for AI crawlers
Monitor pages accessed and patterns
Verify crawlers respect your rules