Documentation

Is robots.txt Blocking AI on Your Site?

Learn how your robots.txt file may be preventing AI assistants from accessing and citing your content.

If you're seeing this page, our analysis detected that your robots.txt file is blocking crawlers from accessing your site. This may be preventing both our analysis tool and AI assistants like ChatGPT, Claude, and Perplexity from reading your content.

The Problem: Overly Restrictive robots.txt

Your robots.txt file tells crawlers which parts of your site they're allowed to access. However, if configured too restrictively, it can block legitimate AI assistants from accessing your content.

Common blocking patterns we detected:

  • Wildcard Disallow: Disallow: / under User-agent: *
  • Specific AI Bot Blocking: Disallow rules for GPTBot, ClaudeBot, etc.
  • Spyglasses Blocking: Disallow rules specifically for our crawler

Why This Matters

When AI assistants respect your robots.txt and find they're blocked:

  • ChatGPT cannot cite your content in responses
  • Claude cannot reference your information when answering questions
  • Perplexity cannot include you in search results
  • Google Gemini skips your pages in AI-powered overviews

Unlike malicious crawlers, AI assistants from major companies respect robots.txt. Blocking them removes you from the future of search.

How to Check Your robots.txt

View Your Current File

Visit: https://yoursite.com/robots.txt

Look for problematic patterns:

❌ Problem: Blocking Everything

User-agent: *
Disallow: /

This blocks all crawlers from all pages. Unless you intentionally want to hide your site from all search engines and AI, this is too restrictive.

❌ Problem: Blocking AI Assistants

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

This specifically prevents AI assistants from accessing your content.

❌ Problem: Blocking Analysis Tools

User-agent: Spyglasses
Disallow: /

This prevents us (and similar tools) from analyzing your AI visibility.

How to Fix It

If you don't have sensitive content to protect:

User-agent: *
Allow: /

Or simply have an empty robots.txt or no file at all—this allows everything.

Option 2: Selectively Allow AI Assistants

If you need to block some crawlers but want AI visibility:

# Allow major AI assistants
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Spyglasses
Allow: /

# Block others
User-agent: *
Disallow: /

Note: Allow directives must come before Disallow for the same user agent.

Option 3: Allow Most, Block Sensitive Areas

If you have specific pages to protect:

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /internal/
Allow: /

This allows access to public content while protecting sensitive areas.

AI Crawler User Agents to Allow

Here are the major AI assistants you should consider allowing:

AI AssistantUser AgentCompany
ChatGPTGPTBotOpenAI
ChatGPT (Citations)ChatGPT-UserOpenAI
ClaudeClaudeBotAnthropic
Claude (Web)anthropic-aiAnthropic
PerplexityPerplexityBotPerplexity
Google GeminiGoogle-ExtendedGoogle
Bing CopilotBingbotMicrosoft
SpyglassesSpyglassesSpyglasses

Comprehensive AI Allowlist

# OpenAI (ChatGPT)
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Anthropic (Claude)
User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

# Perplexity
User-agent: PerplexityBot
Allow: /

# Google (Gemini, AI Overviews)
User-agent: Google-Extended
Allow: /

User-agent: Googlebot
Allow: /

# Microsoft (Copilot)
User-agent: Bingbot
Allow: /

# Spyglasses
User-agent: Spyglasses
Allow: /

# Allow general crawlers
User-agent: *
Allow: /

Testing Your robots.txt

1. Validate Syntax

Use Google's robots.txt Tester:

  1. Visit Google Search Console
  2. Go to robots.txt Tester
  3. Enter your robots.txt content
  4. Test specific user agents and URLs

2. Test Specific User Agents

Use our testing tool or manually test:

# Test if GPTBot is allowed
curl -A "GPTBot" https://yoursite.com/robots.txt
 
# Test a specific page
curl -A "GPTBot" https://yoursite.com/about

3. Request a New Analysis

After updating your robots.txt:

  1. Return to Spyglasses AI Visibility Report
  2. Request a new analysis
  3. We'll verify that crawlers can now access your site

Common Scenarios

E-commerce Sites

User-agent: *
# Block cart and checkout (private)
Disallow: /cart/
Disallow: /checkout/
# Block search results (duplicate content)
Disallow: /search
# Allow everything else including product pages
Allow: /

SaaS/Software Companies

User-agent: *
# Block admin and user dashboards
Disallow: /app/
Disallow: /dashboard/
# Allow public marketing pages, docs, blog
Allow: /

Content/Media Sites

User-agent: *
# Allow everything - you want maximum visibility
Allow: /

Local Businesses

User-agent: *
# Allow everything - AI citations drive local discovery
Allow: /

Best Practices

Do's

Do allow AI assistants unless you have a specific reason not to
Do use specific Disallow rules for sensitive areas
Do test your robots.txt after changes
Do monitor crawler behavior in your analytics
Do keep your robots.txt simple and maintainable

Don'ts

Don't use Disallow: / under User-agent: * unless intentional
Don't block AI crawlers without understanding the impact
Don't use robots.txt for access control (use authentication instead)
Don't block content you want to be discovered
Don't copy robots.txt from other sites without understanding it

Understanding Crawler Respect Levels

Not all crawlers respect robots.txt equally:

Crawler TypeRespects robots.txt
Major AI Assistants (GPT, Claude, etc.)✅ Yes, strictly
Search Engines (Google, Bing)✅ Yes, strictly
Malicious Bots❌ Often ignore it
Aggressive Scrapers❌ Often ignore it

Key Insight: The crawlers you want to access your site (AI assistants, search engines) respect robots.txt. Malicious crawlers often ignore it. Blocking legitimate crawlers only hurts your visibility.

What About AI Training?

Some companies have added robots.txt rules to prevent AI training:

User-agent: GPTBot
Disallow: /

Important distinction:

  • GPTBot: Used by OpenAI for both training AND ChatGPT responses
  • ChatGPT-User: Used only for ChatGPT browsing (not training)

If you want to block training but allow ChatGPT citations:

# Block training
User-agent: GPTBot
Disallow: /

# Allow ChatGPT responses
User-agent: ChatGPT-User
Allow: /

However, consider that being cited in AI responses is valuable for brand awareness and traffic, even if your content contributes to training.

Monitoring and Maintenance

Set Up Alerts

  1. Monitor robots.txt access in your web analytics
  2. Set up alerts for unexpected changes to robots.txt
  3. Track AI crawler traffic to verify access

Regular Reviews

  • Review your robots.txt quarterly
  • Update when adding new site sections
  • Remove obsolete rules
  • Test after any changes

Analytics Integration

Track AI crawler visits:

  1. Set up custom segments in Google Analytics
  2. Filter by user agent for AI crawlers
  3. Monitor pages accessed and patterns
  4. Verify crawlers respect your rules

Need More Help?