Is robots.txt Blocking AI on Your Site?
Learn how your robots.txt file may be preventing AI assistants from accessing and citing your content.
If you're seeing this page, our analysis detected that your robots.txt file is blocking crawlers from accessing your site. This may be preventing both our analysis tool and AI assistants like ChatGPT, Claude, and Perplexity from reading your content.
The Problem: Overly Restrictive robots.txt
Your robots.txt file tells crawlers which parts of your site they're allowed to access. However, if configured too restrictively, it can block legitimate AI assistants from accessing your content.
Common blocking patterns we detected:
- Wildcard Disallow:
Disallow: /underUser-agent: * - Specific AI Bot Blocking: Disallow rules for GPTBot, ClaudeBot, etc.
- Spyglasses Blocking: Disallow rules specifically for our crawler
Why This Matters
When AI assistants respect your robots.txt and find they're blocked:
- ChatGPT cannot cite your content in responses
- Claude cannot reference your information when answering questions
- Perplexity cannot include you in search results
- Google Gemini skips your pages in AI-powered overviews
Unlike malicious crawlers, AI assistants from major companies respect robots.txt. Blocking them removes you from the future of search.
How to Check Your robots.txt
View Your Current File
Visit: https://yoursite.com/robots.txt
Look for problematic patterns:
❌ Problem: Blocking Everything
This blocks all crawlers from all pages. Unless you intentionally want to hide your site from all search engines and AI, this is too restrictive.
❌ Problem: Blocking AI Assistants
This specifically prevents AI assistants from accessing your content.
❌ Problem: Blocking Analysis Tools
This prevents us (and similar tools) from analyzing your AI visibility.
How to Fix It
Option 1: Allow All Crawlers (Recommended)
If you don't have sensitive content to protect:
Or simply have an empty robots.txt or no file at all—this allows everything.
Option 2: Selectively Allow AI Assistants
If you need to block some crawlers but want AI visibility:
Note: Allow directives must come before Disallow for the same user agent.
Option 3: Allow Most, Block Sensitive Areas
If you have specific pages to protect:
This allows access to public content while protecting sensitive areas.
AI Crawler User Agents to Allow
Here are the major AI assistants you should consider allowing:
| AI Assistant | User Agent | Company |
|---|---|---|
| ChatGPT | GPTBot | OpenAI |
| ChatGPT (Citations) | ChatGPT-User | OpenAI |
| Claude | ClaudeBot | Anthropic |
| Claude (Web) | anthropic-ai | Anthropic |
| Perplexity | PerplexityBot | Perplexity |
| Google Gemini | Google-Extended | |
| Bing Copilot | Bingbot | Microsoft |
| Spyglasses | Spyglasses | Spyglasses |
Comprehensive AI Allowlist
Testing Your robots.txt
1. Validate Syntax
Use Google's robots.txt Tester:
- Visit Google Search Console
- Go to robots.txt Tester
- Enter your robots.txt content
- Test specific user agents and URLs
2. Test Specific User Agents
Use our testing tool or manually test:
3. Request a New Analysis
After updating your robots.txt:
- Return to Spyglasses AI Visibility Report
- Request a new analysis
- We'll verify that crawlers can now access your site
Common Scenarios
E-commerce Sites
SaaS/Software Companies
Content/Media Sites
Local Businesses
Best Practices
Do's
✅ Do allow AI assistants unless you have a specific reason not to
✅ Do use specific Disallow rules for sensitive areas
✅ Do test your robots.txt after changes
✅ Do monitor crawler behavior in your analytics
✅ Do keep your robots.txt simple and maintainable
Don'ts
❌ Don't use Disallow: / under User-agent: * unless intentional
❌ Don't block AI crawlers without understanding the impact
❌ Don't use robots.txt for access control (use authentication instead)
❌ Don't block content you want to be discovered
❌ Don't copy robots.txt from other sites without understanding it
Understanding Crawler Respect Levels
Not all crawlers respect robots.txt equally:
| Crawler Type | Respects robots.txt |
|---|---|
| Major AI Assistants (GPT, Claude, etc.) | ✅ Yes, strictly |
| Search Engines (Google, Bing) | ✅ Yes, strictly |
| Malicious Bots | ❌ Often ignore it |
| Aggressive Scrapers | ❌ Often ignore it |
Key Insight: The crawlers you want to access your site (AI assistants, search engines) respect robots.txt. Malicious crawlers often ignore it. Blocking legitimate crawlers only hurts your visibility.
What About AI Training?
Some companies have added robots.txt rules to prevent AI training:
Important distinction:
- GPTBot: Used by OpenAI for both training AND ChatGPT responses
- ChatGPT-User: Used only for ChatGPT browsing (not training)
If you want to block training but allow ChatGPT citations:
However, consider that being cited in AI responses is valuable for brand awareness and traffic, even if your content contributes to training.
Monitoring and Maintenance
Set Up Alerts
- Monitor robots.txt access in your web analytics
- Set up alerts for unexpected changes to robots.txt
- Track AI crawler traffic to verify access
Regular Reviews
- Review your robots.txt quarterly
- Update when adding new site sections
- Remove obsolete rules
- Test after any changes
Analytics Integration
Track AI crawler visits:
- Set up custom segments in Google Analytics
- Filter by user agent for AI crawlers
- Monitor pages accessed and patterns
- Verify crawlers respect your rules