# Is robots.txt Blocking AI on Your Site?

> Learn how your robots.txt file may be preventing AI assistants from accessing and citing your content.

If you're seeing this page, our analysis detected that your `robots.txt` file is blocking crawlers from accessing your site. This may be preventing both our analysis tool and **AI assistants like ChatGPT, Claude, and Perplexity** from reading your content.

## The Problem: Overly Restrictive robots.txt

Your `robots.txt` file tells crawlers which parts of your site they're allowed to access. However, if configured too restrictively, it can block legitimate AI assistants from accessing your content.

Common blocking patterns we detected:

- **Wildcard Disallow**: `Disallow: /` under `User-agent: *`
- **Specific AI Bot Blocking**: Disallow rules for GPTBot, ClaudeBot, etc.
- **Spyglasses Blocking**: Disallow rules specifically for our crawler

## Why This Matters

When AI assistants respect your `robots.txt` and find they're blocked:

- **ChatGPT cannot cite your content** in responses
- **Claude cannot reference your information** when answering questions
- **Perplexity cannot include you** in search results
- **Google Gemini skips your pages** in AI-powered overviews

Unlike malicious crawlers, AI assistants from major companies **respect robots.txt**. Blocking them removes you from the future of search.

## How to Check Your robots.txt

### View Your Current File

Visit: `https://yoursite.com/robots.txt`

Look for problematic patterns:

### ❌ Problem: Blocking Everything

```txt
User-agent: *
Disallow: /
```

This blocks **all crawlers** from **all pages**. Unless you intentionally want to hide your site from all search engines and AI, this is too restrictive.

### ❌ Problem: Blocking AI Assistants

```txt
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /
```

This specifically prevents AI assistants from accessing your content.

### ❌ Problem: Blocking Analysis Tools

```txt
User-agent: Spyglasses
Disallow: /
```

This prevents us (and similar tools) from analyzing your AI visibility.

## How to Fix It

### Option 1: Allow All Crawlers (Recommended)

If you don't have sensitive content to protect:

```txt
User-agent: *
Allow: /
```

Or simply have an **empty robots.txt** or no file at all—this allows everything.

### Option 2: Selectively Allow AI Assistants

If you need to block some crawlers but want AI visibility:

```txt
# Allow major AI assistants
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Spyglasses
Allow: /

# Block others
User-agent: *
Disallow: /
```

**Note**: `Allow` directives must come before `Disallow` for the same user agent.

### Option 3: Allow Most, Block Sensitive Areas

If you have specific pages to protect:

```txt
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /internal/
Allow: /
```

This allows access to public content while protecting sensitive areas.

## AI Crawler User Agents to Allow

Here are the major AI assistants you should consider allowing:

| AI Assistant | User Agent | Company |
|--------------|-----------|---------|
| ChatGPT | `GPTBot` | OpenAI |
| ChatGPT (Citations) | `ChatGPT-User` | OpenAI |
| Claude | `ClaudeBot` | Anthropic |
| Claude (Web) | `anthropic-ai` | Anthropic |
| Perplexity | `PerplexityBot` | Perplexity |
| Google Gemini | `Google-Extended` | Google |
| Bing Copilot | `Bingbot` | Microsoft |
| Spyglasses | `Spyglasses` | Spyglasses |

### Comprehensive AI Allowlist

```txt
# OpenAI (ChatGPT)
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

# Anthropic (Claude)
User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

# Perplexity
User-agent: PerplexityBot
Allow: /

# Google (Gemini, AI Overviews)
User-agent: Google-Extended
Allow: /

User-agent: Googlebot
Allow: /

# Microsoft (Copilot)
User-agent: Bingbot
Allow: /

# Spyglasses
User-agent: Spyglasses
Allow: /

# Allow general crawlers
User-agent: *
Allow: /
```

## Testing Your robots.txt

### 1. Validate Syntax

Use [Google's robots.txt Tester](https://support.google.com/webmasters/answer/6062598):

1. Visit Google Search Console
2. Go to **robots.txt Tester**
3. Enter your robots.txt content
4. Test specific user agents and URLs

### 2. Test Specific User Agents

Use our testing tool or manually test:

```bash
# Test if GPTBot is allowed
curl -A "GPTBot" https://yoursite.com/robots.txt

# Test a specific page
curl -A "GPTBot" https://yoursite.com/about
```

### 3. Request a New Analysis

After updating your robots.txt:

1. Return to [Spyglasses AI Visibility Report](https://spyglasses.io/ai-visibility-report)
2. Request a new analysis
3. We'll verify that crawlers can now access your site

## Common Scenarios

### E-commerce Sites

```txt
User-agent: *
# Block cart and checkout (private)
Disallow: /cart/
Disallow: /checkout/
# Block search results (duplicate content)
Disallow: /search
# Allow everything else including product pages
Allow: /
```

### SaaS/Software Companies

```txt
User-agent: *
# Block admin and user dashboards
Disallow: /app/
Disallow: /dashboard/
# Allow public marketing pages, docs, blog
Allow: /
```

### Content/Media Sites

```txt
User-agent: *
# Allow everything - you want maximum visibility
Allow: /
```

### Local Businesses

```txt
User-agent: *
# Allow everything - AI citations drive local discovery
Allow: /
```

## Best Practices

### Do's

✅ **Do** allow AI assistants unless you have a specific reason not to  
✅ **Do** use specific `Disallow` rules for sensitive areas  
✅ **Do** test your robots.txt after changes  
✅ **Do** monitor crawler behavior in your analytics  
✅ **Do** keep your robots.txt simple and maintainable

### Don'ts

❌ **Don't** use `Disallow: /` under `User-agent: *` unless intentional  
❌ **Don't** block AI crawlers without understanding the impact  
❌ **Don't** use robots.txt for access control (use authentication instead)  
❌ **Don't** block content you want to be discovered  
❌ **Don't** copy robots.txt from other sites without understanding it

## Understanding Crawler Respect Levels

Not all crawlers respect robots.txt equally:

| Crawler Type | Respects robots.txt |
|--------------|-------------------|
| Major AI Assistants (GPT, Claude, etc.) | ✅ Yes, strictly |
| Search Engines (Google, Bing) | ✅ Yes, strictly |
| Malicious Bots | ❌ Often ignore it |
| Aggressive Scrapers | ❌ Often ignore it |

**Key Insight**: The crawlers you *want* to access your site (AI assistants, search engines) respect robots.txt. Malicious crawlers often ignore it. Blocking legitimate crawlers only hurts your visibility.

## What About AI Training?

Some companies have added robots.txt rules to prevent AI training:

```txt
User-agent: GPTBot
Disallow: /
```

**Important distinction**:

- **GPTBot**: Used by OpenAI for both training AND ChatGPT responses
- **ChatGPT-User**: Used only for ChatGPT browsing (not training)

If you want to block training but allow ChatGPT citations:

```txt
# Block training
User-agent: GPTBot
Disallow: /

# Allow ChatGPT responses
User-agent: ChatGPT-User
Allow: /
```

However, consider that being cited in AI responses is valuable for brand awareness and traffic, even if your content contributes to training.

## Monitoring and Maintenance

### Set Up Alerts

1. Monitor robots.txt access in your web analytics
2. Set up alerts for unexpected changes to robots.txt
3. Track AI crawler traffic to verify access

### Regular Reviews

- Review your robots.txt quarterly
- Update when adding new site sections
- Remove obsolete rules
- Test after any changes

### Analytics Integration

Track AI crawler visits:

1. Set up custom segments in Google Analytics
2. Filter by user agent for AI crawlers
3. Monitor pages accessed and patterns
4. Verify crawlers respect your rules

## Need More Help?

- [Understanding AI Visibility](/docs/help/what-is-ai-visibility)
- [Technical SEO for AI](/docs/help/technical-seo-for-ai)
- [Is Your Site Visible to AI Assistants?](/docs/help/is-your-site-visible-to-ai-assistants)
- [Cloudflare Blocking AI Access](/docs/help/cloudflare-is-blocking-access-to-your-site)
- [robots.txt Specification](https://www.robotstxt.org/)
- [Contact Support](mailto:support@spyglasses.io)
