Is AI Blocking Your Best Media Placements?

We analyzed the robots.txt policies of the 51 largest U.S. news websites by monthly traffic. The results should matter to anyone who values earned media placements. 61% of these sites actively block AI assistants like ChatGPT and Claude from citing their content. That means when a user asks an AI chat tool a question, content from those sites won't appear in the answer. Another 18% block AI model trainers but still allow citations. Only 22% have no AI restrictions at all. Together, the sites that block AI assistants represent over 2.6 billion monthly visits. For PR professionals and marketers, this raises an important question. If a placement lands on a site that blocks AI, does it still reach the growing audience that discovers brands through AI chat?

Key Takeaways

31 of the top 51 U.S. news sites (61%) block AI assistants from citing their content in chat responses
These blocked sites account for 71% of total monthly traffic across all 51 sites (2.68 billion visits)
8 of the top 10 news sites by traffic block AI assistants
Anthropic (Claude) is the most commonly blocked AI assistant, restricted on 30 of 51 sites. OpenAI (ChatGPT) is blocked on 24
78% of all sites block at least AI model trainers, meaning content won't contribute to future AI training data
Blocking AI doesn't erase the value of earned media. Authority, credibility, and traditional search benefits still compound. But AI visibility is a growing channel worth considering

How We Ran This Study

Spyglasses tracks over 600 known AI agents, bots, crawlers, and scrapers as part of its AI Traffic Analytics platform. Our free tool, Is This Site Blocking AI?, evaluates a website's robots.txt policy in real time and identifies any AI assistants or model trainers that the site blocks.

For this study we took the top 51 websites classified as "News" in Similarweb's dataset, sorted by monthly visits as of January 2026. We then fetched each site's robots.txt file and evaluated it against two categories of AI bots. The first is AI Assistants, the bots that fetch content for AI chat tools. The second is AI Model Trainers, the bots that crawl content to train future AI models.

We classified each site using a simple system:

🔴 Blocks AI assistants - one or more AI assistant bots are blocked. Content from this site won't be cited in AI chat responses from the blocked platforms.
🟡 Blocks model trainers only - AI assistants can still cite content, but the site blocks bots that crawl for model training purposes.
🟢 All clear - no AI-related blocks found.

A few notes on methodology. The robots.txt spec allows partial blocks, where a site blocks access to specific directories while allowing everything else. We treated partial blocks as blocks for this study. One site, drudgereport.com, does not publish a robots.txt file in a known location. We treated this as "all clear." Some AI assistants, like those from Bing Copilot, Grok, and Deepseek, do not use a consistent, recognizable identifier. They also do not respect robots.txt rules. We excluded these from the study.

What Did We Find?

Of the 51 sites studied, 31 (61%) block at least one AI assistant. These sites account for 2.68 billion of the 3.75 billion total monthly visits, or 71% of all traffic.

Only 11 sites (22%) have no AI restrictions at all. Nine sites (18%) take a middle path; they block model trainers but still allow AI assistants to cite their content.

Here's a visual breakdown:

Classification	Sites	% of Total	Monthly Traffic	% of Traffic
🔴 Blocks AI Assistants	31	61%	2.68B	71%
🟡 Blocks Trainers Only	9	18%	417M	11%
🟢 All Clear	11	22%	653M	17%

Which AI Assistants Are Blocked Most Often?

Not all AI platforms are treated equally. Anthropic's Claude is the most commonly blocked assistant, restricted on 30 of the 51 sites. OpenAI's ChatGPT is close behind at 24 sites. Mistral is blocked on 8 sites, Google's AI assistant on 5, and Perplexity on just 2.

This means Claude users are the most likely to miss content from top news outlets. ChatGPT users face a similar gap. Perplexity users, on the other hand, can access content from nearly all of these sites.

For PR professionals, this matters because different audiences prefer different AI tools. A placement that's invisible in Claude might still appear in Perplexity or Google's AI results.

What About AI Model Training?

Even sites that allow AI citations may still block model trainers. 78% of all 51 sites (40 total) block at least one model training bot.

This distinction is important. A site that blocks only model trainers will still show up in AI chat responses today. But because the content won't be included in future training data, the AI won't "learn" from it long-term. That means a brand mention on that site might appear when an AI searches the web in real time. But it won't be part of what the AI already knows when a user asks without web search enabled.

For a deeper look at how AI assistants and model trainers differ, see the glossary at the end of this article.

How Do the Biggest Sites Stack Up?

The top of the traffic chart tells a clear story. Of the ten most-visited news sites, eight block AI assistants. The only green lights are Fox News (252M monthly visits) and Google News (96M).

The most aggressive blockers restrict nearly every AI platform. Startribune.com blocks six different AI assistants, including Anthropic, Google, LangChain, Mistral, OpenAI, and Perplexity. Politico blocks five, including Perplexity, which most other sites allow.

Here's the full table. Each domain links directly to the site's robots.txt file so you can review the policy yourself.

Policy	Site	Monthly Visits	Blocked AI Assistants	Blocked AI Model Trainers
🔴	nytimes.com	494.7M	Anthropic, OpenAI (ChatGPT)	Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	cnn.com	308.4M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, OpenAI
🟢	foxnews.com	252.2M
🔴	msn.com	151.4M	Anthropic, Mistral, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	people.com	147.2M	Anthropic, OpenAI (ChatGPT)	Anthropic, Common Crawl, OpenAI
🔴	finance.yahoo.com	135.6M	Anthropic, OpenAI (ChatGPT)	Anthropic, ByteDance, Common Crawl, OpenAI
🔴	bbc.com	124.6M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	usatoday.com	117.9M	Anthropic, Mistral, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	apnews.com	98.1M	Anthropic	Amazon, Anthropic, Apple, Common Crawl, OpenAI
🟢	news.google.com	96.1M
🟢	substack.com	95.5M
🟡	theguardian.com	92.1M		Amazon, Anthropic, Apple, ByteDance, Common Crawl, Google, Meta
🔴	nbcnews.com	89.4M	Anthropic, Mistral, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🟡	cbsnews.com	87.6M		OpenAI
🔴	nypost.com	87.4M	Anthropic	Anthropic, Apple, ByteDance, Common Crawl, Meta
🔴	cnbc.com	87.3M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🔴	news.yahoo.com	73.9M	Anthropic, OpenAI (ChatGPT)	Anthropic, ByteDance, Common Crawl, OpenAI
🔴	wsj.com	72.2M	Anthropic	Anthropic, Apple, ByteDance, Common Crawl, Meta
🔴	dailymail.co.uk	70.4M	Anthropic, Google, Mistral, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	washingtonpost.com	69.2M	Anthropic	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta
🔴	npr.org	63.7M	Anthropic, OpenAI (ChatGPT)	Anthropic, Apple, ByteDance, Common Crawl, OpenAI
🟡	forbes.com	60.1M		Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🔴	abcnews.go.com	53.5M	OpenAI (ChatGPT)	ByteDance, Common Crawl, OpenAI
🟡	businessinsider.com	52.3M		Anthropic, ByteDance, Common Crawl
🔴	thehill.com	51.0M	Anthropic, OpenAI (ChatGPT)	Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	politico.com	48.1M	Anthropic, Google, Mistral, OpenAI (ChatGPT), Perplexity AI	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	reuters.com	46.1M	Anthropic, Google, Mistral	Anthropic, Apple, ByteDance, Common Crawl, Google, Meta
🔴	newsweek.com	45.2M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, OpenAI
🟢	drudgereport.com	44.8M
🔴	buzzfeed.com	40.0M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🟢	the-sun.com	39.0M
🔴	huffpost.com	34.2M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🟡	axios.com	31.8M		Amazon, ByteDance, Common Crawl
🔴	newsbreak.com	31.1M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🟢	indiatimes.com	28.1M
🟢	breitbart.com	28.1M
🔴	theatlantic.com	26.4M	Anthropic	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta
🟡	patch.com	24.8M		Common Crawl
🟡	thegatewaypundit.com	24.1M		Amazon, Common Crawl
🔴	bloomberg.com	23.9M	Anthropic, OpenAI (ChatGPT)	Amazon, ByteDance, Common Crawl, Google, Meta, OpenAI
🟡	newsmax.com	23.2M		OpenAI
🟡	latimes.com	20.9M		Anthropic, ByteDance, OpenAI
🔴	usnews.com	20.4M	Anthropic, Google, Mistral, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🟢	independent.co.uk	19.5M
🔴	variety.com	18.5M	Anthropic, OpenAI (ChatGPT)	Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🔴	sfgate.com	18.5M	Anthropic	Anthropic, Apple, ByteDance, Common Crawl
🟢	ksl.com	17.1M
🟢	thedailybeast.com	16.8M
🔴	startribune.com	16.3M	Anthropic, Google, LangChain, Mistral, OpenAI (ChatGPT), Perplexity AI	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Google, Meta, OpenAI
🔴	al.com	15.8M	Anthropic, OpenAI (ChatGPT)	Amazon, Anthropic, Apple, ByteDance, Common Crawl, Meta, OpenAI
🟢	theepochtimes.com	15.7M

Does This Mean Earned Media Is Less Valuable?

No. And this is an important caveat.

A placement in the New York Times, CNN, or the Wall Street Journal still carries enormous value. These outlets have high domain authority. They build credibility with human audiences. They influence traditional search rankings. A Tier 1 placement can shape perception, drive direct traffic, and signal trust to both customers and investors.

What this data shows is that there's now an additional dimension to consider. If 61% of top news sites block AI assistants, a growing share of potential audience members won't see that placement. They won't encounter it when they ask AI for recommendations, research, or news summaries.

The authority and credibility derived from being featured in major media compounds over time. AI visibility is one factor among many, not the only one.

Should PR Teams Factor AI Policy Into Placement Strategy?

It depends on the goals. If the primary objective is brand credibility and traditional media impact, the site's AI policy is secondary. A feature in Reuters or Bloomberg carries weight regardless of robots.txt settings.

But if the goal includes visibility in AI-powered discovery channels, then yes, it's worth knowing. A placement on Forbes (which blocks trainers but allows assistants) will show up in AI chat results. A placement on the New York Times (which blocks both Anthropic and OpenAI assistants) likely won't.

This doesn't mean avoiding blocked sites. It means understanding the trade-offs and planning accordingly. A strong PR strategy might target a mix of both: high-authority blocked sites for traditional credibility, and AI-friendly sites for discoverability.

What Can Marketers Do With This Information?

There are a few practical steps.

First, check the AI policies of the sites where you've already been placed. You can visit the site's robots.txt file directly. Every domain link in the table above goes straight to theirs. Or use Spyglasses' free Is This Site Blocking AI? tool for a quick classification.

Second, factor AI access into your earned media scoring. Not as the primary metric, but as a weighted signal alongside domain authority, audience fit, and other criteria.

Third, consider diversifying. If your existing placements are concentrated on sites that block AI, look for opportunities on AI-friendly outlets that still have strong traffic and authority. Substack (95.5M monthly visits, all clear), Forbes (60.1M, allows assistants), and The Guardian (92.1M, allows assistants) are strong options.

Will These Policies Change?

Maybe. Robots.txt policies are not permanent. Sites update them as business models evolve and licensing deals are struck with AI companies. The economics of AI traffic are still being figured out.

Several major publishers have signed licensing agreements with OpenAI and other AI companies. These deals may lead to more permissive policies over time. On the other hand, some outlets may tighten restrictions as they seek to protect subscription revenue and content exclusivity.

This is a snapshot as of early 2026. We plan to re-run this study periodically and track how policies shift.

Glossary

What is robots.txt?

Robots.txt is a text file that sits at the root of a website (e.g., nytimes.com/robots.txt). It tells web crawlers and bots what they're allowed to access and what they should stay away from. It's been the standard for controlling bot behavior since 1994. It's not a firewall; it's more like a "please don't" sign. But reputable bots, including those from Google, OpenAI, and Anthropic, follow these rules.

What are AI Assistants?

AI assistants are the bots that tools like ChatGPT, Claude, Gemini, Perplexity, and Copilot use to fetch content from websites in real time. When you see an AI tool say "Searching the web..." this is what's happening. These bots visit web pages, pull content, and bring it back so the AI can include it in its response. If a site blocks an AI assistant bot, that AI tool won't cite the site's content when answering user questions. All major AI assistant bots respect robots.txt directives.

What are AI Model Trainers?

AI model trainers are bots that crawl websites to gather content for training future AI models. This is different from real-time citation. Training data becomes part of what the AI "knows" without needing to search the web. There are valid reasons for a publisher to block this type of use. If your brand placement is on a site that blocks model trainers, the page can still be cited in an AI chat response. But it won't contribute to the AI's baseline knowledge of your brand.