Do I Need an LLMs.txt for My Website?

Spyglasses Team

Spyglasses Team

7/10/2025

#LLMs.txt#AI crawlers#web standards#data protection#AI policy
Do I Need an LLMs.txt for My Website?

The robots.txt file revolutionized how websites control search engine crawlers. Now, a proposed standard called LLMs.txt promises to do the same for AI systems. But after implementing LLMs.txt files on dozens of websites and monitoring AI traffic for months, we can confidently report that major AI companies aren't using it.

Through our work with Spyglasses, we've tracked AI system behavior across hundreds of websites. Not once have we seen OpenAI, Google, Anthropic, or other major AI companies request or access LLMs.txt files. Not for model training, not for building AI search engines, and not as part of AI chat research sources. The gap between the proposed standard and actual AI behavior is bigger than anyone expected.

This doesn't mean LLMs.txt is useless—but it does mean you need to understand what it actually does versus what it promises to do before deciding whether to implement it.

What Is LLMs.txt and How Is It Supposed to Work?

LLMs.txt is a proposed standard that lets website owners specify how AI systems should interact with their content. Similar to robots.txt for search engines, it's a plain text file placed in your website's root directory that contains instructions for AI crawlers and systems.

The proposal suggests that AI systems should check for LLMs.txt files before accessing website content and follow the specified rules. These rules might include:

  • Which AI systems are allowed to access your content
  • What parts of your site AI can or cannot crawl
  • How AI systems should attribute your content
  • Specific licensing terms for AI usage
  • Contact information for AI-related inquiries

The idea is compelling: a simple, standardized way to control AI access to your website. But the reality is more complex than the proposal suggests.

Why Major AI Companies Aren't Using LLMs.txt

Our analysis of real AI traffic reveals that major AI companies operate very differently from how the LLMs.txt standard proposes that they should work. Here's what we've observed:

They Already Use Existing Standards: Major AI companies like OpenAI, Google, and Anthropic already respect robots.txt files. When they need to understand what content is available on a website, they use established standards like sitemap.xml to discover crawlable content. These companies have built their infrastructure around existing web standards, not new experimental ones.

There are definitely some AI trainers that don't (check our list of bots to see which ones don't), but for the most part using these existing protocols will meet your needs.

Training vs. Inference: Most AI companies separate training data collection from real-time inference. Training crawlers that build AI models operate on massive scales and don't check individual website policies before accessing content. Real-time AI systems that answer user questions often rely on existing search APIs rather than directly crawling websites.

Scale Challenges: AI systems that serve millions of users need consistent, reliable access to information. Checking individual website policies for every request would create massive performance and reliability issues.

No Legal Framework Checking: Unlike what some assume, AI model trainers don't check legal documentation, terms of service, or licensing agreements before accessing sites. While large publishers may have the leverage to negotiate licensing deals, most businesses don't have this option and shouldn't count on legal protections alone.

What Actually Works If You Want to Control AI Access

Since LLMs.txt doesn't actually prevent AI systems from accessing your content, here are two scenarios to consider:

Scenario 1: You Probably Don't Want to Block AI Systems

If your website falls into the categories we discussed above—e-commerce, local business, SaaS documentation, or content sites that benefit from AI recommendations—blocking AI systems will likely hurt more than help. These AI systems can drive discovery and recommendations that bring you new customers.

Instead of blocking AI access, focus on optimizing your content so AI systems can understand and recommend your business accurately. This means clear product descriptions, structured data, and comprehensive information that helps AI systems represent your offerings correctly.

Scenario 2: You Want to Actually Block AI Systems

If you're concerned about AI companies using your content and want to take action that actually works, use technologies that AI companies acknowledge and respect:

robots.txt: All major AI companies already respect robots.txt files. If you want to block AI crawlers, add their user agents to your robots.txt file. This is a proven method that actually prevents access.

Technical Solutions: Tools like Spyglasses can detect AI traffic and give you granular control over which AI systems access your content. You can block specific AI crawlers, redirect them to specialized content, or track their behavior to make informed decisions.

Server-Side Controls: Use server-side user-agent blocking, IP blocking, or authentication requirements for sensitive content. These methods actively prevent access rather than relying on AI systems to voluntarily check and follow policies.

The key insight is that effective AI access control requires enforcement mechanisms that AI companies actually use. LLMs.txt lacks this enforcement, while robots.txt and technical solutions provide it.

Where AI Search Is Heading

The LLMs.txt proposal represents an important conversation about AI ethics and website owner rights. Even if major AI companies aren't currently adopting it, the discussion highlights the need for clearer standards around AI content usage.

Future AI systems might adopt LLMs.txt or similar standards, especially as legal and regulatory pressure increases. Smaller AI companies or research organizations might be more likely to respect these standards than major commercial platforms.

Effective AI governance requires both technical standards and enforcement mechanisms. LLMs.txt provides the standard but lacks the (admittedly voluntary) enforcement that makes robots.txt effective for search engines.

Should You Implement LLMs.txt?

The answer depends on your goals and expectations. If you want to:

  • Make a statement about AI ethics: LLMs.txt can document your position on AI usage
  • Prepare for future adoption: Early implementation might benefit you if standards evolve
  • Show due diligence: Demonstrating proactive AI governance might have legal or business value

But if you want to:

  • Actually control AI access: Technical solutions are more effective than LLMs.txt
  • Improve AI visibility: Focus on content optimization rather than access restrictions
  • Reduce implementation effort: Your time might be better spent on other AI strategies

The reality is that most businesses will get better results from understanding and optimizing for AI systems than from trying to block them. The AI systems that can help your business grow are the same ones that LLMs.txt might try to restrict.

Rather than betting on a standard that major AI companies aren't currently using, focus on strategies that work with the AI ecosystem as it exists today. Monitor how AI systems actually interact with your content, optimize for the interactions that benefit your business, and use technical controls for the access you want to restrict.

The conversation about AI website standards is important, but don't let it distract you from the practical work of succeeding in an AI-influenced world.

The Future of AI Website Standards

The LLMs.txt proposal represents an important conversation about AI ethics and website owner rights. Even if major AI companies aren't currently adopting it, the discussion highlights the need for clearer standards around AI content usage.

Future AI systems might adopt LLMs.txt or similar standards, especially as legal and regulatory pressure increases. Smaller AI companies or research organizations might be more likely to respect these standards than major commercial platforms.

The key insight is that effective AI governance requires both technical standards and enforcement mechanisms. LLMs.txt provides the standard but lacks the enforcement that makes robots.txt effective for search engines.

The Reality About AI Website Standards

The LLMs.txt proposal represents an important conversation about AI ethics and website owner rights. Even if major AI companies aren't currently adopting it, the discussion highlights the need for clearer standards around AI content usage.

Future AI systems might adopt LLMs.txt or similar standards, especially as legal and regulatory pressure increases. Smaller AI companies or research organizations might be more likely to respect these standards than major commercial platforms.

But the current reality is that major AI companies already have established patterns for respecting website policies through robots.txt and sitemap.xml. They've built their infrastructure around these existing standards, and there's little evidence they plan to adopt new ones like LLMs.txt.

The most effective approach is to work with the standards that AI companies actually use today, rather than hoping they'll adopt new experimental standards without enforcement mechanisms.

Rather than betting on a standard that major AI companies aren't currently using, focus on strategies that work with the AI ecosystem as it exists today. Monitor how AI systems actually interact with your content, optimize for the interactions that benefit your business, and use proven technical controls for the access you want to restrict.

The conversation about AI website standards is important, but don't let it distract you from the practical work of succeeding in a zero-click AI-centric world.