AI Placement Value Score Methodology

What the AI Placement Value Score Measures

The AI Placement Value Score (AIPVS) answers a simple question: if you secure a media placement on this domain, how much combined value does it deliver in both traditional search and AI-powered discovery?

Traditional metrics like Domain Authority, Domain Rating, and estimated traffic tell you how strong a publication is in organic search. But they don't account for the fact that many major publications now block AI assistants like ChatGPT, Gemini, and Perplexity from accessing their content.

A placement that looks great on paper — high authority, large audience — might be completely invisible to AI. The AIPVS combines organic authority with an assessment of AI accessibility to produce a single, comparable score between 0 and 100.

The Three Layers

The score is built from three independent layers, each measuring a different dimension of placement value.

Layer 1: Organic Authority Baseline

This layer captures what traditional SEO tools already measure; the underlying authority and visibility of the domain.

We pull domain-level metrics from DataForSEO, a leading SEO data provider. The signals include:

Web graph importance ranking. A score based on the quality and quantity of other websites linking to this domain, computed from a dataset of over 250 million host-level nodes. This is our strongest single signal because it captures link quality, cascading authority, and overall web reputation.
Estimated organic traffic. How much search traffic the domain receives from Google. Higher traffic correlates with greater potential AI exposure.
Ranking quality. Not just how many search results the domain appears in, but how many are top-3 positions. Consistent top positions signal stronger authority than appearing on page two.
Commercial value. What advertisers would pay for equivalent traffic through Google Ads. An industry-standard proxy for how commercially valuable the domain's audience is.
Category relevance. How well the domain's content category matches the brand being evaluated. A placement in a relevant publication carries more weight than one in an unrelated vertical. When no brand context is provided, we use a neutral baseline. Importantly, category relevance matters less for high-authority publishers — a backlink from a top-1000 site carries massive weight regardless of topical overlap, so we dynamically reduce the category weight for high-authority domains. For smaller publishers, category relevance retains its full weight because their placement value depends more on audience alignment.

These signals are normalized and combined into a 0–100 score representing the domain's organic authority.

If you already have Ahrefs, Moz, or SEMrush data: You don't need to adopt our organic baseline. Through our API, you can supply your own domain authority score and we'll apply only the AI layers. You can also request just the raw AI Impact Multiplier, a single number between 0.40 and 1.00 representing what fraction of a domain's potential value is accessible to AI.

Layer 2: AI Citation Accessibility

This layer measures whether AI assistants can actually cite content from this domain in real-time responses.

When someone asks ChatGPT or Perplexity a question, these assistants browse the web using dedicated crawlers, similar to how Google's crawler visits websites to build its search index. Website owners can block specific crawlers through their robots.txt file.

We check every major AI platform's browsing crawler against the domain's access policies:

ChatGPT (OpenAI's browsing agent)
Gemini and Google AI Mode (Google's AI retrieval system)
Perplexity (designed from the ground up to cite sources)
Microsoft Copilot (uses Bing's infrastructure)
Claude (Anthropic's browsing agent)
And several others

For each platform, we determine whether the crawler has full access, partial access (allowed on some pages but not others), or is completely blocked.

We go beyond the published policy. Many websites use content delivery networks and security services that silently block AI crawlers even when robots.txt technically allows them. Our verification system tests for both the published policy and the actual infrastructure behavior.

Each platform's access status is weighted by its share of the AI market. Being blocked by a platform used by 65% of users matters more than being blocked by one used by 2%. Weights are updated quarterly as the landscape shifts.

What we intentionally exclude: Google AI Overviews are not included. Google AI Overviews use Google's standard search crawler, which virtually no website blocks (doing so would remove them from Google Search entirely). Since it isn't a differentiating factor, including it would artificially inflate scores.

Per-platform transparency: Beyond the composite score, we show exactly which AI platforms can and cannot access each domain. This lets you make platform-specific decisions based on your audience.

Layer 3: AI Training Influence

This layer assesses how much a placement contributes to a brand's presence in the data AI models learn from.

AI assistants don't just browse the web in real time. They also carry baseline knowledge from their training data — the massive web content collection they were trained on. If a brand is mentioned on a highly connected, well-regarded website that AI models learn from, that brand becomes part of the AI's baseline understanding. This is a longer-term, more durable form of AI visibility.

We use publicly available web structure data from Common Crawl (a nonprofit that maintains an open web archive) to assess how important a domain is in the training data ecosystem. Domains that sit at the center of the web's link graph, meaning they are connected to many other important sites, are far more likely to be included in training data.

How we weight training crawlers

AI companies use dedicated crawlers to collect web content for training their models. We check whether each domain allows or blocks these training crawlers, and we weight each crawler by its downstream model influence — the combined market reach of the AI models that rely on that crawler's data.

This is different from how we weight citation crawlers in Layer 2. Citation crawlers map neatly to specific AI assistants with known user bases. Training crawlers are messier: some feed a single model family, while others (like Common Crawl) provide shared data used across much of the AI industry.

Our weighting considers three factors for each crawler:

Which models does it feed? A training crawler that feeds the models behind the most-used AI assistant carries more weight than one feeding a niche product.
How broad is its influence? Common Crawl's CCBot is weighted not by any single model's market share, but by the breadth of models that depend on it. Published research shows the majority of major LLMs use Common Crawl data in pre-training. Blocking it affects your presence across most of the AI ecosystem, not just one platform.
Does usage data tell the full story? For some platforms, consumer web traffic understates actual influence. Enterprise-heavy platforms and open-source model families that power thousands of downstream applications receive a modest premium to reflect their real-world impact beyond what traffic numbers alone would suggest.

We track training crawlers from OpenAI, Google, Anthropic, Meta, and Common Crawl, along with smaller crawlers from ByteDance, Amazon, and Apple. Weights are based on public data including Cloudflare Radar's AI crawler traffic analysis, Similarweb's AI market share tracking, and published training data research from the Mozilla Foundation and others.

These weights are updated quarterly as the AI landscape shifts, and changes are documented in our methodology changelog.

What we acknowledge: This is the most forward-looking and inherently uncertain layer. Training data inclusion is not publicly documented by most AI companies. We label this component as an estimate and are transparent about the uncertainty. It is most useful as a directional indicator, distinguishing between a domain that is almost certainly in training data (a major national publication) and one that almost certainly isn't (a brand-new blog).

Brand-specific metrics: When a brand domain is provided, this layer also computes a relative uplift score: How much the placement domain's web graph importance exceeds the brand's own importance. A small brand getting featured on a highly connected publication gains disproportionate training influence. These metrics require a brand domain for comparison and are available through our API.

Without a brand domain: When no brand domain is provided, we use the publisher's own training importance score directly — how prominently the domain appears in the web's link graph — modulated by its training crawler access. This means a top-ranked, fully-open publisher is correctly recognized as having high training influence, even without a specific brand comparison.

How the Layers Combine

The three layers are combined into a single 0–100 score. The organic baseline sets the ceiling. A domain can't score higher than its underlying authority supports. The AI layers then adjust that ceiling based on how much of the domain's potential value is accessible to AI.

A domain that allows all AI access retains nearly all of its organic value. A domain that blocks most AI crawlers sees a significant reduction, though it always retains a meaningful portion of its score because traditional organic and SEO value still exists.

The composite score maps to four interpretive tiers:

Tier	Score	Label	Interpretation
1	75–100	Premium	Strong organic authority and excellent AI access. Placements deliver maximum combined value.
2	50–74	Strong	Good authority with meaningful AI value. Some restrictions may limit full potential.
3	25–49	Moderate	Decent organic presence but significant AI gaps. Traditional value persists, AI impact is limited.
4	0–24	Limited	Low authority and/or severely restricted AI access. Evaluate whether placement cost is justified.

Confidence and Data Completeness

Every score includes a confidence indicator: exact or estimated.

Exact means the domain was found in the Common Crawl web graph dataset, and its web graph importance ranking and organic metrics are based on observed data.
Estimated means some signals were unavailable (for example, a very new or niche domain may not appear in Common Crawl). When signals are missing, their scoring weight is redistributed proportionally across available signals. The score remains directionally useful but carries more uncertainty.

The per-platform access breakdown is always based on live or recently cached data, regardless of confidence level.

What This Score Is — and What It's Not

What it is:

A directional indicator for comparing placement opportunities and tracking trends over time.
Transparent in methodology. This document explains how scores are calculated and what data sources we use.
Regularly updated. Platform weights are reviewed quarterly as the AI market evolves. A domain's score may change over time as market shares shift, even if nothing about the domain itself changes.
Complementary to existing tools. Designed to add the AI dimension that SEO platforms don't provide, not to replace them.

What it's not:

An exact dollar valuation. Like every domain authority metric in the industry (Moz DA, Ahrefs DR, SEMrush Authority Score), the AIPVS is an estimated score useful for comparison and trend analysis, not a precise financial calculation.
A guarantee of AI visibility. A high AIPVS means conditions are favorable for AI citation. It doesn't guarantee any specific AI platform will cite a placement for any specific query.
Static. The AI landscape is moving fast. We update platform weights quarterly and expect the methodology itself to evolve as we gather validation data.

Our Commitment to Accuracy

The SEO industry has established a useful precedent: metrics like Ahrefs Domain Rating and SEMrush Traffic Value have become industry standards despite independent studies showing 49–68% median error rates for traffic estimation. They work because they're transparent, directional, and consistently applied, allowing meaningful comparisons even without absolute precision.

We follow the same philosophy. We will:

Publish our methodology openly (this document is a start).
Conduct validation studies comparing our scores against observed AI citation patterns.
Publish error rates and correlation findings honestly.
Maintain a changelog showing when and why platform weights change.

Data Sources

DataForSEO. Domain-level organic authority metrics including estimated traffic, ranking positions, and commercial value. Updated at least monthly.
Common Crawl. Open web graph data with 250M+ host nodes, used for web graph importance and training influence assessment. Updated quarterly.
Similarweb. AI platform market share data used for weighting citation crawler access. Publicly reported figures reviewed quarterly.
Cloudflare Radar. AI crawler traffic data by user agent, crawl purpose, and industry vertical. Used for training crawler traffic share and trend analysis in Layer 3 weighting.
Live robots.txt analysis. Direct verification of AI crawler access policies and infrastructure-level blocking.

AI Placement Value Score Methodology

On this page