The five signals AI engines use to cite pagesReachability, entity clarity, substance, authority, freshness — documented vs observed.
AI engines do not publish their citation ranking. What they do publish — combined with what is operationally observable across the four major engines — points to five page-level signals operators can verifiably move. This page describes each signal, separates documented from observed, and gives the concrete operator action that affects each.
By Martin Yarnold · UpdatedThe five signals, in fix order
Each signal is documented per its vendor support (where it has any), observed per Sentinel's measurement, and tied to a concrete operator action. Reachability comes first because every other signal is irrelevant if the engine cannot fetch the page; entity clarity comes second because reading without disambiguation produces unreliable citations.
Reachability
Entity clarity
Substantive content
Freshness signals
Frequently asked questions
Why these five signals?
These are the five operator-controllable signals that consistently appear in vendor crawler documentation, observable AI engine behaviour, or both. They are not the engines' full ranking model — that is not published — but they are the structural surface operators can verifiably move. Other signals exist (the engines' internal retrieval scoring, query-time embedding choices, model-version effects) but they are either invisible to operators or not operator-controllable, so they belong in a different conversation.
Which of the five signals are vendor-published?
Reachability is partially vendor-published: every major engine publishes its bot user agents and documents robots.txt compliance for autonomous crawlers; user-triggered fetcher behaviour is documented per vendor (OpenAI's ChatGPT-User and Perplexity's Perplexity-User get separate treatment in their respective docs). Entity clarity, substantive content, authority signals, and freshness are not vendor-published as ranking inputs — they are operational hygiene that observably correlates with citation rate without being officially documented as a citation contract.
Which signal matters most?
Reachability, by a wide margin. Every other signal is irrelevant if the engine cannot fetch the page. The most common citation failure mode is structural: WAF edge blocks, robots.txt rules, or slow response time stopping the engine from reading the page at all. After reachability, entity clarity matters most because a page the engine can read but cannot disambiguate from generic content cites less consistently than a clearly-identified entity. Substantive content, authority, and freshness are closer to lateral signals — they shift citation probability at the margin once reachability and entity clarity are solved.
How do I measure where I am weak on these signals?
Reachability is measurable from server logs and curl tests against the named bot user agents. Entity clarity is measurable by validating JSON-LD coverage against a schema validator and grepping for canonical, hreflang, and Organization @id consistency. Substantive content is harder to measure directly — proxy with word count, FAQ schema coverage, and unique-vs-boilerplate ratio. Authority signals are measurable by author-byline coverage, internal-link counts, and external-source citation density. Freshness is measurable by datePublished/dateModified spread across the site. Promagen Sentinel runs all five layers weekly per page on the monitored set.
What about backlinks — are they an AI citation signal?
No vendor publishes backlinks as an AI citation input. Operationally, sites with strong inbound link profiles are often cited more, but the likely mechanism is correlation with the underlying signals (such sites also tend to have better reachability, schema, and content depth) rather than backlinks themselves being a documented input. Treat link building as an indirect lever for AI citation: improving the underlying entity quality is what moves the needle, and that improvement also tends to attract better links.
How do these signals shift over time?
The signals themselves are stable across model versions; the relative weighting between them is what shifts. AI engine releases change retrieval behaviour quietly — the same page can become more or less consistently cited from one model version to the next without any operator-side change. This is why Sentinel measures citation rate as a per-engine time series rather than a single absolute number: the trend is the actionable signal, not the per-week absolute count. Operators monitoring this can detect engine-side shifts within a week or two of model release.
What order should I fix these in?
In order: (1) reachability — WAF and robots.txt allows for the autonomous crawlers; (2) entity clarity — JSON-LD coverage and canonical/hreflang correctness; (3) substantive content — depth and uniqueness on the pages that matter commercially; (4) authority signals — author byline + primary-source citations + internal linking on those same pages; (5) freshness — keep important pages updated with real edits, not date-bumped boilerplate. Skip steps and the diagnosis becomes ambiguous; fix in order and each layer's effect is measurable.
How does Sentinel help with the five signals?
Sentinel measures all five signals weekly per page on the monitored site set: reachability via bot fetch tests, entity clarity via schema validation, substantive content via depth proxies, authority via byline + internal-link coverage, and freshness via datePublished/dateModified analysis. The output is a per-signal pass/fail per page plus a citation-rate time series across the four major engines. The combination shows which signal is weakest on which page, and whether weakness correlates with citation drift.