{"pageUrl":"https://promagen.com/the-five-signals-ai-engines-use-to-cite-pages","lastModified":"2026-05-10","provenanceHash":"sha256:880bd42dc8afe2388d1140c24f6266e333c3cb72b19bb1e3d512dbe5b7dbd74b","provenanceNote":"Reachability documentation references each vendor's published crawler docs as of 2026-05-10. Entity clarity, substantive content, authority signals, and freshness are operational observations from Sentinel's per-engine measurement against a fixed query set; they are not vendor-documented citation contracts. The five-signal model is operator-runnable; per-site weakness profiles require a Sentinel snapshot.","claims":[{"id":"claim-reachability-vendor-documented","statement":"Reachability is partially vendor-documented: every major AI engine publishes its bot user agents and documents robots.txt compliance for autonomous crawlers (GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot, Googlebot, Applebot, Bingbot, CCBot). User-triggered fetchers (ChatGPT-User, Claude-User, Perplexity-User) are documented separately per vendor; OpenAI and Perplexity both call out that robots.txt may not apply to user-initiated requests in the same way as to autonomous crawling.","evidenceUrl":"https://platform.openai.com/docs/bots","lastVerified":"2026-05-10","hash":"sha256:888f62ae8d6d7f2c1015d2ab5ebeca9e2a17bbb0e1c7a44778d02f17de518384"},{"id":"claim-other-four-signals-not-vendor-documented","statement":"Entity clarity, substantive content, authority signals, and freshness are not vendor-documented as AI citation inputs by any of the four major engines. They are operational hygiene that observably correlates with citation rate. Specific claims about ranking weights or per-signal influence on AI citation are inference from observable output, not vendor contract.","evidenceUrl":"https://promagen.com/sentinel/weekly","lastVerified":"2026-05-10","hash":"sha256:7a3a0a7e8d73959f570783e8164db4ca9aaa1067505524b5120b16b871211fa6"},{"id":"claim-reachability-highest-leverage","statement":"Reachability is the highest-leverage operator-controllable signal. Every other signal is irrelevant if the engine cannot fetch the page. WAF and edge bot blocks are the most common cause of \"engine cannot see us\" diagnoses on commercial sites; sub-300ms TTFB is the operational threshold at which retrieval-driven engines reliably wait for response.","evidenceUrl":"https://promagen.com/sentinel/weekly","lastVerified":"2026-05-10","hash":"sha256:dddd4792a30d0bba66ce455c6c9c03bb2d04ae3167ffc3c4bf102ccbdb6015dd"},{"id":"claim-backlinks-not-published-input","statement":"No major AI engine vendor publishes backlinks as an AI citation input. Sites with strong inbound link profiles are observably cited more, but the likely mechanism is correlation with the underlying signals (such sites also tend to have better reachability, schema, and content depth) rather than backlinks themselves being a documented input.","evidenceUrl":"https://promagen.com/sentinel/weekly","lastVerified":"2026-05-10","hash":"sha256:d0eb001371bc2b086c4c17ab68bac90432bebd5b00fb0057ee690696d037e028"},{"id":"claim-fix-order-matters","statement":"The five-signal fix order is structural: reachability must be solved before entity clarity is measurable, entity clarity before substantive content effects are isolatable, and so on. Skipping steps makes diagnosis ambiguous because each layer is a precondition for the layer above.","evidenceUrl":"https://promagen.com/sentinel/weekly","lastVerified":"2026-05-10","hash":"sha256:99d5b7608191ef2605da641cfcb85fa00bc54fb98c0540997e3fe6eecf52c142"}]}