How to fix orphan pages for AI discoveryThree relevant inbound links. Three places that make sense.
Orphan pages are pages AI engines often cannot reach because no cleanly-linked navigation path leads to them. The fix is direct: give every page intended for AI citation at least three inbound internal links from relevant sources. This page covers detection, the linking pattern that works, and the operational rule that prevents new orphans from shipping.
By Martin Yarnold · UpdatedThe three-source rule
Three good link sources
Anti-pattern
Frequently asked questions
How do you define an orphan page?
A page with fewer than three inbound internal links from other pages on the same site. Three is the conventional threshold (used by Promagen Sentinel's orphan-risk component) because it captures "reachable from at least one section, plus redundancy if one link breaks." Sites that use a different threshold should be consistent; the number matters less than the binary classification you can act on. Sitemaps and llms.txt do not count as inbound links for orphan purposes — both are discovery aids, not navigation.
How do I find my orphan pages?
Two methods. (1) Crawler-based: tools like Screaming Frog or Sitebulb produce an internal-link graph; the orphan list is the pages with fewer than three inbound links. (2) Build-time: if your site uses a static-site generator (Next.js, Hugo, Astro), parse the build output to compute the link graph yourself. Promagen Sentinel runs this analysis weekly as part of its orphan-risk component (10% of the composite health score).
What is the fix pattern?
Link from three places that make sense for the page's topic: (1) the most relevant section in your main navigation or a hub page; (2) a related-content block on at least one well-trafficked page (article footers, sidebar widgets); (3) a relevant breadcrumb or category page. Avoid the anti-pattern of "link from a junk page just to hit three" — engines penalise low-quality link sources, and the orphan-risk component is a proxy for reachability, not a counter to be gamed.
Are some pages supposed to be orphan?
A few. Legal pages (terms, privacy) often live only in the footer and are deliberately low-attention — they do not need rich internal-link surfaces. Off-brand orphan pages (e.g. /vaillant-cooling-resistor on Promagen) are deliberate orphans by design. The general rule: if a page is intended to be cited by AI engines or surfaced in search, it should not be orphan. If its purpose is legal compliance or off-brand isolation, orphan status is fine.
How do I tell orphan-by-bug from orphan-by-design?
Ask: did anyone intentionally decide this page should be orphan? If the answer is "no, it was just published without thinking about discovery", it is a bug. If the answer is "yes, we explicitly want this isolated for X reason", it is by design. Documentation matters here — Promagen's Vaillant page is documented as off-brand orphan in its source code (frontend/src/app/vaillant-cooling-resistor/page.tsx header comment). Document your intentional orphans so they do not get "fixed" by a well-meaning future engineer.
How do I prevent new pages from becoming orphans?
Operational rule: every new page-shipping checklist includes "linked from at least three relevant places." Build this into your PR template, your CMS publish flow, or your site-generator template. The cost of fixing orphan risk at publish time is a few minutes; the cost of catching it weeks later in a Sentinel audit is rework plus lost discovery time. Promagen Sentinel's weekly orphan-risk component catches drift; preventing it at publish is cheaper than chasing it.