Rotate for Promagen

Promagen is built for landscape viewing. Turn your phone sideways for the best experience.

Why your site is invisible to ChatGPTThe five most common reasons — with a diagnostic test for each.

Most "ChatGPT can't see us" diagnoses come down to five structural blockers: WAF edge blocks, robots.txt rules, slow server response, missing schema, and orphan pages. Each one is verifiable in minutes; the fix order matters because each step is a precondition for the next. This page describes the five in order, with a diagnostic test you can run yourself.

By Martin Yarnold · Updated
Diagnose all five
Sentinel runs the five-layer diagnosis weekly — robots.txt, edge blocks, TTFB, schema, and orphan-page detection — and surfaces the failures by page.
See how Sentinel measures it →

The five reasons, in fix order

Each step is a precondition for the next. An engine cannot read your schema if its bot is blocked at the edge, cannot wait for your slow page if the request times out, cannot rank a page it cannot reach via internal links. Fix in order; do not skip.

1

robots.txt blocks GPTBot or OAI-SearchBot

Diagnose: Fetch /robots.txt and grep for GPTBot and OAI-SearchBot. If either has a Disallow on paths you care about, ChatGPT's autonomous crawlers cannot read them. ChatGPT-User is user-triggered and OpenAI documents that robots.txt may not apply to user-initiated requests in the same way — so blocking ChatGPT-User specifically is not a reliable autonomous-block.
Fix: Remove the Disallow for GPTBot and OAI-SearchBot for the paths you want ChatGPT to be able to read. The most common cause is a stale "block all bots" rule from a pre-launch robots.txt that was never relaxed for production.
2

Cloudflare / WAF blocks the bots before robots.txt

Diagnose: curl your key pages with the GPTBot user-agent. A 403, 429, or 503 from the edge means the WAF is blocking the bot before robots.txt is even fetched. This is the most common cause of "robots.txt looks fine but ChatGPT still can't see us."
Fix: Add explicit allow rules for GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Googlebot, and Bingbot in your WAF / firewall config. Most WAFs default-block "suspected bot" traffic; the AI crawlers self-identify clearly and should be allow-listed explicitly rather than left to default behaviour.
3

Server is too slow — bots time out

Diagnose: Measure TTFB on key pages. ChatGPT Search and Perplexity fetch on-demand at answer time; pages slower than ~300ms TTFB get skipped more often than pages faster than that. The 300ms threshold is operational, not vendor-published — it is the practical break point where retrieval-driven engines stop waiting reliably.
Fix: Standard server-performance work: caching, CDN coverage, server-side rendering optimisation. The specific change that helps most depends on the stack; the operational target is consistent sub-300ms TTFB on the pages you want cited.
4

Missing or invalid JSON-LD schema

Diagnose: View page source on key pages and look for application/ld+json script tags. Validate any present JSON-LD against a schema validator. ChatGPT can read pages without schema, but disambiguation is much harder — multiple pages about similar topics blur into one entity in the engine's view, making per-page citation less reliable.
Fix: At minimum: Organization (one node, referenced consistently), Article or WebPage on the page, BreadcrumbList for site structure. For commercial pages: Product, FAQPage with stable @id anchors. The aim is entity disambiguation, not maximum schema coverage.
5

Orphan pages with no internal links

Diagnose: Crawl your site and identify pages with zero inbound internal links. Cross-check against your sitemap — pages not in the sitemap and not linked from any discoverable page are effectively undiscoverable by autonomous crawling. Sentinel's orphan-risk component flags this weekly.
Fix: Add at least one inbound internal link from a discoverable page (homepage, hub page, or related editorial), plus a sitemap entry. Pages worth ranking are worth linking from somewhere; pages not worth linking from anywhere probably should not exist.

Frequently asked questions

How do I check if my site is blocking ChatGPT's crawlers?

Fetch /robots.txt directly and grep for GPTBot and OAI-SearchBot. If either has a Disallow rule for the paths you care about, ChatGPT cannot include those pages in autonomous crawling. ChatGPT-User is a user-triggered fetcher; OpenAI's bot docs note that robots.txt rules may not apply to user-initiated requests in the same way as to autonomous crawling, so blocking ChatGPT-User in robots.txt is not a reliable way to prevent user-initiated fetches. The fix for the autonomous-crawler block is to remove the Disallow rule for GPTBot and OAI-SearchBot for any path you want ChatGPT to be able to read.

What if Cloudflare or a WAF is blocking the bots before robots.txt?

Many WAFs default-block traffic that looks like a bot, regardless of the bot's legitimacy. Run a curl with the GPTBot user-agent against your site's key pages and check the HTTP status. A 403 or 429 from your WAF means the bot is blocked at the edge, not at robots.txt. The fix is to add explicit allow rules for the major AI crawler user agents in your WAF config: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Googlebot, Bingbot at minimum. Edge blocks are the most common cause of "ChatGPT can't see us but our robots.txt looks fine" diagnoses.

How fast does my site need to be?

Sub-300ms time-to-first-byte is the practical target for retrieval-driven AI engines. ChatGPT Search and Perplexity both fetch on-demand at answer time; if your server takes 1s+ to respond, the engine often skips you and uses faster alternatives. Sub-300ms is not a vendor-published number; it is the operational threshold where retrieval-driven engines reliably wait for your response across query types. Sites slower than that get retrieved less consistently for retrieval-augmented answers.

Which schema do I need for AI engines to disambiguate my pages?

At minimum: Organization on every page (one node, identified by @id, referenced by Article.publisher), Article or WebPage on the page itself, and BreadcrumbList for site structure. For commercial pages: Product (with Offer, AggregateRating, Brand) for product pages, FAQPage for FAQ sections with stable @id anchors, Person for author bylines on editorial content. The point is entity disambiguation — the engine needs enough structured data to identify what the page is about and how it connects to the rest of the site, not the maximum possible schema coverage.

Why are orphan pages a problem?

AI crawlers follow the same discovery path as search crawlers: sitemaps, robots.txt, and internal links. A page with no internal links and no sitemap entry is undiscoverable by autonomous crawling — the engine cannot reach it unless a user pastes the URL directly via a user-triggered fetcher (ChatGPT-User). Orphan pages on commercial sites are common because new pages get shipped without being linked from existing pages. The fix is to ensure every important page has at least one inbound internal link from a discoverable page, plus a sitemap entry. The Sentinel orphan-risk component flags this on a weekly cadence.

What should I fix first?

In order: (1) WAF / edge bot blocks — these zero out everything else; (2) robots.txt allows for GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Googlebot; (3) sub-300ms TTFB on key pages; (4) JSON-LD on every page Sentinel monitors; (5) internal links to orphan pages. The order matters because each step is a precondition for the next: an engine cannot read your schema if its bot is blocked at the edge, cannot wait for your slow page if the request times out, cannot rank a page it cannot reach via internal links. Skip steps and the diagnosis becomes ambiguous.

How do I test that ChatGPT can see my site after fixing?

Three layers: (1) server-log inspection — confirm GPTBot and OAI-SearchBot are appearing in your access logs with 200 responses for the paths you fixed; (2) in-product fetch test — paste your URL into ChatGPT and ask "summarise this page" — successful summarisation proves ChatGPT-User can reach and parse it; (3) citation queries — run 5-10 queries your buyers ask and check whether your domain appears in the cited sources. Layer 3 is the slowest signal because citation rate moves on weekly cadences; layers 1 and 2 give immediate feedback.

How does Sentinel help with this?

Sentinel runs the five-layer diagnosis weekly: robots.txt verification, edge-block detection, TTFB measurement, schema coverage, and orphan-page detection across the monitored site. Per-week, per-page, the report shows pass/fail on each layer plus the citation rate trend across the four major engines. The honest framing: Sentinel cannot make ChatGPT cite you, but it can prove the structural blockers are removed and surface citation-rate drift before it becomes a commercial problem.

Get a free Sentinel snapshot →

Bot user agents and robots.txt behaviour reference OpenAI's published crawler documentation as of 10 May 2026; the sub-300ms TTFB threshold is operational, not vendor-published. ChatGPT, GPTBot, OAI-SearchBot, ChatGPT-User are trademarks of OpenAI. Promagen Ltd is independent of OpenAI.

provenance: sha256:133ee23fa3553a1d