Why your site is invisible to ChatGPTThe five most common reasons — with a diagnostic test for each.
Most "ChatGPT can't see us" diagnoses come down to five structural blockers: WAF edge blocks, robots.txt rules, slow server response, missing schema, and orphan pages. Each one is verifiable in minutes; the fix order matters because each step is a precondition for the next. This page describes the five in order, with a diagnostic test you can run yourself.
By Martin Yarnold · UpdatedThe five reasons, in fix order
Each step is a precondition for the next. An engine cannot read your schema if its bot is blocked at the edge, cannot wait for your slow page if the request times out, cannot rank a page it cannot reach via internal links. Fix in order; do not skip.
robots.txt blocks GPTBot or OAI-SearchBot
Cloudflare / WAF blocks the bots before robots.txt
Server is too slow — bots time out
Missing or invalid JSON-LD schema
Orphan pages with no internal links
Frequently asked questions
How do I check if my site is blocking ChatGPT's crawlers?
Fetch /robots.txt directly and grep for GPTBot and OAI-SearchBot. If either has a Disallow rule for the paths you care about, ChatGPT cannot include those pages in autonomous crawling. ChatGPT-User is a user-triggered fetcher; OpenAI's bot docs note that robots.txt rules may not apply to user-initiated requests in the same way as to autonomous crawling, so blocking ChatGPT-User in robots.txt is not a reliable way to prevent user-initiated fetches. The fix for the autonomous-crawler block is to remove the Disallow rule for GPTBot and OAI-SearchBot for any path you want ChatGPT to be able to read.
What if Cloudflare or a WAF is blocking the bots before robots.txt?
Many WAFs default-block traffic that looks like a bot, regardless of the bot's legitimacy. Run a curl with the GPTBot user-agent against your site's key pages and check the HTTP status. A 403 or 429 from your WAF means the bot is blocked at the edge, not at robots.txt. The fix is to add explicit allow rules for the major AI crawler user agents in your WAF config: GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Googlebot, Bingbot at minimum. Edge blocks are the most common cause of "ChatGPT can't see us but our robots.txt looks fine" diagnoses.
How fast does my site need to be?
Sub-300ms time-to-first-byte is the practical target for retrieval-driven AI engines. ChatGPT Search and Perplexity both fetch on-demand at answer time; if your server takes 1s+ to respond, the engine often skips you and uses faster alternatives. Sub-300ms is not a vendor-published number; it is the operational threshold where retrieval-driven engines reliably wait for your response across query types. Sites slower than that get retrieved less consistently for retrieval-augmented answers.
Which schema do I need for AI engines to disambiguate my pages?
At minimum: Organization on every page (one node, identified by @id, referenced by Article.publisher), Article or WebPage on the page itself, and BreadcrumbList for site structure. For commercial pages: Product (with Offer, AggregateRating, Brand) for product pages, FAQPage for FAQ sections with stable @id anchors, Person for author bylines on editorial content. The point is entity disambiguation — the engine needs enough structured data to identify what the page is about and how it connects to the rest of the site, not the maximum possible schema coverage.
Why are orphan pages a problem?
AI crawlers follow the same discovery path as search crawlers: sitemaps, robots.txt, and internal links. A page with no internal links and no sitemap entry is undiscoverable by autonomous crawling — the engine cannot reach it unless a user pastes the URL directly via a user-triggered fetcher (ChatGPT-User). Orphan pages on commercial sites are common because new pages get shipped without being linked from existing pages. The fix is to ensure every important page has at least one inbound internal link from a discoverable page, plus a sitemap entry. The Sentinel orphan-risk component flags this on a weekly cadence.
What should I fix first?
In order: (1) WAF / edge bot blocks — these zero out everything else; (2) robots.txt allows for GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Googlebot; (3) sub-300ms TTFB on key pages; (4) JSON-LD on every page Sentinel monitors; (5) internal links to orphan pages. The order matters because each step is a precondition for the next: an engine cannot read your schema if its bot is blocked at the edge, cannot wait for your slow page if the request times out, cannot rank a page it cannot reach via internal links. Skip steps and the diagnosis becomes ambiguous.
How do I test that ChatGPT can see my site after fixing?
Three layers: (1) server-log inspection — confirm GPTBot and OAI-SearchBot are appearing in your access logs with 200 responses for the paths you fixed; (2) in-product fetch test — paste your URL into ChatGPT and ask "summarise this page" — successful summarisation proves ChatGPT-User can reach and parse it; (3) citation queries — run 5-10 queries your buyers ask and check whether your domain appears in the cited sources. Layer 3 is the slowest signal because citation rate moves on weekly cadences; layers 1 and 2 give immediate feedback.
How does Sentinel help with this?
Sentinel runs the five-layer diagnosis weekly: robots.txt verification, edge-block detection, TTFB measurement, schema coverage, and orphan-page detection across the monitored site. Per-week, per-page, the report shows pass/fail on each layer plus the citation rate trend across the four major engines. The honest framing: Sentinel cannot make ChatGPT cite you, but it can prove the structural blockers are removed and surface citation-rate drift before it becomes a commercial problem.