Rotate for Promagen

Promagen is built for landscape viewing. Turn your phone sideways for the best experience.

How to test if ChatGPT can read your siteThree layers. 30 seconds to weekly.

Three tests separate "ChatGPT does not know you exist" from "ChatGPT knows you but is not citing you." Run them in order: cheap log grep first, in-product fetch test second, weekly citation query measurement third. This page covers each test with concrete examples and what the result actually means.

By Martin Yarnold · Updated
Automated tracking
Sentinel runs layers 1 and 2 every Monday and provides the structure for layer 3 against your fixed query set.
See how Sentinel measures it →

Three tests in order of cost

TestWhat you doWhat the result means
1. Log grep (5 min)grep server logs for GPTBot, OAI-SearchBot, ChatGPT-User over the last 30 days.Frequent 200s. Investigate any 5xx, timeout, 4xx, or zero hits.
2. In-product fetch (30 sec)Paste URL into ChatGPT Search, ask for summary with citation.Accurate summary with URL cited. Wrong summary = parse issue. "Cannot access" = blocked.
3. Citation queries (weekly)Run a fixed 10–30 query set; record whether your domain is cited per query.Citation rate per query as a time series. Trend matters more than absolute count.

Frequently asked questions

What is the fastest test that tells me something useful?

Open ChatGPT with web search enabled, paste your URL, and ask "fetch this URL and summarise what the page is about, citing the page where appropriate." If ChatGPT returns an accurate summary with the URL cited as a source, the page is reachable today. If it says it cannot access the page, returns a wrong paraphrase, or summarises stale content, you have a reachability or parsing problem. This takes 30 seconds and is the highest-signal single test.

Which ChatGPT mode should I use for the test?

ChatGPT Search (web access enabled, not pure model knowledge). The distinction matters: pure model-knowledge responses can summarise from training data without fetching the page live, so "ChatGPT knows about my page" is not the same as "ChatGPT can fetch my page right now." Force the live-fetch path by pasting the URL with an explicit instruction to read the URL and summarise.

How do I grep server logs for OpenAI bots?

Three user agents to look for: GPTBot, OAI-SearchBot, ChatGPT-User. A simple grep pattern like `grep -E "GPTBot|OAI-SearchBot|ChatGPT-User" /var/log/nginx/access.log` surfaces relevant hits. Look at the date range, frequency, and response codes. Frequent 200s = healthy crawl. Frequent 5xx or timeouts = your server is silently failing AI bots. Zero hits = robots.txt block, CDN/WAF rule, or low priority from the engine.

How do I run the citation test against multiple queries?

Pick 10–30 queries your most important buyers ask. Examples for B2B: "best [your category] in [your geography]", "[competitor] alternatives", "how to [problem your product solves]". Run each against ChatGPT Search and record whether your domain appears in the cited sources. Repeat weekly. The output is a citation-rate-per-query time series specific to ChatGPT. Promagen Sentinel automates this for a fixed query set on Monday mornings.

What if all three tests pass but I still don't get citations?

Reachability and citation eligibility are necessary but not sufficient. Pages can be reachable and parse correctly but lose at the retrieval-ranking stage. Common causes: thin or off-topic content for the query intent, weaker authority than competitors for the topic, missing freshness signals (updated dates on stale pages), or simply lower priority at the engine's internal ranking that you cannot directly observe. The fix is content depth and topic authority, not technical hygiene at that point.

Do these tests work for Claude, Perplexity, Gemini too?

Yes — same three layers, different log-visible bots and answer surfaces. Log-visible HTTP UAs to grep for: Claude (ClaudeBot, Claude-User, Claude-SearchBot), Perplexity (PerplexityBot, Perplexity-User), Gemini-side (Googlebot is the crawler; Google-Extended is a robots.txt usage-control token, not a log-visible UA, and is audited via robots.txt review). Answer surfaces: Claude.ai with web access, Perplexity default mode, Gemini app, AI Overviews / AI Mode in Google Search. The methodology is engine-agnostic; per-engine test details are documented in /can-ai-engines-find-my-website.

Get a free Sentinel snapshot →

OpenAI bot names (GPTBot, OAI-SearchBot, ChatGPT-User) reference platform.openai.com/docs/bots. Test methodology describes operational behaviour observable in ChatGPT's product UI; no claim is made about ChatGPT's internal retrieval or ranking algorithms. ChatGPT is a trademark of OpenAI. Promagen Ltd is independent of OpenAI.

provenance: sha256:56eb5c0094b4b7bd