Rotate for Promagen

Promagen is built for landscape viewing. Turn your phone sideways for the best experience.

Can AI engines find my website?A three-layer practical check.

AI visibility starts with crawl reachability. Before you optimise content, schema, or anything else, prove the engines can actually reach your pages. This guide gives a three-layer test you can run today (server logs, in-product fetch, citation queries) plus a per-engine table of the documented bot names and answer-surface tests.

By Martin Yarnold · Updated
Reachability, weekly
Sentinel runs the log-side and structural reachability checks every Monday and flags any engine that stopped reaching your pages.
See how Sentinel measures it →

Per-engine test methods

Bot names below are sourced from each vendor's published crawler documentation. Answer-surface tests describe operational behaviour observable in each engine's UI; engines do not publish their citation algorithms, so these are practical tests against output, not claims about internal ranking.

EngineBots in logsAnswer-surface test
ChatGPTGPTBot, OAI-SearchBot, ChatGPT-UserWeb-search mode: paste URL, ask for summary. Check for source attribution in answer.
ClaudeClaudeBot, Claude-User, Claude-SearchBotEnable web access (claude.ai). Paste URL, ask for summary. Look for inline citation.
PerplexityPerplexityBot, Perplexity-UserDefault mode: query targeted at content. Numbered citations appear inline; check for URL.
GeminiGooglebot (no separate Gemini UA)Gemini app: ask about content. Check for source card. Google-Extended is a robots.txt control for Gemini training/grounding, not a log-visible bot.
Apple IntelligenceApplebot (no separate Apple-Intelligence UA)Applebot-Extended is a robots.txt control for Apple Intelligence usage, not a log-visible bot.
Bing / CopilotBingbotBing search → "Copilot Search". Sources panel shows cited URLs.

Three layers, in order of cost

Layer 1 — Server logs

Grep your access logs for the published bot user agents. Presence proves crawl. Absence is diagnostic — check robots.txt, CDN, WAF.

Layer 2 — In-product fetch

Paste a URL into each engine in web-search mode and ask for a summary. Accurate summary = reachable. Wrong summary = parsing issue. "Cannot access" = blocked.

Layer 3 — Citation queries

Ask the engine targeted questions about your domain/product. Cited = engine has indexed and trusts the page. Not cited = page reachable but not retrieval-ranked.

Frequently asked questions

How do I know whether an AI engine has found my site at all?

Three layers, in order of cost. First: check server logs for user-agent strings matching the engines' published log-visible bot names (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Applebot, Bingbot, CCBot, Googlebot). Presence in logs proves crawl. Robots.txt usage-control tokens — Google-Extended (Gemini training/grounding) and Applebot-Extended (Apple Intelligence usage) — are NOT log-visible UAs and should be audited separately via robots.txt review. Second: ask the engine in its own product UI — paste a URL and ask "what is this page about" — if it summarises the actual content it parsed the page successfully. Third: run targeted queries about your brand and product and see whether the engine cites your domain. All three together form a coverage check that survives a single engine misbehaving.

Which exact bot user agents should I look for in server logs?

For ingestion testing: GPTBot, OAI-SearchBot, ChatGPT-User (OpenAI); ClaudeBot, Claude-User, Claude-SearchBot (Anthropic); PerplexityBot, Perplexity-User (Perplexity); Applebot (Apple); Bingbot (Bing search + Copilot retrieval); CCBot (Common Crawl, used by many AI labs); Googlebot (Google Search indexing — and the same UA underlies AI features in Search). Note: Google-Extended and Applebot-Extended are NOT log-visible user agents — they are robots.txt usage-control tokens with no HTTP fetcher of their own. Looking for them in access logs returns zero hits even when the corresponding AI usage is allowed.

I see zero AI bot hits in my logs. What's wrong?

Five common causes. (1) robots.txt blocks the bot — most common, easy fix. (2) CDN or WAF rate-limits or fingerprints AI bots — check Cloudflare, Vercel firewall, AWS WAF rules. (3) Server returns 5xx or times out for AI crawlers specifically (some bots are less patient than Googlebot — 5–10 second hard timeout). (4) The bot has not yet discovered your domain — check whether you have any inbound links from sites the engine already crawls. (5) The engine genuinely does not crawl small-to-medium sites at high frequency — log volume of 0 is occasionally just low priority.

How do I test whether ChatGPT can see a specific page right now?

Open ChatGPT, ensure web search is enabled, paste the full URL, and ask "fetch this URL and summarise what the page is about, citing the page where appropriate." If the engine produces an accurate summary with the URL cited, the page is reachable. If it paraphrases incorrectly or says it cannot access the page, the page is invisible to ChatGPT in that moment. Repeat for the other engines using their respective web-search modes. None of this is officially documented retrieval behaviour — it is an operational test that reflects whatever the engine actually does at query time.

How often should I run these checks?

Monthly at minimum, weekly if AI citations are part of your revenue model. AI engines re-index on their own cadence, change retrieval ranking quietly, and run training cuts that drop sources without warning. A page that passed all three checks in January can fail all three in April. Promagen Sentinel automates the log-side and structural checks on a weekly cadence; the query-side check is harder to automate because each engine's UI changes frequently and the answer surface drifts.

Which AI engine is easiest to verify against?

Perplexity, by design. Perplexity's answer interface shows numbered source citations on every answer, so a "did Perplexity find my page" test is unambiguous — either the URL appears as a numbered source or it does not. ChatGPT (in Search mode) shows source attribution but inconsistently. Claude shows citations when invoked with retrieval. Gemini cites in AI Overviews more visibly than in the standalone Gemini app. Start with Perplexity for the clearest signal, then extend to the others.

Get a free Sentinel snapshot →

Bot names are sourced from each vendor's published crawler documentation. Answer-surface test methods describe observable behaviour and do not claim any engine's internal retrieval algorithm. ChatGPT, Claude, Perplexity, Gemini, Bing, Copilot are trademarks of their respective owners. Promagen Ltd is independent of these companies.

provenance: sha256:77385e4e1516aa62