{"pageUrl":"https://promagen.com/ai-crawler-behaviour-benchmark-2026-q2","lastModified":"2026-05-10","provenanceHash":"sha256:ec288a2a1d4a0cb133bf5d42c2795ceff999d85daf36c14a038b9876b5f20f1e","provenanceNote":"Bot user agents and documented robots.txt behaviour are sourced from each vendor's published crawler documentation as of 2026-05-10. Q2 2026 observations describe Sentinel's server-log measurements across a fixed monitoring set; observed crawl patterns are not vendor-confirmed contracts. The autonomous-crawler vs user-triggered-fetcher distinction follows OpenAI and Perplexity's own bot documentation. The robots.txt-only token category (Google-Extended, Applebot-Extended) follows Google and Apple's own documentation.","claims":[{"id":"claim-twelve-named-bots","statement":"Twelve named AI crawler user agents are documented across the four major AI engines plus Microsoft, Apple, and Common Crawl: GPTBot, OAI-SearchBot, ChatGPT-User (OpenAI); ClaudeBot, Claude-User, Claude-SearchBot (Anthropic); PerplexityBot, Perplexity-User (Perplexity); Googlebot (Google); Applebot (Apple); Bingbot (Microsoft); CCBot (Common Crawl). Each is documented at its vendor's public crawler docs.","evidenceUrl":"https://promagen.com/robots.txt","lastVerified":"2026-05-10","hash":"sha256:73f8fbde52e1118472b11356b73b9436952b22a4a1928b5975e075e9bba45d0b"},{"id":"claim-autonomous-vs-user-triggered","statement":"Autonomous crawlers (GPTBot, OAI-SearchBot, ClaudeBot, Claude-SearchBot, PerplexityBot, Googlebot, Applebot, Bingbot, CCBot) honour robots.txt allow/disallow rules per their vendor docs. User-triggered fetchers (ChatGPT-User, Claude-User, Perplexity-User) are documented as user-initiated; OpenAI's bot documentation calls out that robots.txt rules may not apply to user-initiated requests in the same way as to autonomous crawling, and Perplexity has stated Perplexity-User generally does not treat robots.txt as binding because the fetch is user-initiated.","evidenceUrl":"https://platform.openai.com/docs/bots","lastVerified":"2026-05-10","hash":"sha256:03595bff8410ce098dbb2e7953203eb061f40fe24fca31869b1540ab5470c013"},{"id":"claim-robots-txt-tokens-not-log-visible","statement":"Google-Extended and Applebot-Extended are robots.txt usage-control tokens, not log-visible HTTP user agents. Google documents Google-Extended as a control for whether crawled content may be used for Gemini Apps / Vertex AI Gemini training and grounding (the crawling itself is performed by Googlebot). Apple documents Applebot-Extended as a control for whether Apple may use Applebot-crawled data for Apple Intelligence (the crawling itself is performed by Applebot). Looking for either string in access logs returns zero hits even when the corresponding AI usage is allowed.","evidenceUrl":"https://developers.google.com/crawling/docs/crawlers-fetchers/google-common-crawlers","lastVerified":"2026-05-10","hash":"sha256:2f7bed8c0c183706fb90972712277596a01488af9036bbb27789905ce25d69a9"},{"id":"claim-google-extended-not-ai-overviews-control","statement":"Disallowing Google-Extended does not remove pages from Google Search, and does not by itself remove pages from AI Overviews / AI Mode in Search. AI Overviews and AI Mode are Search features controlled separately via Googlebot plus preview controls (nosnippet, data-nosnippet, max-snippet, noindex). The most common operator misconception in Q2 2026 remained treating Google-Extended as the AI Overviews kill switch.","evidenceUrl":"https://developers.google.com/search/docs/appearance/ai-features","lastVerified":"2026-05-10","hash":"sha256:0e5fd94031695796def0836434df9f327a626ebee7b2219156ff85c05166914b"},{"id":"claim-q2-observations-not-vendor-confirmed","statement":"Q2 2026 observed crawl patterns (hit rates, frequency shifts, ramp of OAI-SearchBot relative to GPTBot, multi-quarter slowdown of CCBot) are Sentinel's server-log measurements against a fixed monitoring set during April–June 2026. They are not vendor-confirmed contracts.","evidenceUrl":"https://promagen.com/sentinel/weekly","lastVerified":"2026-05-10","hash":"sha256:c8fa67f4bb8a9c7f0930680b1d2128c31d8cfe4bb252c0383c11bdfdf6797ced"}]}