{"pageUrl":"https://promagen.com/what-is-an-ai-crawler","lastModified":"2026-05-10","provenanceHash":"sha256:e125f466f7493871c6c96b29658323567e7efb1c4beed76c12ef70ded7f1d162","provenanceNote":"AI crawler list is sourced from Promagen's own robots.txt configuration at frontend/src/app/robots.txt/route.ts. The listed fetcher bots and robots.txt usage-control tokens are explicitly named in the production robots.txt as of 2026-05-10.","claims":[{"id":"claim-bot-distinctions","statement":"AI engine companies operate multiple distinct crawler bots — e.g. OpenAI runs GPTBot for training, ChatGPT-User for user-triggered fetches, and OAI-SearchBot for ChatGPT Search citations. They must be controlled independently in robots.txt.","evidenceUrl":"https://promagen.com/robots.txt","lastVerified":"2026-05-10","hash":"sha256:47bf27ba3a924175825c3f3879727d4ccbf9b2ee81c6b00c72bd9730c131d180"},{"id":"claim-google-extended-distinct","statement":"Google-Extended is a robots.txt usage-control token (not an HTTP fetcher) that opts a site out of Gemini Apps generative training and Vertex AI grounding. It does NOT control Google Search's AI Overviews or AI Mode — those are governed by Googlebot plus snippet preview controls (nosnippet, max-snippet, data-nosnippet). Disallowing Google-Extended does not remove pages from Google Search.","evidenceUrl":"https://developers.google.com/search/docs/crawling-indexing/google-extended","lastVerified":"2026-05-10","hash":"sha256:11071a4273e203386d26e61354bf6a743962b78cf143b2e86ce8214235fbdc72"},{"id":"claim-applebot-extended-distinct","statement":"Applebot-Extended is a robots.txt usage-control token (not an HTTP fetcher) that opts a site out of Apple Intelligence training. Disallowing Applebot-Extended does not block Applebot search and Siri crawling, which uses the separate Applebot user agent.","evidenceUrl":"https://support.apple.com/en-us/119829","lastVerified":"2026-05-10","hash":"sha256:22395ecbe6b4573a9087ba001d563afc477ef57bf10f1e8475a531425f9170ae"},{"id":"claim-blocking-training-costs-visibility","statement":"Blocking AI training crawlers (GPTBot, ClaudeBot, CCBot) removes site content from future model knowledge and from future AI engine answers about the site's domain. This is the opposite of AI visibility.","evidenceUrl":"https://promagen.com/sentinel/weekly","lastVerified":"2026-05-10","hash":"sha256:e66296cd08de42120f5e673e3d766d7e6bcfb64231e8fdd019605dfc4447b570"}]}