{"pageUrl":"https://promagen.com/which-ai-engines-respect-robots-txt","lastModified":"2026-05-10","provenanceHash":"sha256:feab70ed427b9a798bd1b07209221a922bd920d3ec228fbce4d50598912d919b","provenanceNote":"Bot names, vendor mapping, and robots.txt compliance status are sourced from each vendor's published crawler documentation. The two-bucket distinction (log-visible HTTP crawler UAs vs robots.txt usage-control tokens) follows Apple and Google's own documentation: Applebot-Extended and Google-Extended carry no HTTP fetcher of their own. Verify against current vendor docs before quoting in compliance or legal materials.","claims":[{"id":"claim-openai-autonomous-bots-robots","statement":"OpenAI publicly documents GPTBot and OAI-SearchBot as autonomous crawlers (training and ChatGPT Search retrieval respectively) that honour robots.txt allow/disallow rules.","evidenceUrl":"https://platform.openai.com/docs/bots","lastVerified":"2026-05-10","hash":"sha256:e6452070833c8ad1edc947cd263261b674fd456e2f25efe4ec85589fcf910d5b"},{"id":"claim-openai-chatgpt-user-fetcher","statement":"OpenAI publicly documents ChatGPT-User as a user-triggered fetcher invoked by user action in ChatGPT. OpenAI's bot documentation calls out that robots.txt rules may not apply to user-initiated requests in the same way as to autonomous crawling.","evidenceUrl":"https://platform.openai.com/docs/bots","lastVerified":"2026-05-10","hash":"sha256:1bb0d1b47e0d4be10f2d2e6ebfde5a8a4943581db718da67b6a42aec190ac2b4"},{"id":"claim-perplexity-bot-vs-user","statement":"Perplexity publicly documents PerplexityBot as its autonomous citation indexer, controlled by robots.txt allow/disallow directives. Perplexity-User is a user-triggered fetcher; Perplexity has stated that Perplexity-User generally does not treat robots.txt as binding because the request is user-initiated.","evidenceUrl":"https://docs.perplexity.ai/guides/bots","lastVerified":"2026-05-10","hash":"sha256:a9ee5328d89cdd2d1bfab490e1805b732832a893df108788038b85980f52e8d6"},{"id":"claim-google-extended-scope","statement":"Google documents Google-Extended as a robots.txt usage-control token (not a log-visible user agent) that controls whether crawled content may be used for Gemini Apps and Vertex AI Gemini training and grounding. The crawling itself is performed by Googlebot. Disallowing Google-Extended does not remove pages from Google Search; AI Overviews and AI Mode in Search are controlled separately via Googlebot plus preview controls such as nosnippet, data-nosnippet, max-snippet, and noindex.","evidenceUrl":"https://developers.google.com/crawling/docs/crawlers-fetchers/google-common-crawlers","lastVerified":"2026-05-10","hash":"sha256:197f611e17427390f96ef2d51dfc8c532cfec85c4ef8ad0557952778edadcaeb"},{"id":"claim-anthropic-bots-robots","statement":"Anthropic publicly documents ClaudeBot, Claude-User, and Claude-SearchBot as the named user agents that honour robots.txt, with site owners able to control crawler access via standard robots.txt directives.","evidenceUrl":"https://support.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler","lastVerified":"2026-05-10","hash":"sha256:03f6c4f361432427ab57fe351c03d75fd2b8f93c6805a9aec62cb89a87168b62"},{"id":"claim-applebot-extended-scope","statement":"Apple documents Applebot-Extended as a robots.txt-only directive (not a log-visible user agent) that controls whether Apple may use data crawled by Applebot for Apple Intelligence and generative-AI training. The crawling itself is performed by Applebot; Applebot-Extended carries no HTTP fetcher of its own.","evidenceUrl":"https://support.apple.com/en-us/119829","lastVerified":"2026-05-10","hash":"sha256:26e4d7cbc00038226a8feb5369b6294f5addf47212db7b76d7d0164848a2a968"}]}