What is llms.txt?The proposed machine-readable summary AI engines can ingest if they support the format.
llms.txt is a markdown file at site root that publishes a curated, machine-readable summary AI engines can ingest if they choose to support the format. This page explains the spec, gives an honest read of vendor-support status (largely unconfirmed at the major engines as of 2026), and shows where llms.txt fits next to robots.txt and sitemap.xml — without overstating what it can do for visibility on its own.
By Martin Yarnold · Updatedllms.txt next to robots.txt and sitemap.xml
The three artefacts solve different problems. A site that takes AI visibility seriously publishes all three. None replaces the others.
| Artefact | Audience | Contents | Cadence |
|---|---|---|---|
| robots.txt | All crawlers | Allow/Disallow rules, sitemap pointer, user-agent rules. | Changes rarely. |
| sitemap.xml | Search engines | Complete URL inventory with lastmod, priority, changefreq. | Regenerated on every deploy. |
| llms.txt | AI engines | Curated site summary, citation targets, pricing, contact, key pages. | Updated when site identity or top pages change. |
What goes in — and what doesn't
Include
Exclude
Frequently asked questions
How is llms.txt different from robots.txt?
robots.txt is access control — it tells crawlers which paths they may fetch. llms.txt is content — it tells AI engines what the site is, what it offers, and where to find the most important pages. They serve different layers: robots.txt is the gate, llms.txt is the introduction. Both should exist on a site that takes AI visibility seriously, and they don't replace each other.
How is llms.txt different from sitemap.xml?
sitemap.xml is an exhaustive list of URLs intended for search engine indexing. llms.txt is a curated, human-readable summary intended for AI engine ingestion. A site with 50,000 pages should have a sitemap.xml listing every one and an llms.txt naming only the 5–15 pages that matter most. Treat sitemap as machine inventory and llms.txt as editorial summary.
Which AI engines actually read llms.txt?
llms.txt is an emerging convention published in September 2024 by Answer.AI, not a proven universal crawler signal. Anthropic publishes an llms.txt-style documentation index at claude.com/docs/llms.txt — a notable public adoption of the format for their own docs — but official Anthropic crawler documentation describes ClaudeBot, Claude-User, and Claude-SearchBot using robots.txt as the access-control signal and does not document llms.txt as a guaranteed external crawl or retrieval input. No major engine publicly documents external llms.txt ingestion as a guaranteed signal as of 2026. Treat vendor commitments as emerging or unconfirmed unless vendor docs explicitly say otherwise. The practical recommendation still stands: it costs little to publish a well-formed llms.txt as structured context, and the upside if the format becomes canonical is real.
Can llms.txt be too long?
Yes. The spec suggests keeping it focused — a few hundred lines at most, with the most important content near the top. Engines that choose to consume llms.txt have finite attention; a 5,000-line file buries the signal even for engines that support the format. Aim for an executive summary at the top, then sections by audience (developers, researchers, buyers), each with 5–15 of the most useful page links. If you find yourself adding everything, you're writing a sitemap, not an llms.txt.
Should I include pricing in llms.txt?
For B2B SaaS, yes — pricing is one of the most common queries asked of AI engines about a product. If an engine consumes your llms.txt, an explicit pricing section gives it a stable, controlled fact to draw on; this assumes engine-side support, which remains emerging. For commerce, no — pricing changes too often. The rule of thumb: include facts that are stable for the file's natural refresh cadence (typically monthly). Anything that changes faster belongs in a database the engine can't see anyway.
Does llms.txt replace JSON-LD structured data?
No. They're complementary. JSON-LD structured data lives on individual pages and describes what each page is about — a well-established, broadly supported signal. llms.txt proposes a site-level summary at root and is much newer, with vendor support still emerging. Engines that support both can use them together: structured data to parse the page being read, llms.txt to understand which pages on the site are worth reading. Schema is the load-bearing signal today; llms.txt is the optional layer on top.