What is llms.txt?The proposed machine-readable summary AI engines can ingest if they support the format.

llms.txt is a markdown file at site root that publishes a curated, machine-readable summary AI engines can ingest if they choose to support the format. This page explains the spec, gives an honest read of vendor-support status (largely unconfirmed at the major engines as of 2026), and shows where llms.txt fits next to robots.txt and sitemap.xml — without overstating what it can do for visibility on its own.

By Martin Yarnold · Updated 10 May 2026

See ours

Promagen publishes its own llms.txt at promagen.com/llms.txt. Sentinel measures llms.txt presence as part of its weekly audit.

See how Sentinel measures it →

llms.txt next to robots.txt and sitemap.xml

The three artefacts solve different problems. A site that takes AI visibility seriously publishes all three. None replaces the others.

Artefact	Audience	Contents	Cadence
robots.txt	All crawlers	Allow/Disallow rules, sitemap pointer, user-agent rules.	Changes rarely.
sitemap.xml	Search engines	Complete URL inventory with lastmod, priority, changefreq.	Regenerated on every deploy.
llms.txt	AI engines	Curated site summary, citation targets, pricing, contact, key pages.	Updated when site identity or top pages change.

What goes in — and what doesn't

Include

One-paragraph site description. Pricing (B2B SaaS). Top 5–15 public pages with titles + descriptions. Citation guidance for AI engines. Contact link. Sitemap + robots pointers.

Exclude

Complete URL inventory (use sitemap.xml). Access rules (use robots.txt). Per-page content (use page schema). Time-sensitive announcements that go stale faster than the file refresh.

Frequently asked questions

How is llms.txt different from robots.txt?

robots.txt is access control — it tells crawlers which paths they may fetch. llms.txt is content — it tells AI engines what the site is, what it offers, and where to find the most important pages. They serve different layers: robots.txt is the gate, llms.txt is the introduction. Both should exist on a site that takes AI visibility seriously, and they don't replace each other.

How is llms.txt different from sitemap.xml?

sitemap.xml is an exhaustive list of URLs intended for search engine indexing. llms.txt is a curated, human-readable summary intended for AI engine ingestion. A site with 50,000 pages should have a sitemap.xml listing every one and an llms.txt naming only the 5–15 pages that matter most. Treat sitemap as machine inventory and llms.txt as editorial summary.

Which AI engines actually read llms.txt?

llms.txt is an emerging convention published in September 2024 by Answer.AI, not a proven universal crawler signal. Anthropic publishes an llms.txt-style documentation index at claude.com/docs/llms.txt — a notable public adoption of the format for their own docs — but official Anthropic crawler documentation describes ClaudeBot, Claude-User, and Claude-SearchBot using robots.txt as the access-control signal and does not document llms.txt as a guaranteed external crawl or retrieval input. No major engine publicly documents external llms.txt ingestion as a guaranteed signal as of 2026. Treat vendor commitments as emerging or unconfirmed unless vendor docs explicitly say otherwise. The practical recommendation still stands: it costs little to publish a well-formed llms.txt as structured context, and the upside if the format becomes canonical is real.

Can llms.txt be too long?

Yes. The spec suggests keeping it focused — a few hundred lines at most, with the most important content near the top. Engines that choose to consume llms.txt have finite attention; a 5,000-line file buries the signal even for engines that support the format. Aim for an executive summary at the top, then sections by audience (developers, researchers, buyers), each with 5–15 of the most useful page links. If you find yourself adding everything, you're writing a sitemap, not an llms.txt.

Should I include pricing in llms.txt?

For B2B SaaS, yes — pricing is one of the most common queries asked of AI engines about a product. If an engine consumes your llms.txt, an explicit pricing section gives it a stable, controlled fact to draw on; this assumes engine-side support, which remains emerging. For commerce, no — pricing changes too often. The rule of thumb: include facts that are stable for the file's natural refresh cadence (typically monthly). Anything that changes faster belongs in a database the engine can't see anyway.

Does llms.txt replace JSON-LD structured data?

No. They're complementary. JSON-LD structured data lives on individual pages and describes what each page is about — a well-established, broadly supported signal. llms.txt proposes a site-level summary at root and is much newer, with vendor support still emerging. Engines that support both can use them together: structured data to parse the page being read, llms.txt to understand which pages on the site are worth reading. Schema is the load-bearing signal today; llms.txt is the optional layer on top.

Get a free Sentinel snapshot →

Rotate for Promagen