What Is llms.txt?

llms.txt is a Markdown file placed at the root of a website (/llms.txt) that gives large language models a curated, clean map of the site's most important content. Proposed by Answer.AI co-founder Jeremy Howard in September 2024, it exists because LLM context windows are too small to ingest an entire website and raw HTML is full of navigation, ads, and scripts that bury what a model actually needs (Answer.AI, 2024). It is a proposed convention, not a ratified standard, and adoption by AI systems remains limited.

How llms.txt Works

The format is deliberately simple. A valid llms.txt opens with an H1 carrying the site or project name (the one required element), followed by a blockquote that summarizes what the site is, then optional H2 sections that each hold a bulleted list of links in [name](url): optional note form (llmstxt.org). Because it is plain Markdown, both people and models can read it without a special parser.

The spec also asks sites to publish a clean Markdown version of each page at the same URL with .md appended (for example, /pricing.html.md), so a model that follows a link from llms.txt lands on prose instead of a rendered HTML page (llmstxt.org). A separate community convention, /llms-full.txt, concatenates a site's full documentation into a single file. That name comes from tools and adopters such as Mintlify, not from Howard's original spec, which instead defines tool-generated context files (Answer.AI, 2024).

llms.txt vs robots.txt and sitemap.xml

These three root files do different jobs. robots.txt controls access, telling crawlers what they may and may not fetch, and the major AI crawlers honor it. sitemap.xml lists every URL on a site so search engines can discover and index them. llms.txt does neither. It is a curated, hand-picked subset of clean content meant for a model to read at inference time, not an access rule and not an exhaustive index (Search Engine Land, 2025).

Use Cases

  • Documentation sites. The clearest fit. Dev-tool docs expose an llms.txt so a coding assistant can pull accurate API references instead of guessing. Mintlify auto-generates one for the docs sites it hosts, which is part of why so many developer tools have one (Ahrefs, 2026).
  • Curating what a model sees. A site can point models at canonical, current pages and leave out duplicate, thin, or outdated URLs.
  • Cheaper context loading. Pointing an assistant at one clean file costs fewer tokens than feeding it a crawl of rendered HTML.
  • AI search and answer optimization. Teams adopt it hoping to shape how assistants summarize their brand, as part of the broader generative engine optimization effort.

Best Practices

Keep expectations grounded first. As of mid-2026, llms.txt is not an official standard and the major AI systems have not confirmed using it. Google's Gary Illyes said Google "doesn't support llms.txt and isn't planning to" (Search Engine Land, 2025), and John Mueller noted that "no AI system currently uses llms.txt" (Search Engine Roundtable, 2025). Ahrefs found that of roughly 38,000 domains with a valid file, 97% received zero requests for it in May 2026 (Ahrefs, 2026). Publish one because it is cheap and well-formed content never hurts, not because it guarantees AI traffic.

When you do write one:

  • Lead with a tight blockquote summary and link only to your best, current pages.
  • Serve a clean .md version of each linked page so a model that follows a link gets prose, not a JavaScript shell.
  • Keep it in sync with the site. A stale llms.txt is worse than none.
  • Do not treat it as access control. If you need to allow or block AI crawlers, that still belongs in robots.txt and your User-Agent rules, not llms.txt.

The underlying principle, that models work better on clean Markdown than on raw HTML, is also why retrieval pipelines increasingly fetch pages as Markdown rather than parsing a rendered DOM. Massive's Web Render API returns any public page in format=markdown for exactly this reason, and Massive's own documentation publishes an llms.txt index plus per-page .md variants.

Conclusion

llms.txt is a low-cost, sensible idea: hand models clean, curated Markdown instead of making them parse a whole site. Whether it becomes load-bearing depends on AI providers choosing to read it, which most have not done yet. Treat it as good hygiene for an AI-readable web, not as a ranking lever.

Frequently Asked Questions

No. It is a proposed convention published at llmstxt.org by Answer.AI's Jeremy Howard in 2024. No standards body has ratified it and no major AI provider has formally adopted it (Search Engine Journal, 2026).

Not in any confirmed way as of mid-2026. Google says it does not use the file, and an Ahrefs study found 97% of domains with a valid llms.txt got zero requests for it in May 2026 (Ahrefs, 2026).

robots.txt controls which crawlers may access which paths. llms.txt does not control access at all. It points models to a curated set of clean content to read, so use robots.txt, not llms.txt, to allow or block AI bots.

Place it at your domain root as /llms.txt, written in Markdown: an H1 site name, a blockquote summary, then H2 sections listing your key links (llmstxt.org).

Mostly documentation-heavy tech companies. Anthropic, Cloudflare, Mintlify, and Tinybird publish one, and Mintlify auto-generates them for the docs sites it hosts (Ahrefs, 2026).