llms.txt, defined
An LLM's context window is too small to swallow your entire website, and raw HTML — packed with navigation, ads and JavaScript — is messy to read. llms.txt solves both problems by letting *you*, the site owner, say "here is the good stuff, already curated." It's a plain Markdown file that lists your most important pages, each with a short description, so a model answering a question about your category has a clean starting point instead of guessing from scraped HTML.
It was proposed in September 2024 by Jeremy Howard, and the spec lives at llmstxt.org. Critically, it's aimed at **inference time** — the moment an AI engine like ChatGPT, Perplexity or Claude is composing an answer — not at training the model or at getting you indexed in Google.
A treasure map, not a fence
Search Engine Land put it well: llms.txt isn't robots.txt. robots.txt is a fence that blocks or permits crawlers. llms.txt is a treasure map that says "start digging here." It recommends; it never restricts.
What an llms.txt file looks like
The format is deliberately Markdown, not XML, so both humans and models can read it. The spec defines a strict order, but only the first element is required:
- 1
An H1 with your site or project name
The only required element. Everything below it is optional but recommended.
- 2
A blockquote summary
One short paragraph (prefixed with >) describing what your site or brand is.
- 3
Optional context sections
Plain Markdown paragraphs or lists adding detail a model would find useful — any block type except headings.
- 4
H2 "file list" sections
Bulleted links in the form Page name: a one-line note — your curated list of best pages.
One section name carries special meaning. An `## Optional` H2 marks links a model **can safely skip** when it needs a shorter context. It's the only section name the spec gives defined semantics.
llms.txt vs llms-full.txt
A companion convention, llms-full.txt, concatenates your entire documentation into one large Markdown file — handy for pasting straight into a coding assistant. llms.txt is the curated index of links; llms-full.txt is the whole library. Many docs platforms also let you append .md to a page URL (e.g. /pricing → /pricing.md) to serve a clean Markdown version of that page.
How to create one
It's a plain text file — you can write it in any editor, or generate it. The hard part isn't the syntax; it's the curation. Resist the urge to dump every URL. Hand-pick the 5–10 pages that actually define your brand:
- Your **homepage** and **about/company** page — who you are.
- Your **pricing** page — how you're bought.
- Your **core product or feature** pages — what you do.
- Your **key docs or guides** — how you're used.
- For each, write a **one-line description** so the model knows why it matters.
Save it as `llms.txt` and place it at your domain root so it resolves at `https://yourdomain.com/llms.txt`. That's it — curation beats completeness every time.
Don't want to hand-write it? Run a free, one-page AI audit — it checks your AI-crawler access, entity recognition and answer-readiness, the things that actually move AI visibility, in about 15 seconds.
Run a free AI auditllms.txt vs robots.txt vs sitemap.xml
These three files are easy to confuse, but they do completely different jobs. They coexist — llms.txt replaces neither of the others.
| File | Its job | In one word |
|---|---|---|
| robots.txt | Tells crawlers which paths they may or may not access | Exclusion |
| sitemap.xml | Lists every indexable URL so search engines discover them all | Discovery |
| llms.txt | Points AI models to your best content for answering questions | Curation |
robots.txt and sitemap.xml are about *access* and *indexing for search*. llms.txt is about *understanding and curation for AI answers*. robots.txt blocks or permits; llms.txt only recommends — and unlike robots.txt, it has no enforcement. An AI provider can ignore it entirely. If you actually want to control which bots reach your site, that's a job for robots.txt and AI-crawler rules, not llms.txt.
Does llms.txt actually work? A reality check
This is where honesty matters more than hype. As of early 2026, the evidence for llms.txt moving the needle is thin — and you should know that before you over-invest.
- **Adoption is low and AI bots rarely fetch it.** In an SE Ranking study of roughly 300,000 domains, only about 10% had an llms.txt file. Across 62,000+ AI-bot visits, the file was targeted in only about 0.1% of them — major bots like GPTBot, ClaudeBot and PerplexityBot showed essentially no requests for it.
- **No measured citation lift.** A 10-site before/after study found no independent effect of llms.txt on whether LLMs cited those sites; the gains that did appear traced back to content, PR and technical fixes, not the file.
- **Google doesn't use it.** Google's John Mueller has said no AI service has confirmed using llms.txt — and that you can tell from server logs they don't even check for it. Google's AI Overviews and AI Mode draw from the regular Search index, so the file does nothing for Google.
So why publish one?
Because it's cheap, low-risk infrastructure, and the strongest use case — documentation sites feeding coding assistants — genuinely works. Real adopters include Anthropic, Hugging Face, Perplexity, Zapier, Cursor and Windsurf. Publish it, keep your expectations grounded, and don't treat it as a ranking or citation lever.
How to tell if it's helping
Since the file itself has no proven citation effect, the only way to know whether *anything* you do is working is to measure the outcome that matters: are AI engines actually citing you? That means tracking your **mention rate** and **share of voice** across ChatGPT, Perplexity, Gemini and Claude, and watching the first-party AI-crawler and referral traffic landing on your site. SourceWatch measures exactly this — so you can see whether your AI visibility moves after you publish an llms.txt, instead of assuming it did. If you'd rather skip the manual file, our llms.txt generator builds a clean, curated one from your site in seconds.