The llms.txt template — copy, paste, adapt
Here is the canonical structure, matching the official llms.txt spec exactly. Save it as a file named `llms.txt`, fill in your details, and upload it to the root of your domain so it resolves at `https://yourdomain.com/llms.txt`. The only part you truly must include is the H1 title on the first line — everything below it is optional.
The blank template (matches the spec)
`# Title` — then optionally `> Optional description goes here`, then optional free-text details, then optional `## Section name` lists where each entry is `- Link title: Optional link details`. A section literally named `## Optional` is special: AI tools may skip those URLs when they need a shorter context. That is the whole format.
A filled-in example you can adapt — point the links at your real, clean pages (docs, pricing, policies). Plain `.md` versions of those pages are ideal, because the whole point is to hand machines content that is easy to parse:
A worked example
`# Acme Co` · `> Acme Co builds invoicing software for freelancers. Plans start at $9/mo.` · then a `## Docs` list — `- Getting started: 5-minute setup guide`, `- API reference: full REST endpoint list`, `- Pricing: current plans and limits` · and a `## Optional` list — `- Changelog: release history`. Lead with your clearest, most citable pages first.
- 1
Write one H1 with your site name
The first line is `# Your Brand`. This is the only required section — a file with just this line is already valid.
- 2
Add a one-line summary (recommended)
A blockquote — `> What you do, who it is for, and the single most important fact (e.g. pricing).` This is the context an AI tool reads first.
- 3
List your most important pages under H2 sections
Group links by theme (`## Docs`, `## Product`, `## Policies`). Each line: `- Page name: short note`. Link to clean, canonical pages — ideally `.md` versions.
- 4
Put the nice-to-haves under `## Optional`
Anything an AI can safely skip when space is tight (changelogs, archives) goes here. This is a real part of the spec, not a convention.
- 5
Upload it to your domain root
It must resolve at `/llms.txt` — the same place `robots.txt` lives. That is it.
Want the deeper explainer on the format, where it came from, and how it differs from robots.txt and sitemaps? See the full what is llms.txt definition.
What llms.txt actually is (and isn’t)
llms.txt was proposed by Jeremy Howard of Answer.AI on September 3, 2024. The idea is simple: large language models work with a limited context window, and most web pages are cluttered with navigation, scripts and ads that waste it. llms.txt gives an AI a curated, flattened index — in clean Markdown — of the pages you most want it to read, so it spends its attention on your actual content instead of your chrome.
It is easy to confuse with two files you already know. It is neither:
| File | Job | Audience |
|---|---|---|
| robots.txt | Grants or denies crawler access — the on/off switch | All crawlers (search + AI) |
| sitemap.xml | Lists every URL for link-following crawlers to discover | Search engine indexers |
| llms.txt | Recommends your priority content as a curated prose map | AI tools that read Markdown context |
It points; it does not gatekeep
robots.txt controls whether a bot may read you. A sitemap lists everything so a crawler can follow links. llms.txt does the opposite of both: it recommends a short, prioritized set of clean pages for a machine that reads prose. It has no enforcement power — it is a courtesy index, not a permission system. If AI-crawler access is your real concern, that is a robots.txt job, not an llms.txt one.
The honest truth: Google doesn’t use it, and pickup is low
Most llms.txt generators sell the file as an SEO win. It is not one, and pretending otherwise wastes your time. Here is what the people who run the engines and the people who studied the data actually say.
Google has said no — by design
At Google Search Central Live in July 2025, Gary Illyes said Google will not crawl or use llms.txt. John Mueller compared it to the long-discredited `keywords` meta tag — a signal the site owner controls, and therefore one that is trivially gamed and easily ignored. Google’s position is that "normal SEO" is what feeds AI Overviews and AI Mode, not a separate AI file.
AFAIK none of the AI services have said they’re using llms.txt … to me it’s comparable to the keywords meta tag.
Adoption and impact are still thin
When SE Ranking studied roughly 300,000 domains, only about one in ten had an llms.txt at all — and crucially, having one showed no measurable link to getting cited by AI:
10.13%
of ~300,000 domains had an llms.txt — with NO correlation found between its presence and AI citations. GPTBot fetches the file occasionally, but "not often." (SE Ranking)
None of this means the file is worthless. It means you should ship it for the right reason, with the right expectations — not as a ranking hack.
Where llms.txt genuinely helps right now
The real, current value of llms.txt is not in consumer AI search — it is in AI coding assistants and agents. Tools like Cursor, Continue, Cline and other MCP-connected agents read llms.txt to load a clean, curated map of a product’s docs into their context. That is why the companies shipping llms.txt today are overwhelmingly developer-facing.
- **Your audience builds with AI dev tools.** If developers use Cursor or an MCP agent against your docs, an llms.txt gives them a fast, clean entry point — Anthropic, Cloudflare, Vercel, Stripe, Coinbase and Pinecone all ship one for exactly this reason.
- **You want clean, machine-readable context.** Even without guaranteed crawler pickup, a curated `/llms.txt` is a low-effort signal of intent and a tidy index you control.
- **It costs almost nothing.** Two minutes with the template above. The downside is essentially zero; the question is only whether it is the *highest-leverage* two minutes you could spend (it usually isn’t — see below).
Why structured, citable content is the real lever
The peer-reviewed GEO study (Aggarwal et al., KDD 2024) tested 10,000 queries and found that adding **quotations (+42.5%)**, **statistics (+32.8%)** and **citations (+27.9%)** lifted a source’s visibility in AI answers by up to ~40% — while keyword stuffing did nothing. That is about what is *in* your content, not which file points to it. A well-written llms.txt is only as useful as the clean, quotable, stat-rich pages it links to. Get those right first.
The higher-leverage move: make sure AI crawlers can reach you
Here is the thing an llms.txt cannot fix, and the thing that actually decides whether you show up in AI answers: **can the AI crawlers read your content at all?** A perfect llms.txt is pointless if your robots.txt is quietly blocking the bots that build the answer engines, or if your pages need JavaScript to render and the crawler only sees an empty shell.
- **robots.txt rules** — the single most common silent killer. Disallow OAI-SearchBot, Claude-SearchBot or PerplexityBot and you vanish from ChatGPT, Claude and Perplexity answers, llms.txt or not.
- **Blocked AI user-agents** — a firewall or CDN rule that drops AI crawlers does the same damage, and it usually happens without anyone deciding to.
- **Render-blocking** — if your content only appears after client-side JavaScript runs, many crawlers never see it. They read the HTML you ship, not the page a browser paints.
Before you spend time on llms.txt, check the thing that actually gates AI visibility. SourceWatch’s free single-page audit checks AI-crawler access — robots.txt rules, blocked AI user-agents, and render-blocking — in about 15 seconds, no card required.
Run a free AI-crawler access auditFix access first, then ship the llms.txt as a clean finishing touch. In that order, the two-minute file is a nice addition rather than a distraction from the work that moves the needle.
You shipped llms.txt. Now measure whether it changed anything
The reason most teams can’t tell whether llms.txt "worked" is that they never measured the outcome it’s supposed to influence — getting read and cited by AI. That is the gap SourceWatch is built to close, with two capabilities most tools in this space don’t have.
1. First-party AI traffic capture, verified — not guessed
Most AI-visibility tools estimate your reach by firing synthetic prompts at the models. SourceWatch does that across every engine, but it also measures the half almost no one captures: the real AI crawlers and AI-referral visitors hitting your own pages. Every bot and visitor is verified against each vendor’s **published IP ranges** before it counts — so when GPTBot or ClaudeBot fetches your `/llms.txt` or your docs, you see it as a fact, not a spoofable user-agent string. That is the only honest way to tell whether opening the door actually let the right bots in.
2. An MCP server for Claude Code
SourceWatch plugs into Claude Code through an MCP server, so your assistant can read your AI-visibility gaps, the most-cited sources in your category, and the real queries the models ran — then act on them by auditing pages and drafting answer-first content briefs, in the same loop. If you’re the kind of team that ships an llms.txt, you’re probably already working in an AI IDE; this meets you there.
Straight about scope
A live, automated llms.txt generator is on the way — for now this page hands you the template above. SourceWatch generates content **briefs, not finished drafts**. The public **REST API is coming soon**; today the programmatic surface is the MCP server. The free single-page audit checks one URL; a full-site scan and ongoing tracking run on the 14-day trial (card optional, unlimited seats). No ranking guarantees, no fake ROI promises.