The AEO Report

Original research

No. 07 · Discoverability · Original scrape data

llms.txt in the wild.

What 62 popular dev, SaaS, AI, and SEO sites are actually shipping — and the three things 90% are getting right.

David Corey, Editor Published 2026-05-21

Scraped llms.txt from 62 well-known tech sites — AI labs, dev tools, SaaS platforms, SEO incumbents. The goal: see what's actually being shipped now that Google Lighthouse 13.3 added llms.txt as a formal audit under a new "Agentic Browsing" category.

Important framing up front. The Lighthouse audit is for agentic browsing readiness — not Google Search ranking. Google Search has separately stated llms.txt isn't needed for AI features. So this isn't "Google now uses llms.txt for SEO." It's "Google now formally measures whether your site is legible to AI agents." Different surface, different optimization, same file.

Adoption rate: 63%. 39 of the 62 sites had a valid llms.txt file. Higher than expected. Solid coverage among AI labs and dev tool companies. Notable gaps among legacy SEO incumbents (SemRush, Backlinko, SEJ — none had a working file).

The numbers

Adoption is solid. Quality varies wildly.

Adoption rate
39 of 62 sites (63%) have a valid llms.txt
Has the > summary blockquote
68% — meaning 32% silently violate the spec's most basic structural requirement
Has at least one ## section
92% — most files have at least some topic structure
Has an H1
90%
Zero structure (bare list of links)
7% — rarer than expected, but the offenders are large sites
Size range
546 chars (Continue.dev) → 284,256 chars (PostHog) — a 520× variance
Median size
~12K chars

The good ones

Five files that nail it.

Each of these treats llms.txt as a curated map for an agent — not a sitemap dump.

  1. docs.railway.app — 37K chars · 15 sections · 236 links

    Every link carries a one-sentence description of what an agent will find at that URL. 99.6% description ratio. Reads like a hand-curated guide, not an export.

  2. clerk.com — 19K chars · 5 sections · 78 links

    Tight, complete, 100% described. Proof that you don't need 100K chars to ship a useful file. Five sections is enough when each one earns its place.

  3. planetscale.com — 100K chars · 3 sections · 503 links

    Big file, but organized. 75% description ratio. Shows that "large" isn't the same as "lazy" — they grouped 500 links under 3 well-named sections rather than dumping them flat.

  4. frase.io — 10K chars · 10 sections · 62 links

    Short, focused, 100% described. The cleanest small-file example in the sample.

  5. docs.stripe.com — 93K chars · 26 sections · 528 links

    Follows Stripe's docs information architecture exactly. 92% description ratio. The file teaches the agent the same mental model Stripe's human readers use.

The weak ones

Three patterns to avoid.

1. The wall of bare links

Sentry ships a 20K-char file with zero ## sections and only 17 links. Moz ships 61K chars, zero sections, 139 bare links. Both files look complete by size, but to an AI agent crawling for navigation, they're barely better than a sitemap with markdown wrappers. No curation, no descriptions, no signal about what matters.

2. The too-thin file

Continue.dev's file is 546 characters total — basically just a description and 5 links. Probably auto-generated from minimal config and forgotten about. Buttondown ships a 996-char file with no sections. These look like the team enabled an "llms.txt generator" plugin and never refined the output.

3. The silent spec violation

32% of files in this sample skip the > summary blockquote. That's the line the original spec defines as the single-sentence description of your whole site — what an agent pulls when it has space for only one quote. Skipping it means a spec-respecting agent has nothing to quote, and may fall back to the meta description or the first <p> tag, which is rarely what you'd choose.

Practical takeaways

Five rules if you're shipping or refactoring an llms.txt.

  1. 01

    Include the > blockquote right under the H1. Make it dense. Assume the agent uses only this line.

  2. 02

    Group links under ## headers by topic. 5–15 sections is the sweet spot. Zero sections = bare list. 50+ sections (PostHog ships 55) creates navigation overhead.

  3. 03

    Every link gets a one-sentence description unless the title alone is unambiguous (comparison pages, vendor pairs). Aim for 80%+ description ratio.

  4. 04

    Keep total size under ~30K chars unless you have a genuine docs-tree reason to go big. Agents may not parse the whole file. Front-load value.

  5. 05

    No HTML, no JavaScript, no markdown tables. Plain markdown links + descriptions. The spec is intentionally minimal. Don't over-engineer.

What we don't yet know

Three honest open questions.

Does file size matter at the upper end?

PostHog ships 284K chars — the largest in the sample. They're widely cited in AI search results. But causation vs. correlation is unclear — they're cited because of their content, not necessarily their llms.txt structure.

Is the > blockquote actually pulled by any major LLM today?

We don't have observable evidence. The spec defines it, Lighthouse audits for the H1 but not specifically the summary blockquote. Whether ChatGPT, Claude, or Perplexity actually use that line is open.

Is one big file better than a hub plus per-section files?

Several big docs sites are now experimenting with the split approach (/docs/llms.txt, /api/llms.txt). No clear winner. The spec doesn't address it. We may know more once Lighthouse adds more audits to the Agentic Browsing category.

The dataset

Raw scrape data — 39 valid files.

Probed via curl against https://{domain}/llms.txt with a 6-second timeout. Validated as HTTP 200, content greater than 100 characters, content does not start with < (filters HTML 404s). Sections counted as lines matching ^## . Description ratio computed as the percentage of bullet lines containing a : after the link (a proxy for "has a description"). Licensed CC0 — use freely.

domain chars sections links summary
docs.railway.app37,21415236yes
planetscale.com100,0383503yes
clerk.com19,087578yes
frase.io9,6931062yes
docs.stripe.com93,23826528no
nuxt.com51,9164320yes
neon.tech27,72119185yes
posthog.com284,256552,594yes
python.langchain.com200,23821,384yes
docs.anthropic.com166,61331,544no
vercel.com166,728
mantine.dev41,18212414no
weaviate.io36,687
pinecone.io35,65112242yes
bun.sh33,2722317no
sentry.io20,420017no
athenahq.ai12,329278yes
scrunchai.com12,307581no
qdrant.tech11,748261yes
cursor.com9,47620167no
linear.app9,4432141yes
framer.com8,148640yes
intercom.com8,0471463yes
nextjs.org7,464434yes
notion.com6,930749yes
resend.com5,6011445yes
composio.dev4,7591226no
honeycomb.io4,7341318yes
amplitude.com2,77476no
prisma.io2,409213yes
crewai.com2,160427yes
otterly.ai2,033413yes
svelte.dev1,676
railway.app1,40539yes
supabase.com1,258219no
continue.dev54615yes
moz.com61,8820139no
tryprofound.com13,019
buttondown.email99604no

Dashes (—) indicate the file was valid but our parser failed on that specific metric (typically because the file format threw off the regex assumptions). The file itself is still listed because it counts as adopted.

Methodology

Probed 62 well-known dev tool, SaaS, AI lab, and SEO platform domains via plain curl with a 6-second timeout. Each successful response was validated as HTTP 200, content length greater than 100 characters, and not starting with < (which would indicate an HTML 404 page rather than a real markdown file). Section count is the number of lines matching ^## . Link count is the number of https?:// matches in the file. Description ratio is the percentage of ^- bullet lines containing a colon after the link, used as a proxy for "this link has a description." All measurements are point-in-time on May 21, 2026, and may not reflect current state if a site has updated their file since.

This dataset is published under the CC0 1.0 license — public domain, no attribution required. Use freely.