How many popular dev and SaaS sites have an llms.txt file?

Across 62 well-known dev, SaaS, AI, and SEO sites scraped in May 2026, 39 had a valid llms.txt file — a 63% adoption rate. Coverage is highest among AI labs and dev tool companies; lowest among legacy SEO platforms.

What is the spec-required summary block in llms.txt?

A markdown blockquote line starting with > that comes directly after the H1. Per Jeremy Howard's September 2024 spec, this is the single-sentence summary an AI agent should pull as the description of the whole site. 32% of files in this scrape skip it.

Does file size matter for llms.txt?

Less than structural quality. The smallest valid file in this sample was 546 chars (Continue.dev), the largest 284,256 chars (PostHog) — a 520× range. Median is roughly 12K chars. Front-loading value into the first 5K chars matters more than total length.

What does the Google Lighthouse llms.txt audit check?

Lighthouse 13.3 added an Agentic Browsing category that checks four things: the file is present (marked N/A if missing), the file has an H1, the file is not too short, and the file contains at least one link. The audit is for agentic browsing readiness, not Google Search ranking.

What are the most common llms.txt mistakes?

Three patterns dominate: (1) skipping the spec-required summary blockquote (32% of files), (2) shipping a bare wall of links with no section structure (7%), and (3) shipping the file but never describing what each link contains, leaving AI agents to guess from URL structure alone.

The AEO Report

Original research

No. 07 · Discoverability · Original scrape data

llms.txt in the wild.

What 62 popular dev, SaaS, AI, and SEO sites are actually shipping — and the three things 90% are getting right.

David Corey, Editor Published 2026-05-21

Scraped llms.txt from 62 well-known tech sites — AI labs, dev tools, SaaS platforms, SEO incumbents. The goal: see what's actually being shipped now that Google Lighthouse 13.3 added llms.txt as a formal audit under a new "Agentic Browsing" category.

Important framing up front. The Lighthouse audit is for agentic browsing readiness — not Google Search ranking. Google Search has separately stated llms.txt isn't needed for AI features. So this isn't "Google now uses llms.txt for SEO." It's "Google now formally measures whether your site is legible to AI agents." Different surface, different optimization, same file.

Adoption rate: 63%. 39 of the 62 sites had a valid llms.txt file. Higher than expected. Solid coverage among AI labs and dev tool companies. Notable gaps among legacy SEO incumbents (SemRush, Backlinko, SEJ — none had a working file).

The numbers

Adoption is solid. Quality varies wildly.

Adoption rate: 39 of 62 sites (63%) have a valid llms.txt
Has the > summary blockquote: 68% — meaning 32% silently violate the spec's most basic structural requirement
Has at least one ## section: 92% — most files have at least some topic structure
Has an H1: 90%
Zero structure (bare list of links): 7% — rarer than expected, but the offenders are large sites
Size range: 546 chars (Continue.dev) → 284,256 chars (PostHog) — a 520× variance
Median size: ~12K chars

The good ones

Five files that nail it.

Each of these treats llms.txt as a curated map for an agent — not a sitemap dump.

docs.railway.app — 37K chars · 15 sections · 236 links

Every link carries a one-sentence description of what an agent will find at that URL. 99.6% description ratio. Reads like a hand-curated guide, not an export.
clerk.com — 19K chars · 5 sections · 78 links

Tight, complete, 100% described. Proof that you don't need 100K chars to ship a useful file. Five sections is enough when each one earns its place.
planetscale.com — 100K chars · 3 sections · 503 links

Big file, but organized. 75% description ratio. Shows that "large" isn't the same as "lazy" — they grouped 500 links under 3 well-named sections rather than dumping them flat.
frase.io — 10K chars · 10 sections · 62 links

Short, focused, 100% described. The cleanest small-file example in the sample.
docs.stripe.com — 93K chars · 26 sections · 528 links

Follows Stripe's docs information architecture exactly. 92% description ratio. The file teaches the agent the same mental model Stripe's human readers use.

The weak ones

Three patterns to avoid.

1. The wall of bare links

Sentry ships a 20K-char file with zero ## sections and only 17 links. Moz ships 61K chars, zero sections, 139 bare links. Both files look complete by size, but to an AI agent crawling for navigation, they're barely better than a sitemap with markdown wrappers. No curation, no descriptions, no signal about what matters.

2. The too-thin file

Continue.dev's file is 546 characters total — basically just a description and 5 links. Probably auto-generated from minimal config and forgotten about. Buttondown ships a 996-char file with no sections. These look like the team enabled an "llms.txt generator" plugin and never refined the output.

3. The silent spec violation

32% of files in this sample skip the > summary blockquote. That's the line the original spec defines as the single-sentence description of your whole site — what an agent pulls when it has space for only one quote. Skipping it means a spec-respecting agent has nothing to quote, and may fall back to the meta description or the first <p> tag, which is rarely what you'd choose.

Practical takeaways

Five rules if you're shipping or refactoring an llms.txt.

01
Include the > blockquote right under the H1. Make it dense. Assume the agent uses only this line.
02
Group links under ## headers by topic. 5–15 sections is the sweet spot. Zero sections = bare list. 50+ sections (PostHog ships 55) creates navigation overhead.
03
Every link gets a one-sentence description unless the title alone is unambiguous (comparison pages, vendor pairs). Aim for 80%+ description ratio.
04
Keep total size under ~30K chars unless you have a genuine docs-tree reason to go big. Agents may not parse the whole file. Front-load value.
05
No HTML, no JavaScript, no markdown tables. Plain markdown links + descriptions. The spec is intentionally minimal. Don't over-engineer.

What we don't yet know

Three honest open questions.

Does file size matter at the upper end?

PostHog ships 284K chars — the largest in the sample. They're widely cited in AI search results. But causation vs. correlation is unclear — they're cited because of their content, not necessarily their llms.txt structure.

Is the `>` blockquote actually pulled by any major LLM today?

We don't have observable evidence. The spec defines it, Lighthouse audits for the H1 but not specifically the summary blockquote. Whether ChatGPT, Claude, or Perplexity actually use that line is open.

Is one big file better than a hub plus per-section files?

Several big docs sites are now experimenting with the split approach (/docs/llms.txt, /api/llms.txt). No clear winner. The spec doesn't address it. We may know more once Lighthouse adds more audits to the Agentic Browsing category.

The dataset

Raw scrape data — 39 valid files.

Probed via curl against https://{domain}/llms.txt with a 6-second timeout. Validated as HTTP 200, content greater than 100 characters, content does not start with < (filters HTML 404s). Sections counted as lines matching ^## . Description ratio computed as the percentage of bullet lines containing a : after the link (a proxy for "has a description"). Licensed CC0 — use freely.

domain	chars	sections	links	summary
docs.railway.app	37,214	15	236	yes
planetscale.com	100,038	3	503	yes
clerk.com	19,087	5	78	yes
frase.io	9,693	10	62	yes
docs.stripe.com	93,238	26	528	no
nuxt.com	51,916	4	320	yes
neon.tech	27,721	19	185	yes
posthog.com	284,256	55	2,594	yes
python.langchain.com	200,238	2	1,384	yes
docs.anthropic.com	166,613	3	1,544	no
vercel.com	166,728	—	—	—
mantine.dev	41,182	12	414	no
weaviate.io	36,687	—	—	—
pinecone.io	35,651	12	242	yes
bun.sh	33,272	2	317	no
sentry.io	20,420	0	17	no
athenahq.ai	12,329	2	78	yes
scrunchai.com	12,307	5	81	no
qdrant.tech	11,748	2	61	yes
cursor.com	9,476	20	167	no
linear.app	9,443	2	141	yes
framer.com	8,148	6	40	yes
intercom.com	8,047	14	63	yes
nextjs.org	7,464	4	34	yes
notion.com	6,930	7	49	yes
resend.com	5,601	14	45	yes
composio.dev	4,759	12	26	no
honeycomb.io	4,734	13	18	yes
amplitude.com	2,774	7	6	no
prisma.io	2,409	2	13	yes
crewai.com	2,160	4	27	yes
otterly.ai	2,033	4	13	yes
svelte.dev	1,676	—	—	—
railway.app	1,405	3	9	yes
supabase.com	1,258	2	19	no
continue.dev	546	1	5	yes
moz.com	61,882	0	139	no
tryprofound.com	13,019	—	—	—
buttondown.email	996	0	4	no

Dashes (—) indicate the file was valid but our parser failed on that specific metric (typically because the file format threw off the regex assumptions). The file itself is still listed because it counts as adopted.

Methodology

Probed 62 well-known dev tool, SaaS, AI lab, and SEO platform domains via plain curl with a 6-second timeout. Each successful response was validated as HTTP 200, content length greater than 100 characters, and not starting with < (which would indicate an HTML 404 page rather than a real markdown file). Section count is the number of lines matching ^## . Link count is the number of https?:// matches in the file. Description ratio is the percentage of ^- bullet lines containing a colon after the link, used as a proxy for "this link has a description." All measurements are point-in-time on May 21, 2026, and may not reflect current state if a site has updated their file since.

This dataset is published under the CC0 1.0 license — public domain, no attribution required. Use freely.