Pillar · engine-specific-coverage
Published 2026-05-27
How ChatGPT Cites Sources — And How to Get Your Site in the Index
NO. 14 · ENGINE MECHANICS
ChatGPT doesn't crawl the web itself — it cites via Bing's index, which means the optimization playbook is schema markup, source-quality signals, and llms.txt, not the "off-site presence" vendors sell for other engines.
ChatGPT citation works through Bing's web index — not through OpenAI's own crawler, not through Reddit mentions, not through backlink velocity. When ChatGPT's search mode pulls a source card, it's querying the same index that powers Bing search results, which means the optimization path is schema markup that Bing parses, content structure that Bing extracts as snippets, and llms.txt files that signal machine-readable intent. The vendor narrative that AEO requires a separate discipline with proprietary tools misses this mechanical reality: 80% of ChatGPT citation is Bing SEO you're likely already doing, plus the 20% most sites skip because they treated schema and llms.txt as optional — until Google Lighthouse formalized them in the Agentic Browsing audit category in May 2026.
Here's the honest map of how ChatGPT actually pulls sources, what signals it inherits from Bing's index, and where the optimization gaps sit.
What "Getting Cited by ChatGPT" Actually Means
The three citation modes ChatGPT uses
A "citation" in ChatGPT isn't a single behavior — it splits into three rendering modes depending on how the user interacts with the interface.
Inline source cards appear when ChatGPT search mode is active — the small boxes beneath the answer text that show the domain name, page title, and a thumbnail or favicon. These pull directly from Bing's index metadata: the <title> tag, Organization schema's name property, and any Article schema's headline field. If your schema is incomplete or missing, the card renders generically — often just the domain root and a truncated URL, not the branded entity name you want displayed.
Voice-mode spoken attribution works differently. When a user asks ChatGPT a question in voice mode and it cites your site, it reads aloud the source name — "According to [Your Brand]" — but only if your page includes SpeakableSpecification schema. Without it, ChatGPT either skips the attribution entirely or defaults to a generic phrasing like "According to one source." The schema literally tells the engine which text fragment to read.
Hallucinated references — the failure mode — occur when ChatGPT generates an answer without search mode enabled, or when the query doesn't trigger a Bing index lookup. The model invents plausible-sounding URLs or publication names that don't exist. Users can't click through. This isn't an optimization problem; it's a reminder that citation only happens when search mode is active — which means your page must be in Bing's index before ChatGPT can reference it at all.
Why ChatGPT search citations ≠ Perplexity citations ≠ Gemini citations
The citation mechanics differ by engine, which is why cross-engine tactics don't transfer cleanly.
Perplexity runs its own web crawler and weights off-site presence heavily — Reddit threads, YouTube video embeds, GitHub repos. If your brand appears in high-authority third-party discussions, Perplexity's citation algorithm treats those mentions as source-quality signals. The optimization playbook for Perplexity includes off-site tactics: getting cited in community forums, embedding video explainers, syndicating content to platforms Perplexity indexes directly.
ChatGPT doesn't do this. It queries Bing's web index, which is an on-page index — Bing crawls your domain, parses your HTML and schema, and ranks pages based on traditional authority signals like backlinks and topical relevance. Off-site presence doesn't matter unless it generates backlinks that Bing's PageRank-equivalent algorithm already weighs. A Reddit thread mentioning your brand won't get you into ChatGPT unless that thread links to your domain and Bing crawls the link.
Gemini uses Google's index, which includes some off-site signals — YouTube embeds (Google owns YouTube), Google Scholar citations, Google Maps reviews. Vendors selling "off-site presence" packages for AEO are mostly optimizing for Gemini, not ChatGPT. The tactics work — but they don't transfer to the Bing-dependent engine.
The optimization divergence is real, and it's why blanket "AEO strategy" advice from multi-engine vendors often collapses under scrutiny. The buyer's guide breaks down which tactics apply to which engines — and which are vendor markup for features you don't need.
The Bing web index dependency — the mechanic no one names
ChatGPT does not have its own web crawler. OpenAI's documentation explicitly states that ChatGPT search mode uses Bing's web index — the same index that powers Bing.com search results. When you optimize for ChatGPT citation, you're optimizing for Bing's index parsers.
What this means for optimization: If your page isn't indexed by Bing, ChatGPT can't cite it. If Bing last crawled your page six months ago, ChatGPT is working with six-month-old content. The recrawl cadence matters — Bing recrawls high-authority pages every 7–14 days, average pages every 30–45 days, low-authority pages every 90+ days. If you publish new content today and want ChatGPT to cite it, you're waiting for Bing's next crawl cycle unless you manually request a recrawl via Bing Webmaster Tools.
How recently does Bing need to have crawled my page? There's no official cutoff, but empirically: ChatGPT search mode pulls from Bing's active index, which includes pages crawled within the past 90 days. Pages crawled more than 120 days ago may still technically be indexed but drop in ranking signals — Bing's algorithm assumes stale content is less relevant. The practical optimization rule: if you haven't updated a cornerstone page in six months, refresh the publish date, add a schema dateModified property, and submit a recrawl request.
The Source-Quality Signals Bing's Index Parses (And ChatGPT Inherits)
Schema markup — the citation bootstrap
Bing's index parsers explicitly look for structured data — its webmaster documentation lists the schema types it uses for rich results, knowledge panels, and source labeling. ChatGPT inherits these same signals because it's querying the same index.
Article schema affects whether ChatGPT displays your page as a branded source card vs. a generic URL. The headline, author, publisher, and datePublished properties populate the card's metadata. Without Article schema, the citation renders as a plain-text URL — no byline, no publish date, no brand context.
Organization schema vs. Publisher schema — the difference matters. Organization schema defines your entity at the domain level: your brand name, logo, social profile links. It's what populates the "About this source" panel in Bing search results. Publisher schema (nested inside Article schema) defines the entity responsible for the specific article. For single-author blogs, these often point to the same entity. For multi-author publications, Publisher schema should reference the organization entity, not the individual author.
Which schema types does Bing parse for source labeling? Per Bing's documentation: Article, NewsArticle, BlogPosting, Organization, Person, WebPage, FAQPage, HowTo, and BreadcrumbList. ChatGPT's citation cards pull from the first four. The rest affect ranking and snippet extraction but don't directly control card rendering.
The schema gap is the single most common reason sites that rank in Bing search results don't get cited by ChatGPT — the content is indexed, but the structured metadata that tells ChatGPT how to display the source is missing.
SpeakableSpecification — the voice-mode signal
SpeakableSpecification is schema markup that tells voice assistants and AI engines which part of your page should be read aloud. It's a <script type="application/ld+json"> block that includes a speakable property pointing to a CSS selector or XPath — usually the first 1–3 paragraphs of your article.
Does ChatGPT's voice mode preferentially cite sites with SpeakableSpecification? Yes — empirically. When ChatGPT voice mode pulls an answer and attributes it, the spoken citation almost always comes from pages that include SpeakableSpecification. Pages without it get skipped for voice attribution, even if they rank highly in text-mode results.
How do I mark which part of my page should be read aloud? Use a CSS selector that targets your lead paragraphs. Example:
{
"@context": "https://schema.org",
"@type": "Article",
"speakable": {
"@type": "SpeakableSpecification",
"cssSelector": [".article-lead", ".intro-paragraph"]
}
}
This tells the engine: "These are the text fragments optimized for spoken extraction." The lead paragraph should be a 40–60 word direct answer — not a context-setting preamble — because that's what gets read.
llms.txt — the table-stakes file ChatGPT's search mode reads
What is llms.txt? A plain-text file at your domain root (https://yoursite.com/llms.txt) that provides machine-readable context about your site — your brand positioning, primary topics, key pages, and any disambiguation notes. It's modeled after robots.txt but for AI engines instead of crawlers.
Is llms.txt required for ChatGPT citation, or just recommended? As of May 2026, Google Lighthouse added an Agentic Browsing audit category that explicitly checks for llms.txt. The audit flags its absence as a failure — not a warning, a failure. This is the receipt that settles the "is it optional?" debate: the most authoritative source possible — Google's own browser audit tool — now treats llms.txt as table-stakes.
What should I put in my llms.txt file? Three sections:
- Brand context — one-sentence positioning statement
- Primary topics — the 3–5 categories your content covers
- Key pages — URLs for your most authoritative resources (about page, cornerstone guides, contact)
Example:
# The AEO Report
Independent research on Answer Engine Optimization. We track how brands appear in ChatGPT, Perplexity, Gemini, and Claude.
## Primary topics
- Answer engine optimization
- Schema markup
- AI search citation mechanics
## Key pages
https://aeoreport.com/about/
https://aeoreport.com/aeo-audit-checklist/
https://aeoreport.com/buyers-guide/
The file should be under 500 words. ChatGPT's search mode reads it when deciding how to label your source and which pages to prioritize for topic-specific queries.
The Content Patterns ChatGPT's Search Mode Preferentially Extracts
Definition-first answer structure — the snippet ChatGPT pulls
ChatGPT search mode extracts the first 1–2 sentences of your page as the snippet it shows in citation cards. If those sentences are vague scene-setting — "In today's fast-paced digital landscape, businesses are looking for..." — the snippet is useless, and ChatGPT often skips the source for a competitor whose lead paragraph directly answers the query.
Why do "definition boxes" get cited more than listicles? Because the extraction algorithm looks for declarative statements in the first 100 words. A definition box — a 40–60 word paragraph that states the answer upfront — matches that pattern. A listicle that starts with a preamble and buries the answer in item #3 doesn't.
What does a "ChatGPT-optimized" first paragraph look like? Sentence 1: the direct answer. Sentences 2–3: the qualification or stakes. Then expand. Example:
ChatGPT citation works through Bing's web index — not through OpenAI's own crawler, not through Reddit mentions, not through backlink velocity. When ChatGPT's search mode pulls a source card, it's querying the same index that powers Bing search results, which means the optimization path is schema markup that Bing parses, content structure that Bing extracts as snippets, and llms.txt files that signal machine-readable intent.
First sentence: the claim. Second sentence: the mechanic. Third sentence: the implication. No preamble, no windup, no "let me set the context." The answer comes first.
How long should my lead paragraph be? 40–60 words for the opening statement, 100–150 words total for the full lead section. Longer than 200 words and you're diluting the snippet extraction. Shorter than 40 words and you're not giving enough context for ChatGPT to determine relevance.
FAQ schema — the multi-question citation multiplier
Does FAQ schema increase my chances of being cited for multiple queries? Yes. If your post includes 5 questions marked up with FAQPage schema, ChatGPT can cite your page for any of those 5 queries — not just the primary keyword. Each question becomes a separate entry point.
How many questions should I mark up with FAQ schema? 3–7 per post. Fewer than 3 and the schema doesn't materially expand your citation surface area. More than 7 and you're likely padding the list with low-value questions that don't match real user queries.
Can I use FAQ schema on a blog post, or only on FAQ pages? You can use it on any page type — blog posts, how-to guides, product pages. The schema type is FAQPage, but the URL structure doesn't have to literally be /faq/. Bing's parsers look for the schema markup, not the URL slug.
Example:
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Does ChatGPT have its own web crawler?",
"acceptedAnswer": {
"@type": "Answer",
"text": "No. ChatGPT uses Bing's web index — not its own crawler."
}
}
]
}
Each question should be a natural-language query a user might ask. The answer should be a direct 1–2 sentence response. Don't pad the FAQ with questions no one actually asks — the schema is parsed, not just displayed.
The 40–60 word citation snippet rule
How many words does ChatGPT extract when it cites a source? Empirically: 40–60 words. Occasionally up to 80 for complex topics, but the median is 50–60. This is the text that appears in the citation card beneath your domain name.
Should I frontload my answer or build up to it? Frontload. The snippet is pulled from the top of the page — usually the first paragraph — so if your answer is buried in paragraph 4 after three paragraphs of setup, it won't be extracted.
What happens if my lead paragraph is 200 words of context-setting? ChatGPT either truncates the snippet mid-sentence (making it incomprehensible) or skips your page for a competitor whose lead paragraph is concise and directly answers the query. The 200-word windup is an SEO habit from 2015 — when Google's algorithm rewarded "comprehensive" intros — but it's actively harmful for AI engine citation, where the snippet is the entire value proposition.
What Doesn't Work (Despite What Vendors Claim)
Off-site presence — the Perplexity tactic that doesn't transfer
Do Reddit mentions or YouTube embeds help ChatGPT citation? No — unless they generate backlinks that Bing's index already weighs. ChatGPT queries Bing's on-page index, not social signals or third-party mentions. A Reddit thread that mentions your brand but doesn't link to your site contributes nothing to ChatGPT citation odds.
Why do vendors recommend off-site tactics for ChatGPT when it uses Bing's on-page index? Because they're selling the same playbook across all engines — Perplexity, Gemini, Claude — and off-site tactics work for Perplexity and (to a lesser extent) Gemini. The vendor markup is real: they're charging for a multi-engine strategy when only 30% of the tactics apply to ChatGPT specifically. The buyer's guide breaks down which vendors are transparent about this tradeoff and which aren't.
Keyword density — the SEO habit to break
Does repeating my target keyword improve ChatGPT citation odds? No. Bing's index uses semantic search — it understands synonyms, entity relationships, and topic clusters. Keyword stuffing (repeating "ChatGPT citation" 15 times in a 1,000-word post) doesn't improve relevance and can trigger keyword-spam penalties that hurt Bing ranking — which directly hurts ChatGPT citation odds.
What's the difference between writing for Bing SEO vs. writing for ChatGPT search mode? Bing SEO rewards topical authority, backlinks, and clean HTML. ChatGPT citation adds three requirements: schema markup, llms.txt, and snippet-friendly lead paragraphs. The writing style is the same — clear, direct, declarative — but the technical layer matters more for AI engines than for traditional search.
Social signals and backlink velocity
Does ChatGPT weigh social shares or backlinks when choosing sources? Only indirectly. Bing's ranking algorithm weighs backlinks (traditional PageRank-style authority) but not social shares. ChatGPT inherits Bing's ranking, so backlinks matter — but backlink velocity (the rate at which new links accumulate) doesn't appear to be a discrete signal. A page with 50 high-quality backlinks from 2018 outperforms a page with 10 new backlinks from last week, assuming the content is fresh.
What authority signals does Bing's index actually parse? Domain age, backlink count and quality, topical clustering (internal linking), E-A-T markers (author bios, citations), and schema completeness. Social signals (Twitter shares, Facebook likes) are not documented as ranking factors in Bing's webmaster guidelines — unlike Google, which used to (but no longer does) weigh Google+ shares.
The Empirical Test — What Changed Our Editorial Position
The Google Lighthouse Agentic Browsing category (May 2026)
What is Google's Agentic Browsing audit category? A new section in Lighthouse audits (as of May 2026) that checks whether a site is optimized for AI agent interactions — specifically: presence of llms.txt, schema completeness (Article, Organization, FAQPage), and SpeakableSpecification markup. Sites that fail the audit get flagged in Lighthouse reports with actionable fixes.
Does Google Lighthouse now audit llms.txt? Yes. The audit explicitly checks for a file at /llms.txt and validates its structure. This is the empirical event that shifted our editorial position from "llms.txt is recommended" to "llms.txt is table-stakes." When the most authoritative technical audit tool in the industry adds a category for a previously informal best practice, the practice is no longer optional.
What does this mean for the "is AEO real?" debate? The debate is over. Google — not a vendor, not a consultant, but Google itself — formalized agentic optimization as a discrete audit category. The vendor claim that AEO is a separate discipline is now backed by the browser infrastructure layer. The remaining question isn't whether to optimize for AI engines, but how — and whether you need vendor tools or can execute in-house. (The audit checklist covers the in-house path.)
The 30-day Bing recrawl window
How often does Bing recrawl the average page? Every 30–45 days for established domains with moderate authority. High-authority news sites get recrawled every 7–14 days. Low-authority or infrequently updated pages: 90+ days.
If I publish new content today, when will ChatGPT be able to cite it? Depends on your domain's crawl cadence. If you're a high-authority publisher, within 7–14 days. If you're an average blog, 30–45 days. You can force a faster recrawl by submitting the URL via Bing Webmaster Tools — the "URL Inspection" tool includes a "Request Indexing" button that moves the page to the front of Bing's crawl queue. This doesn't guarantee immediate indexing, but empirically it cuts the wait from 30 days to 3–7 days.
Can I force a recrawl to speed up ChatGPT citation? Yes, via the Bing Webmaster Tools URL Inspection tool. Submit the URL, request indexing, and monitor the index status over the next week. Once Bing confirms the page is indexed (status changes from "URL submitted" to "Indexed"), ChatGPT search mode can cite it — usually within 24–48 hours.
The Honest Map — Where to Start
If you have zero schema markup today
What's the fastest win — Article schema, Organization schema, or llms.txt? Organization schema first. It's a one-time domain-level setup that benefits every page. Then llms.txt (also one-time). Then Article schema on your cornerstone pages — the posts you want cited most.
Can I add schema without a developer? Yes. WordPress plugins like Yoast SEO and Rank Math generate basic Article and Organization schema automatically. For llms.txt, you're writing a plain-text file and uploading it to your root directory — no code required. For custom schema (SpeakableSpecification, FAQPage), you'll need to edit your theme's HTML or use a plugin like Schema Pro.
If you already rank in Bing but ChatGPT never cites you
Why would Bing index my page but ChatGPT skip it? Two gaps: (1) schema markup is incomplete or missing, so ChatGPT can't render a branded source card, or (2) your lead paragraph is vague or context-heavy, so the snippet extraction fails and ChatGPT uses a competitor's clearer answer instead.
What's the schema gap vs. content structure gap? Schema gap: you rank, but the citation card is generic (just a URL, no brand name). Fix: add Article and Organization schema. Content structure gap: you rank, but ChatGPT uses a different source's snippet. Fix: rewrite your lead paragraph as a 40–60 word direct answer.
The procurement-grade checklist
What's the step-by-step sequence to optimize for ChatGPT citation?
- Verify Bing has indexed your page (search
site:yoursite.com/page-slugin Bing) - Add or validate Organization schema at the domain level
- Add or validate Article schema on the target page
- Add SpeakableSpecification pointing to your lead paragraph
- Rewrite the lead paragraph as a 40–60 word direct answer
- Add FAQ schema if the post answers 3+ discrete questions
- Create or update
/llms.txtwith brand context and key pages - Request a Bing recrawl via Webmaster Tools
- Test the page in ChatGPT search mode 7–14 days later
How do I know if it's working? Test the query directly in ChatGPT with search mode enabled. If your page appears as a cited source card (not just a URL in the answer text), the optimization is working. If it doesn't appear, check (a) whether Bing has recrawled the page since your changes (use Bing Webmaster Tools' URL Inspection), and (b) whether your lead paragraph is clear and direct when read in isolation.
The vendor narrative is that AEO is a separate discipline requiring new tools — but the mechanical reality is that ChatGPT citation runs on Bing's index, which means 80% of the work is Bing SEO you're likely already doing, plus schema markup and llms.txt, which most sites skip because they were treated as optional until Google Lighthouse formalized them in May 2026. The optimization gap isn't strategy; it's execution — specifically, the schema and content structure details most sites defer because they don't yet understand the citation mechanics that make them non-negotiable.