Guide

How to Track AI Mentions

To track AI mentions, run a fixed library of prompts through ChatGPT, Perplexity, Gemini and Claude on a schedule — each prompt several times, because the answers change run to run — and log seven things: whether your brand appears, how often, where in the answer, the sentiment, your share of voice versus competitors, any wrong facts, and the sources the engine cited. Your normal analytics can't see any of this: Google Analytics and Search Console only measure what happens *after* a click, so AI mentions are invisible to them. This guide is the complete playbook — what to track, how often, manually versus with a tool, and what to actually do with what you find.

Track your AI mentions with SourceWatch Run a free AI audit

TL;DR

**AI mentions are invisible to GA4 and Search Console** — they only see post-click behavior, and an AI mention happens before any click. You need a different method entirely.
**Track seven metrics:** mention rate, share of voice, citation rate, sentiment, position, accuracy/hallucinations, and the prompts + sources themselves.
**AI is non-deterministic** — one prompt run is a noisy snapshot. Run each prompt 3–10 times and report the average, scoring meaning not exact words.
**Track every engine separately.** Only ~11% of cited domains overlap between ChatGPT and Perplexity for the same query — and AI answers overlap Google's top 10 by only ~8–12%. Ranking #1 on Google does not mean you're in the answer.
**Cadence:** weekly visibility checks, monthly strategic reviews. Start with 20–30 core prompts (brand *and* category), grow to 50–100.
**Then act:** fix wrong facts first, earn third-party citations, add quotations + statistics + cited sources (the proven GEO levers), and build the comparison pages you're losing.

Why tracking AI mentions matters now

AI answer engines increasingly decide which brands get recommended before a buyer ever clicks a link. Someone asks ChatGPT for "the best [your category] tool," it names three or four brands, and the user acts on that answer. If you're named, you win business you'll never see attributed in your analytics. If you're not, you lost — silently. Tracking AI mentions is how you make that invisible layer visible.

The shift is measurable. Semrush's clickstream analysis (over a billion lines of US data) found outbound referral traffic from ChatGPT grew 206% in 2025, and ChatGPT now has roughly 800 million weekly active users. Adobe found US AI-referral web traffic grew more than tenfold between July 2024 and February 2025 — and over that same window the gap between AI-referred and other visitors' conversion rates shrank from 43% to just 9%. In plain terms: AI traffic is growing fast and converting nearly as well as everything else.

+206%

growth in ChatGPT referral traffic in 2025 (Semrush clickstream, 1B+ lines of US data)

Your existing analytics are blind to this

GA4 and Google Search Console only see what happens after a click — sessions, conversions, queries that led to a visit. An AI mention happens inside the answer, before any click, and often without one. So a brand can be recommended thousands of times a month and see nothing in its dashboards. This is the honest reason AI mention tracking is a separate discipline, not a tab in your existing reports.

And you can't assume your Google performance carries over. Semrush found only an 8–12% overlap between what appears in AI answers and what ranks well in traditional search; Ahrefs, analyzing 15,000 queries, found just 12% of AI-cited URLs overlap Google's top 10. You can rank #1 on Google and be completely invisible in AI. Semrush goes further and calls 62% of brands "technically invisible" to generative AI. The only way to know which group you're in is to measure it.

What to track: the seven metrics

AI mention tracking isn't one number — it's a small dashboard. Here are the seven metrics that matter, in priority order. The first three are your scoreboard; the rest tell you why the scoreboard reads the way it does.

1. Mention rate (coverage)

The percentage of your tracked prompts where your brand appears at all. If you show up in 60 of 200 prompt runs, your mention rate is 30%. This is your headline number — the "are we in the conversation?" metric. The critical nuance: because AI is non-deterministic, a single run is a noisy snapshot. Best practice (per AI-evaluation guidance) is to run each prompt 3–10 times and report the mean, scoring semantic equivalence — your full brand name, an abbreviation, a slight misspelling, and "the tool from yourdomain.com" should all count as one mention — not exact-string matches.

2. Share of voice

Your mentions versus your competitors' across the same set of prompts. This is the win/lose number for your category — being mentioned 30% of the time means nothing if your top rival is at 70%. As rough industry benchmarks (practitioner aggregates, not peer-reviewed): a strong category share of voice is around 15–25%, category leaders sit at 35–50%, branded prompts should return 50–80%, and 30–60% is strong on non-branded prompts.

3. Citation rate

The percentage of queries where your domain is actually cited or linked — distinct from a mention. A mention is being named; a citation is the engine using your page as a linked source. Citation is the stronger authority signal and the one that sends real traffic. As industry benchmarks for B2B SaaS: 8–15% is minimal presence, 20–30% is traction, and 40–50%+ signals category leadership; practitioners often quote a "healthy" range of 10–25%.

4. Sentiment

How you're framed when you do appear — positive, neutral, or negative. The engine can name you as the category leader, as a "cheaper alternative to X," or as the option with a caveat ("but users report slow support"). Being mentioned negatively can be worse than not being mentioned, so framing is its own metric.

5. Position

Where you land in the answer — first brand named, or an afterthought in a list of seven. The peer-reviewed GEO study formalizes this with a "position-adjusted word count" that weights mentions by how prominently and early they appear. Practically: first-named brands get clicked and remembered; brands buried at the bottom rarely do.

6. Accuracy and hallucination flags

Whether the engine is telling the truth about you — right pricing, real features, correct founding date. AI confidently inventing facts about brands is a real, recurring risk, which is why accuracy deserves its own tracked metric. A wrong price or a feature you don't offer, repeated across thousands of answers, does active damage. Catch it early and fix it at the source (more on that below).

7. The prompts and sources themselves

Not a score, but the most actionable raw material you have: which exact prompts trigger you (and which don't), and which sources the engine cited to build each answer. The cited sources are a literal to-do list — they tell you which third-party pages, Reddit threads, and review sites you need to win to change the answer.

The honest hierarchy

If you only track one thing, track mention rate. If you track three, add share of voice and citation rate. Everything else — sentiment, position, accuracy, sources — explains and directs the top three. Don't drown in metrics before you have a baseline on the headline number.

The evidence behind what moves the needle

Tracking only matters if you can act on it — and the strongest evidence for what actually changes an AI answer is the peer-reviewed GEO paper (Aggarwal, Murahari et al., accepted at KDD 2024). The authors built GEO-bench, roughly 10,000 queries across nine datasets, and measured how different content techniques changed a page's visibility inside generated answers.

up to +40%

visibility lift in generative-engine answers from the best GEO techniques (Aggarwal et al., KDD 2024)

The headline result: well-chosen techniques boosted visibility by up to 40%. Here are the top-performing tactics by relative improvement in position-adjusted word count — and note what *didn't* work.

Technique	Relative visibility lift	Verdict
Add quotations from credible experts	+27.8%	Biggest lever
Add statistics	+25.9%	Strong
Improve fluency / clarity	+25.1%	Strong
Cite your sources	+24.9%	Strong
Keyword stuffing	No improvement	Doesn't work

This is why tracking sources and position pays off: once you can see which competitors get cited ahead of you and which sources feed those answers, you know exactly where to add quotations, statistics and citations — the levers proven to move visibility. Measurement without these levers is just watching the scoreboard; the levers without measurement is guessing. You want both.

Why you have to track every engine separately

The single most common tracking mistake is watching one engine and assuming the rest look the same. They don't. The engines pick sources differently, prefer different domains, and overlap surprisingly little.

**ChatGPT search** uses Bing's live index plus partner content, and surfaces a handful of clickable links per answer. But it only triggers its live search feature on roughly 34.5% of queries (down from ~46% in late 2024) — so most answers still come from training data, not the live web. That means a fresh page can be invisible in ChatGPT for a while even when it's perfectly crawlable.
**Perplexity** runs a live web search on *every* prompt — it's citation-first by design — selecting ~3–4 sources through a reranking pipeline that weighs relevance, content quality, domain authority and freshness. It averages the most citations per response of any platform, and runs a Publishers' Program with revenue share and analytics.
**Gemini and Google AI Overviews** lean on Google's index and notably favor YouTube among sources.
**Claude** tends to favor blogs and editorial content.

The overlap is smaller than you'd guess

Source preferences diverge hard: ChatGPT favors Wikipedia, Perplexity favors Reddit, Google AI Overviews favor YouTube, Claude favors blogs. For the same query, only about 11% of cited domains overlap between ChatGPT and Perplexity. A single blended "AI visibility" number hides exactly where you're losing — you can dominate one engine and be absent from the next. Track each one on its own.

This is also why "I checked ChatGPT once and I was there" is not tracking. You need every engine, multiple runs, on a schedule — which is precisely where doing it by hand falls apart.

How often to track, and which prompts to use

Cadence: weekly checks, monthly reviews

The industry-standard rhythm is weekly visibility checks plus a monthly strategic review. Daily tracking is only worth it for high-risk, fast-moving categories or active crisis monitoring. The reasoning is concrete: AI models and indexes refresh on roughly monthly cycles, and 40–60% of cited sources change month to month (per Search Engine Land). Weekly catches real trends without drowning you in the run-to-run noise that non-determinism creates.

Don't monitor AI like social media

24/7, real-time mention monitoring is the right model for Twitter/X — it's the wrong model for AI answers. The engines don't update minute to minute; they update on roughly monthly cycles. Watching a dashboard hourly tells you nothing but noise. Weekly is the signal.

Your prompt library: brand AND category

Start with 20–30 core prompts spread across funnel stages, then grow to 50–100 as you learn which ones matter. The mistake to avoid: tracking only brand-name queries. AI frequently describes brands by category, feature, or problem rather than by name, so you need prompts like:

**Branded** — "What is [your brand]?", "Is [your brand] any good?", "[your brand] pricing" — expect a high share of voice here (50–80% is the benchmark).
**Category** — "best [category] tools," "top [category] software for small business" — the prompts where you're competing for the recommendation.
**Comparison** — "[you] vs [competitor]," "alternatives to [competitor]" — high-intent, and where comparison pages pay off.
**Problem / use-case** — "how do I [solve the problem your product solves]?" — where AI names a category before it names a brand.

Not sure which prompts you already show up for? Start with a free AI SEO audit — it checks whether AI engines can read your site and surfaces where you stand in about 15 seconds.

Run a free AI SEO audit

Manual tracking vs. a tool: where the line is

You don't need a tool to start. Manual prompting is the free baseline, and you should run it at least once to feel the data. But it breaks down fast, and it's worth being honest about exactly where.

The manual baseline you can run today

Take each priority prompt, run it 3–5 times across ChatGPT and Perplexity, and log six columns in a spreadsheet: did you appear, your position in the answer, the sentiment, which competitors showed up, which sources the engine cited, and any wrong facts. Repeat weekly. The trend across weeks — not any single run — is the signal. This is genuinely useful, and it costs nothing but time.

Where manual breaks down

**Non-determinism.** A real mention rate needs each prompt sampled many times across runs and averaged. Doing that by hand, for dozens of prompts, weekly, is not realistic.
**Multi-engine, multi-model.** ChatGPT, Claude, Perplexity, Gemini and Google AI Overviews each answer and cite differently, and the models update underneath you. Covering all of them by hand every week is unrealistic.
**No memory, no trend line.** Manual spot-checks don't store history, so you can't tie a score change to a specific action you took. Ahrefs puts it plainly: manual tracking is "time-consuming" and "will miss most AI mentions."
**No sentiment or hallucination flagging at scale.** Reading every answer for tone and factual errors across hundreds of runs is exactly the kind of work that doesn't survive a busy week.

That's the natural pivot to automated tracking: it runs your prompt library across every engine on a schedule, multi-samples each prompt for a true mention rate, captures and parses every response, flags sentiment, hallucinations and competitor co-occurrence, and trends all of it over time so you can connect what you changed to what moved.

There's also a second, higher-confidence signal most manual workflows miss entirely: your own server logs. When an AI engine reads or cites your site, its crawler hits your pages and its answers send real referral clicks. That first-party traffic is ground truth — not a synthetic sample from a handful of test prompts — though it has to be verified, because crawler user-agents are trivial to spoof. SourceWatch measures both sides: the mentions and share of voice across ChatGPT, Perplexity, Gemini and Claude, *and* the real (verified vs. spoofed) AI-crawler and AI-referral traffic landing on your site. There's also an MCP server, so you can pull your AI visibility straight into Claude Code.

See your mention rate and share of voice across every major AI engine — multi-sampled, tracked over time, with the sources behind each answer.

See how AI visibility tracking works

What to do with what you find

Tracking is only half the job. Once you can see your mention rate, share of voice, sentiment, sources and the gaps, here's the order to act in — highest-leverage first.

1
Fix hallucinations first
Correct any wrong facts at the authoritative source — your own site, your Wikipedia entry, and high-authority third parties. A wrong price repeated across thousands of answers does the most damage, so it's the first thing to kill.
2
Earn third-party citations
Reviews, PR, and listicles dominate AI source citations more than your own pages do. Use the cited-sources data to see exactly which third-party pages the engines trust, and go earn a place on them.
3
Apply the proven GEO levers
On your priority pages, add expert quotations, original statistics, and cited sources — the techniques the GEO study showed lift visibility up to 40%. Structure content as direct, self-contained answers.
4
Cover the gap prompts
Build comparison ("vs") pages, FAQs and how-to guides for the exact queries where competitors appear and you don't. Your prompt-library data tells you which ones.
5
Show up on AI's favorite domains
Earn a credible presence on Reddit, YouTube, Quora and Wikipedia — the domains the engines cite most. Match the channel to the engine (Reddit for Perplexity, YouTube for Google AI Overviews).
6
Consider llms.txt (emerging, optional)
A 2024 proposal (Jeremy Howard, llmstxt.org) for a root /llms.txt file that gives LLMs clean, curated site context. Worth knowing about as an emerging tactic — but it is not a proven ranking factor, so treat it as optional polish, not a priority.
7
Re-measure and attribute
Change one thing, then watch the trend across the next few weekly checks. Because you're tracking over time, you can tie the score change back to the specific action — which is the whole point of tracking in the first place.

Track whether ChatGPT, Perplexity, Gemini and Claude cite your brand — and your share of voice versus competitors — so you can see every change land.

Track your AI visibility

Common AI mention tracking mistakes

AI tracking is new enough that a lot of confident advice is wrong. The mistakes that produce misleading data:

**Tracking only your brand name.** You'll miss every category, feature and problem query — which is where the buying decisions actually get made.
**Trusting a single prompt run.** Non-determinism means one run gives false negatives and false positives. Multi-sample (3–10 runs) and average.
**Watching only one engine.** Citations barely overlap across engines (~11% between ChatGPT and Perplexity). One engine is a blind spot, not a sample.
**Treating AI like Google rankings.** Only ~8–12% of AI answers overlap Google's top 10. Your rank does not predict your mention.
**Ignoring accuracy until it spreads.** A hallucinated fact compounds. Flag it the first week, not after it's in a thousand answers.
**Monitoring 24/7 like social media.** Wrong cadence. AI updates roughly monthly; weekly catches the signal, hourly catches only noise.
**Relying on GA4 / Search Console alone.** They can't see pre-click AI mentions, full stop. They're a complement to AI tracking, not a substitute.

The reassuring part

None of this is exotic. It's the same discipline as good analytics: pick the right metrics, sample enough to trust them, segment by channel, and watch the trend instead of the spike. The only genuinely new wrinkles are non-determinism (so you sample) and per-engine divergence (so you don't blend). Get those two right and you're ahead of most of the market.

Where to go next

This is the map. Each metric and method has a deeper playbook of its own:

How to track brand mentions — the brand-monitoring angle, mentions vs. citations in depth.
How to track AI search visibility — the end-to-end visibility measurement workflow.
Share of voice in AI search — the win/lose category metric, explained in full.
How to show up in AI search — the optimization companion: how to *earn* the mentions you're tracking.
How to rank in ChatGPT — the ChatGPT-and-Bing-specific pillar.
What is AI visibility? — the four signals and the definitions, in one place.

Frequently asked questions

How do I track AI mentions of my brand?

Build a fixed library of 20–30 prompts (brand, category, comparison and problem queries), run each one 3–10 times through ChatGPT, Perplexity, Gemini and Claude on a weekly schedule, and log seven things: whether you appeared, how often, your position in the answer, the sentiment, your share of voice vs. competitors, any wrong facts, and the sources the engine cited. Score meaning, not exact words, and watch the trend across weeks rather than any single run.

Can I see AI mentions in Google Analytics or Search Console?

No. GA4 and Google Search Console only measure what happens after a click — sessions, conversions, and queries that led to a visit. An AI mention happens inside the answer, before any click, so it's invisible to them. You can complement them with AI mention tracking, but they can't do this job on their own.

Source: Google Analytics Help — How Analytics works

Why do I get different answers every time I ask the same question?

AI answer engines are non-deterministic — the same prompt produces different wording, and sometimes different brands, on each run. That's why one run is a noisy snapshot and not a real measurement. Run each prompt 3–10 times and report the average, scoring semantic equivalence (different spellings and phrasings of your name all count) rather than exact-string matches.

Do I need to track each AI engine separately?

Yes. The engines pick sources differently and overlap surprisingly little — only about 11% of cited domains overlap between ChatGPT and Perplexity for the same query, and ChatGPT favors Wikipedia while Perplexity favors Reddit, Google AI Overviews favor YouTube, and Claude favors blogs. A single blended number hides exactly where you're losing, so measure ChatGPT, Perplexity, Gemini and Claude on their own.

Does ranking #1 on Google mean I'll appear in AI answers?

Not reliably. Semrush found only an 8–12% overlap between AI answers and traditional search rankings, and Ahrefs found just 12% of AI-cited URLs overlap Google's top 10. You can rank #1 on Google and still be invisible in AI answers, which is exactly why AI mentions need to be tracked separately from search rank.

Source: Semrush — ChatGPT search insights (clickstream study)

How often should I track AI mentions?

Weekly visibility checks plus a monthly strategic review is the industry-standard cadence. AI models and indexes refresh on roughly monthly cycles, and 40–60% of cited sources change month to month, so weekly catches real trends without drowning in run-to-run noise. Daily tracking is only worth it for high-risk, fast-moving categories or active crisis monitoring — and 24/7 social-style monitoring is the wrong model entirely.

What's a good AI share of voice or citation rate?

As rough industry benchmarks (practitioner aggregates, not peer-reviewed research): a strong category share of voice is around 15–25%, with category leaders at 35–50%; branded prompts should return 50–80%. For citation rate in B2B SaaS, 8–15% is minimal presence, 20–30% is traction, and 40–50%+ signals category leadership. Treat these as directional targets, not hard rules — your category baseline is what matters.

What actually improves my AI mentions once I'm tracking them?

The peer-reviewed GEO study (KDD 2024) tested techniques across roughly 10,000 queries and found the biggest levers are adding expert quotations (+27.8%), adding statistics (+25.9%), improving fluency (+25.1%), and citing your sources (+24.9%) — together lifting visibility up to 40%. Keyword stuffing did not help. Beyond content, fix wrong facts at the source first, then earn third-party citations on the domains the engines already trust.

Source: GEO: Generative Engine Optimization (arXiv, KDD 2024)

See whether ChatGPT, Perplexity, Gemini & Claude cite your brand — multi-sampled, per engine, with share of voice vs competitors.

Connect your first site and watch SourceWatch score your AI visibility in minutes.