How do I measure AI search visibility?
There is no industry-standard metric for AI search visibility yet. The four most useful right now are citation share-of-voice (how often you appear vs competitors on category prompts), sentiment (how the engine talks about you), average rank when cited (first mention vs fifth), and query coverage (how many of your tracked prompts mention you at all). Each one answers a different question and you need all four to read the picture honestly.
Why this is not solved yet
SEO has a settled vocabulary: rankings, impressions, clicks, CTR, conversion. AI search does not. The engines are non-deterministic, the consumer apps differ from the APIs, and there is no equivalent of Google Search Console handing you reliable data. Vendors are converging on roughly the same metric set, but naming and methodology vary.
The practical implication: do not get attached to a single number. Build a dashboard that shows several signals together, and treat any movement that only shows on one metric as suspicious until it shows on a second.
Citation share-of-voice
The closest thing AI search has to an SEO ranking. Pick a set of category prompts a real customer would ask. Run them across ChatGPT, Perplexity, Gemini, Claude, and DeepSeek on a schedule, with multiple variants per prompt. Count how often each brand is mentioned.
Concrete example. Take the prompt "best CRM for small teams". Phrase it 20 different ways: "top CRMs under 50 dollars", "which CRM should a 10-person agency use", "what is the best CRM for solopreneurs", and so on. Run each variant five times across all five engines. That is 500 samples. Count how many of them mention HubSpot, Salesforce, Pipedrive, and your brand. The percentage of samples that name you is your category citation share-of-voice.
Why the variants matter: a single prompt phrasing can be a fluke. Twenty variants with five runs each gives you a defensible average.
Sentiment
Being cited is necessary but not sufficient. The engine can mention you and frame you as expensive, niche, or beta-quality. Sentiment scoring runs the cited passages through a classifier (positive, neutral, negative) so you can see whether the citation was a recommendation, a comparison, or a warning.
Watch for the cases where citation share goes up but sentiment trends negative. That usually means a new comparison article ranks for the head term and you are getting cited as the cheaper but weaker option. Volume looks good. Reality is bad.
Average rank when cited
When ChatGPT lists "top five CRMs", being mentioned at position one is a different outcome from being mentioned at position five. Track the average position your brand occupies in cited lists.
Most GEO tools surface this as "average rank" or similar. It is the AI-search equivalent of moving from page two of Google to position three. Worth tracking weekly. Small changes here usually trail content moves by two to four weeks on the live-browsing engines.
Query coverage
Of every prompt you track, in what percentage of them does your brand show up at all? This is the breadth metric. A brand can have great share-of-voice on three prompts and zero coverage on the next 27. Coverage tells you whether your visibility is concentrated or broad.
Aspirationally, you want both: high share on the prompts you care about, and broad coverage across the long tail. Most brands start with neither. The first GEO win is usually pushing coverage from zero to thirty or forty percent on the head terms; share-of-voice gains follow.
How to put it together
A working dashboard answers four questions per week:
- Did our share-of-voice on tracked prompts go up or down?
- Did sentiment hold or shift?
- Did our average rank when cited improve?
- Did query coverage expand or stay flat?
If three of four are up and one is flat, that is a good week. If share is up but sentiment is down, you have a content problem to fix. If coverage is up but rank is down, you are picking up long-tail mentions but the head terms are still owned by competitors.
One honest caveat. Because the engines are non-deterministic, weekly noise is real. Trust trend lines over single-week deltas. Two weeks of consistent direction beats a single big jump.
What to ignore
Vanity metrics that look impressive but do not predict revenue:
- "Total mentions" with no denominator. 500 mentions across what? Useless without a per-prompt rate.
- Engine-specific brag-stats. Being cited 10 times by Claude is meaningless if your buyers are using Perplexity.
- Citation count without sentiment. The tobacco industry got cited a lot too.
Pick the four metrics above, sample them honestly across the engines your customers actually use, and let the trend lines do the talking.