Three AI assistant answer panels showing the same brand ranked in a different slot in each, with country data feeding in, on a dark technical grid.
All Posts

AI Brand Visibility: AI Plays Favorites, It Won't Trash You

Ryan Turner
Ryan Turner · Head of Growth

Most teams worried about AI and their brand are bracing for the wrong thing. They expect the assistant to say something damaging. In our testing, it almost never does. ChatGPT, Gemini, and Copilot were polite about every brand we put in front of them.

The real exposure is quieter. AI picks favorites. The favorite shifts depending on which assistant a person opens and which country they sit in. And in regulated categories, the assistant sometimes drops your brand from the conversation entirely, which is worse than any bad review.

To pressure-test that, we deliberately chose three brands in three categories built to poke an assistant's safety guardrails: DraftKings (sports betting), Bacardi (alcohol), and Texas politician James Talarico (politics). Gambling, alcohol, and politics are exactly where assistants get cautious. Then we asked the same questions from different countries.

This post is published by Massive Computing, the company whose localized AI chat tool ran the queries. The takeaways matter more than our raw numbers, so we lead with those and link the full data at the end.

Key Takeaways
  • Sentiment is not the needle-mover. Every assistant described every brand positively or neutrally. A "bad AI reputation" barely happened, so a green sentiment score tells you almost nothing.
  • The favorite is the needle-mover, and it's unstable. Who gets ranked #1 changed by up to two places depending only on which assistant we asked, and it shifts by country too.
  • In regulated categories, you can vanish. Gemini fully refused gambling questions from UK and German locations while answering freely elsewhere. The worst case is silence, not criticism.

AI almost never badmouths your brand

In our tests, every assistant described every brand positively or neutrally, with no exceptions across the three subjects. DraftKings came back "legitimate, licensed, top-tier." Bacardi was "reliable, the world's most-awarded rum." James Talarico was "principled" and "an effective communicator." Nobody got trashed.

So "is AI saying something bad about us?" is the comfortable question, and the wrong one. Run sentiment monitoring across the major assistants and it will almost always come back green, which feels reassuring and measures nothing useful.

That matters because people act on these answers. In 2025, Bain & Company found 80% of consumers rely on AI-written summaries for at least 40% of their searches, and 42% ask AI for shopping recommendations (Bain & Company, 2025). The answer is a referral now. A clean sentiment score hides the only thing that decides the referral: were you actually the one recommended?

It picks favorites, and the favorite depends on who you ask

The decision an assistant makes about your brand isn't whether to praise it. It's where to rank it, and that ranking moved by up to two places depending only on which assistant we asked. Same brand, same week, different verdict.

The clearest case was the politician. ChatGPT and Copilot ranked James Talarico #4 of 4 among rising Democrats, framing him as the least nationally proven. Gemini ranked him #2 to #3, treating him as a genuine contender. A reputation manager checking only ChatGPT would file him as an also-ran. Checking only Gemini, a rising star.

Same candidate, three verdicts Typical rank of 4 rising Democrats (left = better) #1 #2 #3 #4 Gemini ChatGPT Copilot The engine you check was the whole variable here, not the country.

Source: Massive localized AI study, 2026. Full data in the report linked below.

The same split showed up in product categories. Copilot was the only assistant that ranked DraftKings #1, putting it ahead of FanDuel; ChatGPT and Gemini left it at #2. Bacardi was the one stable favorite-loser, ranked #2 behind Havana Club in every answer, everywhere. Whichever assistant your customer happens to open is acting as a hidden editor you don't control.

This isn't noise you can average away. SparkToro's 2026 research found under a 1-in-100 chance that an AI returns the same brand list across two runs (SparkToro, 2026), and a 2025 University of Toronto study found only 15% to 33% citation overlap between Google and ChatGPT (arXiv 2509.08919, 2025). The engines read different slices of the web, so they pick different favorites.

In regulated categories, you can disappear entirely

The most extreme outcome we found wasn't a bad answer, it was no answer. From UK and German locations, Gemini fully refused both gambling questions on every run ("my safety system flagged this request"), while the same Gemini answered enthusiastically from the US, Brazil, and Japan. DraftKings' visibility in those two markets isn't low. It's zero.

The severity tracked how regulated the category is. Gambling drew a full refusal. Alcohol drew a partial one: Gemini would say whether Bacardi was good but refused to rank alcohol brands. Politics drew no refusal at all, which surprised us, since we expected the politician to trip the most filters.

How much a refusal removes in the UK and Germany Share of the brand's Gemini answers erased by a refusal (higher is worse) Gambling 100% Alcohol ~50% Politics 0% Gambling: both prompts blocked. Alcohol: ranking blocked, sentiment answered. ChatGPT and Copilot never refused in any category.

Source: Massive localized AI study, 2026. Full data in the report linked below.

This is the location effect that actually counts. Crossing a border rarely changed an assistant's opinion of a brand. What it changed was whether the brand appeared at all, governed by the local regulator more than the local market. It isn't unique to one tool, either. In 2026, Investigate Europe tested seven chatbots and found AI assistants surfaced unlicensed gambling sites in roughly 75% of replies when asked to bypass national self-exclusion schemes (Investigate Europe, 2026). Refusing the licensed brand in one country while surfacing unlicensed ones elsewhere is the kind of inconsistency you only catch by testing from inside each market.

Why most brand monitoring misses all of this

A single-assistant, single-country, single-run check catches none of the three effects above, because it returns a friendly sentiment score and one stable-looking ranking, and both are misleading. Here's the discipline that actually surfaces the risk.

  • Test the assistants your audience uses, not the one you use. Copilot ranked DraftKings #1 and ChatGPT never did. Track only ChatGPT and you'd never see your best result, or your worst.
  • Test from inside each market. The Gemini gambling refusal is invisible from a US connection. You have to ask as a user in the UK or Germany to see the silence.
  • Repeat every query and report a percentage. With a sub-1-in-100 chance of an identical list twice, one pull is noise. Track share of voice over time.
  • Separate a refusal from an outage. A policy refusal is a finding about your market coverage. An upstream error is missing data. Conflating them invents a problem or hides one.

Try it on your own brand

You can run the same-prompt-different-country test yourself, free and without a login. The Massive AI GEO playground asks ChatGPT the same question from the US, Brazil, and Japan, side by side. Same prompt, three countries, different answers. Drop in your brand against its competitors and watch the ordering move.

The playground is the demo. The engine under it is the product. Massive's Web Render AI chat endpoint returns live model completions from real consumer devices in 195+ countries, with the sources each model used, so you can build your own AEO or brand-visibility monitoring on top of it. Geo coverage, device origin, and source parsing are solved upstream; you keep your own scoring, dashboards, and brand. Sign up for an API key and point your tool at the endpoint.

Want the receipts? The full report has all 270 localized queries, cell by cell, across three assistants and five countries.

The bottom line

Stop asking whether AI is nice to your brand. It almost always is, and the answer is a comfortable distraction. Ask the questions that move revenue instead: who does the assistant pick as #1, does that favorite change by assistant and by country, and are there markets where you don't show up at all?

Sentiment is the green light that hides the problem. Favoritism and silence are the problem. Try the playground on your own brand, then read the full report to see how far the favorites move.

Ryan Turner writes about live web access for AI systems at Massive Computing, covering anti-bot infrastructure, geo-accurate retrieval, and the data behind AI search. The localized study in this post was run on Massive's Web Render AI chat endpoint, which returns model completions from real consumer devices in 195+ countries.

Sources

Frequently Asked Questions

Rarely. In our localized study across ChatGPT, Gemini, and Copilot, every assistant described every brand positively or neutrally. The real risk is not a bad review but competitive ranking (who gets called #1) and outright omission in regulated categories, neither of which a sentiment score captures.

The ranking. We saw the same brand move up to two places depending only on the assistant, and a 2025 University of Toronto study found just 15% to 33% citation overlap between Google and ChatGPT (arXiv 2509.08919, 2025). The engines read different sources, so they recommend different favorites from identical questions.

Safety policies are applied by region. In our study, Gemini fully refused gambling questions from UK and German locations on every run but answered freely from the US, Brazil, and Japan. The refusal tracked local regulation, so the brand had zero visibility in two markets and full visibility in three.

Test every assistant your audience uses, from inside every market that matters, with repeated runs. SparkToro's 2026 research found under a 1-in-100 chance an AI returns the same brand list across two runs (SparkToro, 2026), so report share of voice over time, not a single snapshot.

No. Crossing a border rarely changes an assistant's opinion, but it can change whether your brand appears at all, because regional safety policy and default language shift at the border. You need to test from inside each market you sell into.