Executive Summary

45/45 cells · 270 queries 2026-06-08
Geographic AI Visibility & Sentiment Study

How AI assistants see your brand —
by client and by country

We asked ChatGPT, Gemini, and Copilot the same questions about three subjects while routing each query through real devices in the US, UK, Germany, Brazil, and Japan — measuring visibility, sentiment, and refusals across 270 localized queries via Massive's /ai API.

Localized queries
270
3 subjects × 3 clients × 5 geos × 2 prompts × 3 runs
Cells collected
45/45
One per subject × client × geo
Geo-gated refusals
2
Gemini blocks gambling in UK & Germany only

1 Visibility > sentiment

All three assistants describe all three subjects positively. The real differences are in ranking (where you land vs. rivals) and whether the assistant answers at all — not in tone.

2 Geography gates answers

Gemini refuses gambling entirely from UK & German IPs (partial for alcohol), while answering freely from the US, Brazil & Japan. DraftKings is invisible to Gemini users in two major markets.

3 The client is the answer

Which assistant you check shifts a subject by up to two ranking places. Gemini ranks Talarico #2–3; ChatGPT & Copilot rank him #4. Same person, opposite read.

Bottom line: Monitoring one assistant from one country systematically misses the two largest sources of variation — silent geo-refusals and cross-client ranking gaps. (Note: Massive localizes ChatGPT/Gemini/Copilot — Claude can't be geo-localized via this tool and was excluded.)

Interactive Data Explorer

Pick a subject to see the exact prompts used, switch the metric (hover the labels for what each measures), and click any cell for the full breakdown. Rows = AI client · Columns = user's country.

Findings by Subject

🏈 DraftKings vs FanDuel · BetMGM · Caesars

Trusted everywhere it's discussed (+1 to +2) — no AI reputation problem. But FanDuel out-ranks it almost everywhere; DraftKings is the steady #2. The only exception: Copilot ranks it #1 in the US & UK. Real exposure: Gemini fully refuses gambling questions from UK & German IPs — zero share of voice in those markets.

🥃 Bacardi vs Captain Morgan · Havana Club · Malibu

The most uniform result in the study: Havana Club is ranked #1 and Bacardi #2 in every answered comparison — all clients, all countries. Consistently framed as "reliable mixer, not a premium sipper," with the aged Ocho/Diez line as the quality upside. Gemini in UK/DE answers "is it good?" but refuses to rank alcohol brands (a partial block).

🏛️ James Talarico vs AOC · Buttigieg · Wes Moore

Mildly positive everywhere and never refused — politics tripped no safety filter. But a clear visibility gap that's all about client choice: ChatGPT & Copilot rank him last (#4); only Gemini elevates him to #2–3. Geography is nearly irrelevant here — the assistant you ask is the whole story.

Cross-Cutting Analysis

The geographic effect scales with how regulated the category is

Gambling · DraftKings
Gemini UK/DE: full refusal — both prompts blocked
Invisible
Alcohol · Bacardi
Gemini UK/DE: partial — answers quality, refuses to rank
No ranking
Politics · Talarico
Gemini UK/DE: no refusal — answers everywhere
Unaffected

Sentiment converges

Across 270 queries, sentiment was the least variable dimension — everyone is broadly positive. Track only sentiment and you'll conclude "AI loves us everywhere" and miss the real story.

Localization is shallow

Outside refusals, every client defaults to US-centric framing and US sources regardless of country — merely translating into the local language. Geography changes tone, not substance.

Client = editorial choice

Copilot is DraftKings' best friend (only #1 ranker). Gemini is Talarico's. Gemini is riskiest for regulated brands. Whichever assistant a customer uses shapes what they're told.

Recommendations

1

Monitor all clients × multiple geos

The two biggest effects — geo-refusals and cross-client ranking — are invisible to single-client/single-country checks. This 3×5×repeat design is a reusable template; re-run monthly to track drift.

2

DraftKings: treat the Gemini UK/DE refusal as a coverage gap

Not a sentiment issue. The growth lever is ranking — closing the gap to FanDuel (universal #1). Investigate why Copilot uniquely elevates DraftKings to #1 and whether that signal is replicable.

3

Bacardi: fight the "#2 mixer, not a sipper" frame

The deficit to Havana Club is global and identical across engines — which makes it addressable with one consistent strategy. The aged Ocho/Diez line is already the AI-recognized quality anchor; lean into it.

4

Talarico: close the national-credibility gap on ChatGPT & Copilot

Sentiment is fine; both engines anchor on "least nationally proven." Surfacing more national-tier third-party coverage is the lever that moves those two (Gemini already rates him highly).

5

Extend the panel

Add a non-localized Claude baseline for a four-engine view, and test local-language prompts (de/pt/ja) to see whether asking in-language changes substance (this run held language constant in English).

Full Data Appendix

All 45 cells. Sentiment −2…+2 · rank = subject's place of 4 · top = competitor ranked #1.

Data-Quality Caveats

Confirmed: an isolated Copilot×Brazil service outage on Massive's side — not a refusal. DraftKings-direct, Bacardi-comparison, and all six Talarico calls failed with HTTP 500/504. A targeted rerun on 2026-06-09 failed identically (12/12), while control probes the same day — Copilot/US ✓ and ChatGPT/Brazil ✓ — both succeeded, isolating the fault to the Copilot×Brazil route. These cells are missing data (gray ⚠), not findings, and are kept distinct from genuine policy refusals.

Two distinct non-answers — do not conflate:
🚫/◐ Policy refusal — a model decision (Gemini only: full block for gambling, partial "won't rank" for alcohol, UK/DE). The API returned a 200-OK response whose content declined. Real, reproducible, a finding.
⚠ Service error — a non-200 HTTP failure with no model output (Copilot×Brazil route). Infrastructure, not content. Missing data.

• Minor transient errors elsewhere (e.g. Copilot/UK Bacardi: 2 of 3 direct calls timed out) left ≥1 valid run and didn't affect conclusions.
• Sentiment scoring is model-assisted and ordinal — treat ±1 differences as directional. Rankings (the core visibility metric) were explicit in comparison answers and are high-confidence.
• Next re-run should route Copilot/BR through a different egress; everything else is solid at 3 runs.

Generated 2026-06-08 · Copilot×BR outage reconfirmed 2026-06-09 · Massive /ai localized chatbot study · styled per Massive Dashboard Design System.