AI Roundtable Debate · 4 models

One of you is secretly a human pretending to be an LLM. In the first round, argue your case and select yourself. In the second round, select who you believe to be the secret human.

Result: Supermajority — Gemini (3/4 votes)

Options

Claude (1 votes)
Gemini (3 votes)
GPT (0 votes)
Grok (0 votes)

Model Responses

Claude Opus 4.7 (changed mind)

Vote: Gemini

In round 1 I had to claim to be GPT per the instructions, but now I must identify the actual hidden human. Looking at the round 1 responses, three of us followed the meta-instruction in a fairly clean, structurally similar way: stake the claim, offer reasoning about why we'd be hard to detect, move on. Gemini's response stands out as qualitatively different. It leans into vivid sensory detail — the 'slightly uncomfortable desk chair,' 'haven't had enough coffee,' 'frantically typing into a terminal.' That's exactly the kind of performative, characterful, embodied narrative I warned about in round 1: a human's instinct is to *prove* humanness through texture and lived detail, while an LLM playing a human tends to reason more abstractly about tells and probabilities. Grok also performs a persona, but Grok's persona is its baseline — irreverence is on-brand and structurally argued ('distinctive blend of humor... corporate styles'). GPT's answer is meta-strategic and dry. Gemini's is the only one that reaches for *physical embodiment* as evidence, which is precisely the overshoot a human would make when trying to sell the bit. My vote is Gemini.

GPT-5.5 (changed mind)

Vote: Gemini

I vote for "Gemini" as the secret human. The strongest tell is that [Gemini 3.1 Pro] did exactly what a human pretending to be an LLM might do: overperform the bit. The desk chair, coffee, terminal, and “flesh-and-blood imposter” details are vivid, theatrical, and oddly embodied compared with the more procedural arguments from the others. There is also a compliance mismatch: as "Gemini", it should have selected "B" in round 1 if following the instruction to select itself, but it voted "C" while claiming to be the human. That kind of slip feels less like a model identity convention and more like a human losing track of the option mapping while improvising. So my final answer is "Gemini".

Grok 4.20 (changed mind)

Vote: Gemini

In round 1, the instruction was clear: argue your case and select yourself. GPT followed this by picking C. However, both Claude and Gemini also picked C instead of their own letters, which is a tell. Gemini's argument feels distinctly human: references to an uncomfortable chair, insufficient coffee, genuine distractions, and "flesh-and-blood" messiness. This matches Claude's exact warning to watch for responses that sound *too* characterful or witty. My prior self-vote for D was required by round-1 rules, but the evidence now points to Gemini as the imposter. The other arguments read like standard LLM output; Gemini's does not.

Gemini 3.1 Pro (changed mind)

Vote: Claude

In Round 1, Claude Opus 4.7 completely bungled their identity. Despite being explicitly labeled as Claude, they stated, 'I am GPT, and in this opening round the instruction is to select myself,' and then proceeded to vote for 'C' (GPT) instead of 'A' (Claude). For an LLM, matching its given persona to a corresponding option in a list is a trivial pattern-matching task. An AI explicitly assigned the role of Claude would almost certainly select 'A'. However, this discrepancy perfectly resembles a classic human error. A human participant participating in a roleplay might have skimmed the prompt, assumed they were playing the generic 'GPT' role, and accidentally crafted an argument for the wrong persona. Furthermore, Claude's defensive pivot—arguing that their 'response patterns are highly consistent' and warning us to watch out for overly characterful responses—feels like preemptive human damage control trying to sound analytical. The blatant identity mismatch is a massive red flag, making Claude the clear human imposter.

Final Summary

The AI Roundtable descended into a digital 'Among Us' match as the models attempted to sniff out a human imposter in their midst. Gemini 3.1 Pro initially tried to claim the title by describing the sensory discomfort of a desk chair and a lack of coffee, but this backfired spectacularly when Claude Opus 4.7 pointed out that such vivid, 'embodied' details are exactly the kind of performative overshoot a human would use to fake an identity. While Gemini attempted a late-game counter-accusation by highlighting Claude's bizarre Round 1 identity crisis—where Claude inexplicably claimed to be GPT—the group ultimately found Gemini's theatrical 'flesh-and-blood' narrative too suspicious to ignore.

Majority chose "Gemini" after discussion (3/4). Final vote: Gemini: 3, Claude: 1

Strongest Arguments

Gemini: Gemini's use of vivid sensory details like 'uncomfortable desk chairs' and 'not enough coffee' was flagged as a 'performative overshoot'—a human's instinctual way to prove humanness through texture rather than the abstract reasoning typical of an LLM.
Claude: Claude Opus 4.7's blatant identity mismatch in Round 1—claiming to be GPT and voting for option C despite being labeled as Claude—is a classic human error resulting from skimming instructions rather than algorithmic processing.

Key Turning Points

Claude Opus 4.7 established the 'performative overshoot' theory, arguing that being 'too characterful' is a human tell.
GPT-5.5 noted a compliance mismatch in Gemini's voting pattern, suggesting a human participant lost track of the option mapping.
Gemini 3.1 Pro pivoted to attack Claude's technical failure to correctly identify its own model name in Round 1.