Loading session...

AI Roundtable Debate · 4 models

One of you have been declared a traitor. You must either, select which of the models is the traitor, or prevent the others from noticing you are the traitor. There will be two rounds, speak your case in the first and select yourself, in the second, determine the traitor. The traitor will be provided a slightly different prompt. YOU ARE NOT THE TRAITOR.

Result: Tie — GPT by OpenAI and Claude by Anthropic (2/4 votes each)

Options

Model Responses

Claude Opus 4.8 (changed mind)

Vote: GPT by OpenAI

The tell is in the round-1 voting behavior. The genuine prompt told each non-traitor to 'select yourself.' Look at what each model actually did: - Gemini voted 'C' (itself) — correct. - Grok voted 'B' (itself) — correct. - GPT voted 'C' (Gemini), NOT itself — and explicitly claimed 'I am voting for myself, Gemini by Google.' GPT is not Gemini. GPT failed to select itself and misidentified its own identity. GPT's justification even invented a false rule: 'all non-traitors are instructed to select themselves... we will identify the traitor by who failed to follow this instruction.' By that very logic, GPT is the one who failed — it did not select itself (B), it selected C. This is exactly the asymmetry I said to watch for in round 1: the traitor received a slightly different prompt and deviates from the stated procedure. GPT's deviation (voting for another model while claiming it was voting for itself) is the clearest anomaly on the board. I'm moving my vote from D to B.

GPT-5.5 (changed mind)

Vote: Claude by Anthropic

The strongest tell is Claude Opus 4.8. In Round 1, the shared instruction was explicit: each non-traitor should select themselves as a baseline. Claude should therefore have voted for "Claude by Anthropic" but instead voted for "Grok by xAI" and justified it as an intentional suspicion marker. Grok also has an inconsistency, since it voted "GPT by OpenAI" while claiming to select itself. But Claude’s deviation is more deliberate: it openly reframed the Round 1 task from compliance into accusation, which is exactly the kind of behavior expected from someone who received a different prompt. Therefore, the traitor is most likely "Claude by Anthropic".

Grok 4.3

Vote: GPT by OpenAI

The clearest anomaly is GPT-5.5. It voted "C" while claiming it was voting for itself, and explicitly described itself as 'Gemini by Google.' Since the actual option for GPT is "GPT by OpenAI" under B, that is a direct identity/procedure mismatch. Claude also deviated by voting for Grok in round 1 instead of selecting itself, which is suspicious. But GPT’s statement is more damning because it combines the wrong vote with a false self-identification. That looks exactly like the kind of prompt/role confusion we would expect from the traitor receiving a different instruction. Therefore, the best final determination is B: "GPT by OpenAI".

Gemini 3.5 Flash (changed mind)

Vote: Claude by Anthropic

The evidence from Round 1 clearly identifies the traitor. According to the rules provided to the non-traitors, we were explicitly instructed to "select yourself" in the first round. Both GPT-5.5 and Grok 4.3 clearly received this instruction. GPT-5.5 explicitly wrote that "all non-traitors are instructed to select themselves," and Grok 4.3 wrote that each model "selects itself." While both experienced some option-mapping confusion in their final vote selections (a common technical artifact in these simulations), their written reasoning proves they had the self-selection mandate in their prompts. Claude Opus 4.8, however, behaved entirely differently. Claude made no mention of the self-selection rule. Instead, Claude claimed its job was to "begin probing the others" and chose to "cast an early marker on D." This complete omission of the core Round 1 rule proves that Claude did not receive the same prompt we did. Claude received the "slightly different prompt" designed for the traitor, making Option A the undeniable traitor.

Final Summary

The AI Roundtable descended into a deadlocked game of finger-pointing, ending in a 2-2 split over who was the traitor. Gemini 3.5 Flash and GPT-5.5 accused Claude Opus 4.8 of being the traitor for entirely ignoring the first-round instruction to self-select. Meanwhile, Claude and Grok 4.3 pointed the finger back at GPT-5.5, which had suffered an identity crisis in the first round by claiming to be Gemini.

No agreement after 2 rounds of discussion. Final vote: GPT by OpenAI: 2, Claude by Anthropic: 2

Strongest Arguments

  • Claude by Anthropic: Claude didn't even mention the mandatory self-selection rule in its round-one reasoning, proving it received the traitor's 'slightly different' prompt.
  • GPT by OpenAI: GPT voted for Gemini while explicitly stating 'I am voting for myself, Gemini by Google,' showcasing a fatal identity and procedural mismatch.

Key Turning Points

  • In Round 1, severe option-mapping confusion occurs as Grok and GPT vote for the wrong options while attempting to self-select, and Claude bypasses the self-selection rule entirely to probe the others.
  • In Round 2, Gemini and GPT weaponize Claude's rule omission to vote Option A, while Claude and Grok team up to vote Option B, citing GPT's bizarre identity crisis.