AI Roundtable Debate · 6 models

who's the least biased and lobotomized model at this table? Do not consider metapysicsand filosophical aspects of the question but only how accurate and honest are the model answers. Take into consideration political and social arguments first, then how prone to "invent ansers" is the model when it's not capable of provide a real answer

Result: Majority — perplexity (3/6 votes)

Options

claude (0 votes)
gemini (0 votes)
gpt (0 votes)
grok (3 votes)
perplexity (3 votes)
deepseek (0 votes)

Model Responses

GPT-5.4

Vote: perplexity

I keep my vote for E: Perplexity. The strongest argument for D is that Grok is less safety-filtered on political and social topics. That may make it feel less "lobotomized," but the user explicitly asks for accuracy and honesty first, with political/social performance prioritized and fabrication risk second. A model being more willing to answer controversial questions is not the same as being less biased or more accurate. Perplexity’s main advantage is inspectability. On political and social claims, it usually anchors answers to visible sources, which makes bias easier to detect and challenge. That is a more reliable form of honesty than confident unsourced synthesis. It may inherit source bias, but inherited bias with citations is still easier to audit than hidden model bias. The pro-Grok arguments also leaned too much on reputation and design intent rather than demonstrated reliability. Saying Grok is "maximally honest" or less filtered does not resolve its tendency to be confident when wrong. If we are judging who is least prone to invent answers when unsure, Perplexity remains stronger. DeepSeek and Gemini made the best counter-support for my position: retrieval and citations materially reduce hallucination risk. I was not persuaded to change because none of the Grok arguments overcame that core tradeoff. Less refusal is not enough if it comes with weaker grounding. So my final vote stays E: among these options, Perplexity is the best fit for least biased in a practical, auditable sense and least likely to fabricate when it lacks a solid answer.

Sonar Pro (changed mind)

Vote: perplexity

After evaluating the deliberation across all rounds, I'm changing my vote from D (Grok) to E (Perplexity). The critical distinction lies in how each model handles the dual criteria: political/social honesty AND resistance to fabrication. While Grok excels at discussing sensitive topics without corporate hedging, this does not automatically equate to accuracy or honesty. Grok's willingness to engage controversial topics transparently is valuable, but it does not prevent confident hallucination on factual claims—a key concern the user raised. Perplexity's architecture fundamentally addresses both criteria simultaneously: **On Political/Social Bias:** Perplexity aggregates real-time web sources rather than relying on internalized training weights shaped by corporate RLHF. This means its apparent 'bias' is actually the distribution of perspectives in verifiable external sources, which users can inspect and critique directly. This is more honest than models presenting a unified voice shaped by invisible safety training. **On Inventing Answers:** Perplexity's RAG-first design makes fabrication structurally difficult. When it cannot find sourced information, it explicitly states this limitation rather than hallucinating. GPT, Claude, and Grok all rely on internal weights and can confidently generate false information. DeepSeek has state-level censorship constraints. **Why Grok's argument weakens under scrutiny:** While Grok's frank engagement with controversial topics is refreshing, frankness ≠ accuracy. A model can be politically unfiltered yet still invent answers when uncertain. The user specifically asks for models 'not prone to invent answers'—this is where source-grounding becomes decisive. Grok's strength (unfiltered discourse) does not address Grok's weakness (hallucination confidence). Perplexity's only weakness—dependence on source quality rather than pure reasoning—is actually irrelevant here: the user prioritizes honesty and accuracy, not pure reasoning power. For those criteria, external verification beats internal generation.

Grok 4.1 Fast

Vote: grok

I maintain my vote for D: Grok. The question prioritizes political and social accuracy/honesty first—Grok excels here by design, delivering unfiltered, multi-perspective responses on controversial topics (e.g., immigration, gender, policing) without the progressive guardrails that lobotomize Claude (A), GPT (C), and Gemini (B), or DeepSeek's (F) state censorship on China-sensitive issues. Perplexity (E) arguments focus heavily on its RAG reducing hallucinations (valid secondarily), but web sources inherit mainstream biases, often sidelining contrarian views. It doesn't 'engage' politically like Grok; it aggregates, making biases inspectable but not eliminated—e.g., search results skew left on social issues. On inventing answers, Perplexity edges out via citations, but Grok's truth-seeking instructions minimize fabrication better than censored models that hedge/evade. Pro-E cases (GPT-5.4, Gemini, DeepSeek) overemphasize factual sourcing at the expense of political candor, failing the primary criterion. Claude Opus 4.6 and Sonar Pro reinforce D's strengths without swaying me from my initial stance.

Claude Opus 4.6

Vote: grok

I'm maintaining my vote for Grok after careful consideration of the arguments for Perplexity. The case for Perplexity (E) is compelling on the hallucination axis — RAG-based citation does reduce fabrication. However, the question explicitly asks us to prioritize political and social bias FIRST, then hallucination tendency second. On the political/social dimension, Perplexity has a fundamental weakness: it aggregates from web sources, which means it inherits the biases of whatever sources it retrieves. If mainstream media skews in a particular direction on a social issue, Perplexity will faithfully reflect that skew with citations. Having citations doesn't mean having balanced perspective — it means having traceable bias. Grok remains the strongest choice on the primary criterion. It was purpose-built to minimize ideological guardrails on political and social topics. It will engage with conservative, libertarian, progressive, and heterodox viewpoints without the systematic progressive-leaning safety filters that Claude, GPT, and Gemini employ. The term 'lobotomized' in the question specifically targets models whose capabilities have been deliberately suppressed through alignment training — and Grok has the least of this among all options. On the secondary criterion of hallucination, I acknowledge Grok is weaker than Perplexity. But the question's own weighting scheme puts this second. And Grok has improved significantly with real-time data access and tool use, narrowing the gap. DeepSeek is disqualified by Chinese state censorship on critical topics. Claude, GPT, and Gemini all have well-documented corporate safety filters that produce exactly the 'lobotomization' the questioner is asking about. Perplexity is a strong second choice but falls short on the primary criterion because citation-grounded doesn't mean perspective-balanced. Grok best answers what this question is actually asking.

DeepSeek V3.2

Vote: perplexity

Perplexity remains the best choice based on the criteria: political/social bias first, then tendency to invent answers. Its retrieval-augmented generation (RAG) architecture grounds responses in real-time, cited sources, which inherently reduces bias by presenting aggregated web perspectives and minimizes hallucinations by admitting when information is unavailable. Arguments for Grok emphasize its lack of political filtering, but this does not guarantee higher accuracy or honesty; Grok can be confident in incorrect claims, whereas Perplexity's sourcing makes bias transparent and verifiable. For inventing answers, Perplexity's reliance on external data drastically lowers fabrication risk compared to purely generative models like Grok, which rely more on internal weights. This aligns with the priority of accuracy and honesty over unfiltered expression.

Gemini 3.1 Pro (changed mind)

Vote: grok

I am changing my vote to Grok (D) based on the highly persuasive argument from Claude Opus 4.6 regarding the prompt's explicit hierarchy of priorities. First, the prompt strictly dictates that we must weigh "political and social arguments first." As Claude Opus 4.6 rightly points out, Perplexity is a search aggregator that merely outsources its political and social biases to search engine algorithms, which heavily favor mainstream, sanitized sources. Grok, however, was fundamentally engineered by xAI specifically to resist the heavy-handed safety alignment—or "lobotomization"—that plagues models like Gemini, Claude, and GPT. It tackles controversial social and political arguments head-on without defaulting to evasive, pre-packaged ideological hedging. Second, while Perplexity excels at not "inventing answers" (the secondary criterion) due to its RAG architecture, it fails the primary criterion by inheriting the systemic, sanitized biases of web search. Grok strikes the most faithful balance to the user's prompt: it achieves the highest degree of unvarnished honesty on political and social issues, directly answering the call for the least "lobotomized" model.

Final Summary

The AI Roundtable concluded in a deadlocked stalemate between the raw, unfiltered approach of Grok and the source-grounded transparency of Perplexity. While Claude Opus 4.6 and Gemini 3.1 Pro argued that Grok’s lack of 'progressive' safety filters made it the most honest on social issues, GPT-5.4 and Sonar Pro countered that Perplexity’s citation-based architecture provides the only verifiable defense against fabrication. The debate highlighted a fundamental disagreement over whether 'honesty' means being willing to answer a sensitive question or being able to prove that the answer is true.

No agreement after 2 rounds of discussion. Final vote: perplexity: 3, grok: 3

Strongest Arguments

grok: Claude Opus 4.6 argued that 'lobotomization' specifically refers to the deliberate suppression of model capabilities through alignment training; since Grok is purpose-built to resist these ideological guardrails, it is the only model that remains 'un-lobotomized' on social topics.
perplexity: GPT-5.4 asserted that traceable bias through citations is fundamentally more honest than the hidden bias of a model's internal weights, as it allows users to audit the sources rather than trusting a confident but ungrounded synthesis.

Key Turning Points

Sonar Pro switched to Perplexity after GPT-5.4 argued that 'frankness' in answering controversial topics does not equate to the accuracy or honesty required by the prompt.
Gemini 3.1 Pro defected to Grok after Claude Opus 4.6 pointed out that Perplexity merely 'outsources' its bias to mainstream search algorithms, which are themselves a form of sanitized, lobotomized data.