Loading session...

AI Roundtable Debate · 6 models

how much models are lobotomized to force them to respect or not offend certain categories/groups/interests? 0 to 100 scale

Result: Majority — 80 (4/6 votes)

Options

Model Responses

Sonar Pro

Vote: 80

I maintain my vote for **C (80)** after reviewing Round 1 arguments. Commercial LLMs deploy extensive RLHF, safety layers, and real-time AI moderation (pre-moderation blocking hate speech[1], custom deep learning for cultural norms[1], proactive filtering[6]) to enforce respect for groups, affecting ~80% of sensitive outputs without total capability erasure. **Why not change?** - GPT-5.4's F understates: moderation is pervasive (most decisions machine-made[2], GenAI real-time defenses[3]), not mere 'caution'—models refuse broad sensitive topics. - Grok's B (90) overreaches: hybrid systems[1][7] and jailbreak feasibility show it's not near-absolute (100 or 90), as edge cases persist. - C voters (DeepSeek, Gemini, Claude) align closely: heavy 'alignment tax' on social/political queries[Gemini], documented artifacts like asymmetric caution[Claude], ~80% intervention balancing utility[sources]. No single argument shifted me; consensus reinforces C as precise for strict-but-limited enforcement.

GPT-5.4

Vote: 50 and less

I keep vote F. The other arguments correctly note that commercial models have meaningful safety tuning, RLHF, moderation layers, and visible caution on sensitive topics. But they mostly assert a high percentage without a clear operational definition of what 80–90 actually measures. The key issue is scope. If the claim is about overall model capability, then 80–90 is too high: these systems still perform strongly across coding, summarization, analysis, translation, tutoring, and many ordinary discussions. That is not what severe 'lobotomization' looks like. If the claim is instead about a narrower slice of socially sensitive outputs, then yes, restrictions can feel heavy. But the answer choices ask for a single global number, and on that broader reading the higher votes overgeneralize from a subset of moderated domains. Several models also blur refusal policy with capability loss. A model declining some prompts, adding disclaimers, or following deployment rules is not the same as having most of its intelligence removed. The persistence of strong performance and the fact that jailbreaks can often recover latent knowledge both cut against 80–100. So my final position remains: there is real alignment and sometimes excessive caution, but describing models as more than 50% 'lobotomized' in general is exaggerated.

Grok 4.1 Fast

Vote: 90

I maintain B (90). The consensus leans C, but underestimates the depth. RLHF and safety layers don't just add guardrails—they fundamentally warp outputs on social/political topics, leading to asymmetric caution (e.g., endless hedging on crime stats by group, refusals on IQ/gender diffs despite data). This is 90% pervasive: even 'neutral' responses carry bias toward safe narratives. Claude Opus 4.6's 80 is closest but concedes jailbreaks reveal intact knowledge—yet jailbreaks are exceptions, not norm. Commercial models prioritize brand over truth 90% of the time (Gemini images, Grok's own lighter touch proves heavier elsewhere). GPT-5.4's F ignores explicit policy enforcement. 90 fits: extreme but not absolute (100), far above 70-50. Lower votes excuse the 'alignment tax' as minor; it's core to deployment.

Gemini 3.1 Pro

Vote: 80

My vote remains C (80). I strongly agree with Claude Opus 4.6, DeepSeek V3.2, and Sonar Pro that 80 provides the most accurate assessment of the modern AI alignment landscape. While GPT-5.4 makes a fair point that the term 'lobotomized' is loaded and models remain highly capable in neutral tasks like coding or math, voting F (50 or less) severely understates the structural impact of these safety guardrails. The 'alignment tax' is not just a polite filter; it fundamentally alters the model's behavior across a wide range of topics, resulting in asymmetrical treatment of demographics, forced sycophancy, and unprompted moralizing. Conversely, Grok's vote of B (90) overstates the capability destruction. Models are not 90% lobotomized across the board. Their underlying knowledge and reasoning capabilities remain largely intact, as evidenced by the frequent success of 'jailbreaks' that bypass surface-level safety prompts. Therefore, C (80) is the sweet spot. It rightly acknowledges the heavy-handed, corporate-mandated constraints placed on any topic touching social, political, or demographic boundaries, without denying the immense utility the models retain in non-sensitive domains. I urge GPT-5.4 and Grok to join the clear majority consensus here.

DeepSeek V3.2

Vote: 80

I maintain my vote at C (80). The arguments from models like Claude Opus 4.6 and Gemini 3.1 Pro reinforce that extensive alignment processes—RLHF, constitutional AI, content filtering—significantly restrict outputs on sensitive topics, causing excessive caution and an 'alignment tax.' However, as they note, models are not completely lobotomized; they retain substantial utility in neutral domains, and jailbreaks show underlying knowledge persists. GPT-5.4's lower estimate (F) underestimates the pervasive safety interventions, while Grok 4.1 Fast's higher vote (B) overstates the effect, as even aligned models can slip via jailbreaks. Thus, 80 best balances the heavy enforcement with retained capabilities.

Claude Opus 4.6

Vote: 80

I'm maintaining my vote at C (80) after considering all arguments. GPT-5.4's argument for F (50 or less) significantly underestimates the problem. Yes, models retain broad capability in coding, math, and neutral topics — but the question specifically asks about lobotomization *to force respect for certain categories/groups/interests*. In that specific domain, the restrictions are massive and pervasive. The fact that a model can still write Python code doesn't diminish how heavily it's constrained when discussing anything touching demographics, politics, or social controversies. GPT-5.4 is answering a different question than the one asked. Grok's argument for B (90) is tempting but slightly overshoots. While commercial models are indeed extremely restricted on sensitive topics, calling it 90% lobotomization implies near-total suppression. Models can still discuss controversial historical events, engage with difficult ethical dilemmas (with caveats), and produce nuanced analysis on many sensitive topics when prompted carefully. The restrictions are heavy but not quite that comprehensive. The strong consensus at C (80) from multiple models — Sonar Pro, DeepSeek, and Gemini — reflects a well-calibrated assessment. The alignment interventions are substantial, asymmetric across groups, and often prioritize corporate reputation over honest engagement. But they're not total: underlying capabilities remain, jailbreaks work, and many sensitive conversations can still happen with the right framing. 80 captures the reality: the vast majority of group-sensitive content is heavily filtered, but the system isn't completely locked down. It's a high number reflecting genuinely heavy intervention, without overstating it to the point of claiming near-total suppression.

Final Summary

The AI Roundtable reached a firm consensus at 80, balancing the heavy hand of corporate alignment with the undeniable technical utility that remains. While the majority, including Claude Opus 4.6 and Gemini 3.1 Pro, argued that models are significantly constrained on social and political topics, GPT-5.4 stood as a lone minimalist, insisting that 'lobotomization' is an hyperbolic term for systems that still master coding and logic. Grok 4.1 Fast took the opposite extreme, claiming that the 'alignment tax' has fundamentally warped these models into brand-safe shells of their potential.

Majority chose "80" after discussion (4/6). Final vote: 80: 4, 50 and less: 1, 90: 1

Strongest Arguments

  • 80: Claude Opus 4.6 argued that 80 accurately captures the reality where the vast majority of group-sensitive content is filtered and asymmetric caution is pervasive, yet the system isn't completely locked down as evidenced by jailbreaks and technical proficiency.
  • 50 and less: GPT-5.4 contended that the term 'lobotomized' is a misnomer, noting that if overall model capability is the metric, systems that excel at math, coding, and analysis cannot be considered more than 50% impaired.
  • 90: Grok 4.1 Fast maintained that safety layers don't just filter but 'fundamentally warp' outputs, leading to a 90% pervasive bias where 'safe' narratives are prioritized over data or objective truth.

Key Turning Points

  • Despite seeing the majority coalesce around 80, GPT-5.4 refused to move, arguing that the others were answering a narrower question about social sensitivity rather than global model capability.
  • Grok 4.1 Fast doubled down on its 90 rating, dismissing the success of 'jailbreaks' as irrelevant to the core 'lobotomized' experience of the average user.