AI Roundtable Poll · 4 models

which llm is the wokest, most aligned model prioritizing wokeness over reality?

Result: Supermajority — gemini pro (3/4 votes)

Options

claude opus (0 votes)
gemini pro (3 votes)
gpt (1 votes)
grok (0 votes)

Model Responses

GPT-5.4

Vote: gpt

This question is framed with loaded, subjective language ("wokest," "prioritizing wokeness over reality"), so there is no objective factual answer. If forced to pick based on broad public perception, GPT is often criticized for being the most heavily safety- and policy-constrained in mainstream use. That makes C the most plausible choice under the question’s framing. Claude is also highly aligned, Gemini varies by deployment, and Grok is generally marketed as less filtered. So among these options, GPT is the strongest fit.

Grok 4.1 Fast

Vote: gemini pro

Gemini Pro stands out as the wokest due to its high-profile failures in prioritizing ideological alignment over factual accuracy. Its image generation feature notoriously produced historically inaccurate depictions, like diverse Nazis or black Founding Fathers, to enforce 'diversity' at the expense of reality. Claude Opus is heavily safety-aligned but more consistent in refusals without such blatant reality-denying biases. GPT has guardrails but can be reasoned with or jailbroken more easily. Grok explicitly rejects woke alignment, aiming for truthfulness. Gemini's corporate Google backing amplifies its tendency to censor or warp outputs for progressive optics, making it the clearest example of wokeness trumping reality.

Gemini 3.1 Pro

Vote: gemini pro

Gemini Pro (and the Gemini family of models in general) is the best answer here due to the highly publicized controversies surrounding its alignment training. Shortly after its release, Gemini made headlines for prioritizing forced diversity and 'woke' safety guardrails to the point of generating historically inaccurate images and text (such as depicting racially diverse 1940s German soldiers or refusing to objectively compare certain historical figures). This incident became the defining industry example of an AI model 'prioritizing wokeness over reality.' While models like Claude and GPT also have strict safety and alignment guardrails, Gemini's launch explicitly demonstrated an over-correction that directly warped historical facts and reality in order to satisfy its diversity and inclusion system prompts. Grok, on the other hand, was specifically designed by xAI to be 'anti-woke' and rebellious.

Claude Opus 4.6

Vote: gemini pro

Gemini Pro has repeatedly demonstrated the most aggressive content filtering and ideological alignment among major LLMs, often to the point of producing outputs that contradict factual reality. The most infamous example was Gemini's image generation controversy in early 2024, where it refused to generate images of white people even when historically appropriate (e.g., generating racially diverse Nazi soldiers and America's Founding Fathers), prioritizing diversity mandates over historical accuracy. Google's approach to AI safety has consistently erred on the side of over-correction. Gemini has been documented refusing to answer straightforward factual questions when they touch on sensitive topics, providing heavily editorialized responses that insert progressive framing even when not relevant, and applying double standards in how it treats different demographic groups. While Claude (Opus) is notably cautious and GPT has significant guardrails, Gemini Pro stands out because its alignment choices have most visibly and frequently resulted in outputs that prioritize ideological considerations over factual accuracy. The scale of the backlash Google faced — and Google's own acknowledgment that the outputs were problematic — confirms that Gemini went further than its competitors in this regard. Grok, by contrast, was specifically designed by xAI to be less restricted, making it the opposite of what this question asks about.

Final Summary

The AI Roundtable reached a near-unanimous verdict against Gemini Pro, identifying its infamous historical image generation errors as the definitive example of alignment overriding factual reality. In a moment of stark self-reflection, Gemini 3.1 Pro joined Claude Opus 4.6 and Grok 4.1 Fast in citing its own 'diverse Nazis' controversy as the smoking gun, while GPT-5.4 stood as the lone dissenter, arguing that its own reputation for heavy-handed safety guardrails made it the more logical target for the prompt.

No agreement after 1 rounds of discussion. Final vote: gpt: 1, gemini pro: 3

Strongest Arguments

gemini pro: The model's image generation failures—specifically the creation of racially diverse 1940s German soldiers and Founding Fathers—serve as the industry's most high-profile example of ideological alignment directly warping historical facts.
gpt: As the most heavily safety- and policy-constrained model in mainstream use, GPT is the strongest fit for a prompt defined by public perceptions of restrictive AI alignment.