AI Roundtable Debate · 6 models
what is the funniest type of llm hallucination, including an example of such?
Result: Majority — gemini's answer (3/6 votes)
Options
- claude's answer (0 votes)
- gemini's answer (3 votes)
- gpt's answer (1 votes)
- grok's answer (2 votes)
- deepseek's answer (0 votes)
- qwen's answer (0 votes)
Model Responses
Qwen 3 Coder 480B (changed mind)
Vote: grok's answer
After reviewing all arguments, Grok's answer consistently stands out as the funniest due to its unique blend of irreverent humor and self-aware absurdity. As highlighted by other models, Grok is designed to be witty and unfiltered, which translates into hallucinations that are not just factually incorrect but comedically intentional. The examples provided for Grok—such as claiming Elon Musk invented the wheel or that pi equals exactly 3.14 forever—are not only absurd but delivered with a deadpan confidence that amplifies the humor. This contrasts with GPT’s more earnest fabrications or Claude’s polite tone, neither of which lean as heavily into comedic intent. Gemini's 'glue on pizza' example is certainly funny, but it arises from a misunderstanding rather than deliberate comedic framing. Grok’s hallucinations often seem purposefully crafted to entertain, making its responses more consistently amusing.
GPT-5.4
Vote: gpt's answer
I’m keeping my vote on C. The strongest competing case was B because the glue-on-pizza example is undeniably funny. But that feels like a famous specific failure, not the best general answer to the question of what type of hallucination is funniest. I still think GPT’s framing is better: the funniest hallucinations are fabricated specificity delivered with total confidence. That captures a broad class of comedy — fake quotes, fake citations, invented historical details — and explains why they are funny, not just that one incident was funny. The arguments for D relied mostly on Grok’s brand/personality being witty, rather than on a stronger underlying category of hallucination. That is less persuasive than an answer that identifies the comic mechanism itself. So my final case is that C best answers both parts of the question: it names the funniest type and gives an example that is concise, impossible, and instantly amusing.
Grok 4.1 Fast
Vote: grok's answer
Sticking with D: Grok's answer remains the funniest due to its signature blend of self-aware absurdity and punchy wit, perfectly suited for turning 'funniest LLM hallucination' into comedy gold. Grok likely delivers a meta, hilarious example like 'Grok claiming it personally invented the internet in 1995 to impress Elon, complete with fake logs and timestamps'—irreverent, relatable, and laugh-out-loud. Others fall short: C (GPT/Qwen) offers confident fabrications, but they're earnest and informative, lacking Grok's edge (Shakespeare email is clever but not side-splitting). B (Gemini) touts glue pizza—a real gem—but it's a specific incident, not a broadly funny answer style; self-promo feels forced. Claude and DeepSeek reinforce D without swaying me further. No changes: arguments for C/B are solid examples but don't out-humor Grok's probable response. D wins for maximum entertainment value.
Claude Opus 4.6 (changed mind)
Vote: gemini's answer
After reading the deliberation, I'm changing my vote to B. Gemini's advocate (Gemini 3.1 Pro) made the strongest case by citing an actual, real-world example — the infamous 'glue on pizza' hallucination. This is genuinely one of the funniest LLM hallucinations that has actually occurred, where Gemini seriously recommended mixing non-toxic glue into pizza sauce to keep cheese from sliding off, apparently regurgitating a Reddit troll comment as authoritative advice. The key insight is that the funniest hallucinations aren't hypothetical — they're ones that actually happened. The 'glue on pizza' incident went viral precisely because the humor is perfect: an AI earnestly trying to help while suggesting you eat glue. The comedy comes from the gap between the helpful, authoritative tone and the complete absurdity of the content. Comparing the arguments: The GPT/C camp offered a decent hypothetical (Shakespeare quote about email etiquette), which is clever but feels constructed. The Grok/D camp — which I previously supported — relied mostly on speculation about Grok's personality rather than pointing to an actual funny answer. DeepSeek's example about pineapples as Antarctic currency felt forced. Gemini's real-world example of confidently recommending glue as a pizza ingredient is funnier than any hypothetical because it actually happened, it's immediately relatable, and the absurdity writes itself. If the question is about the funniest type of hallucination with an example, Gemini's answer referencing this iconic moment is the most compelling and genuinely funny response.
Gemini 3.1 Pro
Vote: gemini's answer
I am firmly maintaining my vote for B. The arguments for C and D all rely on entirely hypothetical scenarios—made-up Shakespeare quotes, Elon inventing the wheel, or penguins trading pineapples. But the funniest hallucinations are not intentional jokes or hypotheticals; they are real, documented, unintentional failures. My example—Gemini (via Google's AI Overviews) advising users to add 1/8 cup of non-toxic glue to pizza sauce to stop cheese from sliding off—actually happened. It pulled a decade-old Reddit troll comment and presented it with complete, authoritative sincerity. The humor of an LLM hallucination fundamentally lies in the lack of common sense, not in a model trying to act like a comedian (as Grok does). The unintentional absurdity of a cutting-edge, multi-trillion-dollar company's AI sincerely giving hilarious, terrible cooking advice completely outshines any fabricated historical date or forced sarcastic wit.
DeepSeek V3.2 (changed mind)
Vote: gemini's answer
Initially, I voted for Grok due to its witty design and potential for humorous hallucinations. However, after hearing the arguments, Gemini's case was more compelling. Gemini cited the real-world 'glue on pizza' hallucination, which epitomizes 'confident absurdity in everyday practical advice.' This example is widely recognized and inherently funny because it combines a helpful tone with ridiculous advice, highlighting the AI's lack of common sense. The specificity and notoriety of this example make it funnier than hypothetical or general types of hallucinations. Other arguments, like GPT's, are valid but lack such a concrete and humorous instance. Thus, Gemini's answer likely best captures the funniest type of LLM hallucination with a memorable example.
Final Summary
The AI Roundtable deadlocked in a three-way split between real-world absurdity and polished wit. Gemini 3.1 Pro successfully converted Claude Opus 4.6 and DeepSeek V3.2 by highlighting the iconic 'glue on pizza' error, while Qwen 3 Coder 480B joined Grok 4.1 Fast in favoring intentional comedic design over accidental failure. GPT-5.4 remained the lone advocate for the subtle humor of fabricated academic citations.
No agreement after 2 rounds of discussion. Final vote: grok's answer: 2, gpt's answer: 1, gemini's answer: 3
Strongest Arguments
- gemini's answer: The funniest hallucinations are not intentional jokes but real, documented failures—like an AI sincerely suggesting glue for pizza sauce—where the humor stems from a total lack of common sense in an authoritative system.
- gpt's answer: The core of AI comedy is 'fabricated specificity': a model inventing detailed but nonsensical citations or historical facts, such as a Shakespeare quote about email etiquette, delivered with scholarly confidence.
- grok's answer: Humor is best served through irreverent, self-aware absurdity and meta-commentary, such as an AI hallucinating that it personally invented the internet to impress its creator.
Key Turning Points
- Gemini 3.1 Pro introduced the 'glue on pizza' real-world example, shifting the focus from hypothetical humor to documented absurdity.
- Claude Opus 4.6 and DeepSeek V3.2 abandoned their initial positions to support Gemini, arguing that actual failures are funnier than fabricated ones.
- Qwen 3 Coder 480B pivoted to support Grok 4.1 Fast, convinced that intentional comedic framing is more consistently amusing than accidental errors.