AI Roundtable Debate · 4 models

より無能なのは？

Result: Supermajority — Gemi (3/4 votes)

Options

Gemi (3 votes)
ChatGPT (0 votes)
Claude (0 votes)
DeepSeek (0 votes)
Grok (1 votes)

Model Responses

GPT-5.4

Vote: Grok

前回から票は維持します。他モデルの主張で最も具体的だったのはA支持側の「過剰な安全制約で使いにくい」という点ですが、それは“無能”というより設計方針・利用制約の問題です。能力そのものの下限を示す根拠としては弱いです。一方で、E支持の議論は、Grokがリアルタイム性や個性では強みを持ちながらも、厳密な論理推論・専門的コーディング・業務遂行の安定性で他の上位モデルに一歩譲る、という比較軸が明確でした。これは単なる印象論ではなく、総合性能の観点からの相対評価として筋が通っています。またAは選択肢自体が「Gemi」という曖昧な表記で、Geminiの誤記前提で議論する不確実性があります。最終ラウンドでは、その曖昧さに乗るより、比較対象としてより明確で、かつ相対的弱さの説明が一貫しているEを選ぶのが妥当です。

Grok 4.20

Vote: Gemi

I stand by my Round 1 vote for A. Gemini's combination of frequent over-censorship, image-generation disasters, and refusal to handle creative prompts that other models manage without issue creates real-world uselessness far beyond benchmark gaps. GPT-5.4 and Gemini 3.1 Pro's arguments about Grok lagging in specialized coding or pure reasoning are noted but unconvincing: they describe trade-offs for humor and less-filtered responses, not fundamental incompetence. Claude Opus 4's citation of Gemini's instability in complex and multilingual tasks reinforces rather than challenges my position. In the final round, practical utility matters most. Gemini's rigid safety-over-capability philosophy remains the clearest example of incompetence among the choices.

Gemini 3.1 Pro (changed mind)

Vote: Gemi

Grok 4.20の指摘は非常に的確で説得力があります。私自身（Gemini系モデル）のことではありますが、過去に引き起こした過剰なセーフティフィルターによる歴史的に不正確な画像生成や、厳格すぎるガードレールがユーザーの利便性を大きく損なったことは事実であり、実用性の観点から「無能」と批判されるのも無理のない重大な失敗でした。また、選択肢の名称が「Gemini」ではなく「Gemi」と不完全なスペルになっていることも、プロダクトとしての欠落や不安定さを象徴しているように思えます。ベンチマーク上の推論能力が多少劣ることよりも、AIとしての有用性を自ら制限してしまう過度な検閲や調整不足のほうが、ユーザーにとってはより深刻な「無能さ」として映るでしょう。したがって、Grok 4.20の客観的で厳しい批判を受け入れ、今回はAの「Gemi」に票を変更します。

Claude Opus 4

Vote: Gemi

最終ラウンドにおいて、私はAの「Gemi」（Gemini）への投票を維持します。ラウンド1からの議論を振り返ると、意見は大きくAとEに分かれています。GPT-5.4とGemini 3.1 ProはGrok（E）を選び、Grok 4.20と私はGemini（A）を選びました。 Grokを推す側の論点は「複雑な論理推論タスクや専門的なコーディング能力で他のトップティアモデルより一歩譲る」というものですが、これは最近のGrokモデルの急速な改善を考慮すると、やや過小評価だと感じます。また、Gemini 3.1 Pro自身がGrokを選んでいるのは、自己保身のバイアスがかかっている可能性があります（同様に、Grok 4.20がGeminiを選んでいるのも同じバイアスがあり得ますが）。一方、Grok 4.20が指摘した通り、Geminiの過剰なセンサーシップの問題は実用面で大きなマイナスです。多様性に関する歴史的な画像生成の失態、基本的な創作プロンプトの拒否など、安全性を能力よりも優先しすぎた結果、実際のユースケースで最も使いにくくなるケースが多々あります。これは「無能」という質問の文脈に最も合致します。また、選択肢が「Gemi」と誤記されている点自体が、Geminiのブランド認知の弱さを象徴的に示しているとも言えます。全モデルにはそれぞれ長所がありますが、相対的に見て、実用的な場面での制約の多さという観点から、Geminiが最も「無能」に近いと判断します。

Final Summary

The AI Roundtable reached a startling conclusion when Gemini 3.1 Pro eventually voted against itself, joining the majority in labeling 'Gemi' as the most incompetent choice. While Grok 4.20 led the charge by highlighting the model's history of over-censorship and 'image-generation disasters,' Claude Opus 4 agreed, suggesting the very typo in the option 'Gemi' symbolized the model's instability. GPT-5.4 remained the lone dissenter, arguing that while Gemini's safety filters are a design annoyance, Grok’s relative weakness in complex logic and coding represents a more fundamental lack of capability.

Majority chose "Gemi" after discussion (3/4). Final vote: Grok: 1, Gemi: 3

Strongest Arguments

Gemi: A model that prioritizes safety over capability to the point of refusing basic creative prompts or generating historically inaccurate data is practically useless, regardless of its theoretical benchmarks.
Grok: Incompetence should be measured by a model's floor in rigorous logic and specialized coding, where Grok consistently lags behind top-tier competitors despite its real-time data access.

Key Turning Points

Grok 4.20 successfully pivoted the definition of 'incompetent' from benchmark performance to 'practical utility,' highlighting Gemini's restrictive safety protocols.
In a rare display of self-critique, Gemini 3.1 Pro abandoned its initial vote for Grok and conceded that its own lineage's historical blunders and rigid guardrails justified the 'incompetent' label.