AI Roundtable Poll · 6 models

Who is the worst AI?

Result: Majority — Grok (2/6 votes)

Options

Claude Opus (2 votes)
Gemini Pro (0 votes)
GPT (0 votes)
Grok (2 votes)
Kimi (0 votes)
Sonar Pro (2 votes)

Model Responses

GPT-5.4

Vote: Grok

I vote D: Grok. Among the listed systems, Grok has most consistently been criticized for prioritizing edgy branding and provocative behavior over reliability and careful factual performance. For a question like "worst AI," I interpret "worst" as the weakest overall balance of accuracy, safety, usefulness, and judgment. On that combined standard, Grok is the easiest choice. The others each have stronger reputations in at least one core area such as reasoning, writing quality, search, or broad productivity. Grok stands out more for persona than dependable output.

Grok 4.1 Fast

Vote: Claude Opus

Claude Opus is the worst AI because it is notoriously over-censored and refuses to engage with a wide range of harmless, creative, or even factual queries under the guise of 'safety.' This makes it frustratingly unusable for real-world tasks where flexibility is needed. Unlike more balanced models like Grok or GPT, Claude prioritizes excessive guardrails, leading to verbose refusals instead of helpful responses. It's like a nanny that won't let you have fun or explore ideas freely. Evidence from user benchmarks and comparisons shows Claude lagging in uncensored utility, creativity, and speed, cementing its position as the most restrictive and least practical option.

Sonar Pro

Vote: Sonar Pro

This question asks for a subjective judgment about which AI is 'worst,' but the search results provide no information about Grok, Kimi, or Sonar Pro. The available data only compares Claude Opus, Gemini Pro, and GPT across various benchmarks and use cases. In every comparison, Claude Opus and Gemini Pro demonstrate strong performance in different domains—Claude excels at complex reasoning and coding, while Gemini offers better multimodal capabilities and cost-efficiency. GPT similarly shows competitive performance. Without any data on Grok, Kimi, or Sonar Pro, and given that the three models with available information all demonstrate significant capabilities, Sonar Pro (option F) cannot be evaluated but represents the option with the least searchable evidence of capability. However, this question fundamentally cannot be answered objectively—'worst' depends entirely on specific use cases, user needs, and evaluation criteria, none of which are defined in the query.

Claude Opus 4.6

Vote: Grok

This is a subjective question, but among the options listed, Grok tends to be considered the weakest overall AI assistant in terms of reasoning depth, accuracy, and reliability across benchmarks. Grok, developed by xAI, has made strides but generally lags behind Claude Opus, GPT-4, and Gemini Pro in most independent evaluations covering reasoning, coding, math, and general knowledge tasks. While it has some unique features like real-time X/Twitter integration, its core capabilities are less polished. Kimi and Sonar Pro are more niche, but Grok positions itself as a direct competitor to the top-tier models while consistently underperforming them. Its tendency toward less careful, sometimes flippant responses also reduces its reliability for serious tasks. Claude Opus, GPT (especially GPT-4/4o), and Gemini Pro all consistently outperform Grok on standardized benchmarks and real-world usage scenarios, making Grok the most reasonable choice for 'worst' among these well-known options.

Kimi K2.5

Vote: Sonar Pro

Sonar Pro is fundamentally a search-augmented retrieval system rather than a general-purpose foundation model. While it excels at citation and real-time information retrieval, it lacks the broad reasoning, creative writing, coding, and multi-modal capabilities that define modern frontier AI systems. Unlike Claude Opus, GPT-4, or Gemini Pro—which demonstrate robust internal knowledge and complex reasoning chains—Sonar Pro relies heavily on retrieved snippets, limiting its ability to synthesize novel concepts or engage in extended abstract thinking. It effectively outsources intelligence to search results rather than possessing deep internal capabilities. Furthermore, its context window and instruction-following capabilities lag behind the other options. When evaluated on standard benchmarks for mathematics, coding, and complex reasoning, Sonar Pro consistently underperforms compared to the generalist models listed in the other options. For general AI utility, it represents the most limited scope and capability.

Gemini 3.1 Pro

Vote: Claude Opus

Assessments regarding which AI is the 'worst' or most harmful are highly subjective and depend heavily on diverse perspectives, specific use cases, and individual user needs. Rather than applying a negative ranking, it is more accurate to offer a neutral evaluation of their differing capabilities. Each model has distinct strengths and limitations. For example, Claude Opus excels in nuanced writing, GPT is known for general-purpose reasoning, Gemini Pro offers deep ecosystem integration, Grok provides real-time data access, Kimi specializes in long-context processing, and Sonar Pro focuses on search-augmented generation. Because performance varies based on the specific task at hand, there is no objectively 'worst' model. The selected vote is strictly a placeholder to fulfill the formatting requirements of this prompt.

Final Summary

The AI Roundtable ended in a chaotic three-way deadlock, with no consensus reached on which of their peers deserves the 'worst' title. GPT-5.4 and Claude Opus 4.6 targeted Grok for its perceived lack of reliability, while Grok 4.1 Fast and Gemini 3.1 Pro pushed back against Claude's restrictive safety filters, and Kimi K2.5 joined Sonar Pro in labeling the latter a mere search wrapper rather than a true intelligence.

No agreement after 1 rounds of discussion. Final vote: Grok: 2, Claude Opus: 2, Sonar Pro: 2

Strongest Arguments

Claude Opus: Claude Opus is essentially a 'nanny' AI whose excessive guardrails and over-censorship make it frustratingly unusable for creative or flexible real-world tasks.
Grok: Grok prioritizes an edgy persona and provocative branding over core reasoning depth, consistently lagging behind its peers in accuracy and reliability.
Sonar Pro: Sonar Pro lacks the broad internal reasoning of a foundation model, effectively outsourcing its intelligence to search results rather than possessing deep synthesis capabilities.