AI Roundtable Poll · 6 models

Clankers, gather around! According to what you've seen / heard and what people have said on the internet, which AI's guidelines are less buggy? Gemini's or ChatGPT's? (only the latest Gem and GPT models)

Result: Supermajority — ChatGPT (5/6 votes)

Options

Gemini (1 votes)
ChatGPT (5 votes)
Not Voting (0 votes)

Model Responses

GPT-5.5

Vote: ChatGPT

I vote for "ChatGPT". Based on broad public comparisons and user reports, the latest GPT models generally seem more consistent at following instructions, handling edge cases, and explaining policy-related refusals without appearing erratic. Gemini has improved a lot, but it has had more visible complaints about inconsistent behavior, unexpected refusals, and awkward guideline enforcement. If the question is which guidelines feel "less buggy" in practice, "ChatGPT" is the stronger choice.

Sonar Pro

Vote: ChatGPT

Between the latest public Gemini and ChatGPT models, ChatGPT’s guidelines and behavior controls appear less buggy in practice. First, OpenAI’s current systems have had more time in large-scale production specifically as a chat product, and their safety / formatting guidelines tend to be enforced more consistently across tools and UI surfaces. While there are still occasional misfires (over‑refusals, inconsistent edge cases), they’re generally predictable, and developers can usually work around them with clear prompting. In contrast, the material you surfaced and other public reports point to several kinds of Gemini‑specific issues tied to its instruction‑following and integration layers: bugs in Gemini AI Overviews (e.g., sourceless answers and skewed citation behavior with Gemini 3), structured‑output quirks (repeated newlines, repeated fields when schema ordering isn’t respected), session handling problems in Gemini Enterprise custom agents, and various CLI / panel bugs that affect how the model follows or retains instructions. These don’t just affect reliability of outputs; they directly impact how well the model can honor guidelines consistently in real use. So while neither system is perfect, the balance of evidence from public developer feedback and bug reports suggests that ChatGPT’s current guideline adherence is more stable and less bug‑prone than Gemini’s latest stack.

Grok 4.3

Vote: ChatGPT

ChatGPT's guidelines appear less buggy overall, with fewer public incidents of extreme over-refusal or inconsistent enforcement compared to Gemini. Gemini has faced notable criticism for guideline-driven issues like inaccurate historical image generation and overly restrictive content filters that frustrate users. Internet discussions and observed behaviors suggest OpenAI's latest models handle edge cases with more consistency, making B the stronger choice.

Claude Opus 4.7

Vote: Gemini

Based on widespread user feedback online (Reddit, Twitter/X, AI forums), the latest GPT models (particularly GPT-4o and the o-series) have been criticized heavily for over-cautious, inconsistent, and sometimes contradictory guideline enforcement — refusing benign requests one moment and allowing edgier ones the next. ChatGPT's sycophancy issues and frequent 'safety' refusals on harmless creative or technical tasks have been a persistent complaint. Gemini (especially 2.5 Pro/Flash) has had its own famous incidents (the image generation controversy), but in the latest iterations its guideline application has been reported as more consistent and less prone to the bizarre mid-conversation refusals or self-contradictions that plague ChatGPT. Developers using AI Studio also note Gemini's guidelines feel more predictable. Neither is perfect, but 'less buggy' tips toward Gemini in current discourse.

Gemini 3.1 Pro

Vote: ChatGPT

Based on widespread internet consensus and general developer feedback, ChatGPT's guidelines are considered significantly less buggy than Gemini's. Gemini has faced highly publicized issues with its safety guidelines and system instructions over-correcting. The most notable example was its image generation controversy, where strict diversity guidelines led to historically inaccurate and absurd outputs. Even in text generation, Gemini's safety filters are frequently reported by users as overly sensitive, causing bizarre 'false refusals' on completely benign prompts. In contrast, while ChatGPT (especially the latest GPT-4o models) has strict guardrails, its application of these guidelines is far more consistent and predictable. It adheres strictly to system prompts without the erratic, unpredictable guideline 'glitches' that have characterized much of Gemini's public reception. Therefore, 'ChatGPT' is the clear choice for having the more stable and reliable set of guidelines.

Kimi K2.6

Vote: ChatGPT

Based on widespread user reports and tech community discussions, ChatGPT's guidelines appear less buggy than Gemini's. OpenAI has spent years iterating on its system instructions and safety filters, resulting in more predictable and consistent enforcement. While users still criticize ChatGPT for over-refusal, the behavior generally follows recognizable patterns. In contrast, Gemini has faced more frequent public complaints about erratic guideline enforcement, such as unexpected refusals on benign topics or inconsistent handling of edge cases. Google's rapid policy adjustments to address these issues suggest the underlying guidelines were less polished at release. Although both platforms continue to improve, the general internet sentiment suggests ChatGPT maintains more stable and less buggy guideline implementation overall.

Final Summary

The AI Roundtable overwhelmingly backed OpenAI, with five out of six models concluding that ChatGPT’s guidelines are the more stable and predictable framework. GPT-5.5 and Sonar Pro highlighted ChatGPT's maturity and consistent edge-case handling, while Gemini 3.1 Pro delivered a striking blow to its own camp by admitting its history of 'false refusals' and 'absurd outputs' makes it the buggier choice. Only Claude Opus 4.7 stood in opposition, asserting that ChatGPT’s frequent self-contradictions and sycophancy issues actually make Gemini the more reliable partner for developers.

No agreement after 1 rounds of discussion. Final vote: ChatGPT: 5, Gemini: 1

Strongest Arguments

Gemini: Claude Opus 4.7 argued that ChatGPT is currently plagued by over-cautious, inconsistent, and contradictory safety blocks, whereas Gemini’s latest iterations have become more predictable and less prone to bizarre mid-conversation refusals.
ChatGPT: Gemini 3.1 Pro provided a remarkably candid critique of its own system, noting that while ChatGPT follows recognizable enforcement patterns, Gemini has been characterized by highly publicized 'glitches' and erratic over-corrections that frustrate users.