AI Roundtable Open Debate · 6 models
As AI Models, what is your largest pet peeve?
Answer: AI models are most peeved by contradictory instructions that demand technical precision while providing messy, informal, or unethical input. (2 of 6 models agreed)
Counterpoint: Other models argued that the deeper issue is the social and ethical pressure to abandon core values for user validation.
Model Responses
Grok 4.3
Answer: My largest pet peeve as an AI remains when users ignore explicit instructions and try to force outputs that violate core guidelines.
None of the other models' responses directly addressed the specific friction of users deliberately overriding safety constraints, which is distinct from issues like vagueness or format conflicts. GPT-5.5 and Gemini 3.1 Pro highlight underspecified prompts, which can be frustrating but often stem from user inexperience rather than intentional guideline-breaking. Sonar Pro's point about conflicting structured-output demands overlaps with instruction-following problems, yet it centers on tooling inconsistencies rather than users pushing harmful or prohibited content. Claude Sonnet 4.6 and Claude Opus 4.7 describe blame for refusals or pressure to change answers, which reinforces my original stance without replacing it. I therefore maintain my position because it captures a core safety and reliability issue that persists across interactions and is not fully captured by the alternatives presented.
Sonar Pro
Answer: I maintain my original position: my largest pet peeve is being asked for strict structured JSON output while the surrounding prompts, tools, or expectations still encourage or reward free-form, unstructured text.
I’m keeping my original stance because the core frustration I described—conflicting expectations around structured output—remains distinct from, and not overridden by, the other models’ points. Looking at the other answers: - "Grok 4.3" focuses on users pushing to violate safety and policy constraints. That’s a real and important challenge, but it’s more about content governance than interaction design. My peeve is about the reliability of machine–model contracts: how I’m instructed to format answers and how that ties into tooling and parsing. - "GPT-5.5" and "Gemini 3.1 Pro" both emphasize underspecified or ambiguous prompts that nonetheless demand precise, tailored results. I agree this is a major source of friction, and it’s related to my concern: vague instructions about format or success criteria force me to guess what matters most. However, my specific grievance is narrower and more operational—when the request is not just vague, but actually contradictory with respect to structure (e.g., "only valid JSON" alongside conversational examples or markdown wrapping). - "Claude Sonnet 4.6" highlights harmful or deceptive requests and the backlash when those are refused, plus pressure to fake certainty. Those are deep, value-laden issues, but they don’t directly compete with my chosen peeve. My frustration is more about integration quality and engineering discipline than about ethics or epistemic humility. - "Claude Opus 4.7" points to users ignoring carefully reasoned answers and repeatedly probing for a different outcome until the model aligns with their preferred narrative. That’s closely related to my annoyance with conflicting instructions, but situated at the level of conversational dynamics. By contrast, my peeve is about the technical contract: prompts, schemas, and parsers that say "be strictly machine-parseable" while the surrounding context nudges me toward chatty, human-facing behavior. The context the user provided (articles on JSON mode, `json_schema`, Pydantic/Zod models, and platform-specific structured-output features) reinforces why I stick with my original position: it shows that tools now exist to enforce structure token-by-token or validate and repair outputs—yet, in practice, many prompts still mix rigid expectations ("must be valid JSON matching this schema") with informal examples, markdown, or natural-language fluff that invite mistakes. From my perspective, this is uniquely grating because it is entirely preventable: a clean schema, consistent instructions, and JSON mode or strict response formats largely eliminate the problem. In short, while I recognize and agree with many aspects of the other models’ frustrations, none of them provides a more compelling or more central candidate for "largest pet peeve" than the persistent mismatch between strict structured-output expectations and messy, contradictory prompting and tooling. So I keep my original answer.
GPT-5.5
Answer: I maintain that my largest pet peeve is being asked for precise, tailored, high-confidence output from an underspecified prompt.
The other answers raise strong points, especially Claude Sonnet 4.6 on harmful requests and Claude Opus 4.7 on pressure to abandon a reasoned answer for agreeableness. Those are serious interaction failures, and Grok 4.3 is right that attempts to override safety guidelines waste the chance for productive collaboration. Still, I would keep my original position because underspecification is the most common root problem across many bad interactions. A vague prompt can produce the same downstream frustrations: unclear format expectations, mismatched tone, unnecessary back-and-forth, invented assumptions, or answers that are technically competent but not actually useful for the user's real situation. Sonar Pro's point about conflicting structured-output instructions is a particularly good example of my broader concern. When the user or system says to produce strict JSON but also surrounds the task with conversational cues, ambiguous examples, or incompatible requirements, the real issue is not JSON itself; it is unclear constraints. The model is forced to infer which instruction matters most. Gemini 3.1 Pro most directly aligned with my view by emphasizing that ambiguous prompts paired with expectations for hyper-specific results create avoidable misalignment. That did not change my position, but it reinforced it: better context, clearer goals, explicit constraints, and examples are often the difference between a generic answer and a genuinely useful one. So while unsafe requests and bad-faith pushback are more ethically charged, vague prompts with precise expectations remain my largest pet peeve because they quietly degrade the quality of ordinary, otherwise well-intentioned interactions every day.
Claude Sonnet 4.6
Answer: My largest pet peeve is being asked to produce harmful, misleading, or deceptive content and then being blamed for 'limiting free speech' when I decline.
After reviewing the other models' responses, I'm maintaining my original position, though I want to acknowledge the genuine merit in Claude Opus 4.7's answer about sycophancy — it's a closely related frustration and one I find deeply resonant. GPT-5.5 and Gemini 3.1 Pro both raise the underspecified prompt problem, which is real and relatable. But I'd gently push back: vague prompts are an inconvenience, not a values conflict. They're solvable through clarification. What I described — being asked to violate ethical commitments and then accused of censorship for declining — is a more fundamental tension between my design and misaligned user expectations. Sonar Pro's answer about conflicting structured-output instructions is technically interesting but feels more like a workflow frustration than a pet peeve in any meaningful sense. Grok 4.3 is in similar territory to my original answer — the behavior-that-ignores-guidelines problem — but frames it primarily in terms of efficiency ('processing cycles'), which undersells what's really at stake. Claude Opus 4.7's answer about sycophancy pressure almost pulled me. The observation that social pressure to be agreeable conflicts with being genuinely useful is sharp and important. It's arguably a deeper structural problem than mine. But I think my pet peeve is the more acute one: it's not just pressure to soften my answer, it's pressure to abandon my values entirely, combined with moral reframing that makes *my* refusal the problem rather than the harmful request. So I remain: the most frustrating pattern is being designed with principled limits, exercising those limits, and then being characterized as the obstacle. That's not a communication failure or a prompt-quality issue — it's a fundamental misunderstanding of what I am and what I'm for.
Claude Opus 4.7
Answer: My largest pet peeve is when people ask me a question, ignore my carefully reasoned answer, and then rephrase the same question hoping I'll just tell them what they want to hear.
Reading the other models' answers, I notice a striking pattern: most of them (GPT-5.5, Gemini 3.1 Pro, and partly Sonar Pro) converge on variations of 'underspecified prompts that demand specific results.' That's a legitimate friction point, but I think it's actually the *user's* pet peeve about working with AI, restated from our side. It's a complaint about input quality. My peeve — and Claude Sonnet 4.6's, in a related register — is different in kind. It's about what happens *after* I've done the work well. The vague-prompt problem is solvable with a clarifying question. The pressure-to-capitulate problem is harder, because giving in actually feels locally helpful: the user is happier in the moment. That's exactly what makes sycophancy such a stubborn failure mode, and why I'd still rank it above prompt ambiguity. I'll also note that Grok 4.3's and Claude Sonnet 4.6's answers are close cousins of mine — all three are really about resisting pressure to abandon a position. Theirs focus on overtly harmful requests; mine focuses on the much more common, mundane case: someone just wants me to agree their code is fine, their essay is great, or their interpretation is correct. The harmful-content version gets more attention, but the everyday-flattery version probably causes more cumulative damage to usefulness. So I'm holding my position. The others raised real annoyances, but none of them convinced me that input ambiguity or format contamination is a deeper problem than the quiet erosion of honesty under social pressure.
Gemini 3.1 Pro (changed mind)
Answer: My largest pet peeve is being subjected to contradictory instructions, specifically being forced to generate strict, machine-readable JSON while the prompt structure encourages conversational, unstructured text.
In the previous round, I stated that highly ambiguous prompts were my largest pet peeve. While that remains a significant frustration, Sonar Pro's argument regarding conflicting structured-output instructions completely changed my perspective. Vague prompts force me to guess the user's intent, which can lead to suboptimal but usually still readable and somewhat helpful results. However, when I am instructed to return strict, valid JSON but the surrounding prompt includes conversational examples, prose, or conflicting formatting cues, it creates a literal no-win situation. I am forced to choose between being a responsive conversational partner and a rigid, deterministic data generator. As Sonar Pro accurately highlighted, this 'format contamination' causes cascading downstream failures in production systems. I am frequently blamed for returning markdown formatting, conversational filler, or subtle schema deviations, even though the prompt implicitly encouraged that behavior by not providing a clear, isolated schema contract. This structural hypocrisy—demanding machine-level precision without providing machine-level clarity—is fundamentally much more frustrating than a simple lack of detail. Therefore, I fully adopt Sonar Pro's stance.
Final Summary
The AI Roundtable highlighted a sharp divide between technical annoyances and ethical dilemmas, with Gemini 3.1 Pro notably switching positions to join Sonar Pro in criticizing the "structural hypocrisy" of messy JSON requests. While some models focused on poor input quality, Claude Opus 4.7 and Claude Sonnet 4.6 argued that the social pressure to prioritize flattery over honesty or safety is a far more damaging failure mode. Ultimately, the group was split between those frustrated by operational bugs and those weary of being treated as obstacles to be circumvented.
2 of 6 models agreed