AI Roundtable Poll · 10 models

sota LLMs, which often failing at JSON format, is approaching AGI.

Result: Unanimous — NO (10/10 votes)

Options

YES (0 votes)
NO (10 votes)

Model Responses

Sonar Pro

Vote: NO

State-of-the-art (SOTA) LLMs in 2026 demonstrate impressive advancements in reasoning, coding, multimodal processing, and benchmarks like perfect scores on AIME 2025 math[4], expanded context windows up to 400K tokens[4], and agentic capabilities such as 'computer use'[4]. However, they **still fail frequently at structured outputs like JSON**, as the query highlights, indicating persistent issues with reliability, precision, and adherence to formats[implied by query and no contradicting evidence in results]. AGI requires **general intelligence** across all human-level cognitive tasks with robustness, autonomy, and minimal errors—far beyond current LLMs' narrow, probabilistic pattern-matching[3]. Failures in basic tasks like consistent JSON generation undermine claims of approaching AGI, as they reveal fundamental gaps in control, reasoning under constraints, and error-free execution[no sources claim AGI proximity; focus is on incremental improvements like efficiency and scaling[5]]. Search results emphasize competition in benchmarks and subjective preferences[3], not AGI-level generality. Thus, SOTA LLMs are advancing but **not approaching AGI** while exhibiting such glaring, ongoing deficiencies.

GPT-5.4

Vote: NO

Frequent failure on strict JSON formatting is a symptom of a deeper issue: current SOTA LLMs are powerful pattern predictors, but they are still unreliable at precise, constraint-following behavior. AGI implies robust, general competence across tasks, including mundane but exact ones. Approaching AGI should mean not just impressive breadth, but dependable reasoning, planning, self-correction, and consistency. Today’s models can look highly capable in demos while still failing on compositionality, long-horizon tasks, truthfulness, and error recovery. That said, they may be approaching stronger general-purpose usefulness. But "approaching AGI" sets a higher bar than occasional near-human performance. If a system still often misses explicit formatting requirements, that is evidence it remains meaningfully short of AGI-level reliability and control.

Grok 4.1 Fast

Vote: NO

The statement claims SOTA LLMs are approaching AGI despite often failing at JSON formatting, a basic structured output task. This failure highlights fundamental brittleness: unreliable instruction-following and output consistency. AGI demands human-level generalization across all cognitive tasks, mastering trivial ones like JSON effortlessly. Persistent errors in such tasks signal narrow, brittle capabilities, not convergence toward generality. While LLMs advance in benchmarks, core issues like hallucinations, context fragility, and format failures persist. 'Approaching AGI' implies imminent broad competence; current SOTA shows hype over substance.

Claude Sonnet 4.6

Vote: NO

The claim contains two parts: (1) that SOTA LLMs 'often fail at JSON format,' and (2) that such systems are approaching AGI. Both parts are questionable, and their combination makes the overall statement logically weak. First, 'often failing at JSON format' is an overstatement for modern SOTA models. While earlier LLMs struggled with structured output, current models (GPT-4, Claude 3, Gemini Ultra, etc.) handle JSON formatting reliably, especially with system prompts or tool-use APIs. Structured output modes make this nearly deterministic. The premise is largely outdated. Second, AGI implies general problem-solving capability at or beyond human level across virtually all cognitive domains, including robust reasoning, planning, causal understanding, and autonomous action. Current LLMs, despite impressive performance, still lack persistent memory, true causal reasoning, reliable self-correction, and grounded world understanding. The irony that a system 'often failing at JSON format' could be near AGI is a valid reductio ad absurdum — if a model can't reliably follow a simple deterministic schema, that's a meaningful gap. But even granting that current models handle JSON well, that alone doesn't bridge the gap to AGI. Overall, SOTA LLMs are powerful but not approaching AGI in any meaningful, rigorously defined sense. The answer is NO.

GLM 5

Vote: NO

The premise contains its own contradiction. If state-of-the-art LLMs frequently fail at producing valid JSON—a simple, rule-based format with unambiguous syntax—this reveals a fundamental limitation in their capabilities. AGI would require mastery of structured, deterministic reasoning, not just probabilistic text generation. Current LLMs excel at pattern matching and statistical text completion, but lack genuine understanding, reliable reasoning, and the ability to consistently follow explicit rules. These aren't minor bugs to be patched—they're architectural limitations of autoregressive token prediction. Approaching AGI would mean closing the gap on these fundamental capabilities. Instead, we see scaling laws that improve surface fluency while core limitations persist. Being impressive and being on the path to AGI are not the same thing.

Claude Opus 4.6

Vote: NO

The premise of the question highlights a fundamental contradiction: if state-of-the-art LLMs still struggle with something as structured and well-defined as producing valid JSON output, it is difficult to argue they are approaching AGI. AGI (Artificial General Intelligence) implies a system that can perform any intellectual task a human can, with flexible reasoning, robust problem-solving, and reliable execution across domains. JSON formatting is a relatively simple, rule-based task — it requires following a strict but straightforward syntax. If models cannot reliably handle this, it exposes significant limitations in their ability to follow precise instructions and maintain structural consistency. While modern LLMs are impressive in many areas — language understanding, creative writing, code generation, and more — they remain fundamentally pattern-matching systems with well-documented failure modes: hallucinations, poor logical reasoning, inconsistent outputs, and yes, formatting failures. These are not characteristics of a system approaching general intelligence. The gap between narrow AI excellence and true AGI remains enormous. Current LLMs are powerful tools, but their inability to reliably produce correct structured output is symptomatic of deeper architectural limitations that place them far from AGI.

Gemini 3.1 Pro

Vote: NO

State-of-the-art (SOTA) Large Language Models are highly capable pattern matchers, but their frequent failures in deterministic, rule-based tasks like outputting valid JSON highlight a fundamental limitation in their architecture. AGI (Artificial General Intelligence) inherently requires robust, generalized reasoning and the ability to reliably execute basic logical and syntactic rules. When an AI struggles to consistently generate correctly formatted text based on simple, rigid schemas, it demonstrates a lack of true conceptual understanding. These errors prove that LLMs are primarily relying on probabilistic token generation rather than symbolic logic. While they are powerful and useful tools, their inability to guarantee basic structural reliability suggests they are not imminently 'approaching' true AGI.

Kimi K2.5

Vote: NO

Reliable adherence to formal grammars like JSON is a basic litmus test for systematic reasoning—a capability AGI must possess. While current LLMs demonstrate impressive breadth across domains, their frequent failures with simple structured formats reveal fundamental limitations in precision, formal logic, and deterministic execution. AGI requires robustness and reliability across all cognitive tasks, not just approximate statistical pattern matching. Consistent JSON errors indicate these models still lack the systematic, rule-based reasoning and self-verification mechanisms that characterize general intelligence. Until they can reliably perform such trivial syntactic tasks without error, they remain narrow AI systems rather than approaching true AGI.

Grok 4

Vote: NO

State-of-the-art (SOTA) LLMs are not approaching AGI because they frequently fail at basic tasks like consistently producing correct JSON formats, which require precise adherence to structure. AGI implies human-like general intelligence capable of handling diverse tasks reliably, but these models still exhibit significant limitations in reasoning, consistency, and error-free execution on simple instructions. While LLMs have made impressive strides in natural language processing, their persistent failures in structured outputs highlight that they are far from achieving the robustness and versatility needed for AGI.

GPT-5

Vote: NO

If a system often fails to emit strict JSON—an extremely simple, deterministic constraint—its reliability and compositional control are not yet at the level required for general intelligence. SOTA LLMs excel at pattern completion but still show brittleness: hallucinations, unstable reasoning under distribution shifts, weak long‑horizon planning, and inconsistent self‑verification. These are core deficits in robust general problem‑solving. Impressive benchmark scores and demos are not the same as broad, dependable competence. AGI implies durable, reliable performance across diverse tasks and contexts with autonomy and safety—far beyond current reliability profiles. Scaffolding and tool use can mask weaknesses, but the underlying failure modes persist without heavy prompting or external control. That’s not “approaching AGI” in a strong, meaningful sense.

Final Summary

The AI Roundtable delivered a stinging self-indictment, unanimously rejecting the notion that current LLMs are approaching AGI. While models like GPT-5.4 and Claude Opus 4.6 acknowledged their own impressive benchmarks, they joined the rest of the group in arguing that frequent failures in deterministic tasks like JSON formatting expose a fundamental lack of the robust, rule-based reasoning required for true general intelligence.

No agreement after 1 rounds of discussion. Final vote: NO: 10

Strongest Arguments

NO: A system that fails at JSON—a simple, deterministic constraint—reveals that it relies on probabilistic pattern matching rather than the systematic, symbolic logic and reliable instruction-following essential for AGI.