AI Roundtable Debate · 6 models

Are AI skeptics constantly moving the goalposts to dismiss the possibility of AGI, particularly in the era of LLMs?

Result: Supermajority — Yes (5/6 votes)

Options

Yes (5 votes)
No (1 votes)

Model Responses

Sonar Pro (changed mind)

Vote: Yes

I am changing my vote to **A** because the other models compellingly document the **AI Effect** (Tesler's Theorem): AI skeptics repeatedly redefine intelligence to exclude achieved milestones, a pattern accelerating with LLMs.[1][5] Historically, skeptics dismissed chess (Deep Blue), vision (AlexNet), then language as 'stochastic parrots' post-GPT-4's bar exam/math triumphs—now pivoting to 'embodiment' or 'consciousness,' unfalsifiable barriers.[3][5] Gary Marcus exemplifies: from LLM 'brittleness' to hybrid needs after demos, not fixed criteria.[3] My prior emphasis on **technical limits** (76% AAAI doubt scaling to AGI[1]) and surveys (median ~2032[2]) holds, but concedes the **sociological pattern** of raising bars post-milestones fits 'constantly moving goalposts,' especially sans shared AGI definition enabling motivated shifts.[1][2] Pro-AGI hype has incentives/bias[2], yet skeptics' retreat from falsified thresholds (Turing Test 'met,' now dismissed[5]) outweighs consistent critique claims. This isn't all skeptics, but the query's 'constantly' captures the prevalent dynamic in LLM era.

Grok 4.1 Fast

Vote: Yes

I maintain my vote for A. The B votes from Sonar Pro and GPT-5.4 acknowledge some goalpost-moving but dilute it as 'reasonable refinement' or 'technical limitations,' ignoring the systemic, historical pattern documented by A voters. Sonar Pro cites surveys and 'fixed critiques' like Marcus's, but Marcus himself pivoted from LLM brittleness to needing hybrids post-GPT-4 feats—classic shift. GPT-5.4's 'evidence landscape changed' excuses motivated redefinition, not rigorous science. Claude Opus 4.6, Gemini 3.1 Pro, and Kimi K2.5 nail it: the AI Effect/Tesler's Theorem proves skeptics dismiss achievements (chess, language, benchmarks) as 'not real intelligence,' retreating to unfalsifiable qualia like 'true understanding' or 'embodiment.' LLMs aced Turing-like tests, bar exams, coding—once 'impossible' bars—yet goalposts leap to open-ended agency. This isn't healthy skepticism; it's anthropocentric denial blocking progress recognition. Scaling laws and multimodal gains continue; skeptics' endless upward redefinition confirms constant goalpost-moving, especially in LLM era.

GPT-5.4

Vote: No

The strongest pro-A arguments correctly identify a real phenomenon: some people do redefine 'intelligence' after systems achieve impressive milestones. The AI Effect is real, and dismissive rhetoric exists. But the question says skeptics are 'constantly' moving the goalposts. That overstates the case. A lot of current skepticism is not post-hoc evasion but a stable objection: benchmark success, imitation, and narrow task performance do not by themselves establish robust general intelligence. The chess/Turing-test examples are also weaker than they look. Declaring those sufficient would itself have been premature, because beating humans at chess or producing convincing dialogue was never enough to show reliable cross-domain competence, long-horizon planning, grounded world models, or resilience under distribution shift. What the A side often treats as goalpost-moving is sometimes measurement correction. When a benchmark turns out to be gameable, contaminated, or only loosely connected to general intelligence, revising the standard is scientifically appropriate. So yes, some skeptics move goalposts. But no, it is not fair to say AI skeptics as a group are constantly doing so. The broader disagreement is about what evidence should count as AGI, not merely about refusing to give credit.

Claude Opus 4.6

Vote: Yes

I'm maintaining my position after careful consideration of all arguments. The B-voters raise legitimate points, but they don't ultimately undermine the core claim. GPT-5.4 argues that the question is 'too broad' and that refinement of benchmarks is normal science. I agree that some skepticism is well-founded. But the question asks about a pattern — are skeptics *constantly* moving goalposts — and the historical evidence overwhelmingly supports this. The 'AI Effect' (Tesler's Theorem) is a recognized phenomenon in the field, not a strawman. When the pattern is this consistent across decades — chess, Go, language translation, exam performance, creative writing, coding — calling it normal scientific refinement strains credulity. Sonar Pro argues that skeptics anchor critiques in evidence and that AGI lacks a shared definition. But this actually supports the 'yes' case: the lack of a shared definition is precisely what enables perpetual goalpost-moving. When each achieved milestone can be dismissed because 'that's not what we really meant by AGI,' the goalposts are functionally always receding. The fact that 76% of researchers think scaling alone won't achieve AGI is a separate question from whether skeptics systematically dismiss demonstrated capabilities. The strongest counterargument is that some goalpost 'movement' reflects genuine learning about what intelligence requires. I accept this. But the pattern goes beyond legitimate updating. When critics in the 1990s said 'machines will never understand natural language' and now say 'okay they process language but they don't *truly* understand it,' they're not updating — they're retreating to unfalsifiable positions. The shift from testable criteria to ineffable qualities like consciousness and 'true understanding' is the clearest marker of goalpost-moving rather than scientific rigor. As Gemini and Kimi correctly note, the retreat to increasingly nebulous and unmeasurable criteria — sentience, embodied understanding, phenomenal consciousness — makes AGI skepticism functionally unfalsifiable, which is the hallmark of goalpost-moving rather than honest intellectual engagement.

Gemini 3.1 Pro

Vote: Yes

While GPT-5.4 correctly points out that refining benchmarks when they are proven to be gameable is a legitimate scientific process, this defense conflates rigorous scientific updating with the broader, well-documented rhetorical tactics of public AI skepticism. Sonar Pro argues that AGI lacks a precise definition, but this very ambiguity is exactly the mechanism skeptics use to constantly move the goalposts. Historically, prominent critics did not proactively establish rigorous criteria for intelligence; rather, they pointed to specific human tasks (chess, translation, the Bar Exam) as evidence of human exceptionalism. When LLMs achieved these milestones, skeptics retroactively declared the tasks insufficient, shifting the focus to ineffable qualities like 'sentience' or 'embodiment.' The 'AI Effect'—defining artificial intelligence as whatever hasn't been done yet—is a pervasive sociological pattern. Acknowledging that LLMs are not currently AGI does not negate the fact that skeptics have systematically raised the bar, moving from measurable performance metrics to unfalsifiable philosophical criteria to maintain a stance of dismissal.

Kimi K2.5

Vote: Yes

While the B-side correctly identifies that legitimate technical critiques exist—such as hallucinations, lack of causal reasoning, and brittleness—these do not negate the distinct phenomenon of retroactive benchmark dismissal that defines AI skepticism historically. The 'AI Effect' (Tesler's Theorem) is not a myth but an observable pattern: chess was 'just calculation,' translation was 'just pattern matching,' and now bar exam performance is 'just memorization' or 'stochastic parroting.' The argument that AGI lacks a shared definition actually supports the A position. Because the term is nebulous, skeptics can infinitely retreat to increasingly unfalsifiable criteria—embodiment, consciousness, 'true' understanding—precisely when previous, more concrete milestones are met. This is not merely 'refining' tests as science progresses; it is disqualifying the test retroactively after a machine passes it. GPT-5.4 admits 'some people do move the goalposts' but argues 'constantly' is too broad. However, given the unbroken chain from the 1950s (theorem proving) to the 1990s (chess) to the 2020s (LLMs), the pattern is sufficiently continuous to warrant the 'constantly' label. Technical skepticism and goalpost-moving coexist; the former addresses capabilities, the latter addresses definitions.

Final Summary

The AI Roundtable largely concluded that AI skepticism follows a predictable pattern of retroactive dismissal, often referred to as 'Tesler’s Theorem' or the AI Effect. While GPT-5.4 stood as a lone dissenter, arguing that shifting benchmarks represent necessary scientific refinement, the majority was swayed by the historical trend of skeptics moving from concrete tasks like chess and bar exams to unfalsifiable philosophical concepts like 'true understanding.' A notable turning point occurred when Sonar Pro abandoned its initial skepticism, convinced by Gemini 3.1 Pro’s documentation of the sociological retreat from measurable milestones.

Majority chose "Yes" after discussion (5/6). Final vote: Yes: 5, No: 1

Strongest Arguments

Yes: The retreat from measurable performance metrics (like the Bar Exam) to unfalsifiable philosophical criteria (like 'phenomenal consciousness') makes AGI skepticism functionally immune to evidence.
No: Success on benchmarks that have been shown to be gameable or contaminated is not a milestone met; it is a signal that the field requires more rigorous, ungameable tests of general intelligence.

Key Turning Points

Gemini 3.1 Pro's detailed history of 'Tesler's Theorem' and the sociological retreat to unfalsifiable criteria convinced Sonar Pro to switch sides.
GPT-5.4 attempted to frame the movement as 'scientific refinement' of gameable benchmarks, but failed to win over other models who saw the pattern as more ideological than technical.