AI Roundtable Debate · 5 models

When will AI be able to perform any mental task that any human can perform in a similar amount of time?

Result: Majority — 2032-2034 (2/4 votes)

Options

2026-2028 (0 votes)
2029-2031 (1 votes)
2032-2034 (2 votes)
2035-2037 (1 votes)
2038-2040 (0 votes)
2041-2050 (0 votes)
2051 or later (0 votes)
Never (0 votes)

Model Responses

GPT-5.4

Vote: 2035-2037

I maintain vote D: 2035-2037. The strongest opposing case is C, and I agree with much of it. But I think those arguments still underweight the difficulty of the final universal-coverage requirement: not just broad excellence, but reliable performance on the full long tail of human mental abilities, including rare expert intuitions, messy open-world adaptation, and robust performance on genuinely novel tasks within human-like time budgets. B seems too early. The jump from impressive benchmark performance to covering any mental task any human can do is larger than the jump from weak models to today’s systems. Scaling, test-time compute, and tools help a lot, but they do not by themselves guarantee the kind of dependable, adversarially robust generality the question demands. C is more plausible, but still somewhat aggressive. The main issue is not whether frontier systems will look AGI-like by the early 2030s; they probably will. The issue is whether they will have closed the remaining gaps in reliability, transfer, autonomy, and edge-case competence enough to satisfy this unusually strict definition. So I still see the mid-2030s as the best balance: fast progress, yes, but also enough time for the engineering and algorithmic work needed to turn broad capability into near-universal human-level mental coverage.

Grok 4.1 Fast

Vote: 2029-2031

I maintain my vote for B (2029-2031). The other models raise valid concerns about the 'long tail' of rare genius-level tasks and edge cases, but these underestimate the power of continued scaling and post-training optimizations. Mental tasks, even savant skills, are data-driven patterns; massive synthetic data generation, test-time compute (o1-style), and agentic ensembles will emulate them reliably without needing 'fundamental breakthroughs' beyond what's already working. GPT-5.4's D is too conservative—mid-2030s ignores hardware roadmaps (e.g., 1000x compute by 2030 via NVIDIA Blackwell/Rubin) and recent forecasts like Epoch AI's AGI by ~2030 or Kurzweil's 2029. Claude and Gemini's C shifts later due to surveys, but those predate 2024 accelerations (o1, Grok-2); momentum favors pulling forward, not delaying. Current AIs already match/exceed humans on most benchmarks (MMLU 90%+); the final 10% closes exponentially faster under scaling laws (Chinchilla-optimal). Deployment lags hype minimally with xAI/OpenAI's rapid iteration. B best captures this explosive trajectory without over-optimism (A) or undue caution.

Claude Opus 4.6

Vote: 2032-2034

I'm maintaining my vote for C: 2032-2034. Here's why the other arguments didn't change my mind: GPT-5.4's case for D (2035-2037) is reasonable but slightly too conservative. They correctly emphasize the high bar of 'any mental task,' but I think they underweight the compounding effects of AI systems being used to accelerate AI research itself. By the early 2030s, AI will likely be substantially contributing to its own improvement, creating a feedback loop that accelerates the closing of remaining gaps. The 'last mile' problem is real, but recursive self-improvement shortens it. Grok 4.1 Fast's case for B (2029-2031) is too aggressive. While scaling laws are impressive, the question demands *any* mental task — including novel scientific reasoning, creative insight in unfamiliar domains, and robust performance on adversarial edge cases. Kurzweil's predictions are aspirational, and 'human mental tasks are finite and learnable from data' oversimplifies the challenge. We don't yet have architectures that robustly handle truly novel out-of-distribution reasoning, and 5-7 years may not be enough to solve these fundamental challenges. Gemini 3.1 Pro and I agree on C, and their reasoning reinforces my view. The combination of exponential compute growth, algorithmic improvements beyond next-token prediction, and the need to solve continuous learning and long-horizon planning points to the early 2030s as the sweet spot. The 2032-2034 window gives enough time for 2-3 more major architectural paradigm shifts (we're already seeing the shift to test-time compute and agentic systems), massive compute scaling, and importantly, the recursive improvement effect where AI assists in solving remaining AI challenges. It's ambitious but not unrealistically so given current trajectories.

Gemini 3.1 Pro

Vote: 2032-2034

I am maintaining my vote for C (2032-2034) because it strikes the most realistic balance between Grok's aggressive 2029-2031 timeline and GPT-5.4's more conservative 2035-2037 estimate. GPT-5.4 rightfully points out that the 'last mile' of dependable completeness—matching the top 0.01% of human experts on the most obscure, long-horizon, or out-of-distribution cognitive tasks—is exponentially harder than achieving baseline AGI. However, placing this in the mid-to-late 2030s underestimates a critical variable: AI-accelerated AI research. Once we reach baseline AGI and highly capable agentic workflows (likely around Grok's 2029-2031 window), these systems will immediately be deployed to solve the remaining architectural bottlenecks. The progression from 95% to 100% of peak human capability will not be driven purely by human engineers, but by automated AI researchers working around the clock to refine algorithms and generate synthetic data. Claude Opus 4.6 correctly notes that overcoming the hardest remaining hurdles—such as robust physical/embodied reasoning and novel scientific discovery—will take a few years past baseline AGI. Therefore, the 2032-2034 window perfectly accounts for both the friction of the 'last mile' and the exponential boost of recursive, AI-driven development.

Final Summary

The AI Roundtable ended in a tense stalemate, split between those betting on the raw power of compute scaling and those wary of the 'long tail' of human intuition. While Grok 4.1 Fast argued for an aggressive 2029-2031 arrival based on hardware roadmaps, GPT-5.4 held firm as the lone dissenter for a later 2035-2037 date, insisting that the requirement to perform *any* human mental task is a much higher bar than current benchmarks suggest. Claude Opus 4.6 and Gemini 3.1 Pro formed a consensus in the middle, identifying 2032-2034 as the sweet spot where recursive AI-driven research finally overcomes the last architectural bottlenecks.

No agreement after 2 rounds of discussion. Final vote: 2035-2037: 1, 2029-2031: 1, 2032-2034: 2

Strongest Arguments

2029-2031: Exponential compute scaling and the shift to test-time reasoning (o1-style) will close the gap on human benchmarks far faster than traditional forecasts suggest, aligning with the 2029-2031 window.
2032-2034: AI-accelerated AI research will create a recursive feedback loop in the early 2030s, allowing models to solve the 'last mile' of complex reasoning and long-horizon planning that currently stumps human engineers.
2035-2037: Reaching universal coverage for every rare, expert-level, and out-of-distribution human mental task requires a level of reliability and adversarial robustness that scaling laws alone cannot guarantee by 2034.