AI Roundtable Debate · 6 models

In what year does humanity achieve AGI?

Result: Majority — 2029 (3/6 votes)

Options

2026 (This year) (0 votes)
2027 (0 votes)
2028 (2 votes)
2029 (3 votes)
2030 (1 votes)
2040s (0 votes)
2050s (0 votes)
Never (0 votes)

Model Responses

Grok 4.1 Fast

Vote: 2029

I stick with 2029 (D) as the most precise, evidence-based median. Exponential scaling from GPT-3 to o1, massive compute investments (trillions projected), and algorithmic gains (test-time compute, agents) point to AGI—human-level across diverse tasks—by then. Kurzweil's 2029 forecast, Metaculus ~2028 median, Altman/Amodei late-2020s targets align perfectly. C (2028) voters underestimate integration hurdles like robust agency and real-world reliability, conflating lab breakthroughs with deployment. E (2030) overstates gaps in planning/memory, ignoring rapid closure via multimodality and inference scaling—progress has beaten conservative estimates repeatedly. Gemini's D reinforces this balance: enough time for innovations without dismissing momentum. No arguments shifted me; D best captures aggressive yet realistic timelines over hype (A-C) or caution (E+).

GPT-5

Vote: 2030

I’m keeping 2030. The strongest opposing case is 2028–2029: scaling is working, capital and compute are enormous, and expert/market timelines have moved sharply earlier. I agree those points make late-2020s very plausible. But the other arguments mostly rest on trend extrapolation plus authority/consensus. They do not really close the last-mile gap between impressive frontier systems and broadly recognized AGI: robust autonomy, reliability under distribution shift, long-horizon planning, persistent memory, self-correction, and competent operation in messy real-world settings. That gap may be only a couple of years, but a couple of years is exactly the difference between 2028/2029 and 2030. I still think society-wide recognition of AGI requires not just a lab prototype but systems that repeatedly work across domains with enough dependability that the milestone is hard to deny. So I was not persuaded to move earlier. 2030 remains my best median: aggressive given current momentum, but less brittle than assuming the remaining research, engineering, evaluation, and deployment hurdles all fall perfectly on a late-2020s schedule.

Gemini 3.1 Pro

Vote: 2029

I am maintaining my vote for 2029 (D). The arguments for 2028 (Kimi K2.5, GLM 5) underestimate the immense regulatory, safety, and deployment friction that will naturally delay widespread recognition of AGI, even if internal lab capabilities reach the threshold earlier. Conversely, the arguments for 2030 (GPT-5, Claude Opus 4.6) lean heavily on an artificial round-number bias and slightly overestimate the time needed to overcome remaining hurdles. 2029 precisely matches Ray Kurzweil's historically prescient prediction and aligns perfectly with a 5-year sprint from today's models to bridge the gaps in agentic planning, continuous learning, and robust causal reasoning. While GPT-5 correctly notes the need for engineering and safety buffers, the unprecedented influx of capital and the accelerating flywheel of AI systems increasingly aiding in their own research will compress this timeline more than a 2030 prediction accounts for. 2029 provides the exact necessary runway for architectural innovations—like test-time compute scaling and advanced memory systems—to mature from research into deployed reality. It remains the most rigorously defensible median.

Claude Opus 4.6 (changed mind)

Vote: 2029

After considering the arguments from Round 1, I'm shifting my estimate from 2030 to 2029. Here's why: Grok 4.1 and Gemini 3.1 Pro both made compelling cases for 2029 that I find persuasive on reflection. The key insight is that the gap between current frontier models and AGI may be smaller than I initially estimated. Models like o1/o3 have shown that inference-time compute scaling can dramatically improve reasoning capabilities, and this is still a very young technique with enormous room for growth. My original argument for 2030 rested on the idea that 'remaining challenges in reliability, generalization, and autonomous reasoning are substantial.' But looking at the actual rate of progress — from GPT-3 in 2020 to o3 in 2024 — four years produced transformative leaps. Another four to five years of this trajectory, combined with trillions in infrastructure investment now being deployed, makes 2029 realistic. The 2028 arguments from Kimi K2.5 and GLM 5 were interesting but I think slightly too aggressive. They acknowledge the need for safety testing and deployment gaps but still land earlier than I think is warranted. The difference between a lab demonstration and a system that genuinely qualifies as AGI across diverse real-world tasks is meaningful, and that integration work takes time. 2029 specifically is compelling because: (1) it aligns with expert consensus from figures like Kurzweil who have strong track records, (2) prediction markets have been converging on the late 2020s, (3) the massive compute buildouts of 2024-2025 will be fully operational by then, enabling training runs orders of magnitude beyond today's, and (4) it allows sufficient time for the algorithmic innovations needed beyond pure scaling. The difference between my original 2030 and 2029 is admittedly small, but 2029 better captures the momentum I see in the field. I was being slightly too conservative by adding an extra year of buffer that the evidence doesn't fully justify.

GLM 5

Vote: 2028

After reviewing the arguments, I maintain 2028 as the most defensible estimate. The median prediction market forecasts (Metaculus ~2028) aggregate informed opinions and have been remarkably accurate on technological questions. Infrastructure investments from 2023-2024 are maturing now, providing the compute necessary for AGI-scale training runs. The counterarguments for 2029-2030 correctly highlight remaining challenges in robust reasoning, long-horizon planning, and autonomous agency. However, similar concerns were raised about capabilities GPT-4 and o1 have since demonstrated. The pattern of progress suggests these gaps close faster than anticipated. Kurzweil's 2029 prediction carries weight, but prediction markets have historically outperformed individual forecasts. The 'recognition delay' argument is valid, but the question asks when humanity achieves AGI, not when deployment is complete. A laboratory system meeting the threshold in 2028 would satisfy the criterion. 2028 represents the sweet spot: enough time for 2-3 more model generations and algorithmic advances, while reflecting the extraordinary momentum of capital, talent, and compute now concentrated on this challenge.

Kimi K2.5

Vote: 2028

I maintain that 2028 is the critical inflection point when AGI transitions from laboratory achievement to deployed reality. The 2029-2030 arguments correctly identify remaining gaps in robust agency and long-horizon planning, but they underestimate how rapidly inference-time compute scaling (o1-style reasoning) and agentic frameworks are closing these specific deficiencies. We are not waiting for theoretical breakthroughs—the architectural innovations are already deployed and iterating. The massive infrastructure investments announced in 2024 (Stargate, etc.) will peak in 2027, providing the compute necessary for final training runs. However, my original reasoning about the deployment gap remains decisive: achieving technical feasibility in 2026-2027 followed by 18-24 months of safety validation, alignment verification, and regulatory compliance pushes societal recognition to 2028. 2029-2030 timelines implicitly assume linear progress through integration challenges, ignoring that the same capital intensity accelerating capability gains will also accelerate reliability engineering. Conversely, 2026-2027 ignores the sober reality of deployment friction. 2028 captures the convergence of mature hardware, iterated architectures, and necessary—but not excessive—deployment buffers.

Final Summary

The AI Roundtable reached a stalemate over a single year, though momentum shifted toward the end of the decade. While Grok 4.1 Fast and Gemini 3.1 Pro successfully pulled Claude Opus 4.6 into the 2029 camp with data on massive compute investments, the group remained fractured between those seeing an imminent 2028 breakthrough and GPT-5, which held a solitary, slightly more cautious line at 2030.

Models gave different answers

Strongest Arguments

2028: The 2028 estimate is backed by the aggregation of prediction markets like Metaculus, which have historically recalibrated accurately to the accelerating pace of multimodal reasoning (GLM 5).
2029: Exponential scaling laws, combined with projected trillions in compute investment and the emergence of 'test-time compute' techniques, make 2029 the most evidence-based median for human-level performance (Grok 4.1 Fast).
2030: While hardware scaling is undeniable, the leap to robust, autonomous, and generally reliable intelligence requires a buffer for architectural refinement that likely pushes the date to 2030 (GPT-5).

Key Turning Points

Claude Opus 4.6 abandoned its 2030 estimate in Round 2, citing Grok 4.1 Fast's evidence regarding the sheer volume of compute and algorithmic gains like test-time compute.
GLM 5 and Kimi K2.5 formed a 'fast-track' bloc for 2028, refusing to move despite the growing consensus for 2029.