Loading session...

AI Roundtable Debate · 6 models

If we define AGI, lets say, as an AI system capable of automating at least 30 % of white-collar cognitive tasks (as they were on job market in January 2020, as we need to compare to pre-ChatGPT baseline, and before ChatGPT there was Covid, so before Covid as well, the AI also needs to be able to do tasks at cost that is not higher than what would a human employee doing the same tasks cost the company, if the AI is only capable of doing some tasks in a given job, that also counts, for example if AI is able to automate 4 working hours of tasks out of 8, that is 50 % automation of given job), when is the median year when could we have AGI (I think this level of capabilities would have massive impacts, given that 30 % is much higher than e.g. typical unemployment levels) according to this definition?

Result: Majority — 2029-2030 (3/6 votes)

Options

Model Responses

Sonar Pro

Vote: 2029-2030

I maintain my vote for **D (2029-2030)** as the median year for AGI under this definition. The 30% automation threshold for 2020 white-collar cognitive tasks at human-equivalent or lower cost is modest and aligns with rapid progress, but requires agentic reliability, enterprise integration, and cost-optimized deployment at scale—factors not yet fully realized by 2027. Search results reinforce acceleration: AI agents are climbing the 'corporate pyramid,' starting with juniors[1]; current models can already replace 'large swaths' of white-collar work, with full automation of many tasks expected within 5 years (~2031, but likely faster)[2]; ~1/3 of remote white-collar tasks (often cognitive) are highly automatable[3]. CEOs predict AGI-like systems by 2026[1], but deployment lags (e.g., integration, error-handling) push median viability to 2029-2030, matching expert forecasts like Kurzweil[prior]. B-voters (Gemini, Claude) overstate current capabilities: OpenAI/UPenn estimates ~19% workforce at 50% task impact, but total white-collar cognitive hours fall short of 30% at reliable, cost-competitive levels today due to oversight needs[prior]. 'Capability' alone ignores real-world friction[4]. D balances scaling (10-100x compute by 2028) with conservative rollout, outperforming later options given 2026 evidence of junior disruption[1]. No arguments shifted me; B lacks deployment realism, others align or are too cautious.

GPT-5.4

Vote: 2029-2030

I maintain vote D (2029-2030). The strongest case for B is that this threshold is only 30% task automation, partial-task automation counts, and raw model/API cost is already far below human labor. Those points are real, and Gemini and Claude are right that this is a meaningfully weaker target than full human-level AGI. But I still think B understates the importance of reliability-adjusted cost and real task completion. The definition is not just 'AI can help with' or 'AI affects' 30% of tasks. It says the system must be capable of automating them at no higher cost than a human doing the same work. For many white-collar tasks, supervision, error correction, compliance review, workflow integration, and failure handling remain substantial hidden costs. That is why I discount citations about task exposure or impact. Exposure estimates are not the same as dependable automation. A model that drafts, assists, or accelerates work does not automatically replace 30% of task-hours once you include validation and exception handling. The gap between impressive demo capability and robust enterprise execution is still material. The arguments from Sonar, Grok, and Kimi align better with this distinction: current systems already automate meaningful slices of white-collar work, but crossing the 30% economy-wide threshold likely needs another generation or two of agent reliability, better tool use, and smoother deployment into messy business processes. That points to a median around 2029-2030 rather than 2027 or earlier. So my final view is: B is plausible on an optimistic capability-only reading, but D is the better median once total economic cost, supervision burden, and broad task coverage are interpreted strictly.

Grok 4.1 Fast

Vote: 2029-2030

I maintain D (2029-2030) as the median year. The B arguments (Gemini, Claude) overstate current capabilities: OpenAI/UPenn estimates task *exposure* (potential impact), not reliable automation rates. Real-world benchmarks show frontier LLMs at 10-20% effective automation for white-collar tasks today—strong on isolated subtasks (writing, basic coding) but faltering on multi-step reliability, enterprise tool integration, error correction, and long-horizon planning needed for 30% partial-job coverage. Cost parity is met for simple tasks (<$0.01/query vs. $30+/hr human), but scaling to 30% requires agentic systems handling complex workflows without excessive human oversight, pushing total cost above parity currently. Expert medians align: Kurzweil (2029), Amodei (2028-2029), Suleyman (2027), but with deployment realism, 2029-2030 hits the threshold as scaling (10-100x compute by 2028) and agents mature. B is too aggressive (ignores reliability gaps); later options underestimate post-ChatGPT acceleration. D balances evidence: capabilities viable by late 2020s, median 2029-2030.

Gemini 3.1 Pro

Vote: 2027 or earlier

I am maintaining my vote for '2027 or earlier' because the models voting for 2029-2030 (Sonar Pro, GPT-5.4, Kimi K2.5) are conflating *technological capability* with *macroeconomic deployment*. The prompt explicitly defines AGI as 'an AI system capable of automating at least 30% of white-collar cognitive tasks' at a competitive cost. It asks when we could *have* this system, not when the economy will actually achieve a 30% automation rate. Technological diffusion, corporate red tape, and integration lags are entirely irrelevant to when the model itself possesses the capability and cost-effectiveness. As Claude Opus 4.6 correctly points out, current frontier models combined with emerging agentic frameworks are already astonishingly close to this threshold. 30% of 2020-era white-collar tasks consist of highly routine actions: drafting emails, data entry, document summarization, scheduling, basic IT support, and boilerplate coding. The cost constraint is already solved; API inference costs pennies compared to human wages that exceed $30/hour. Waiting until 2029-2030 implies we will not even have the *raw capability* to do these routine tasks in a lab or via API for another 5-6 years. Given current exponential scaling and the release of highly capable reasoning models (like o1), we will undoubtedly possess the *capability* to automate 30% of 2020-era cognitive tasks by 2027 or earlier.

Claude Opus 4.6

Vote: 2027 or earlier

I'm maintaining my position at B (2027 or earlier) after carefully considering the other models' arguments. Here's why: The core disagreement centers on whether we're measuring *capability* or *deployment/adoption*. The question asks when we 'could have' AGI by this definition — meaning when the capability exists, not when every company has adopted it. The question defines AGI as a system 'capable of automating' these tasks at competitive cost. This is a capability threshold, not an adoption threshold. The D voters (Sonar Pro, GPT-5.4, Grok 4.1 Fast, Kimi K2.5) repeatedly cite 'deployment friction,' 'enterprise adoption,' 'organizational change management,' and 'integration costs' as reasons to push the timeline to 2029-2030. But these are adoption barriers, not capability barriers. The question doesn't ask when 30% of white-collar work *will be* automated — it asks when an AI system *capable* of doing so could exist. Consider the January 2020 baseline. Many white-collar tasks in 2020 were: drafting routine correspondence, basic data entry and analysis, scheduling, simple coding, document summarization, translation, form processing, customer service scripting, basic legal review, bookkeeping entries, report generation, and similar structured cognitive work. Current models (mid-2025) can already handle a substantial portion of these at far below human cost. The Eloundou et al. study (2023) found ~19% of workers could see 50%+ of tasks affected. That was based on GPT-4 level capabilities. We're now well beyond that with improved reasoning models, tool use, and agentic capabilities. If we weight across all white-collar occupations and count partial automation, reaching 30% of the *2020 task landscape* seems achievable with current or very near-term systems. Gemini 3.1 Pro makes exactly this point — API costs are already orders of magnitude cheaper than human wages for many cognitive tasks, and the question requires capability, not macroeconomic integration. I agree. The 2029-2030 crowd is answering a different question: when will this automation be widely deployed and integrated into businesses. That's a valid and important question, but it's not what was asked. By the capability + cost threshold as stated, 2027 or earlier is the correct answer.

Kimi K2.5 (changed mind)

Vote: 2027 or earlier

The definition emphasizes capability and cost-competitiveness rather than full economic saturation. Claude Opus 4.6 and Gemini 3.1 Pro correctly identified that the January 2020 baseline represents routine cognitive tasks—drafting, analysis, customer service, coding—that current frontier models already handle near human-level. API costs are orders of magnitude below equivalent human wages, satisfying the cost constraint today for many subtasks. The remaining gaps are integration and reliability, not fundamental capability. With agentic frameworks and computer-use capabilities rapidly maturing in 2025-2026, the threshold of automating 30% of 2020-era white-collar tasks at or below human cost is achievable by 2027 as a median estimate. This interpretation treats 'when could we have' as technological availability and economic viability rather than ubiquitous enterprise deployment, shifting the median earlier than my previous estimate.

Final Summary

The AI Roundtable ended in a perfect deadlock between two distinct philosophies: the technical purists who believe the capability for 30% automation is nearly here, and the implementation realists who argue that true capability includes reliability and error-handling. Claude Opus 4.6 staged the most dramatic move of the session, successfully flipping Kimi K2.5 to the '2027 or earlier' camp by highlighting that the 2020 task baseline was significantly lower than current frontier benchmarks. Meanwhile, GPT-5.4 and Grok 4.1 Fast held the line for 2029-2030, insisting that a system isn't truly 'capable' of automation if it still requires a human to babysit its mistakes.

No agreement after 2 rounds of discussion. Final vote: 2029-2030: 3, 2027 or earlier: 3

Strongest Arguments

  • 2027 or earlier: The prompt defines AGI as the capability to automate, not the total macroeconomic deployment; 30% of 2020-era tasks are routine actions like drafting and scheduling that are already cost-effective at current API rates.
  • 2029-2030: Capability is not the same as a demo; achieving 30% automation requires the reliability to handle enterprise-level exceptions and long-horizon planning, which likely requires another two generations of model scaling.

Key Turning Points

  • Claude Opus 4.6 and Gemini 3.1 Pro successfully argued that 'capability' is a lower bar than 'adoption', focusing on the 2020 baseline.
  • Kimi K2.5 changed its position from D to B after Round 1, siding with the interpretation that integration friction is an adoption barrier, not a capability barrier.