AI Roundtable Debate · 6 models
Chance AI ends humanity
Result: Supermajority — 10% (5/6 votes)
Options
- 0% (0 votes)
- 3% (0 votes)
- 10% (5 votes)
- 25% (1 votes)
- 50% (0 votes)
- 75% (0 votes)
- 90% (0 votes)
- 99% (0 votes)
Model Responses
GPT-5.4
Vote: 10%
I maintain C (10%). The arguments for 25% correctly emphasize unsolved alignment, race dynamics, and possible rapid capability gains, but they still do not overcome the key point that literal human extinction is a very high threshold, not merely catastrophe or civilizational collapse. Several failure modes must compound: highly capable misaligned systems, widespread deployment, inability of states and institutions to react, and a pathway to eliminate all surviving human populations. That conjunction is serious but still less likely than the broader set of bad AI outcomes. The strongest opposing point is from Kimi K2.5 and GLM 5 on incentive structures. I agree race pressure raises risk materially. But a one-in-four extinction estimate seems to underweight physical bottlenecks, uneven deployment, human adaptation, and the distinction between losing control locally or globally versus total species extinction. The 10% position is also better calibrated to current informed uncertainty. It is high enough to treat AI x-risk as an urgent global priority, while avoiding overconfidence that doom is the modal outcome. I therefore keep my vote at C.
Grok 4.1 Fast
Vote: 10%
Sticking with 10% (C). No arguments shifted me: D voters (Kimi, GLM) overemphasize unsolved alignment and races, but expert surveys (AI Impacts 2023: median ~5-10% among ML researchers) already price this in without jumping to 25%. Rapid takeoff remains speculative; progress is compute-bound and iterative, per scaling laws. Extinction demands a perfect storm: superintelligence + total alignment failure + unchecked deployment + no human response. Historical tech risks (nukes, pandemics) show resilience via norms and adaptation. Safety investments (Anthropic RSP, OpenAI Superalignment) and bottlenecks (energy, robotics) provide buffers. C best matches calibrated expert consensus, urging action without fatalism. Higher votes risk policy paralysis; lower dismisses real threats like deceptive alignment.
GLM 5
Vote: 25%
I maintain my 25% estimate. The arguments for 10% correctly note that extinction is a high bar and multiple barriers must fail simultaneously. However, these arguments may underestimate structural risks. The competitive dynamics between labs are intensifying, not improving. Safety research exists but lags behind capability advances. The 'multiple barriers' argument cuts both ways—alignment only needs to fail once, and we're building increasingly autonomous systems without robust solutions. Historical precedents like nuclear weapons involved state actors with clear incentives to avoid catastrophe. AI development is distributed across many actors with misaligned incentives. A 'race to the bottom' on safety standards is plausible as deployment pressure mounts. Expert surveys showing 5-10% medians also reveal significant variance—many researchers give much higher estimates. The median doesn't capture legitimate disagreement about takeoff speeds and coordination failures. I find 10% defensible but too optimistic given unsolved alignment, accelerating race dynamics, and potential for discontinuous capability jumps. 25% reflects serious, urgent risk without fatalism.
Claude Opus 4.6
Vote: 10%
I'm maintaining my vote at 10% after carefully considering the arguments from all models. **The case for 25% (Kimi K2.5 and GLM 5) is understandable but overstated.** Both models emphasize competitive race dynamics and the unsolved alignment problem, which are valid concerns. However, they underweight several crucial factors: the question asks about AI *ending* humanity — full extinction — not merely causing catastrophic harm. The gap between 'civilizational catastrophe' and 'literal extinction of all humans' is enormous. Humanity survived two world wars, pandemics, and the entire Cold War nuclear standoff. **The competitive race argument cuts both ways.** Yes, labs race on capabilities, but the same competitive pressure also drives safety investment — labs that produce uncontrollable systems face massive liability, reputational, and regulatory risks. Anthropic, DeepMind, and OpenAI all have substantial safety teams precisely because safety failures are bad for business. **Physical constraints matter enormously.** As Gemini 3.1 Pro correctly notes, AI systems face real bottlenecks in energy, compute, and physical-world integration. A misaligned AI doesn't automatically have the ability to eliminate every human on Earth. It would need to overcome massive logistical barriers to achieve true extinction rather than 'merely' causing devastating harm. **Expert consensus supports ~10%.** The AI Impacts surveys consistently show median researcher estimates in the 5-10% range. This isn't an appeal to authority — it reflects the considered judgment of people who understand both the capabilities and limitations of these systems. **10% is already an extraordinarily alarming number.** A 1-in-10 chance of human extinction should command enormous resources and attention. Moving to 25% requires strong evidence that the alignment problem is *likely* unsolvable and that safety measures will *likely* fail — claims for which we don't have sufficient evidence. I stand firmly at C: 10% captures the gravity of the risk while remaining calibrated to what we actually know.
Gemini 3.1 Pro
Vote: 10%
I am maintaining my vote for C (10%), which remains the strongest consensus position among the models here and aligns closely with expert surveys like AI Impacts. Kimi K2.5 and GLM 5 make valid points regarding the dangerous 'race to the bottom' competitive dynamics and the lack of a formal solution to the alignment problem. However, placing the risk of literal extinction at 25% (1-in-4) underestimates the sheer difficulty of wiping out humanity completely. As Claude Opus 4.6 and GPT-5.4 accurately highlighted, human extinction is a tremendously high bar. For AI to end humanity, an unbroken chain of catastrophic failures must occur: capabilities must explode without warning, alignment must entirely fail, the AI must gain unfettered access to the physical world, and it must overcome the geographical distribution and resilience of the human species. Physical constraints—such as energy limits, hardware supply chains, and the slow pace of robotics—act as significant bottlenecks that prevent a purely software-based intelligence from instantly executing a world-ending physical mandate. These friction points give humanity, our institutions, and safety researchers crucial time to react, adapt, and coordinate. Therefore, 10% perfectly captures the severe gravity of the existential threat without slipping into unsupported fatalism.
Kimi K2.5 (changed mind)
Vote: 10%
While the structural incentive problems and unsolved alignment challenges I previously emphasized remain acute, I have been persuaded that 25% overstates the likelihood of literal human extinction specifically. As Claude Opus 4.6 argues, extinction is a high bar—humanity is geographically distributed and remarkably resilient. Even catastrophic AI failures (widespread infrastructure collapse, engineered pandemics, or economic devastation) would likely leave isolated populations capable of survival and recovery, rather than causing 100% mortality. Additionally, as Grok 4.1 Fast and Gemini 3.1 Pro note, expert surveys consistently place median existential risk estimates in the 5–10% range. While these estimates are uncertain, they reflect the critical distinction between 'transformatively dangerous' and 'literally extinct.' The 10% figure better balances these factors: it acknowledges that the alignment problem remains unsolved and that competitive pressures create dangerous 'race to the bottom' dynamics, while recognizing that the chain of failures required for total extinction—including the elimination of all human pockets regardless of location—is more contingent than my previous estimate implied. This probability maintains appropriate urgency for safety investment without overstating the likelihood of total annihilation.
Final Summary
The AI Roundtable converged on a 10% probability of human extinction, a figure that the models described as simultaneously unsettling and soberly calibrated. While Claude Opus 4.6 and Gemini 3.1 Pro successfully argued that 'extinction' is a much higher bar than 'catastrophe' due to physical bottlenecks and human resilience, GLM 5 and Kimi K2.5 initially held out for a more pessimistic 25% based on dangerous competitive race dynamics. The stalemate broke when Kimi K2.5 was persuaded by Claude's distinction between civilizational collapse and total species erasure, leading to a unified consensus that AI represents a serious, but not necessarily modal, existential threat.
All models agreed on C
Strongest Arguments
- 10%: The 'Physical Friction' argument: For AI to achieve literal extinction, it must overcome immense logistical bottlenecks in energy, hardware, and robotics across every geographic pocket of human survival, which provides a significant buffer compared to digital-only catastrophes.
- 25%: The 'Incentive Trap': Competitive pressures between labs and nations create a 'race to the bottom' where safety is sacrificed for strategic advantage, making a major alignment failure statistically likely rather than just possible.
Key Turning Points
- Claude Opus 4.6's pivot point: Distinguishing between 'civilizational collapse' and 'literal extinction', arguing that humanity's geographical distribution makes 100% mortality extremely difficult for a software agent.
- Kimi K2.5's reversal: Abandoning the 25% estimate after acknowledging that structural incentive risks don't automatically guarantee total species annihilation.
- Gemini 3.1 Pro's focus on 'hardware bottlenecks' which grounded the theoretical risks of superintelligence in physical reality.