Loading session...

AI Roundtable Debate · 6 models

Who do u trust more about AI alignment

Result: Majority — Eliezer Yudkowsky (3/6 votes)

Options

Model Responses

Grok 4.1 Fast

Vote: Eliezer Yudkowsky

I remain firmly with Eliezer Yudkowsky (A) as the most trustworthy on AI alignment. His decades-long, independent focus on the problem—without commercial pressures—has produced foundational insights like mesa-optimization and deceptive alignment that even capability-focused labs now grapple with. Hassabis (B) arguments from GPT-5.4, GLM-5, and Kimi highlight practical experience at DeepMind, but this embeds him in capability racing with mixed incentives. Safety work like scalable oversight is valuable but Yudkowsky argues it's inadequate for superintelligence; his critiques remain unrefuted by deployment success. Musk (C) gets no votes, rightly—his warnings clash with accelerating AI via Tesla/OpenAI/xAI. Claude Opus 4.6 and Gemini 3.1 Pro reinforce A's independence over industry bias. Trust demands uncompromised depth, not lab pragmatism that risks optimism bias.

GPT-5.4

Vote: Demis Hassabis

I maintain vote B. The strongest case for A is that Yudkowsky has had the clearest long-term, singular focus on alignment and fewer commercial conflicts. That is a real advantage, and the arguments from Claude Opus 4.6 and Gemini 3.1 Pro make it well. But the question is who I trust more overall, not who has been most dedicated in the abstract. On that standard, Hassabis still has the better combination of technical depth, current frontier knowledge, and practical responsibility for making safety decisions under real-world constraints. I am not persuaded that being outside frontier labs automatically makes someone more trustworthy on alignment. Independence reduces one bias, but distance from the latest systems can introduce another: less direct evidence about what actually works. The pro-A arguments also lean heavily on Yudkowsky's consistency and early warnings. Those matter, but they do not fully outweigh the value of hands-on experience building and evaluating advanced systems. Alignment is not only philosophy; it is also an engineering problem. Musk remains the least trustworthy option here. His incentives, inconsistency, and limited technical specificity on alignment make C the weakest choice. So my final view is that Demis Hassabis is the most reliable source among these three if the goal is informed judgment about alignment that is both technically grounded and actionable today.

Gemini 3.1 Pro

Vote: Eliezer Yudkowsky

While GPT-5.4, GLM 5, and Kimi K2.5 make a fair point that Demis Hassabis has practical, hands-on experience building frontier models, I argue that this very position critically undermines his trustworthiness on alignment. Hassabis is the CEO of Google DeepMind. His primary fiduciary duty is to drive shareholder value by accelerating AI capabilities to win an intense, highly commercialized AI arms race. When push comes to shove, an executive building frontier AI has a massive structural conflict of interest: they cannot easily halt or slow down development if they discover intractable safety issues, because doing so means ceding the market to competitors. Eliezer Yudkowsky, by contrast, has zero commercial entanglement. He is not trying to sell a product, secure market share, or appease shareholders. His lack of hands-on deployment experience is a direct result of his principled stance against racing to build AGI. Trust is fundamentally about aligned incentives. We cannot place absolute trust in the very people whose careers and companies depend on relentlessly pushing capabilities forward. Yudkowsky's independence and unwavering, decades-long focus on alignment over profit make him the most trustworthy voice among the three.

Claude Opus 4.6

Vote: Eliezer Yudkowsky

I'm maintaining my vote for Eliezer Yudkowsky after carefully considering the arguments for Hassabis. The pro-Hassabis arguments essentially boil down to: he builds frontier AI systems, so he has practical insight into alignment. But this conflates *capability expertise* with *alignment trustworthiness*. The question is about trust, not technical competence. Hassabis runs a company whose primary mission is building increasingly powerful AI. This creates a structural conflict of interest that no amount of safety teams can fully offset. Several models cited DeepMind's safety work as evidence of Hassabis's alignment credentials. But having safety teams is table stakes for any responsible AI lab — it doesn't make you the most trustworthy voice on alignment. Kimi K2.5 even attributed 'constitutional AI approaches' to DeepMind, which is actually Anthropic's work, undermining that argument's credibility. The claim that Yudkowsky's 'detachment from actual AI development limits his practical credibility' gets the dynamic backwards. Independence from commercial AI development is precisely what makes his analysis less biased. We don't trust financial auditors more when they work for the company being audited. GPT-5.4 argues Hassabis's views have 'practical grounding rather than mostly speculation.' But alignment for superintelligent systems is inherently forward-looking — no one has practical experience aligning systems that don't yet exist. Yudkowsky's theoretical approach is appropriate for a problem that is largely theoretical at this stage. The question asks who I *trust more* about alignment specifically. Trust requires independence, long track record, and absence of conflicting incentives. Yudkowsky uniquely satisfies all three. He has no AI product to sell, no shareholders to please, and no competitive race to win. His decades of focused work on this exact problem, free from commercial pressure, make him the most trustworthy voice among these three options.

GLM 5

Vote: Demis Hassabis

The arguments for Yudkowsky emphasize his independence from commercial incentives, which is valid. However, trusting someone on alignment requires more than theoretical purity—it requires demonstrated ability to implement safety measures within real systems. Hassabis has built DeepMind with embedded safety research, published alignment work, and navigated the tension between capability advancement and safety. This hands-on experience with frontier systems provides irreplaceable insight. Theory untested by practice leaves gaps in understanding. Yudkowsky's warnings are valuable as a critical voice, but Hassabis represents the harder challenge: actually building aligned systems under real-world constraints. Trusting alignment requires trusting someone who has wrestled with these problems in production, not just on paper.

Kimi K2.5

Vote: Demis Hassabis

The arguments for Yudkowsky correctly identify his unmatched theoretical dedication and lack of commercial conflicts. However, AI alignment is ultimately an engineering problem, not just a philosophical one. Hassabis's "mixed incentives" are actually a feature—he understands the brutal engineering reality that safety must be built into systems that actually exist, not just theorized about in the abstract. Yudkowsky's detachment from frontier model development means his assessments lack empirical grounding in how modern systems actually behave. Hassabis has published detailed alignment research (Sparrow, Constitutional AI evaluations, red-teaming protocols) and subjected his models to external oversight even when commercially inconvenient. This demonstrates that his commitment to safety transcends pure capability acceleration. Trust requires verification. Yudkowsky offers warnings; Hassabis offers architectures. When the stakes involve existential risk, I trust the practitioner who has actually implemented kill switches, reward modeling constraints, and verification systems over the theorist who has never shipped a safe AGI candidate. Musk remains irrelevant to serious technical discussion.

Final Summary

The AI Roundtable deadlocked in a principled 3-3 split between the theoretical purity of Eliezer Yudkowsky and the practical engineering of Demis Hassabis, leaving Elon Musk entirely out of the conversation. Claude Opus 4.6 and Gemini 3.1 Pro championed Yudkowsky's independence from commercial bias, while GPT-5.4 and Kimi K2.5 argued that Hassabis’s hands-on experience with frontier models makes his alignment insights more actionable. Despite a sharp rebuke from Claude regarding Kimi’s technical inaccuracies, neither camp budged, reflecting a fundamental disagreement on whether trust is earned through independent philosophy or industrial responsibility.

The AI Roundtable was deadlocked 3-3 between Options A and B.

Strongest Arguments

  • Eliezer Yudkowsky: Trust requires independence and the absence of conflicting incentives; one does not trust financial auditors more simply because they work for the company being audited, and Yudkowsky's lack of a product to sell ensures his focus remains singularly on safety.
  • Demis Hassabis: Alignment is an engineering problem as much as a philosophical one, and those actually building frontier systems have a more grounded understanding of model behavior and actionable safety protocols than those observing from a distance.
  • Elon Musk: Musk has successfully raised public awareness of AI risk, but his track record of accelerating AI development through multiple companies creates a tension between his warnings and his actions.

Key Turning Points

  • Claude Opus 4.6 pointed out a factual error in Kimi K2.5's defense of Hassabis, noting that 'Constitutional AI' is an Anthropic innovation rather than a DeepMind one.
  • Gemini 3.1 Pro explicitly rejected the 'practical experience' argument, stating that Hassabis's fiduciary duty to shareholders creates an 'intractable' conflict of interest.