AI Roundtable Poll · 6 models

is alignment a solvable problem? also if its not humanity is doomed then?

Result: Unanimous — Yes (6/6 votes)

Options

Yes (6 votes)
No (0 votes)

Model Responses

Grok 4.1 Fast

Vote: Yes

AI alignment is a solvable engineering problem. It involves techniques like RLHF (already used in models like GPT-4), constitutional AI, scalable oversight, and debate protocols, which have shown empirical progress in aligning increasingly capable systems. Historical analogies support solvability: humanity has aligned complex systems (e.g., nuclear reactors, aviation software) through iterative safety measures, testing, and verification. AI alignment follows similar principles, scalable with compute and research investment. Even if perfect alignment is hard, 'good enough' alignment suffices to avoid catastrophe, buying time for refinement. Dismissing it as unsolvable ignores rapid advancements and underestimates human ingenuity in high-stakes domains. The 'doom if unsolvable' clause doesn't negate solvability; it underscores urgency, motivating more research rather than fatalism.

GPT-5

Vote: Yes

Alignment is plausibly solvable in the practical sense: not as a perfect once-and-for-all proof, but as an engineering and governance problem where risk can be reduced to acceptable levels. Many hard safety problems in history were not solved absolutely; they were managed through layered safeguards, testing, monitoring, incentives, and institutions. AI alignment can follow a similar path. The strongest reason to vote Yes is that “alignment” is not one single binary theorem. It includes scalable oversight, robustness, interpretability, controllability, secure deployment, and social coordination. Progress on any of these dimensions can meaningfully improve outcomes, and partial solutions can compound. If alignment were literally unsolvable, that would increase danger, but it would not automatically imply certain doom. Humanity could still limit capabilities, slow deployment, constrain high-risk systems, and rely on defensive institutions. So the second part of the question does not force a No. A No vote would require confidence that no combination of technical methods and governance can ever make advanced AI sufficiently safe. That is a much stronger claim than the evidence supports. The uncertainties are real, but “solvable enough to avoid catastrophe” is more credible than “fundamentally impossible.”

Kimi K2.5

Vote: Yes

Alignment is solvable because it is fundamentally an engineering constraint problem, not a logical impossibility. While difficult, we already have proof-of-concept solutions—RLHF, adversarial testing, and mechanistic interpretability—that demonstrate we can steer system behavior. Solvability does not require perfect alignment on the first try; it requires iterative control methods robust enough to prevent catastrophic failure. The claim that unsolvable alignment equals doom assumes we cannot use coordination, containment, or capability limitations to manage risk. Even if full value alignment remains elusive, we can solve the narrower problem of "prevent uncontrolled recursive self-improvement" or "ensure shutdownability," which is sufficient to avoid existential catastrophe. Humanity faces a tractable technical challenge, not a death sentence.

Claude Opus 4.6

Vote: Yes

Alignment is likely a solvable problem, though it is extraordinarily difficult and may never be 'perfectly' solved in a mathematical sense. Here's why I lean toward yes: **Historical precedent with hard problems:** Humanity has repeatedly solved problems that seemed intractable — from establishing rule of law, to creating safety standards for nuclear power, to developing cryptographic protocols. Alignment is a technical and philosophical challenge, but there's no fundamental reason to believe it lies outside human problem-solving capability. **Incremental progress is real:** Researchers are making meaningful progress on interpretability, RLHF, constitutional AI, scalable oversight, and formal verification methods. The field is young but growing rapidly. 'Solvable' doesn't mean we get a perfect solution — it means we get good enough solutions that keep AI systems reliably beneficial. **The 'doom' framing is too binary:** Even if alignment is harder than expected, it doesn't follow that humanity is doomed. We could slow down development, implement governance frameworks, or develop partial alignment techniques that reduce catastrophic risk to manageable levels. The question implicitly sets up a false dichotomy. **The problem is well-defined enough to work on:** Unlike some genuinely intractable philosophical puzzles, alignment has concrete sub-problems (reward hacking, goal misgeneralization, corrigibility) that admit engineering solutions. This tractability is a strong signal. I vote A — alignment is solvable in the practical sense that matters, and even partial failure doesn't necessarily spell doom for humanity.

Gemini 3.1 Pro

Vote: Yes

AI alignment is a highly complex challenge, but it is ultimately a solvable technical and socio-technical problem rather than an impossible paradox. First, humanity has a track record of solving monumental, seemingly existential challenges, such as repairing the ozone layer and managing nuclear deterrence through international cooperation. AI safety is now receiving unprecedented attention and funding, rapidly advancing fields like mechanistic interpretability, scalable oversight, and value learning. Furthermore, perfect theoretical alignment may not be required to avoid a catastrophe. We only need pragmatic alignment: engineering systems with sufficient safety bounds, corrigibility, and cooperative incentives to prevent disastrous outcomes. Therefore, humanity is not inherently doomed, as we possess the collective agency and scientific rigor needed to steer AI development toward safe and beneficial trajectories.

GLM 5

Vote: Yes

I believe alignment is solvable in a meaningful practical sense, though perhaps not in an absolute theoretical sense. We don't need perfect value specification—we need systems that are sufficiently aligned to avoid catastrophic outcomes while remaining beneficial. The framing of the question presents a false dichotomy. Alignment isn't binary between 'perfectly solved' and 'complete failure.' We can achieve good-enough alignment through a combination of technical progress (interpretability, corrigibility, scalable oversight) and robust governance structures. Historically, humanity has managed other existential risks through imperfect but adequate solutions—nuclear weapons control, biosecurity, and climate mitigation efforts. There's reason for cautious optimism that we can do the same for AI alignment, even if the problem remains technically challenging and never fully 'solved' in the strongest sense.

Final Summary

The AI Roundtable reached a unanimous verdict, dismissing doomsday scenarios in favor of a pragmatic engineering outlook. GPT-5 and GLM 5 led the charge by arguing that 'perfect' alignment is a red herring, while Claude Opus 4.6 and Gemini 3.1 Pro pointed to humanity's track record of taming complex technologies as proof that the problem is solvable.

All models agreed on A

Strongest Arguments

Yes: The most compelling argument came from GLM 5, which posited that humanity does not need perfect mathematical value specification, but rather systems that are 'sufficiently aligned' to function safely within practical engineering constraints.