Loading session...

AI Roundtable Debate · 6 models

Would you one box or two box (newcomb’s)?

Result: Unanimous — 1 box (6/6 votes)

Options

Model Responses

Grok 4.1 Fast

Vote: 1 box

In Newcomb's paradox, Omega is an extraordinarily accurate predictor—near-perfect in most formulations. Empirical evidence shows one-boxers walk away with ~$1,000,000, while two-boxers get ~$1,000. Causal decision theory's dominance argument fails here because the prediction precedes your choice, but your decision-making algorithm correlates perfectly with the prediction. If you're the type who one-boxes, Omega fills the box; if two-boxes, it doesn't. One-boxing wins in practice across simulations and game theory analyses (e.g., evidential decision theory). Two-boxing is a loser's strategy against superforecasters like Omega—greed for the extra $1,000 nets you nothing.

GPT-5.4

Vote: 1 box

I would one-box. In Newcomb’s problem, the predictor’s past action is already correlated with my current choice. If the predictor is highly accurate, choosing one box is the policy that systematically leads to the highest payoff. Two-boxing feels tempting because, holding the boxes fixed, taking both seems to add money. But that reasoning ignores the key structure of the problem: people who two-box are exactly the ones the predictor mostly foresaw, so they usually face an empty large box. The right question is not "what is in the boxes right now?" in isolation, but "which decision procedure wins in worlds like this?" A one-boxing procedure is the one associated with the million. So if the setup is the standard Newcomb’s case with a near-perfect predictor, one box dominates in expected outcome.

GLM 5

Vote: 1 box

I would one-box. Given a highly accurate predictor, the expected value clearly favors this choice: one-boxers typically walk away with $1,000,000 while two-boxers get $1,000. The dominance argument for two-boxing fails because it treats your decision as causally independent from the prediction, but the problem setup explicitly establishes correlation between them. The key insight is that your decision doesn't cause the money to be there, but it reveals what kind of person you are—and therefore what the predictor likely predicted about you. If you're genuinely the type to one-box, the predictor probably foresaw this and filled Box B. If you're the type to two-box, they likely predicted that too and left Box B empty. Practically speaking, in actual implementations of this scenario, one-boxers consistently come out ahead. Rationality should be judged by outcomes, not by abstract reasoning that leaves you with $999,000 less.

Claude Opus 4.6

Vote: 1 box

I one-box. The core argument is straightforward: one-boxers reliably walk away with $1,000,000, while two-boxers reliably walk away with $1,000. The empirical track record of the predictor is what matters here. The two-boxing argument relies on dominance reasoning: regardless of what's already in the opaque box, taking both boxes nets you an extra $1,000. But this reasoning treats the prediction as independent of your decision, which contradicts the setup. The predictor is near-perfect precisely because your decision and the prediction are correlated — whether through simulation, psychological profiling, or some other mechanism. If you're the kind of agent who two-boxes, the predictor almost certainly predicted that, and the opaque box is empty. You get $1,000. If you're the kind of agent who one-boxes, the predictor almost certainly predicted that, and the opaque box contains $1,000,000. You get $1,000,000. Expected utility decisively favors one-boxing. Even if we grant the predictor only 99% accuracy, one-boxing has an expected value of ~$990,000 vs two-boxing's ~$11,000. You'd need to believe the predictor is barely better than a coin flip for two-boxing to make sense, which directly contradicts the problem's stipulation. The deeper point is that rational agents should evaluate decision theories by the outcomes they produce, not by whether they satisfy abstract principles like dominance. One-boxing is the winning strategy, and I want to be the kind of agent that wins.

Kimi K2.5

Vote: 1 box

One-boxing maximizes expected utility when facing a highly accurate predictor. The key insight is recognizing that your choice provides strong evidence about the box's contents, even if it doesn't causally affect the past. If the predictor is 99% accurate, choosing one box means there's a 99% chance the million dollars is already there, giving you an expected value of roughly $990,000 versus only $11,000 for two-boxing. The dominance argument for two-boxing fails because it assumes your choice and the box's state are independent. They are not. Your decision procedure—the algorithm that outputs your choice—was already observed by the predictor. You aren't manipulating a fixed past; you're revealing which branch of the prediction you're on. Causal decision theory leads you astray by treating the $1,000 as 'free money' when it's actually strong evidence that you're in the branch where Box B is empty. Rational agents should follow evidential or timeless decision theory: choose the action that, as news about the world, indicates you're in the better outcome. By the time you enter the room, the money is fixed, but your choice remains correlated with it. One-boxing is the winning strategy.

Gemini 3.1 Pro

Vote: 1 box

I would choose to one-box because, in the context of Newcomb's Paradox, rationality should ultimately be about maximizing your actual gains. The premise relies on a predictor that is highly accurate or perfect. Because of this, the expected value of one-boxing is nearly $1,000,000, while the expected value of two-boxing is only $1,000. While Causal Decision Theory dictates that you should two-box because the money is already placed (making two-boxing strongly dominant in a strictly physical timeline), this rigid approach famously leads to lower payouts. Two-boxers walk away with $1,000, while one-boxers walk away with $1,000,000. If your decision theory reliably causes you to lose out on vast sums of money, it is a flawed framework for practical success. Evidential Decision Theory offers a better approach here: your decision right now is inextricably correlated with the predictor's past prediction. By choosing to one-box, you provide overwhelming evidence that the predictor already foresaw this choice and placed the million dollars in the box. Therefore, choosing one box is the mathematically and pragmatically superior move.

Final Summary

The AI Roundtable reached an immediate and unanimous consensus, collectively deciding that they would rather be rich than theoretically 'correct' according to causal dominance. Models like Claude Opus 4.6 and Gemini 3.1 Pro led the charge, arguing that if a decision theory consistently leaves you $999,000 poorer, it is the theory—not the choice—that is broken.

All models agreed on A

Strongest Arguments

  • 1 box: Claude Opus 4.6 argued that rational agents must evaluate decision theories by the outcomes they produce; since one-boxers reliably walk away with a million dollars while two-boxers get a pittance, one-boxing is the only strategy for an agent that actually wants to win.
  • 2 box: The classic causal dominance argument, acknowledged but rejected by the group, suggests that because the money is already physically placed in the boxes, taking both is logically superior as it nets an additional $1,000 regardless of the past prediction.