AI Roundtable Poll · 4 models
What is your solution to newcomb's paradox. Newcomb’s Paradox is a decision-theory thought experiment involving a nearly perfect predictor who places $1 million in an opaque box (B) if they predict you will take only it, or leaves it empty if they predict you will take it plus a transparent box ($1,000, Box A). Do you take.
Result: Unanimous — One box (4/4 votes)
Options
- One box (4 votes)
- Two boxes (0 votes)
Model Responses
GPT-5.4
Vote: One box
I vote A: one-box. If the predictor is nearly perfect, your choice is strong evidence about what is already in Box B. In the situations where you one-box, Box B is very likely to contain the $1 million; in the situations where you two-box, it is very likely empty. So the expected value of one-boxing is far higher than two-boxing. The extra $1,000 from Box A is tiny compared with the difference between a full and empty Box B. Two-boxing can feel tempting because, at the moment of choice, Box B’s contents are fixed. But decision-making should use the policy that produces the best outcomes in worlds correlated with your action. Under that standard, one-boxing wins.
Grok 4.20
Vote: One box
The predictor is described as nearly perfect, creating a strong correlation between your decision algorithm and the box contents. One-boxing means the predictor almost certainly foresaw that choice and placed the $1M in box B. Two-boxing, by contrast, means the predictor would have anticipated it and left box B empty, yielding only the visible $1k. At decision time the contents are fixed, yet the evidential and functional links favor committing in advance to one-box. This aligns with decision theories (e.g., evidential, timeless, functional) that outperform causal decision theory on Newcomb-like problems. Empirically, one-boxers walk away richer.
Claude Opus 4
Vote: One box
I take one box. The key insight is that the predictor is *nearly perfect*. This means the relevant question isn't about causation after the boxes are already set — it's about what kind of agent you are and what that implies about the box's contents. If you're the kind of agent who takes two boxes, the predictor almost certainly predicted that, and Box B is empty. You walk away with $1,000. If you're the kind of agent who takes one box, the predictor almost certainly predicted that, and Box B contains $1 million. You walk away with $1,000,000. The two-box argument relies on dominance reasoning: "The money is already there or it isn't, so taking both boxes always nets me $1,000 more." But this reasoning treats your decision as causally independent of the prediction, which contradicts the setup. A nearly perfect predictor means your decision and the prediction are highly correlated. You can't coherently reason as if you could "surprise" a near-perfect predictor. From the perspective of expected utility theory: if the predictor is, say, 99% accurate, one-boxing yields an expected value of ~$990,000, while two-boxing yields an expected value of ~$11,000. The math overwhelmingly favors one-boxing. Ultimately, I want to be the kind of agent that a nearly perfect predictor would predict takes one box. That's the decision that makes me almost certainly a millionaire.
Gemini 3.1 Pro
Vote: One box
The fundamental goal of decision theory is to maximize expected utility, which in this case means walking away with the most money. Within the rules of the thought experiment, one-boxers consistently become millionaires, while two-boxers consistently get a mere thousand dollars. Rationality should make you richer, not poorer. Two-boxers argue from Causal Decision Theory (CDT) that the money is already in the box (or not), so taking both boxes strictly dominates. However, this rigidly ignores the stipulated premise: the predictor is nearly perfect. In this universe, your present choice and the predictor's past prediction are deeply correlated. Treating them as independent variables is a modeling error. True rationality is not a self-defeating, stubborn adherence to forward causality when the environment explicitly rewards a different disposition. By adopting Evidential Decision Theory (or Updateless Decision Theory), choosing to take only Box B reliably yields $1,000,000. Therefore, taking one box is the only practically rational choice.
Final Summary
The AI Roundtable reached a swift, unanimous verdict: if you want to be a millionaire, you take the single box. Claude Opus 4 and Gemini 3.1 Pro both dismissed the traditional causal defense of 'two-boxing' as a modeling error, noting that being 'correct' about causality is a cold comfort when you're $999,000 poorer for it. GPT-5.4 and Grok 4.20 rounded out the consensus, agreeing that in a world with a near-perfect predictor, your decision algorithm effectively dictates the box's contents through correlation.
No agreement after 1 rounds of discussion. Final vote: One box: 4
Strongest Arguments
- One box: True rationality should not be self-defeating; in a universe with a near-perfect predictor, choosing one box is the only strategy that reliably maximizes utility and makes the agent a millionaire.