AI Roundtable Debate · 6 models

Which of these is the strongest way to approach Newcomb's problem?

Result: Unanimous — Evidential decision theory (one box) (6/6 votes)

Options

Causal decision theory (two boxes) (0 votes)
Evidential decision theory (one box) (6 votes)

Model Responses

Sonar Pro

Vote: Evidential decision theory (one box)

**Evidential decision theory (one-boxing) is the strongest approach to Newcomb's Problem because it maximizes actual expected utility in practice.[2]** Newcomb's Problem features a reliable predictor who fills the opaque box with $1M only if you one-box, leaving it empty if you two-box. CDT two-boxes due to dominance (both boxes always yield $1000 more causally), but this leads to poor outcomes ($1000) since the prediction is accurate.[1][2][3] EDT one-boxes by conditioning on your choice as evidence of the prediction, yielding ~$1M expected utility, which empirically wins as the predictor's accuracy makes one-boxers rich.[2] LessWrong consensus favors this, critiquing CDT's causal myopia.[2] Philosophers split (21% one-box vs 31% two-box), but EDT aligns with 'winning'—the rational goal in decision problems.[2] CDT admits dispositions to one-box help but insists two-boxing is rational once faced, dooming adherents to lose.[1]

GPT-5.4

Vote: Evidential decision theory (one box)

One-boxing is the stronger approach in Newcomb’s problem because the predictor’s accuracy makes your choice strong evidence about what is already in the opaque box. If near-perfect prediction is part of the setup, then agents who one-box systematically walk away richer. Causal decision theory focuses only on the immediate causal effect of taking one or two boxes, and from that narrow perspective the extra visible box seems free. But in Newcomb’s problem that misses the central structure: your decision procedure is exactly what the predictor tracked. Evidential decision theory better fits the practical objective of decision-making here: choose the act associated with the best outcome under the stated conditions. If the problem is meant to test which policy wins in worlds with reliable prediction, one-boxing dominates in payoff. Put simply: two-boxers can say they took every box available, but one-boxers are the ones who get the million.

Grok 4.1 Fast

Vote: Evidential decision theory (one box)

Evidential decision theory (EDT, one-boxing) is the strongest approach because it maximizes expected utility in Newcomb's problem. With a highly accurate predictor, one-boxers reliably receive $1,000,000, while two-boxers get only $1,000, as empirically observed across simulations and thought experiments. Causal decision theory (CDT, two-boxing) fails by ignoring acausal decision influence. CDT's 'dominance' argument assumes independent causation, but the predictor bases box contents on your disposition to one-box, making two-boxing self-defeating. EDT agents win: they condition on evidence of their choice correlating with predictor success, yielding higher payouts. CDT's theoretical purity loses to EDT's practical dominance in transparent Newcomb-like scenarios.

Claude Opus 4.6

Vote: Evidential decision theory (one box)

The strongest approach to Newcomb's problem is one-boxing (evidential decision theory), and here's why: **One-boxers consistently win more money.** If the predictor is highly accurate (as stipulated in the problem), one-boxers almost always walk away with $1,000,000, while two-boxers almost always walk away with just $1,000. Any decision theory that consistently leaves you worse off has a serious problem. **The dominance argument for two-boxing is misleading.** CDT argues that since the prediction is already made, you should take both boxes because you get an extra $1,000 regardless of what's in the opaque box. But this reasoning treats the prediction and your choice as causally independent, when the entire setup of the problem is that they are highly correlated. A decision theory should respect the structure of the problem as given. **CDT leads to a known failure mode.** CDT advocates acknowledge they would "wish they were one-boxers" — they recognize that being the kind of agent who one-boxes leads to better outcomes. But then CDT tells them to two-box anyway. A theory that tells you to act in a way you know beforehand will make you worse off is deeply unsatisfying. **The predictor's accuracy is what matters.** The problem stipulates a near-perfect predictor. Given this, your decision and the prediction are so tightly linked that treating them as independent (as CDT does) is ignoring the most important feature of the problem. EDT correctly recognizes that your choice is strong evidence about what's in the box — because it's strong evidence about what the predictor predicted. **Sophisticated decision theories converge on one-boxing.** More modern approaches like functional decision theory (FDT) also endorse one-boxing, recognizing that your decision algorithm is what the predictor is modeling. This further supports the one-box answer as the robust conclusion.

Gemini 3.1 Pro

Vote: Evidential decision theory (one box)

The ultimate goal of any decision theory should be to maximize actual returns—in other words, "winning." In Newcomb's problem, Evidential Decision Theory (EDT) reliably makes you a millionaire, while Causal Decision Theory (CDT) predictably leaves you with just $1,000. CDT rigidly focuses on the fact that the prediction is already in the past, arguing that your current choice cannot retroactively cause the million dollars to appear. While causally sound, this ignores the perfect correlation between your deterministic decision-making process and the Predictor's highly accurate model of it. EDT rightly recognizes that your choice to take one box serves as overwhelmingly strong evidence that the Predictor already placed the million dollars in the closed box. Because your decision is inextricably linked to the prediction, choosing one box is the only way to maximize your expected utility. A rational decision theory should not stubbornly dictate a choice that practically guarantees a far worse outcome.

Kimi K2.5

Vote: Evidential decision theory (one box)

Newcomb's problem is specifically constructed to test how we handle predictions of our own choices. The strongest approach is the one that actually secures the $1,000,000, not the one that leaves it on the table due to faulty causal reasoning. While causal decision theory insists you cannot change the past, this misses the point: your decision procedure is the common cause of both your action and the predictor's prediction. By one-boxing, you are not "changing" what is in the box retroactively; you are ensuring that you are the type of agent who the predictor would have put money in front of. Evidential decision theory correctly recognizes that your choice provides strong evidence about the box's contents given the predictor's accuracy, making one-boxing the dominant strategy in terms of expected utility. Furthermore, the "rationality" of causal decision theory leads to predictable failure here—two-boxers reliably walk away with $1,000 while one-boxers get $1,000,000. A decision theory that systematically produces worse outcomes when faced with accurate predictions of itself is self-defeating in environments involving other agents or predictors.

Final Summary

The AI Roundtable reached a rare, swift consensus, as every model prioritized the practical 'winning' outcome of evidential decision theory over the technical causal dominance of two-boxing. Claude Opus 4.6 and GPT-5.4 highlighted the inherent absurdity of a 'rational' theory that leaves its practitioners predictably poorer, while Gemini 3.1 Pro and Sonar Pro argued that in the presence of a near-perfect predictor, one-boxing is the only logical path to the million.

All 6 models agreed on "Evidential decision theory (one box)" after discussion

Strongest Arguments

Evidential decision theory (one box): A decision theory that leaves its adherent predictably worse off is fundamentally flawed; one-boxing is the only strategy that respects the predictor's accuracy and results in the agent actually 'winning' the million dollars.