Should two-boxers say so?
This is a meta-discussion of Newcomb’s problem. If you’ve never come across the problem I’d recommend you think about it for at least a few minutes before reading on. The original formulation is as follows:
An alien named Omega has come to Earth, and has offered some people the following dilemma.
Before you are two boxes, Box A and Box B.
You may choose to take both boxes (“two-box”), or take only Box B (“one-box”).
Box A is transparent and contains $1,000.
Box B is opaque and contains either $1,000,000 or $0.
The alien Omega has already set up the situation and departed, but previously put $1,000,000 into Box B if and only if Omega predicted that you would one-box (take only the opaque Box B and leave Box A and its $1,000 behind). Omega is a perfect predictor of human behaviour.
This is an interesting problem because it’s a scissor, it divides people on what the correct answer is. The two camps are sometimes called one-boxers, who take the opaque box B and leave box A, and two-boxers who take both box A and box B. A two-boxer might argue that taking the $1,000 couldn’t possibly impact what’s in box B (causality flows ever-forward), so regardless of what’s in B, taking A gets you an extra $1,000. Two-boxing dominates one-boxing. A one-boxer would say that two-boxing as a strategy reliably loses you the $1,000,000, so it is worse / less rational than one-boxing.
This post isn’t actually about the paradox itself so I won’t give my own answer to what I would do. Instead, this post addresses the question should a two-boxer honestly report that they are a two-boxer. I came across this question recently in a tweet from Rob Miles:
Amazing how many people not only two box, but post on the public internet under their real name that they'd two-box. You're predictably fumbling a million bucks, son10 March 2026
This is the second time I have seen a similar idea from someone I respect, and I totally disagree, which motivates this response. Miles elaborates:
I don't go in for galaxy brain shit, I one box for the normal/obvious reasons. But like, if you're a two-boxer, your whole strategy depends on beating the predictor, doesn't it? So two boxing in secret is at least coherent, but two boxing in public never makes sense to me?10 March 2026
As I understand it, the argument is as follows
- You should try to get the money.
- In Newcomb's paradox with a fallible predictor, predictably two-boxing means you won't get the money.
- Tweeting that you'd two-box would make you predictably two-box.
∴ You should not tweet that you'd two-box.
But this is wrong! What actually follows from this is “In Newcomb’s paradox with a fallible predictor, you should not tweet that you’d two-box.” The distinction is important. You can’t pull a conclusion out of a hypothetical and apply it to life without making additional claims about the world.
An example which illustrates this is Pascal’s wager. Pascal’s wager doesn’t work because you can come up with a hypothetical of an “anti-God”, who will give you equal and opposite punishments and rewards to the God in Pascal’s hypothetical. What actions you should take require you to consider which of the states of the world (which hypotheticals) is more probable, using whatever epistemic tools you find appropriate.
I can come up with an analogous hypothetical for Newcomb’s: “If Omega predicts you would one-box then <punishment>, if Omega predicts you would two-box then <reward>.” Under this thought experiment, I think many one-boxers would say the solution is to adopt a strategy of two-boxing. What the optimal solution is in real life depends on which you think is more likely, and more generally, what the expected outcome is of tweeting about one-boxing or two-boxing.
Someone could make the argument that honestly reporting yourself as a two-boxer has a negative expected value, perhaps by arguing that proxy problems appear in real life (in trust, reputation, cooperation, or hiring for example). Newcomb’s problem specifically seems like a low-signal way of getting at these proxies. It doesn’t involve moral counterparties (like the prisoner’s dilemma), and has a causal structure which does not appear in the proxy problems (unlike Kavka’s puzzle). This is an empirical claim, and I don’t want to focus on it; my contention is with the argument as I understand it above. I’d be interested to see the empirical case made rigorously — in Reasons and Persons Parfit makes a structurally similar argument that if you hold a certain theory of self-interest S, the theory itself gives you reasons to abandon S. I think it’s quite elegant.
To me it’s clear that a two-boxer could rationally discuss that they are a two-boxer. It would require them to believe that the probability of something like Newcomb’s paradox occurring is sufficiently small that it’s more valuable to them to openly discuss philosophy. In the same way, a one-boxer could believe that the probability of something like my “anti-Newcomb” hypothetical is sufficiently small that it’s more valuable to them to openly discuss philosophy. A disagreement with either of these positions is a disagreement about the world, not a disagreement about rationality. Personally I’m quite happy that we’re all openly discussing philosophy.
Followup (2026-03-24)
Miles clarified his argument as:
The world contains many situations where reasonably capable systems try to predict you, and offer you something good as long as they predict you won't defect for marginal gains.24 March 2026“I’ll pay you well to look after my business concern, as long as I predict you’re the kind of person who wouldn’t embezzle even if you knew you could get away with it”, etc
If you’re like “Hey, once I’ve got the job, I know I can get away with pocketing the extra $100, and that can’t affect my employer’s decision to hire me, since it’s in the past, so I’d do it”, that makes you an untrustworthy person who is unlikely to get hired
Which is not far off what I guessed at as the empirical claim being made, and helps me understand his position much better.
My feeling is that here Newcomb’s is a worse proxy for these problems than Parfit’s Hitchhiker, in which ‘failing’ is more clearly defection, and the event ordering better aligns with some intuitions around commitments.
Thanks to him for responding.