The Snake Eyes Paradox
In which I solve all extant version of the Snake Eyes Paradox (and create a new one)
About a year ago, I was alerted to paradox in probability called the Snake Eyes Paradox by a market on Manifold.
The problem goes something like this:
A Game Master has found an unlimited number of people willing to play the following game:
In Round one, the GM selects one person and rolls two six-sided dice. If they both come up 1, the unlucky participant is thrown into a pit of vipers. Otherwise, they receive a prize that they think was worth risking their life over.
If the first participant is thrown into the viper pit in Round 1, the game ends there. Otherwise, it continues to Round 2, where 2 participants are chosen. Once again, 2d6 are rolled, and if they come up snake eyes, both participants are thrown into the pit of vipers and the game ends.
The game continues in this manner: On Round n, 2^(n-1) participants are chosen, and the dice are rolled. It’s always a completely new set of participants, no one plays on multiple rounds. If snake eyes is rolled, this round’s participants are all thrown into the viper pit, and the game ends. Otherwise, they receive a prize and the game goes on to the next round.
The question is, if you learn that you were selected as a participant, what is your probability of being thrown into the viper pit? In other words, what is P(“You are thrown into the viper pit”|“You are selected for one of the rounds of the game”)?
When I first heard this problem, I was not sure of the answer, because there seem to be good arguments for both 1/36 and ~1/2. The problem felt intuitively similar to the Sleeping Beauty problem, so I (and many others) thought the answer might depend on what assumptions you make about anthropic reasoning. However, the answer is actually unambiguous and has nothing to do with anthropics. In fact, the answer can be directly calculated using only classical probability theory, unlike the Sleeping Beauty problem.
The argument for 1/36
This one is pretty simple. The probability of rolling snake eyes on any given roll of the dice is 1/36. You will be thrown into the viper pit if and only if the dice roll snake eyes on the round for which you were chosen, and no factor related to your being chosen affects the roll of the dice, nor does the roll of the dice on that round affect your chance of being chosen.1 Therefore, you have a 1/36 chance of dying.
You can also look at it this way: There is a 1/36 chance that snake eyes will be rolled on Round 1. Therefore, the probability that you’ll be thrown into the pit, given that you’re selected for Round 1, is 1/36. Similarly, conditional on Round n occurring, there’s a 1/36 probability that snake eyes will be rolled on Round n. So, conditional on you being chosen for Round n, there’s a 1/36 chance that you’re thrown into the pit. Since there’s a 1/36 chance no matter what round you’re chosen for, the probability that you’re thrown into the pit given that you’re chosen at all is 1/36.
This looks like a solid mathematical proof. After all, what would it look like to say that the probability is not 1/36? On the one hand, maybe it would mean that the probability of being thrown into the pit given that you’re chosen for a particular round is 1/36, but the probability that you’re thrown in given that you’re chosen at all is something else. But this is clearly absurd. Being thrown in at all is just the disjoint union of being thrown in at any particular round, taken over all rounds. If the probability conditional on any given round is 1/36, the probability conditional on the disjoint union of all rounds has to be 1/36 as well. To see why, imagine that you are selected for the game, but you don’t know what round you’ll be playing on. You know that on any given round, the probability of losing is 1/36. Therefore, you know that if you were given an extra piece of information, in particular, what round you are playing on, you would update your probability to 1/36 no matter what that extra information is. But if that’s the case, then you should just update your probability to 1/36 now, since you don’t need that extra piece of information to tell you that’s what it should be. Holding any other credence would violate the law of total expectation.2
Okay, so we’ve established that, if the probability of dying in each round is 1/36, then the overall probability of dying, when you don’t know which round you’re playing on, must also be 1/36. So maybe the probability on each round isn’t actually 1/36? But this is equally absurd. Death on each round is determined by the dice. The dice are fair, and the setup of the problem doesn’t affect their probability of rolling snake eyes each time the GM rolls them. Insisting that the probability of death on each round is not 1/36 would mean insisting that the setup of the problem somehow magically influences the dice to make them more or less likely to roll snake eyes.
Okay, well if these arguments are so decisive, why isn’t the answer obviously 1/36? Well…
The argument for ~1/2
Since the number of people selected doubles with each round, the total number of people who play the snake eyes game is 2^N-1, where N is the total number of rounds. But the total number who die is 2^(N-1), so the probability of dying, given that you are selected for the game and that there are N rounds, is
This is always larger than 1/2, so the overall probability of dying should be larger than 1/2. To find the exact probability, we multiply the probability of dying, conditional on there being N rounds, with the probability of there being N rounds, and sum over all N:
Python tells me that this converges to about 0.5218872774242399, so you have a slightly greater than 52% chance of dying.
Actually, there’s one small thing I’ve glossed over here: The probability of there being N rounds is not independent of your probability of being chosen. The more rounds there are, the likelier you are to be chosen. So instead of using (35/36)^(N-1), we should use the probability of there being N rounds conditional on you being chosen for one of them. The exact conditional probabilities depend on how likely you are to be chosen for any given round,3 but whatever they are, they don’t change this conclusion by much. As the probability that you’re chosen on the first round approaches 1, the expected proportion of people who die, conditional on you being chosen, approaches the 52% value we just calculated. And as the probability that you’re the mth participant chosen (not the same as being chosen on Round m) approaches zero for all m, the expected proportion of people who die, conditional on you being chosen, approaches 1/2 (since then more weight is given to longer games with more opportunities for you to be chosen, and in those games a proportion closer to 1/2 of people die).4
Again, the logic here seems pretty impeccable. How can you, knowing that you’re part of a group of people at least half of whom will die, say that your probability of dying is much less than that? But it comes to a different answer than the first calculation. At least one of them must be wrong.
Another approach
Okay, let’s try to figure this out more formally. The problem statement is a little ambiguous. How does the GM pick people from an unlimited group to play the game? He must pick according to some probability distribution. Normally we would assume this to be a uniform distribution,5 but there is no uniform distribution over a countably infinite number of people.6 So for now, we’ll just assume that there’s some distribution, but we won’t assume anything about it. The original argument for 1/36 didn’t depend on the distribution anyway, so maybe we’ll get lucky and it won’t matter.
Here’s the first idea: Assume we have countably infinitely many people, each labeled with a positive integer, such that the labeling is a bijection between the set of people and the set of positive integers. Now assume that Person 1 will play on Round 1, Persons 2 and 3 will play on Round 2 if it happens, Persons 4 through 7 will play on Round 3 if it happens, and so on. Then we can explicitly calculate what the probability of Person k being chosen for the game is, and what the probability that Person k dies is. The former is
and the latter is
So the conditional probability is
no matter what k is. This is looking pretty good for Team 1/36.
But hold up, haven’t we changed the problem by assuming the GM is selecting the participants in a fixed order, rather than randomly? Maybe we have, but surely finding out that you were the kth person won’t make a difference to your P(death). After all, if that would cause you to update P(death) to 1/36 no matter what, then you should just update now.
But let’s assume that, instead of choosing participants in a fixed order, we have some probability distribution p(k) describing how likely the kth person is to be chosen to participate on Round 1. In Round 2, we select two more participants using the same probability distribution with the participant chosen in Round 1 removed.7 The probability Pₙ(k) that Person k is chosen as the nth participant, given that there are at least n participants, can be calculated from p:
However, the exact value doesn’t matter - it will cancel out in the end. What’s the probability that Person k dies, given that they are chosen? Well, the probability that Person k is chosen at all is the sum over all rounds m of the probability that Person k is chosen in Round m. And the probability that Person k is chosen in Round m is the sum over all “draft numbers”8 in Round m of the probability that Person k is chosen with that draft number. In other words:
But the probability that Person k dies is just the sum over all rounds of the probability that k is chosen for Round m, times the probability that the players in Round m die, conditional on Round m happening at all. This latter probability is still just 1/36, so
The second line there is true because Person k being chosen for Round m doesn’t affect the probability that the players die in Round m (i.e., the dice are fair - they don’t care who was chosen for any round). Line 3 is true by applying the formula from the previous equation for P(k chosen for Round m).
So it looks like no matter what our distribution is, the probability that Person k dies is 1/36 times the probability that Person k is chosen to play. And since Person k can die only if they are chosen to play,9 this means that
Since we assume in the problem that you are one of the potential participants, i.e., you are Person k for some k,
I trust this method more than the arguments I gave in the two previous sections because it doesn't involve any ambiguity over how the participants are chosen, or what it means for you to be chosen for the game: You are Person k for some k, and no matter what the probability distribution for choosing participants is, or what the actual value of k is, your probability of dying, given that you are chosen, is 1/36. This conclusion is also what you get if you actually try to simulate the problem: You have to assign some probability distribution to each potential participant being chosen on each step, and no matter what that distribution is, you would find after enough runs that every participant is thrown into the viper pit 1/36 of the times they are chosen.
But then what is the problem with the argument for ~1/2?
The envelope paradox
My first instinct was that maybe the error is similar to the one that leads to the following paradox:
Suppose that Alice gives you two envelopes, labeled A and B, each with some money inside. She tells you that one envelope has twice as much money as the other inside it. Which envelope should you choose?
At first, this seems like a non-problem. You don’t know anything about which envelope contains more money, so you might as well choose envelope A. But then, suppose that envelope A contains $X. By the principle of indifference, you should assume that B is just as likely to be the envelope containing more money as A is. So, there’s a 50% chance that B is the better envelope, in which case it contains $2X, and a 50% chance that it’s the worse envelope and contains $X/2. The expected value of B then, is $1.25X, larger than the amount of money in envelope A. So, you should switch to B. But this can’t possibly be correct, since the same argument would also prove that you should switch from B to A.
The problem with the argument is that it assumes that X is some fixed amount, independent of whether envelope B has $2X or $X/2 in it. In reality, if B has $2X in it, then X is more likely to be small (since this means that A is the less valuable envelope), and if B has $X/2 in it, X is more likely to be large. More specifically, if B has $Y in it, then E(X|Y=2X) < E(X) < E(X|Y=X/2). Our actual calculation for the expected value of Y should be E(Y) = 0.5*2E(X|Y=2X) + 0.5*0.5E(X|Y=X/2), which will be equal to E(X), not 1.25E(X).
At first glance, it looks like the argument for ~1/2 may be committing a similar error. The intuitive argument that the probability should be greater than 1/2 could be phrased as something like, “If X people play, then more than X/2 people die, so the expected number of people who die is more than X/2.” But I don’t think this fully explains the error. The expected proportion of people who die, conditional on you playing, really is always larger than 1/2. Meanwhile, in the envelope paradox, the expected value of Y conditional on X is larger than X when X is small, but smaller than X when X is large. The error committed by the argument for ~1/2 is more subtle than this.
The real error
After the game is completed, if you pick a random person from the set of all people selected to play, there is in fact a 52% chance that that person was thrown into the pit of vipers. That is, after all, what we proved in the argument for 52%. This fact is indisputable, since 52% of people who play the game will die, in expectation.
Similarly, if you condition this on you being chosen, there really is a p∈[50%,52.188…%] chance that a random person picked from the set of all players died.
At the same time, there is only a 1/36 chance that you will die, conditional on you playing. In fact, there is a 1/36 chance that any participant will die, conditional on that participant playing. This fact is also indisputable, as we have shown that this is true for any potential participant.
The error is assuming that because the first two claims are true, the last must be false. This would be accurate for a game with only finitely many potential participants, but when you bring infinity into probability theory, stuff gets weird. For example, there’s the martingale betting strategy, which heavily resembles the Snake Eyes game. In this strategy, every time you lose a bet, you double the stake on the next bet. If you’re able to bet an arbitrarily high number of times (meaning you either have infinite money or can go infinitely far into debt), then you will gain whatever the stake of the initial bet was, with probability 1, even though every bet you made had zero expected value. Similarly, in the Snake Eyes game, the GM “raises the stakes” each round by doubling the number of players, ensuring that at least half of the players will die, even though, on each round, only 1/36 of the players die in expectation. It only works because he can select an arbitrary number of players: If there was a finite stock of players, then the GM would either have to let everyone live once he had run out (this would mean that, conditional on you being chosen, the game is much more likely to end this way, so the expected proportion of people who die really does go down), or kill everyone on the last round regardless of what the dice say (this would mean that you really do have a higher than 1/36 chance of dying).
Mathematically, this is a result of the fact that expected value does not always distribute over infinite sums. For example, let’s say you’re using the martingale betting strategy to bet on coinflips, with a starting wager of $2. We can define Pₙ to be your profit from the nth coin flip. Then Pₙ = $2ⁿ if you win the nth flip, -$2ⁿ if you lose, and $0 if you finish betting before you play the nth flip (i.e., if you win on one of the earlier flips). The total profit P is the sum of the Pₙ’s. It’s obvious that the expected value of each of the Pₙ’s is 0, and so, E(P) is also zero if you can only bet finitely many times before running out of money. But with the infinite martingale strategy, if turns out that E(P)=1.10 In other words,
A similar case occurs in the Snake Eyes game. Let’s define the “prediction error” χ for a conditional probability P(A|B) as
χ represents how much the actual event that occurred differs from our probabilistic prediction, and in which direction (negative if the event didn’t occur and positive if it did). χ should be pretty familiar to Manifold users: χ÷P(A|B) is equal to the profit you get if you bet 1 mana on YES on a prediction market asking, “Conditional on B, will A occur?”, which is currently at a probability P(A|B).11
No matter what A and B are, the expected value of χ has to be zero if we’ve assigned our probabilities correctly:
In the Snake Eyes game, we can let χₖ represent the prediction error for P(k dies|k chosen) = 1/36. In other words,
Then the expectation value of each χₖ is 0, but the expectation value of the sum of the χₖ’s is not.12 Even the expectation value of the sum of χₖ divided by the number of players is nonzero. This represents the fact that for any given Player k, 1/36 is the proper probability to give of them dying, conditional on being chosen (since the expectation value of χₖ would not be zero if we defined it using -p and 1-p for some p other than 1/36), but that it’s not the proper probability to give for, “A randomly chosen player (after the game concludes) is one of the ones who died.”
You might want to raise a frequentist objection: If we expect the frequency of chosen people who die to be >1/2, then how can we we say that the probability of death, conditional on being chosen, is 1/36? But remember the simulation I showed earlier: After many runs, the proportion of times each individual participant died indeed converged to 1/3.13 At the same time, the total proportion of participants who died converged to 1/2.14 But how can this be? How can every participant die only 1/3 of the time when the total proportion of participants who die is 1/2? To understand how, it will help to look at these graphs:
In this run of simulations, every participant shown with k>300 only ended up being chosen for the game once, and even for k values much smaller than that, there were not enough trials for the proportions to converge. For k>300, the proportion of times k died out of all the times k was chosen had to be either 0 or 1, and it ended up being 1 for most of those k values. If we average all of the estimated conditional probabilities from the graph on the right, weighted by the number of times each participant was chosen, we get a number >1/2 (about 51.8%). And even if we ran way more trials, this would still be the case, as long as we extended the graph to include any larger k values that were eventually chosen (it should converge to 50%). Does this violate the law of large numbers if P(death|chosen) is really 1/3? No. The law of large numbers says that, for each individual value of k, the number of times Player k dies divided by the number of times Player k is chosen will converge to P(k dies|k chosen) = 1/3, with probability 1. This means that the random function F(k),15 defined as the number of times k died divided by the number of times k was chosen, converges pointwise towards the constant function f(k)=1/3, but it need not converge uniformly. This is the key insight, explaining why it is that, even if we play the game many times, the total proportion of players who die is still >1/2, even while the probability of any individual person dying conditional on playing is <1/2. It’s not because the proportion of times any individual dies magically floats above the true probability for an indefinite time - that would violate the law of large numbers. Instead, it’s that there is always a large group of players that have only been chosen a few times and who have died most of the times they were chosen. In particular, the proportion of times they have died is much larger than their probability of death because they have been chosen too few times for the proportion of deaths to converge to P(death|chosen), and the mechanics of the game force the error between different k values to be correlated such that there are always more k values with a larger proportion of deaths than a smaller one. Since the probabilities of specific players being chosen, and specific players dying, are not independent,16 it’s fine for there to be this large group of players whose death proportions have correlated errors making them larger than average.
If I kept running more and more simulations, we would see the points currently on the right graph converge to 1/3. But we would also end up with new data points for larger k values as those values are selected for the first time, and those would mostly start out at 1. This would continue forever - smaller values of k have the death proportion converge to 1/3, while the big chunk of k values for which the death proportion is ~1 gets pushed to the right indefinitely. Since there are infinitely many available k values, there is no end to this process, and we will never see the total proportion of people who die converge on the proportion of times each individual person dies. If instead we cut off the simulation at a finite but very large k value (i.e., we have only finitely many participants and end the simulation with no one dying if we get through all of them), then we actually would see the total proportion of people who die converge to 1/3 (since more weight would be put on the simulations that had the largest number of players, i.e., the ones where everyone survived).
In summary:
In a single game, it’s possible to expect each player to only have a 1/36 chance of dying if they are chosen, while also expecting the total proportion of players who die to be ~52%, because there are infinitely many potential players, and the game does not meet the conditions for linearity of expected values over infinite sums (in particular, the sum of E(|χₖ|) is not convergent).
If we run the game on repeat (resurrecting each dead player between games), the proportion of times each player dies will converge to 1/36 (or 1/3 in my simulations) for every player, even though more than half of the people who play each game will die. This is possible because the convergence is faster for players who are chosen more often, and after finitely many games, there will always be players who have never been chosen yet or who have only been chosen a few times and therefore haven’t converged on the right probability.
From a frequentist standpoint, this last point gives a complete explanation of why the true probability really is 1/36, and why it diverges from the conditional expectation of the proportion of people who will die. And of course, a Bayesian and a frequentist should agree in cases where it’s possible to calculate the probability using frequentist methods alone.
Decision theory
(I’m not as certain about this part as the others. There are some weird issues involved here because of the infinite expected number of deaths and the zero-probability possibility that no one dies.)
You might still have some reservations about this solution. Even if 1/36 is the correct probability, should we make decisions as if 1/36 is correct, or 52% is correct, or 1/2 is correct, or some other value? And if we should make decisions as if a different value is correct, does it really make sense to say that 1/36 is the correct probability?
Let’s get more concrete. Here’s an example of the type of situation you might be worried about: Suppose that, in addition to all the potential players, we have a separate group, the Innocent Victims, which is also infinite. After each player is chosen, an innocent victim is also chosen. The player has to press a button, either A or B. If they press A, then the innocent victim will be thrown into the pit with them if the GM rolls snake eyes, and otherwise the innocent victim will be spared. If they press B, the innocent victim is thrown into the pit if the GM doesn’t roll snake eyes, and is spared on a snake eyes roll. If the player refuses to choose, the innocent victim is thrown into the pit no matter what.
What should you do if you’re chosen as one of the players in this scenario? On the one hand, the GM only has a 1/36 chance of rolling snake eyes, so it seems that you should press Button A, since that maximizes the innocent victim’s probability of survival. But if everyone presses button A, then in the end 2^(N-1) innocent victims will die (where N is the total number of rounds). On the other hand, if everyone presses button B, 2^(N-1) - 1 innocent victims will die (since that is the number of players who survive the game). So one extra person dies if everyone presses A, compared to what would happen if everyone pressed B. Therefore, shouldn’t you act as if A is more likely to lead to the innocent victim’s death than B, i.e., act as if your chances of death are greater than 1/2? And if you should act as if the probability is greater than 1/2, isn’t there some sense in which it’s rational to truly consider the probability to be >1/2?
Let’s analyze this in more detail. If each of the players is a causal decision theorist, they will indeed choose to press button A, leading to one extra person dying. They will do this even if they know that all of the other players are causal decision theorists as well. When they hear the argument, “But if you all press button A, an extra person will die! All the other players are following the same decision procedure as you, so it would be better if you all pressed B!”, they will respond, “Sure, it would be better if we all pressed B. If I could press a button that would force every player, including myself, to press B, I would. But I can’t do that. I have no causal influence on what other players do. I can only affect what happens to this one person, and I’m going to choose whatever minimizes their chance of dying.” So all of the causal decision theorists press button A, and they kill one extra person. They agree that this is a great tragedy, but it’s not their fault that their perfectly rational behavior lead to a predictably worse outcome. This must just be one of those scenarios designed to reward irrationality.
So, on causal decision theory, it seems like we actually will get a worse outcome if everyone uses the correct probability of 1/36 instead of the incorrect p>1/2. But this is no strike against 1/36 being the right probability. After all, it’s well-known that causal decision theory doesn’t always produce the best outcome. For example, in Newcomb’s Problem, causal decision theorists walk away $999,000 poorer than agents who follow other decision theories. This is the kind of thing that happens when you define rationality as conforming to particular decision rule, rather than doing what has the predictably best results.
On the other hand, suppose that the players are all evidential decision theorists, and furthermore, they all know this, and know that they all know it, etc. Then they will be swayed by the argument and decide to press button B.17 The EDT players say, “I can’t causally affect what other players do, but if I press Button B, it is very likely that they will do the same, and if I press Button A, it is likely they will do the same. Since it is better for everyone to press Button B than Button A, that’s what I will do.” The evidential decision theorists don’t care that the probability of that one individual person surviving is higher if they press button A because they realize that pressing Button B also affects everyone else’s chances of survival, and in particular, it guarantees that one fewer person will die. Agents following some version of logical decision theory will do the same thing. They decide that the result is best if their decision-making algorithm outputs “Button B” rather than “Button A”, and do that.
But maybe you still object. The EDT and LDT agents didn’t even seem to care about the probability - maybe they really are just acting as if it’s >1/2. Maybe it’s really ambiguous, and we should say 1/36 is the correct probability for CDT, but some number >1/2 is the correct probability for EDT and LDT. Or maybe we should just say that p>1/2 is correct in all cases, since this would make the CDT agents get better results. But we can’t just change the rules of probability to get the results we want. If we do, then our agents will make the wrong decisions in other circumstances. For example:
Suppose that the exact same scenario occurs, except that this time, you and only you have the chance to press the button. Your decision is not correlated with what other players will do because there are no other players who get the same choice. What should you do? Now you should press Button A - it will probably save the one and only innocent victim who is at stake. If you play the game many times18 you’ll save the most people by doing that.
Another example: Imagine a variant where, instead of the losing players being thrown into a pit of vipers and dying, a roll of snake eyes just means that the game ends, with no players dying and no new players being selected. In this variant, each player is given the opportunity to press Button A or Button B before the dice are rolled. Button A will give them $1000 if snake eyes is rolled, Button B will give them $1000 if Snake Eyes is not rolled. Assume that all of the participants are perfectly selfish, so they only care about their own earnings, and since they’ll survive the game no matter what, they value $1000 equally regardless of what’s rolled on the dice. If you’re one of the players, what button should you choose? If you incorrectly think the probability of snake eyes is >1/2 because you adjusted your credence after hearing the first dilemma, you’ll choose Button A and most likely lose out on $1000. If you think the probability is 1/36, you’ll choose Button B and probably get $1000. In the long run, if the game is played many times, any player who uses this strategy (Choose Button B) will maximize their earnings. And furthermore, if the dollar amounts are changed, we know exactly when to switch to Button A: If the reward it gives on snake eyes is >35 times better than the reward Button B gives on not-snake-eyes.
There’s a Manifold poll on a game similar to that last one. All decision theories recommend playing the game.
Variants Y and N
In the arguments about the original problem, some people pointed out ambiguities. One was that, originally, the market stated that, instead of considering what happens with an infinite pool of people, you could also consider the game with only a finite pool, where the game ends with no one dying if the pool runs out, and take the limit as the size of the pool goes to infinity. This one is called “Variant Y” because it is how some YES bettors (i.e., bettors who think the answer is 1/36) interpreted the problem, and because basically everyone agrees that the probability in this case is 1/36.
And indeed that’s the correct probability. There’s not even any coherent argument for something else. The probability P(death|chosen) is equal to 1/36 no matter what N (the size of the pool) is, so the limit as N goes to infinity is also 1/36. And unlike the infinitary case, there are no weird issues that come up because we’re just dealing with a bounded-length game and a bounded number of people for each N, so the expected proportion of people who die, conditional on you being chosen, also goes to 1/36.
The more interesting variant is “Variant N”, which is how some NO bettors interpreted the problem. In this variant, instead of considering the probability conditional on you being chosen and nothing else, you consider the probability conditional on you being chosen AND the game lasting only a finite amount of time.
In the original problem, the probability that the game lasts only a finite amount of time is 1. And any probability conditioned on an event with probability 1 is still the same probability. So the answer to this is still 1/36.
An objection: The problem statement says that the players are chosen uniformly at random. I’ve been ignoring that statement because it doesn’t make sense in classical probability theory, but it’s relevant here. If the players are chosen uniformly at random, then you are in some sense infinitely more likely to be chosen in a game that lasts forever than one that doesn’t. This is enough to outweigh the fact that the game lasting forever normally has 0 probability, so that P(Game lasts forever|You are chosen) is not 0. And so the probability actually will change if you condition on the game not lasting forever.
One response to that objection is that the problem statement also says:
What if the question turns out to be self-contradictory or the answer isn't a real number? Then we argue about how to fix it in the comments and do our best to agree on the best possible real number answer.
Asking about a game that is impossible to play is self-contradictory. Arguably, the best way to fix this is to remove the assumption that the GM selects players uniformly at random. If we’d like, we can replace it with the assumption that the GM chooses according to a distribution that’s as close to uniform as possible.19 We can take the limit as the probability mass gets more spread out to try to get the spirit of a uniform distribution. But this limit will be 1/36 because, as we already calculated, the conditional probability is 1/36 no matter what distribution we use.
There is, however, an alternative response…
Modified probability theory
Is there some other way to “fix” the problem that preserves the uniformness of the distribution in a more faithful way? Not in classical probability theory, but what if we modify the notion of probability itself? The problem here is that probability is real-valued and countably additive, so a uniform distribution over countably many players will either result in a total probability of any player being chosen of 0 (if the probability of each player being chosen is 0) or of infinity (if the probability of each player being chosen is positive), but normalization requires that it be 1.20 We could get around this either by inventing a new notion of probability that isn’t countably additive - then we can consistently assign a probability of zero to each person being chosen - or one that allows probability to take on infinitesimal values. But we need some notion of pseudoprobability that is formal enough to actually get an answer to the problem.
It turns out that this has actually been done in a paper that Daniel Reeves eventually linked in the description of the original market. The paper discusses the Shooting Room Problem, which is exactly the same as the Snake Eyes Paradox, except that the proportion of people who die is exactly 90%, rather than >1/2.21 And it turns out that the objection I stated in the Variant N section is valid, according to this paper. It states:
In this case [when using the nonstandard infinitesimal probability distribution], the frequency argument for why Tracy should assign a 0.9 probability for George’s demise upon learning that the game has ended stands up. In this setting, Tracy learning that the game has ended does make an impact on her subjective probability function, even though her prior probability for this eventuality was one.
In other words, conditional on snake eyes eventually being rolled, you really do update your probability towards about half of people dying. This is because learning that you were chosen for the game updates your probability that the game never ends to be nonzero.22 If we let C be the event “you are chosen” and I be “the game lasts infinitely many rounds,” then
For a classical probability distribution, P(C) is positive, but P(I) is zero, so the whole expression on the right is zero. But for the uniform pseudodistribution, P(C) is not a positive real number. If we think of it as being zero, then the Bayesian update leaves P(I|C) indeterminate, so let’s think of it as some infinitesimal value. We don’t actually have to say that the probability is infinitesimal, rather than zero - we can just treat the infinitesimals as a mathematical tool and set them to 0 at the end of the reasoning23 - so you don’t need to be too worried about this move if you dislike infinitesimals.
Likewise, we should probably consider P(I) to be some infinitesimal as well, not necessarily the same infinitesimal as P(C), but it is possible that I occurs, so we can give it a probability like (35/36)^ω and not treat this as exactly zero. Under this interpretation, it makes sense that the infinitesimals could somehow cancel out and lead to a non-infinitesimal, nonzero value for P(I|C), although this doesn’t give us enough information to calculate it. But it at least explains why Bayes’s Rule doesn’t force it to be zero under a nonclassical probability theory.
Now to actually calculate the values. Intuitively, we can proceed as follows:
P(death|I,C) = 0, since no one dies in an infinite game. P(death|C) = 1/36. This is the case because the argument for 1/36 is still valid even in the non-classical case - all it depends on is that the dice are fair, so they in fact have a probability of 1/36 of landing Snake Eyes, and that the roll of the dice on Round N is independent of you being chosen for Round N. Now, let N be the event that the game ends on the Nth round. Since the probability distribution for being chosen each time is uniform, we must say that, conditional on N and C, we have a 1/M probability of being the first player, a 1/M probability of being the second player, …, a 1/M probability of being the Mth player (where M=2^N-1). This is the big difference from classical probability theory. There are a few ways to think of this. One is to note that, since the distribution is uniform, you can’t assume that there is anything special about you compared to the other players. Everything is exactly symmetric, so you should be exactly as likely as any other player to be in the ith position. If that still sounds sketchy, another way to think of it is to treat your probability of being chosen as the first player as an infinitesimal ε (since this is just the uniform distribution that the GM is using to select one of infinitely many players). If the GM needs to select a second player, the probability that you are already chosen is ε, so the probability that you are chosen second is
But the latter probability is given by the uniform distribution rescaled to account for the one player that was already chosen, i.e., divided by 1 minus the probability of that player being chosen. In other words,
Then when the GM selects a third player, the probability that you were already chosen is 2ε, but conditional on not being chosen, you need to rescale by 1-2ε, and so on. As you can see, the probability of being chosen for the ith position, conditional on an ith player being needed, is always ε. And that probability is also independent of how many future players are needed after that, so, assuming that there are exactly M players, that you now know that you are in one of those M positions, and that you have no other information about your position, you must update your probability to 1/M of being in each position.
Based on this fact, we find that P(death|N,C) should be equal to the proportion of people who die in a game of N rounds, since the event death∩N∩C is equivalent to the disjoint union of all events in which you are one of the players who dies on Round N, which each have conditional probability 1/M. In other words, the main crux of the argument for ~1/2, which was wrong in the classical case, is actually correct here. This means that P(death|¬I,C) is at least 1/2, since ¬I∩C is the disjoint union of all of the N∩C, which each have a conditional death probability >1/2. Thus, it can no longer be the case that P(I|C)=0, since then it would be impossible that P(death|C)=1/36, as required.
But what is the exact value of P(death|¬I,C)? We can’t actually assign classical probabilities to any of the P(N|C), but we do know what they should be relative to each other. This is because, originally, P(N) = (35/36)^(N-1)*1/36, so the relative unconditional probabilities are
Conditioning on C would require multiplying each P(N) by 2^N-1 to represent the fact that you are 2^N-1 times more likely to be chosen in an N-round game than in a game with just 1 player, and then dividing by the sum of the P(N)*(2^N-1) values. So
However, the sum of these P(N)*(2^N-1) values does not converge, so there is no way to normalize this as a classical distribution. But we can still convince ourselves that this basically makes sense using Bayes’s rule and the infinitesimal treatment:
When combining these facts, we find that that P(N|C) increases as N increases, since 2^N-1 grows faster than P(N). In fact, it does so exponentially, as is obvious from the previous equation. This is why it’s impossible to normalize the distribution and assign classical probabilities to P(N|C). If we want to find P(death|¬I,C), we have to use the formula
It might seem that it’s impossible to calculate this sum, since it requires using these non-normalizable probabilities. But since we are summing over every possibility in the probability space within ¬I∩C, the sum is a weighted average of P(death|N,C), with the non-normalizable probabilities as weights. Since the weights are larger for larger N, we should put more weight on values of P(death|N,C) where N is larger. In fact, we should put zero weight on the P(death|N,C) for all N less than any finite value L, since we know that we have a factor of infinity more weight on all the larger N values than we do on the smaller ones. But if our weighted average has all the weight contained in the tail of P(death|N,C), then we are forced to set it equal to the limit of P(death|N,C) as N approaches infinity, which is just 1/2. Any larger value A>1/2 would mean putting a positive, non-infinitesimal weight on the finite set of N values whose P(death|N,C) is greater than or equal to A, and any value smaller than 1/2 is automatically out since none of the values that we are taking a weighted average of are less than 1/2. So P(death|¬I,C) = 1/2.24
This gives us the answer to our Variant N in the non-classical probability theory version: The answer is 1/2. But the explanation is not that the P(death|chosen) deviates from 1/36 in this version, but instead that P(I|C) deviates from 0. In fact, we can now calculate the exact value:
It turns out it is possible to formalize what I’ve done here, according to the paper I linked previously. They find the exact same solution that I did (except with the numbers changed to those for the Shooting Room Paradox): The regular Snake Eyes Problem still has the answer 1/36, but the conditional probability of an infinite game is 17/18, so Variant N has a solution of 1/2.
If you want more intuition for this, an analogy given in the paper is as follows: Consider the finite variant where the game will end after some finite number of Rounds N no matter what. If Snake Eyes isn’t rolled on the Nth round, or any earlier round, everyone survives. For this game, we need a pool of participants large enough that there are enough people for an N-round game, but the pool can still be finite, so we can assume a uniform distribution over all participants being chosen. If N is large enough, the probability that the game ends with no one dying is extremely low. But, assuming the uniform distribution, it’s also extremely unlikely that you will be chosen for the game at all. If you are chosen, then you have to scale up the probabilities of each possible game length by a factor of the number of players in a game of that length (this is a result of the uniform distribution). This scales up a game of length N the most, so much that it is actually very probable (and so is the game ending with no one dying, since that’s just P(N rounds)*35/36). See, although it was unlikely that the game would end with no one dying, it was even more unlikely that you would be chosen in the first place in a game that ends with people dying. And so, the fact that you are chosen makes a “safe” game more likely. In classical probability theory, this doesn’t work for the infinite case because P(safe game)=0, but whatever distribution you choose, P(chosen)>0, unless it’s impossible to choose you in the first place. However, with non-standard probability theory, we can transfer this argument over to the infinite case: Sure, the probability of an infinite game is extremely small, smaller than any positive real number in fact. But so is your probability of being chosen in a finite game. Your being chosen makes long games more likely, in proportion to the number of players, so it really does make P(infinite game) infinitely more likely, i.e., it makes it nonzero and non-infinitesimal. And it turns out that the infinite game really is the limit of these finite games. P(death|chosen) is always 1/36, so the limit works there. The probability of death given that you’re chosen and the game doesn’t go to N rounds approaches 1/2, as does the probability of death given an “unsafe” game (i.e., a game where snake eyes is ever rolled). The probability that the game ends with no one dying given that you’re chosen approaches 17/18.
Another thing to note about this modified probability theory is that it actually does change the decision theoretic implications. Now in the “Innocent Victim” scenario I discussed above, pressing Button A is optimal according to EDT. After all, there’s a 17/18 chance that the game is infinite, in which case no one dies if all the players press Button A, and infinitely many people die if everyone presses Button B. This far outweighs the 1/18 chance that the game is finite so that everyone pressing Button A will kill one extra person. I think pressing Button A might still be suboptimal according to some versions of UDT, though it depends on how exactly it handles zero probability events with infinite utility.25 Of course, if you switch to Variant N, all decision theories (even causal decision theory) recommend pressing Button B in the non-classical case, although they don’t change their recommendations in the classical case.
Just one small problem…
There is one thing I still don’t understand about this. What’s the probability that the game goes on forever, given that you’re chosen on the Nth round? It should be zero, right? Once you know that you were chosen on Round N, there’s no reason to believe that the game is any more likely to go to rounds greater than N than it would be if you didn’t know you were chosen and just knew that the game had at least N rounds. Basically, if AN = “The game had at least N rounds, and you weren’t chosen for any round later than N,” then AN screens off any evidence that C gave in favor of the game going to more than N rounds, or more than L rounds for any L>N. This is because, given AN, C is equally likely for any possible number of rounds ≥N.
But if you’re chosen for the game at all, then you have to be chosen on some round N. Letting C_N be the event that you are chosen for the Nth round, then
But then we should have that
But if P(I|C_N)=0 for any N, then the sum on the right equals zero! And yet we just concluded that P(I|C)=17/18, not zero. So we either have to conclude that P(I|C_N)≠0, which makes no sense; or reject all the reasoning we just did to conclude that P(I|C)=17/18, which will mean we have to say something nonsensical and paradoxical about Snake Eyes Variant N (and we have to reject the more formal reasoning of the Bartha and Hitchcock paper); or we have to reject the conditional probability formula I just used. Maybe the latter makes the most sense. After all, we’re already rejecting classical probability theory - why should we assume that formulas that are valid there are valid here? But rejecting that particular formula makes me really uncomfortable. Because it’s hard to see how a probability theory that doesn’t follow it can be considered a coherent probability theory at all. What we’re saying, if we reject this, is that the probability of I given C is 17/18, even though the probability of I given any of the possible events that make up C is 0. So, upon learning that you were chosen for the game, you would bet at 17 to 1 odds that the game was infinite, even though you know that no matter what round you were chosen for, there is an exactly 0% chance that the game was infinite given that you were chosen for that round. This sounds like completely insane behavior.
We can make this even worse: You know that if you learn which round you were chosen on, you’ll be willing to bet $1,000,000 that the game was finite, just to gain 1 cent. After all, were you to learn the round, you would have a 100% credence that the game is finite. But you currently don’t know which round you were chosen for, so you have a 17/18 credence that the game was infinite, and therefore, that this bet would cause you to lose $1,000,000. So you’ll be willing to pay $1,000 just to prevent yourself from learning which round you’re playing on, since learning this information would cause you to make a bet that, from your current perspective, would lose you almost a million dollars in expectation.
If anyone has any ideas for resolving this conundrum, please discuss them in the comments to the market below. It’s about the same question, though framed instead in terms of, “What’s the probability of dying?” 1/36 corresponds to the argument I gave that P(I|C_N)=0, and ≥1/2 to the argument that P(I|C_N) can’t be zero.
Substack won't let me embed this market for some reason but here it is: "What is the probability of dying in Snake Eyes Variant NNN?"
Ball and urn variant
Some people thought that the Snake Eyes Paradox involved anthropics in some way. As I mentioned before, this is incorrect: The correct answer can be calculated without invoking any anthropic principles, so it doesn’t matter whether you believe in SSA, SIA, or something else. However, Daniel Reeves created a variant of it in an attempt to factor out any possible anthropics.
In this variant, we have an urn filled with M balls (M can be finite, or Aleph null). We take a ball out and roll the dice. If we roll snake eyes, we deflate it and end the game. Otherwise, we set it aside (outside the urn) and pick two more balls. We repeat the process, picking out double the number of balls that we did on the previous round every time we don’t roll snake eyes, and deflating all the balls that we just picked (but not the ones we picked on previous rounds) and ending the game if we roll snake eyes. If M is finite, then we also end the game without deflating any balls if we run out. If M is infinite, this is only possible if the game lasts infinitely many rounds.
After the game is done,26 we pick a random ball out of all of those that we took out of the urn (uniformly). What is the probability that the ball we pick is deflated? In the finite case, we take the limit of this as M goes to infinity.
Both of these versions are non-paradoxical and have the same answer: It’s that ~52% number I calculated earlier. In the infinite case, the probability of the game going to N rounds is exactly (35/36)^(N-1)*1/36, and we’re not conditioning on anything. The probability of choosing a deflated ball after the game is completed, conditional on there being N rounds, is just the proportion of balls that are deflated in an N round game. If we let D be the event of choosing a deflated ball, then
which is exactly the sum I showed earlier that converges to about 0.52. Note that the possibility of an infinite game doesn’t affect the probability because it has probability zero. If we want to be complete, we should include it as a term in the sum and write
but this is equal to the first sum since both the probability of an infinite game and the probability of choosing a deflated ball, given an infinite game, are zero.27
For the finite game, everything is almost the same except that we can only sum over N values up to log₂(M)+1, since otherwise we run out of balls.
Obviously, the first term is zero. Aside from the case where N=⌊log₂(M)+1⌋, a game of N rounds is guaranteed to be unsafe, so we can remove that condition from all other terms in the sum. P(D|N,unsafe) is also the same except in the case where the game goes to the maximum number of rounds and the balls are deflated, but we don’t have enough balls to fill the entire round. In that case, the proportion of deflated balls will just be the number of remaining balls that are chosen in the final round, divided by M. The exact value is
The last term goes to 0 as M goes to infinity, and the sum becomes the same infinite sum that we already calculated to be ~52%.
So in other words, in both the limit of the finite variant, and in the infinite variant, the probability of choosing a deflated ball is about 52%. Note that we didn’t have to use any non-standard probability theory here,28 nor could we change the answer if we did. Also note that Variant N has the same answer, since the probability of an infinite game is always zero.29
One question remains: Why is this variant different than the original problem? I actually already explained the answer in the earlier sections. In fact, I implicitly referred to this variant earlier when I talked about the probability of picking a person who died if you were to randomly select a participant in the snake eyes game after the fact. The reason this one has a different answer is that it’s asking a different question about the same scenario. It was an attempt to eliminate anthropics, which it technically did, but only vacuously, since there were no anthropics in the original question to begin with. Instead, it just ended up shifting to one of the subtly different questions.
Anthropic variant
This is the final variation, where the answer actually does depend on what you believe about anthropics. Here is the exact statement from the description of Daniel Reeves’s market:
Initially the universe is empty. God creates 1 person and rolls fair dice. If they come up snake eyes, God kills the person and the game ends. If they don't come up snake eyes, the person lives and God creates 2 new people and rolls the dice again. Once again, those people die on snake eyes and the game ends. This repeats as long as non-snake-eyes are rolled, with the group size doubling each time. As soon as snake eyes is rolled the latest group dies, and the game ends.
The question: Suppose God has created all the requisite people, rolled all the dice, and is about to kill the final group that eventually got snake eyes. You're one of those people, with no idea which group you were in or how many other people were created. What's your subjective probability that you'll be killed?
Like the regular Snake Eyes Paradox, this one has both a “normal” variant and Variant N.30 However, reading the description, it seems like the market is actually about Variant N, since the question specifies that God is about to kill the final group of people that eventually got snake eyes. I will discuss both the regular variant and Variant N and discuss the answers that different anthropic probability theories give to both.
Self Sampling Assumption
Under the Self-Sampling Assumption, this is easy to answer. The problem becomes equivalent to the ball and urn variant. This is because, in SSA, we don’t update any probabilities based on our existence alone, except to eliminate possible worlds in which no one in our reference class exists. We imagine that the actual world is selected from whatever unconditional probability distribution we would put on different worlds, and that we then become a randomly selected person from the actual world (within our reference class). Assuming that all the people God creates in the snake eyes game are in our reference class, this is isomorphic to the ball and urn problem.
More explicitly: According to SSA, your existence does not change the probability that God created any given number of people, since he was guaranteed to create a nonzero number of people, so in any world, you would have been randomly selected to be one of those people.31 So just as in the ball and urn variant, the probability of the game going to any particular number of rounds is the same as the unconditional/non-anthropic probability. The probability that you die, conditional on the game going to N rounds, is equal to the proportion of people in an N-round game who die, since SSA says you should reason as though you are a randomly and uniformly chosen observer from the set of all existing observers in your reference class. So, we perform exactly the same sum as we did for the infinite ball and urn case to calculate the answer.
As with the ball and urn variant, we get the same probability for the regular version and Variant N, since SSA says that you should still put probability zero on the game going to infinitely many rounds.
So the SSA P(death) in anthropic snake eyes is ~52%. Why does an SSA reasoner believe their P(death) is higher in anthropic snake eyes than regular snake eyes? It’s essentially just the doomsday argument: If Snake Eyes isn’t going to be rolled on this round, then you would expect to have been created for a later round instead. If snake eyes is rolled sooner, then you are less unusually early, since there are fewer observers created in later rounds. So just as SSA leads a person in the real world to update their probability towards doomsday happening sooner, it leads someone in the anthropic snake eyes world to believe that snake eyes is coming sooner than the unconditional probability of a dice roll would indicate.
Self Indication Assumption
Under the Self-Indication Assumption, the answer is more difficult. SIA says we should multiply the prior probabilities of all possibilities by the number of observers and then normalize. But this is not possible to do in standard probability theory, since multiplying the probability of every possible game (i.e., every number of rounds) by the number of people in each game yields a function that increases as the number of rounds N increases. You can’t normalize it by dividing by the sum of all these relative probabilities, since the sum diverges.
If you already read the modified probability theory section, you’ll know that this sounds like what happened when we tried to calculate the probability of the game going to N rounds in that version. The similarity is more than just superficial: The Bayes factors introduced by your selection in the non-standard version of regular Snake Eyes are exactly the same as the SIA update factors introduced by your existence in Anthropic Snake Eyes. This suggests that we should do the same thing here that we did there and introduce a non-standard or non-finitely-additive probability distribution over the number of rounds. Indeed, we will have to do something like that because trying to apply SIA while retaining classical probability theory yields a contradiction.
So the exact same reasoning used in that section also applies to SIA: According to SIA, there is a 17/18 probability that God created infinitely many people, and that you will therefore survive for sure. Conditional on God creating only finitely many people (Variant N), you have a 1/2 chance of dying. Thus, the overall probability of dying is 1/36. This is a nice result, as it means the anthropic version gives the same result as the non-anthropic version. This is a common feature of these types of problems - consider, for example, how SIA exactly cancels out the doomsday argument.
Should it worry you that SIA requires non-standard probabilities in this problem? Here is an argument that this is perfectly natural for SIA. The entire idea of SIA is that you are more likely to exist if there are more people, specifically M times more likely if there are M times as many people.32 But this doesn’t make sense in classical probability theory: No matter what nonzero prior probability you place on your existence, there’s no way you could multiply it by M for every possible value of M and still get a probability as a result. Eventually, you would get a value greater than 1. So SIA is already treating your existence as something with infinitesimal probability, or as something with zero probability that can nevertheless be conditioned on. This is only possible in the non-standard probability theory. Thus, SIA has been assuming a modified probability theory all along.
It’s also worth mentioning that if you play the finite game, where God stops creating people after N rounds if he doesn’t roll snake eyes (and doesn’t kill anyone), then SIA gives a P(death) of 1/36 without needing to resort to non-standard probabilities. In the limit as N→∞, P(death) is still 1/36, and P(death|S), where S means “God rolls snake eyes on some round,” approaches 1/2. So the infinite game is still the limit of the finite game. Again, this is just like regular snake eyes with uniform probabilities, since SIA is isomorphic to it.
Full Non-indexical Conditioning
(Much less certain about what I say in this section than the others. I could be using FNC wrong.)
Full Non-indexical Conditioning is another version of the anthropic principle proposed by Radford Neal. This one is probably less familiar to readers, so I’ll explain how it works before going on.
FNC is similar to SIA in that I can condition on facts about myself to update my probabilities. For example, since I know that there is a male human with brown hair sitting in a blue room and wearing a blue shirt while writing a Substack post about the Snake Eyes Paradox, I should take this information into account and favor theories that make it more likely. Generally, any specific statement about myself is more likely to be true of some person if the total number of observers in the universe is larger, so like SIA, FNC tells me to increase my estimate of the total population of the cosmos.
But unlike SIA, FNC doesn’t allow me to condition on the fact that these things are true of me or that they are happening right now. That’s where the “non-indexical” part comes in. So, for example, if God were to flip a coin and create an exact duplicate of Earth iff it landed tails, I shouldn’t believe afterwards that it probably landed tails because all the information I know afterwards, including information about myself, would be equally likely to be true of some observer in either case. This is different from the answers that both SSA and SIA give. SIA says that I should assign a 2/3 probability to tails afterward, since there are twice as many observers like me in the tails world. SSA’s answer depends on how many observers I believe there are on worlds that weren’t duplicated - if I think only Earth has intelligent life, then I don’t change my estimate of the probability, but if many other observers within my reference class exist outside of Earth, SSA recommends a higher probability on tails, since I would be more likely to find myself as an Earthling, rather than some other being, in the tails world.
The example above is obviously a variant of the Sleeping Beauty Problem, which can also be used to illustrate the difference between different anthropic principles. SSA says the probability of heads is 1/2, SIA says it is 1/3, and the Strong Self-Sampling Assumption behaves similarly in regular Sleeping Beauty to the way SSA behaves in the duplicate-world problem above. According to FNC, when Sleeping Beauty first wakes up, knowing nothing more than what she did when she went to sleep (aside from the indexical knowledge that she is awake right now), she must still assign a probability of 1/2 to the coin landing heads. But, as she learns more about her current situation, she updates her probability towards 1/3, unless the new information is something she would expect to be the same on both days she’s awakened.33 For example, if Beauty awakes in a blue room, she won’t update her probability on that basis if she expects to wake up in the same room on both days, since the non-indexical fact “Beauty wakes up in a blue room” is equally likely to occur at some point regardless of how the coin landed. But, if she looks out the window and sees that it’s a sunny day, she’ll update slightly towards the coin landing tails: This is more likely to occur in the tails world, since she’ll see a sunny day outside in the tails world as long as it’s sunny on at least one of the days she awakes, whereas, in the heads world, it has to be sunny on the single day that she’s awoken. As Beauty observes enough small details that could have been different on the two days, her credence that the coin landed heads will approach 1/3. In a sense, then, the answers FNC gives are often in between those of SSA and SIA. In questions like the Sleeping Beauty Problem, FNC acts like SSA when you have very little information about yourself but becomes more like SIA as you learn more information that could distinguish you from other potential observers.
So what does FNC say about anthropic snake eyes? If you know nothing about yourself, except that you exist, then FNC recommends that you don’t update anything about your probabilities. So like the SSA observer, you believe that ~52% of all the people God creates will die, in expectation. But if you do learn information about yourself, you start to update your probability towards a larger number of people existing. For example, suppose you know that every person God creates has exactly a 50% chance of being male, and you find yourself as a male. Neither SIA nor SSA would tell you to update based on this information, but FNC says that you should now consider possibilities that make the existence of a male more likely to be more probable. Specifically, you would multiply the probabilities that the game goes to N rounds by 1-(1/2)^[2^N-1] and renormalize. You should continue to perform this update for every bit of information you learn that distinguishes you from the other people God created (but not on every single bit of information that you learn about yourself - for example, if all the people God created are humans, then updating on the fact that you are a human doesn’t change the probability).
If you could receive infinity bits of information that distinguish you from God’s other creations - for example, if there were infinitely many possible minds, God told you that he selected the mind of each of his creations uniformly at random (via a pseudoprobability distribution), and you knew exactly which possible mind you are - then your probabilities should look like those of the SIA observer who thinks that there is a 17/18 chance that God never rolled snake eyes at all and only a 1/36 chance that a randomly selected observer will die. But in practice, you will never receive this much information, so you will always believe that there is a 0% chance that the game went on forever and will merely update your probability in favor of longer and longer finite games. In the limit as you receive more information, you will come to believe that 1/2 of people will die, in expectation, rather than 1/36. In other words, there is a discontinuity at infinity for what proportion of people you expect to die as a function of how many bits of distinguishing information you have about yourself.
You’ll notice that so far I’ve only discussed what the expected proportion of people who die is, rather than what the probability that you die is. Unlike SSA and SIA, FNC rejects the idea that you can treat yourself as a randomly selected observer in any sense, whether from all actual observers in your reference class or from all possible observers. So it doesn’t necessarily give a P(death) equal to the expected proportion of people who will die. So what does it say?
Well, I’m not completely sure it actually gives an answer. The proposition we’re trying to find the probability of (“You’re going to die”) is indexical, and FNC is pretty intent on not liking propositions like that. We can’t condition on them, so can we talk about the probability of them at all? Well, maybe. If you’re certain that there are no observers that are subjectively indistinguishable from you, then you can still talk about “you” within FNC - you just treat “you” as a definite description containing every aspect of your subjective experience. But if there are multiple versions of “you”, some who will die and some who will not, we can’t talk about “you” this way.
So, how likely is it that there are duplicates of you, according to FNC? Let’s say you have Ω bits of information that distinguish you from other people. Then the probability of any given mind being a duplicate of you is 1/2^Ω.34 Let’s call this value m. The probability that any version of “you” exists, given that the game goes to N rounds, is 1-(1-m)^(2^N-1), and the probability that at least two versions of you exist is
So the overall probability that a duplicate of you exists is
where p=1/36 is the unconditional probability of death each round. I’m not sure what the exact limiting behavior of this is as m becomes small,35 but I’m pretty sure it will stay high. This is because, as Ω→∞, the expected proportion of people who die goes to 1/236, and the expected proportion of your duplicates who die also goes to 1/2.37 But if there were no duplicates, then the probability of death conditional on being chosen (and thus, the expected proportion of you-duplicates who would die) would be 1/36, so there have to be duplicates often enough to completely change this calculation. I also approximated the sum directly38 and found that the calculated estimate was always very large, though I can’t be sure that I wouldn’t get a smaller estimate by evaluating more terms.
So we can’t get rid of the problem of duplicates, even by taking a limit as Ω→∞. The result from FNC might just be null, then, unless there actually is a way to reason directly about indexical statements in FNC. One idea is that, if there is a correct answer under FNC, it is 1/36, based on the intuition that it should be isomorphic to the regular snake eyes paradox with classical probabilities, in the same way that SIA is isomorphic to the regular version with non-standard uniform probabilities. Basically, just as SIA gives an infinitesimal or non-countably-additive zero probability to any given person being you, and thus causes you to update in favor of many people existing in the same way that being chosen for non-standard snake eyes causes you to update towards many people being chosen, FNC gives any given person a small but finite probability of being you, similar to the standard probability theory version. This would be correct if it weren’t for duplicates - if God created people without replacement, then it would be exactly isomorphic to the original game, and “you” would die 1/36 of the time you were created.39 With duplicates, it’s a bit different: About 1/2 of your duplicates are expected to die,40 so if the probability is defined at all, it seems like it should be about 1/2. But that still feels like we’re talking about an observer selection effect, which FNC supposedly doesn’t like! I think the best solution is probably to acknowledge that we have to use an observer selection effect here, and if FNC doesn’t like that, it’s just too bad for FNC. So my tentative answer is 1/2.41
Conclusion
The results for the probability of each of these variants are:
In the original Snake Eyes Paradox, P(death|chosen) = 1/36. This is true regardless of whether you use classical or non-standard probability theory, or whether you use a truly infinite population or a limit, and it has nothing to do with anthropics.
Similarly, in Variant Y, the probability is unambiguously 1/36.
In Variant N with classical probability theory, we have to reject the assumption of uniformity, and the probability is still 1/36.
In Variant N with non-standard probability theory, the probability is 1/2. This is because, in this version, the probability of a finite game, conditional on being chosen, is only 1/18.
However, I’m not sure what happens when you condition on a particular round (Variant NNN).
In the ball and urn variant, the answer is unambiguously about 52%. It doesn’t matter if you use the truly infinite version or a limit, whether you use classical or non-standard probability theory, or whether you use Variant N.
In Anthropic Snake Eyes:
The SSA probability is about 52%. This is true regardless of what probability theory you use, whether you use Variant N, or whether you use the truly infinite version or a limit.
The SIA probability with classical probability theory is undefined, including Variant N. However, if you take the limit of finite versions, you get 1/36 (and 1/2 for Variant N).
The SIA probability with non-standard probability theory is 1/36. Furthermore, non-standard probability theory is natural to use for SIA.
The SIA probability for Variant N in non-standard probability theory is 1/2.
Inconclusive verdict for FNC. I’m not sure if it even gives an answer, and if it does, it may depend on the exact way God creates people. However, I think that if there is an answer, it’s probably ~1/2. In any case, it shouldn’t change if we consider Variant N, unless you have infinite information about yourself.
So how should the existing markets resolve? All the ones that have already resolved were resolved correctly. Variant Y should resolve to 1/36 (about 3%). The resolution to Variant N depends on whether the council is comfortable using non-standard probability theory. If they are, it should resolve to 50%, and otherwise, 1/36. Though given the description, using non-standard probability theory seems like the most faithful interpretation. The resolution to the Anthropic Snake Eyes market depends on whether it’s interpreted as Variant N or not. If it is, then it should probably resolve NO because none of the anthropic reasoning rules gave 1/36 for Variant N (unless someone can come come up with a better rule, or an argument that the correct rule actually would give 1/36). This seems like the most reasonable interpretation of the description to me, but if it ends up being agreed upon that it should be the “regular” variant, then I think it should resolve YES (since I favor SIA). And of course, I don’t know what my market should resolve to.
Further Reading
Related to the probability issues themselves:
Wikipedia article on the Law of Large Numbers
Details on Linearity of the Expected Value
Martin Randall’s LessWrong Post (discusses the anthropic variant, though it incorrectly equates it to the non-anthropic problem)
JSTOR article on a similar problem (linked in text)
Wikipedia and Meteuphoric articles related to SSA and SIA
FNC paper (linked in text)
A website all about anthropic reasoning
Decision theory stuff:
Causal decision theory and Evidential decision theory on Wikipedia (also linked in body)
Logical decision theories (also linked in body)
Decision Theory and Causal Decision Theory on the Stanford Encyclopedia of Philosophy
An explanation of decision theories, Comparison of decision theories (with a focus on logical-counterfactual decision theories), Evidential Decision Theory, and Updateless Decision Theory on LessWrong
Anthropic Decision Theory: Paper and LessWrong sequence
Nor is there some third factor affecting both.
If you’re not familiar with why, the LTE says that E(X) = E(E(X|Y)) for any random variables X and Y with E(X) defined. In this case, we can define X to be the variable equal to 1 if you die and 0 if you don’t, so E(X) = P(death). Then, defining Y as the number of rounds N, we get that P(death) = E(P(death|N)). This specific application of the LTE is sometimes called Conservation of Expected Evidence, though it might make more sense to call it Conservation of Expected Probability.
In classical probability theory, you have to have some distribution for this, which can’t be uniform. Later, we’ll look at what happens if you modify probability theory as well.
I’m glossing over some mathematical details of how exactly to take a limit like this. It doesn’t really matter because it doesn’t affect the argument.
In fact, that’s what the market I linked says it should be.
Don’t worry. If using classical probability theory and jettisoning the assumption of uniformity is a deal-breaker for you, you can just skip ahead to the modified probability theory discussion. It turns out the answer is the same for both.
and rescaled to be normalized
The “draft number” is the order in which the participants are chosen, e.g., if you were the third participant to play, your draft number is 3. Round m has 2^(m-1) different draft numbers.
Not true in real life of course, but in this problem, when we say “dies”, we really just mean “loses the game and gets thrown into a pit of vipers”.
Classical probability theory ignores the possibility of an infinite loss, since it has zero probability, though it could be argued that this is philosophically unsatisfying:
Neglecting trading fees and the shift that your bet causes to the market probability, of course.
It’s actually infinite, which is a consequence of the fact that the expected value of the number of players is infinite.
Recall that for the simulations, I replaced the snake eyes roll with a 1/3 probability.
The total proportion is the total number of people who died in all simulations, divided by the total number of people chosen in all simulations (with people chosen in multiple situation counted each time they are chosen, not just once). This is not the same as the average proportion, which is the average of the proportion of people who die in each simulation, taken over all simulations. The difference between the two is that the total proportion puts more weight on simulations with more people in them. In those simulations, a slightly smaller proportion of people die.
Technically, it’s a partial function, since it’s only defined for k values that have already been chosen at least once.
Because some players are more likely to be chosen than others (required because there is no uniform distribution). This means that players who are very unlikely to be chosen on the first round are much more likely to be chosen on later rounds, once the more likely candidates have already been chosen. So players with approximately the same probability often end up being chosen on the same round.
There is a caveat here, which is that technically, the expected utility of both buttons is negative infinity. But the expected difference in utility between Button B and Button A is one life’s worth, and a reasonable EDT agent should be able to recognize this.
This requires modifying the game so that you don’t get killed on snake eyes, or resurrecting you between games, or something like that.
i.e., it distributes its probability mass across as many numbers as possible, such that every number has nonzero probability and no number has probability greater than some small ε. We can take the limit as ε→0 if the probability doesn’t depend on anything other than ε (which it doesn’t), or if those dependencies get smaller and smaller as ε gets smaller.
The Archimedian property of the reals: screwing with our probabilistic intuitions ever since dartboards were invented.
The numbers of people chosen on each round are 1, 9, 90, 900, etc.
And since the probability of an infinite game was nonzero, you do get new information by learning that the game was finite, allowing you to update your credences.
Similar to standard calculus, where we cancel out differential expressions like dx/dx even though dx goes to 0 in the end.
You might wonder why I say it has to be 1/2, and not something like “1/2 + infinitesimal”. You can probably come up with a definition of pseudoprobability that does the latter, but I think it’s more natural not to use infinitesimals unless we have to. If we think of our pseudoprobabilities as really being finitely-but-not-countably additive, then all of the “infinitesimal” probabilities are really equal to 0, and the infinitesimals are just a mathematical tool that we need when trying to condition on them. In the end, we take all those infinitesimals back to zero, just like we do in standard calculus. This seems like the most natural way to generalize standard probability to me, since it will give the same answer as standard probability when it is possible to calculate a standard probability, e.g., according to this definition, the unconditional probability of the snake eyes game ending is 1, as it is in standard probability theory, rather than “1 - infinitesimal”.
and I’m not sure I’m correct about this anyway
In the infinite version, we can treat it as a supertask so that it’s possible to complete it in finite time even if it lasts forever.
It doesn’t matter how you choose the ball in this case, since none of them are deflated, so we don’t have to worry about any non-standard probability issues.
Technically, we have to use it to choose which balls to take out of the urn in the infinite case if we want to keep the assumption of uniformity. But since it doesn’t matter which balls are removed, this doesn’t affect the result.
because we’re not conditioning on any particular ball being chosen
Note that it is possible to do a “regular” version, rather than Variant N - you might be worried that in an infinite game, there will never come a time when all aleph null people have been created, but God can surely perform supertasks, so this is not an issue.
If this sounds sketchy to you, I agree. It’s one reason why I don’t like SSA.
In fact, SIA is just a regular Bayesian update of your pre-anthropic prior on this interpretation.
Yes, this means that she violates conservation of expected evidence. This is one reason why I don’t find FNC plausible, either. At best, it’s a step towards a more complete theory.
This is how I’ve seen FNC used before, but I have some doubts about this. The problem is that “the number of distinguishing bits of information you know about yourself” is itself a distinguishing piece of information about you. So I’m not sure Ω is really well-defined. I have an intuition that resolving this issue might fix the problem of duplicates: God can just create people who know more and more distinguishing bits of information so that he never really creates a duplicate, and if you are allowed to take into account that other minds which are indistinguishable based on all the characteristics you know still know more than you, you don’t have to treat them as your duplicate. This would then be isomorphic to the original version and give a probability of 1/36, so long as God doesn’t create any actual duplicates.
i.e., as Ω becomes large, equivalent to you having lots of information about yourself
since the game is expected to last longer, conditional on you being chosen.
I did some simulations to confirm that this is true. Even for small values of Ω, it gets to 1/2 pretty quickly.
using larger values of p that should converge faster
God has to create duplicates if the game goes too long because a “duplicate” just means a person for which all information you know about yourself is true of them, and you only have a finite amount of information about yourself. As I said in a previous footnote, I think the problem could be fixed if we didn’t have to consider these people duplicates. If we had infinitely many distinct possible people, then God could just use a probability distribution over them as he creates them, and we would recover the classical answer of 1/36.
Technically, it’s slightly greater than 1/2 but approaches 1/2 as the number of bits you know about yourself approaches infinity.
Or rather, slightly greater than 1/2, but approaching it as you learn more information.
Holy cow, this is quite the tour de force! Really beautifully written too. (Also I'm feeling pretty vindicated by it. Some of those markets were hella contentious.)
Very nice!
Re: the non-standard case ('variant NNN'), I don't see why P(I given C_N) = 0 when both P(C_N given I) and P(I) are non-zero infinitesimals. But I read this section quickly and haven't followed Bartha and Hitchcock's construction, so I think I'm missing something.
Re: the countably additive case, I might mention that a 1989 letter from David Lewis to John Leslie (reprinted in Lewis's Philosophical Letters, vol. 2, pp. 481-2) gives a nice explanation of the trouble with the 1/2 (or 9/10: Lewis discusses the Shooting Room version) answer, where he makes the same connection with the two-envelope paradox.