Emin's Page

The Two Envelope Problem


Michael Rosenblum told me about a puzzle he called "The Two Envelope Problem" in the spring of 2001. The puzzle is as follows. I choose a number X in some arbitrary manner. Then I put X dollars in one envelope and 2X dollars in the other envelope. I randomly choose one of these envelopes and hand it to you. You can look at the money in the envelope and keep it or ask me for the other one. Your goal is to maximize the expected value of your wealth.

Some key points to keep in mind about the puzzle:

  • X can be any positive real number. So you can find 1.243454529... dollars in an envelope.
  • X is chosen in a non-random manner. There is no probability distribution associated with X.
  • You receive the envelope with X dollars with probability 1/2 and the envelope with 2X dollars with probability 1/2.
  • Clearly if you always keep the envelope you are given you will get the envelope with X dollars half the time and the envelope with 2X dollars the other half of the time. Therefore your expected wealth will be (3/2) X.

My solution to the puzzle is presented below. Don't read further unless you want to know the answer.
Before explaining my solution, let me first discuss a tempting, but incorrect argument. Call the amount you see in the envelope you are given Y. You know that Y = A X where A is equally likely to be 1 or 2. Thus you might think that if you decide to always trade your envelope for the other your expected wealth is
E[Wealth|Y] = E[Wealth|Y,A = 1] Pr[A=1] + 
              E[Wealth|Y,A = 1] Pr[A=2].
            = 2Y (1/2) + (Y/2) (1/2) = (5/4) Y
This would be incorrect. In fact if you always trade your envelope for the other your expected wealth is
E[Wealth|Y] = E[Wealth|Y,A = 1] Pr[A=1,Y] + 
              E[Wealth|Y,A = 1] Pr[A=2,Y].
            = E[Wealth|Y=X,A=1] Pr[A=1,Y=X] + 
              E[Wealth|Y != X, A=1] Pr[A=1, Y != X] +
              E[Wealth|Y=2X,A=2] Pr[A=2,Y=2X] + 
              E[Wealth|Y != 2X, A=2] Pr[A=2, Y != 2X]
            = 2X * 1/2 + 0 + X * 1/2 + 0 = (3/2) * X.
where
!=
is read as "not equal to". So as you would intuitively expect always keeping the envelope you are given or always trading yields an expected wealth of (3/2) X. How then can we do better? The solution is to use a randomized algorithm. Assume that you get the first envelope, open it and find Y dollars inside. Then with probability f(Y) you trade for the other envelope. In this case your expected wealth is
E[Wealth] = E[Wealth|A=1] Pr[A=1] + E[Wealth|A=2] Pr[A=2]
          = {2*x*f(x) + x*[1-f(x)]}*(1/2) + 
            {x*f(2x) + 2*x*[1-f(2x)]}*(1/2)
          = x*f(x) + x/2 - x*f(x)/2 + 
            x*f(2x)/2 + x - x*f(2x)
          = (3/2)*x + (x/2)*[f(x) - f(2x)].
Note that the always switch case corresponds to f(Y) = 1 and never switch case corresponds to f(Y) = 0 both of which yield E[Wealth] = (3/2)x as expected. However as long as we make the value [f(x) - f(2x)] > 0 then we do better than always switching or never switching for every value of x.

One possible choice for f(.) is f(y) = exp(-y), but others are possible.

While the randomized algorithm increases the expected wealth for every x, it is easy to show that for any deterministic rule, there is at least one value of x for which that deterministic rule is no better than "always switch".

To see this, consider some deterministic rule and let T be the smallest value where the rule says to switch envelopes if the first envelope contains T. If T is infinity, (i.e., if the rule says always switch), then for any value of x, the payoff of that rule is just (3/2)*x which is the same as you would get for "always switch". On the otherhand, if T is finite, then choose x=T/2. Since the deterministic rule always gives away the envelope when it contains 2x, the maximum expected wealth of the deterministic rule is (3/2)*x when x=T/2. In contrast, the randomized algorithm would do better.

Note the point in the above paragraph is not that you can find an x where the randomized rule is better than a given deterministic rule. Instead, the point is that you can always find an x where the deterministic rule gives you no advantage over "always switch". In contrast, the randomized rule always gives you an advantage.