You are on page 1of 11

Optimal Adaptive Strategies for Games of the Parrondo Type

Sven Rahmann Computational Molecular Biology MPI for Molecular Genetics Ihnestrae 6373, D-14195 Berlin, Germany Sven.Rahmann@molgen.mpg.de Tel: ++49 (30) 8413-1173, Fax: ++49 (30) 8413-1152 March 29, 2002

Abstract We study Parrondos paradox, which states that one can turn several losing strategies into a winning strategy by combining them in the right way. While there is in fact nothing paradoxical about this phenomenon, it gives rise to a number of algorithmic problems on Markov chains. We consider the problem of evaluating a strategy and the problem of nding an optimal one for three dierent types of strategies. Fixed strategies have to be specied completely before starting a sequence of games. Randomized strategies decide randomly in each step which game to play according to a specied probability distribution. Finally, adaptive strategies make decisions on-line as the game proceeds. Dierent problems are obtained according to the considered time horizon: We are interested in both a nite sequence of games and in the asymptotic setting of an innite sequence of games. We show that there exist ecient algorithms for all six evaluation problems. However, only an optimal adaptive strategy for a nite game sequence can be found eciently at this point; the remaining ve optimization problems remain currently open.

Introduction

Since the publication of an article in Nature [3], the so-called Parrondo Paradox has generated some attention, even in non-scientic journals such as the New York Times [1]. The paradox has been summarized as Losing strategies can win, but this simplied statement seems to be a source of confusion. Consider the following situation: An investor has the choice between two funds A and B. Both funds decrease in value in the long-term average, but there are some intermediate periods where each fund increases its value. Assume now that fund A is likely to make a prot in a high-interest market, while for fund B this is more likely in a low-interest market. It should not come as a surprise that one can make money by alternating between fund A and fund B, according to market conditions. Maybe a little more surprising is the fact that even by randomly choosing A or B with certain probabilities in each time period, prots can possibly be made, depending of course on the dynamics of each funds involved. This is the essence of the so-called paradox: The right combination of two losing strategies can be a winning strategy. The key is to know the rules of the games exactly so one 1

can decide which game to play in which situation. This is also the reason why one cannot gain money by applying Parrondos paradox to the stock market. Clearly, by only buying or only selling one cannot make money. The strategy buy low, sell high has been known for centuries, but obviously is hard to follow in the real world. Nevertheless, when we do know the precise rules of the games involved, we are faced with the interesting problem to nd an optimal strategy when a sequence of N games is to be played. After introducing a formalism for Parrondo games (Section 2), we state a number of variations of this problem (Section 3). We discuss ecient solutions to dierent types of evaluation problems, i.e., we compute the expected winnings when using a given strategy (Section 4). Then we present a simple algorithm that nds an optimal adaptive strategy for a nite sequence of games (Section 5). We conclude with a discussion of the results obtained in this article (Section 6) and present a MATLAB package that contains implementations of the algorithms presented in this article.

A Formalism for Parrondo Games

In the following, we model a Parrondo game by a set of r Markov chains on states with additional payo vectors. Intuitively, we start in a state p := {1, . . . , }. In each step, we can choose one of the games, i.e., one of the Markov chains. Say, we choose chain number k out of 1, . . . , r. We move to a new state j according to the chosen transition matrix. On arrival, we receive the payo (which may be negative) specied for state j and chain k. The following denitions make this more precise. Denition 1 (Parrondo Game). A Parrondo game consists of a state set (which we may assume to be {1, . . . , }) of cardinality , a start state p , or more generally, a start distribution = (1 , . . . , ). a xed number r of -stochastic matrices P (k) (k = 1, . . . , r), and r payo vectors x(k) R . Denition 2 (Fixed Strategy). A xed strategy of length n is a vector g = (g1 , . . . , gn ) {1, 2, . . . , r}n . We can avoid being restricted to a xed strategy by using randomization: In step t, we choose game k with probability k,t . Denition 3 (Randomized Strategy). A randomized strategy of length n is an rn-matrix = (k,t ), such that each column is a probability vector. Both denitions of a strategy imply that the sequence of games to be played is chosen before the games start. In many settings it is more realistic to assume that the next game can be chosen based on the current state, i.e., if we start step t in state i, we decide to play game Gi,t . Denition 4 (Adaptive Strategy). An adaptive strategy of length n is a n-matrix G = (Gi,t ) with Gi,t {1, . . . , r} for all i = 1, . . . , , t = 1, . . . , n.

In the following, when a sequence of N games is to be played with a strategy of length n N , we assume that the strategy is lengthened to length N by repeating it as many times as necessary. Denition 5 (State sequence). For a Parrondo game, and a strategy g [randomized strategy ; adaptive strategy G] of length n, and a game duration of N n steps, let S = (S1 , S2 , . . . , SN ) N be the random state sequence that is obtained in the following way: One starts in state p, or in state i with probability i . We also write S0 for this possibly random state, which is not part of the state sequence. Then one plays game gt [game k with probability k,t ; game GSt1 ,t ] in step t (t = 1, . . . , N ), and St is dened as the state attained in this game. We now dene the payo for a xed strategy. The denition for a randomized or adaptive strategy is exactly the same, replacing g with or G, respectively. Denition 6 (Payo ). For a strategy g of length n and a game duration of N , the payo is the random variable X(g, N ) := state sequence,
N t=1

xSii , where S = (S1 , . . . , SN ) is the

(g )

the expected payo is E[X(g, N )], the expectation of X(g, N ). the expected payo per game (or per step) is E[X(g, N )]/N , and the asymptotic expected payo per game, simply called the asymptotic payo, is L(g) := limm E[X(g, mn)]/(mn), where g m {1, 2, . . . , r}mn denotes the m-fold repetition of strategy g. While the expected payo and expected payo per game depend on the start state or start distribution, the asymptotic payo is independent of the start conditions when all Markov chains are irreducible and aperiodic. Therefore, whenever we make asymptotic considerations, we make this assumption without further stating it. To illustrate the denitions, let us consider two examples. Example 7 (A deterministic example). Ekhad and Zeilberger [2] have shown that no random mechanism is required to demonstrate the paradox. Let your current capital be C Euros, starting with C = 0. Consider two games A and B. In game A, if C is even, you win 1 Euro, otherwise you lose 3 Euros. In game B, if C is odd, you win 1 Euro, otherwise you lose 3 Euros. Playing game A repeatedly results a capital sequence of (0), 1, 2, 1, 4, . . . Euros, with an average loss of 1 Euro per turn, nally resulting in certain bankruptcy. Playing game B leads to the same result; the sequence starts with (0), 3, 2, 5, 4, . . . . However, alternating between A and B, starting with A, results is a capital sequence of (0), 1, 2, . . . , with winnings of 1 Euro per turn. Formally, we have two games, thus r = 2. We need four states, which we call PlusOdd (1), PlusEven (2), MinusOdd (3), and MinusEven (4). We start in one of the Even states, say p = 4 (the choice is irrelevant). Games A and B are now specied by the transition matrices 0 0 0 1 0 1 0 0 1 0 0 0 , P (2) = 0 0 1 0 , P (1) = 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 3

respectively. The corresponding payo vectors are x(1) = x(2) = (1, 1, 3, 3). Note that game A (chain 1) lives on states 1 and 4 and game B (chain 2) lives on states 2 and 3. For the Plus states, the payo is +1, for the Minus states, the payo is 3. Being in an Even state means that the total payo received so far is even; thus in Game A, we move to a Plus state, and this must be PlusOdd, since the total payo is now odd. Starting from state 4, in game A, (i.e., repeating the length-1 strategy g = (1)), the state sequence is (1, 4, 1, 4, . . . ) with payos (1, 3, 1, 3, . . . ) for an asymptotic average of L(g) = 1 per game. For game B (i.e., repeating the strategy g = (2)), the state sequence is (3, 2, 3, 2, . . . ), also resulting in L(g) = 1. As discussed above, repeating g = (1, 2) results in L(g) = +1 (again assuming that we start from an Even state p = 2 or p = 4. Otherwise, g = (2, 1) is the correct strategy. Note that the two chains in this example are reducible and periodic). Example 8 (The Original Parrondo Game). In [3], the Parrondo paradox was introduced with the following example: We start with a capital of zero Euros (but have innite credit). Winning a game means receiving one Euro, and losing a game means paying one Euro. There are two games. In game 1, we win with probability 1/2 (and lose with probability 1/2). Game 2 is more complicated. When the current capital is a multiple of 3, we win with probability 1/10 (and lose with probability 9/10). Otherwise, we win with probability 3/4. (The description in [3] included a parameter to slightly modify these probabilities, but it is not needed and omitted for simplicity of exposition.) It is easy to see that game 1 is fair (i.e., the average payo per game step is zero). Analysis of game 2 shows that in the long run we spend 5/13 of the time in a state where the capital is divisible by three. Thus the asymptotic payo is 5/13 (1/10 9/10) + 8/13 (3/4 1/4) = 0, and the game is also fair. In [2], it is shown that by repeating the strategy g = (1, 2, 2, 1, 2), the asymptotic payo is L(g) 0.076. This is further discussed in Section 4.4. To model this game, we need the six states Plus1 (1), Plus2 (2), Plus3 (3), Minus1 (4), Minus2 (5), and Minus3 (6). Being in a Plus states indicates that we have won the last game, and the number 1, 2, or 3 indicates our capital modulo 3 (using 3 instead of 0). We start in one of the 3 states, i.e., p = 3 or p = 6 (it does not matter which one). The transition matrices for games 1 and 2 are, respectively, 0 3/4 0 0 0 1/4 0 1/2 0 0 0 1/2 0 0 0 3/4 1/4 0 0 0 1/2 1/2 0 0 1/10 0 1/2 0 0 0 9/10 0 0 0 1/2 0 (2) (1) , , P = P = 3/4 0 0 0 1/4 0 0 1/2 0 0 1/2 0 0 0 0 3/4 1/4 0 0 0 1/2 1/2 0 0 1/10 0 0 0 9/10 0 1/2 0 0 0 1/2 0 and the payo vectors are x(1) = x(2) = (1, 1, 1, 1, 1, 1).

Problems
1. Compute a given functional of the payo distribution for a given strategy. 2. Find a strategy that optimizes a given functional of the payo distribution over all strategies.

For a given a Parrondo game, we consider several problems falling into two main classes.

We consider only the expectation of the payo distribution. It may be equally interesting (but is apparently more dicult) to compute the probability that the payo exceeds a certain threshold. For each of the two problem classes, we may either ask for 1. asymptotic results, or 2. results for game sequences of nite length N . The number N of games to be played can be larger than the length n of the strategy. In this case the strategy is repeated as many times as necessary, i.e., innitely often for asymptotic results. Furthermore, we obtain dierent problems according to when and how we have to decide on a strategy. We consider 1. xed strategy problems, 2. randomized strategy problems, and 3. adaptive strategy problems. Section 4 presents ecient methods to compute the expected payo for a given strategy, while Section 5 focuses on the optimization problems.

Computing the Expected Payo for a Strategy

Previous analyses [4, 5] have almost exclusively focused on computing the asymptotic payo for a xed strategy (a single one out of the six computational problems dened in Section 3). In fact, there is an ecient algorithm for each of the six problems. In Section 4.1 we present and analyze ecient algorithms for a xed strategy, Section 4.2 covers the algorithms for randomized strategies, and Section 4.3 treats adaptive strategies. Numerical illustrations referring to Example 8 are given in Section 4.4. We put special emphasis on the running time of the algorithms. Consider the naive way of computing the expected payo by conditioning on the state sequence S:
N

E[X(g, N )] = E[E[X(g, N ) |S = s]] =


sN

Probg (s)
t=1

(g xst t )

where Probg (s) denotes the probability of state sequence s under strategy g. This requires summing over all n possible state sequences and is therefore impractical for large n. All algorithms presented here are linear in N (or n for the asymptotic results). Table 1 gives an overview of the running times. 5

4.1

Expected Payo for a Fixed Strategy

Let g = (g1 , . . . , gn ) {1, . . . , r}n be a xed strategy of length n, and let N n be the length of the game sequence. Wherever necessary, we assume that g has been lengthened to length N by repetition of g. For t = 1, . . . , N , let (t) := P (g1 ) . . . P (gt ) . This is the distribution of states after step t when using strategy g and starting with distribution (most of the time, is a Dirac measure, i.e., p = 1 for a start state p and i = 0 for i = p). Using , to denote the inner product of two vectors, the expected payo in step t is

i xi
i=1

(t) (gt )

= (t) , x(gt ) .

Thus the expected payo per game for strategy g is 1 E[X(g, N )] = N N


N

(t) , x(gt ) .
t=1

This can be computed in O(N 2 ) time: Since (t) = (t1) P (gt ) (where (0) = ), computing all (t) for t = 1, . . . , N requires N vector-matrix multiplications for a total of O(N 2 ) time. Summing the N inner products takes additional O(N ) time. Now assume that g of length n is repeated ad innitum. Then the n-step transition matrix for one iteration of g is P (g) := P (g1 ) P (g2 ) P (gn ) . Since P (k) is irreducible and aperiodic for each k = 1, . . . , r, so is P (g) . Therefore, P (g) has a unique stationary distribution = (1 , . . . , ) satisfying P (g) = or (P (g) Id) = 0. Independently of the starting distribution , after a few iterations and before beginning a new iteration of strategy g, the chain is in state i with probability approximately i . Once is known, we use the same method as above, replacing by and (t) by (t) . Thus (t) := P (g1 ) . . . P (gt ) is the distribution of states after step t when using strategy g and starting with the stationary distribution . By stationarity, (n) = . It follows that the asymptotic payo L(g) is the expected payo per game step in one stationary iteration of strategy g; n 1 L(g) = (t) , x(gt ) . n
t=1

The running time of this method is dominated by the time to nd . Computing P (g) requires at most O(n) matrix multiplications, each needing O( 3 ) time using the naive matrixmultiplication method. Computing means solving (P (g) Id) = 0, which requires O( 3 ) time (e.g., using Gaussian elimination). Since the rest of the algorithm now runs in O(n 2 ) time as shown above, the total running time is O(n 3 ). If g contains long repeats, it may be possible to achieve a better running time by re-using re-occurring sub-products. In this case, the running time reduces to O(n 3 + n 2 ), where n can be as low as O(log n).

4.2

Expected Payo for a Randomized Strategy

Let Rrn be a randomized strategy of length n. In step t, game k {1, . . . , r} is chosen with probability k,t . Since this choice is independent of the current state, the randomized 6

Strategy: Fixed g (1 n) Randomized (r n) Adaptive G ( n)

N games O(N 2 ) O((nr + N ) 2 ) O(r + N 2 )

Asymptotic O(n 3 ) O(nr 2 + n 3 ) O(r + n 3 )

Table 1: Running times of the algorithms to compute the expected payo for each strategy type and each time horizon.

strategy reduces to playing a single game in step t with transition matrix P (t) and payo (t) dened as follows: vector x
r r

P (t) =
k=1

k,t P

(k)

(t)

=
k=1

k,t x(k) .

Computation of all P (t) and x(t) takes O(nr 2 ) time. Now we apply the algorithms from the previous section to P and x. It follows that computing the expected payo for a game sequence of length N requires a total O((nr+N ) 2 ) time, and computing the asymptotic payo O(nr 2 + n 3 ) time. Often we are interested in repeated randomized strategies of length n = 1. In this case, we also write P for P (1) and x for x(1) . If is the stationary distribution of P , it follows that the asymptotic payo is L() := , x . Starting from an arbitrary distribution , the expected payo per game after N steps is 1 E[X(, N )] = N N
N N

,x =
t=1

(t)

1 N

(t) , x ,
t=1

where (t) := P t is the distribution of states in step t.

4.3

Expected Payo for an Adaptive Strategy

Using an adaptive strategy means that we can choose in each step which game to play, based on the current state. Let G = (Gi,t ) be an adaptive strategy of length n, and let N be the number of games to be played. As usual, we may consider G to be of length N by repeating G as many times as necessary. In step t, using game Gi,t when the current state is i means using a transition matrix P (t) , (Gi,t ) , or whose i-th row is the i-th row of P
(G ) (t) Pi,j = Pi,j i,t .

We can compute the distribution of states (t) after step t by (0) := and (t) = (t1) P (t) for t = 1, . . . , N . (G ) To compute the expected payo in step t, recall that payo xj i,t is received if we move (G ) (t1) (t) (t1) from i to j in step t. This happens with probability i Pi,j = i Pi,j i,t . Therefore the expected payo in step t is
(t1) i i=1 j=1 (G ) (G ) xj i,t Pi,j i,t

=
i=1

(t1) (Gi,t ) yi ,

where yi (i = 1, . . . , ; k = 1, . . . , r) denotes the expected payo after one step when (k) starting in state i and choosing game k, i.e., yi := x(k) , ei P (k) . The expected payo per step is thus E[X(G, N )] 1 = N N
N

(k)

i
t=1 i=1

(t1) (Gi,t ) yi .

To pre-compute all y (k) requires O(r) time. In each step, we compute the expected payo in O() time and the next (t) in O( 2 ) time. Thus the total running time is O(r + N 2 ). For the asymptotic payo, we assume that P (G) := P (1) P (n) is irreducible and aperiodic. Then we nd its unique stationary distribution in O(n 3 ) time, proceed as above with instead of , and nd that 1 L(G) = n The total running time is O(r + n 3 ).
n

i
t=1 i=1

(t1) (Gi,t ) yi .

4.4

Numerical Examples

The following calculations all refer to Example 8. They demonstrate that for short strategies, there can be a big dierence between playing the strategy once or repeating it ad innitum. Consider the strategy g := (1, 2, 2, 1), starting in state 3. If this strategy is played once, the expected payo per game is 8.75 cents, while the asymptotic payo is only 2.454 cents. When the strategy g := (1, 2, 2, 1, 2) is played once (again starting in state 3), this results in an expected payo per game of 8.06 cents (less than for g), but the asymptotic value becomes 7.568 cents (much more than for g). Thus for short sequences of games, the edge eects can make quite a dierence. Now we consider the randomized strategy = (1 , 1 1 )T of length n = 1 (1 [0, 1]). By plotting the asymptotic payo L() against 1 , we nd that choosing 1 = 0.4146 results in the highest asymptotic payo of 2.62 cents. However, when we can only play a sequence of 10 games, the expected payo becomes becomes 0.60 cents per game. In this case, the best possible randomized strategy is 1 = 0.77 with an expected payo of 0.137 cents per game. Thus the asymptotically optimal strategy can be a bad choice for short game sequences. As an example for an adaptive strategy, we consider the greedy strategy chooses the game maximizing the expected payo in the next step. Thus the greedy strategy has length n = 1. The following table lists the expected payo for the next step for each game, depending on the current state. State Game 1 Game 2
1 (Plus1) 2 (Plus2) 3 (Plus3) 4(Minus1) 5 (Minus2) 6 (Minus3)

0 0.5

0 0.5

0 0.8

0 0.5

0 0.5

0 0.8

Therefore, the greedy strategy is to choose Game 1 in states 3 and 6 and Game 2 in states 1, 2, 4, and 5: G = (2, 2, 1, 2, 2, 1)T . Analysis shows that being greedy is not a bad choice: The asymptotic payo is 32.43 cents per game. Starting from state 3, this limit is approached from below; e.g., for a sequence of 20 games, the expected payo per game is 31.41 cents. In this example, the greedy strategy is optimal. However, this is not true in general. In Section 5.1, we present an algorithm that computes an optimal adaptive strategy of length N for a game sequence of length N . 8

Finding an Optimal Strategy

Since we are able to evaluate the payo for a strategy, there is a simple method to nd the best one: Evaluate the the payo for every strategy under consideration, and remember the best one. This approach is taken in Ekhads and Zeilbergers MAPLE package PARRONDO [2] to nd the strategy of length n that maximizes the asymptotic payo (also called the best periodic strategy of length n). This approach requires examining all rn strategies, which becomes infeasible even for moderate values of r and n. Thus more ecient algorithms are highly desirable. In Section 5.1, we present an algorithm that nds the best adaptive strategy of length n for a game sequence of length N = n, which seems to us the most practically relevant case. The algorithm runs in O(r 2 N ) time. Currently, no ecient algorithms are known for the remaining ve optimization problems posed in Section 3.

5.1

Finding The Best Adaptive Strategy

To state the algorithm, we use the following quantities: As in Section 4.3, let yi (i = 1, . . . , ; k = 1, . . . , r) denote the expected payo after one step when starting in state i and choosing game k, i.e.,
(k)

yi

(k)

:=
j=1

Pi,j xj

(k)

(k)

= ei P (k) , x(k) .

Let Yi denote the maximal expected payo among all adaptive strategies of length m when starting in state i. The algorithm is based on the fact that the values Yi (m1) the values Yj by the following lemma. Lemma 9. We have Yi
(0) (m)

(m)

can be obtained eciently from

(m) Yi

= 0 for all i = 1, . . . , , and for m 1,


(k) = max yi + k=1,...,r

Pi,j Yj

(k)

j=1

(m1)

(1)

Proof. The statement for m = 0 is certainly true. When m = 1, the statement becomes (1) (k) Yi = maxk=1,...,r yi , which is the expected payo achieved by the greedy strategy, which is obviously optimal when only one step is to be played. Assuming that the statement is true for m 1, we show that it is also true for m. Assume (m) that there is an adaptive strategy G with expected payo Y > Yi when starting in i, and let k := Gi,1 be the game that is played in the rst step according to this strategy. The expected (k) (k) payo in this step is yi and one moves to state j with probability Pi,j . By induction, we know that from state j, strategy G achieves at most expected payo Yj
(k) yi + i=1 (k) (m1) Pij Yj (m1)

. It follows that

Y for all k, contradicting the assumption that Y is larger than the maximum over all k of these values.

As a consequence, an optimal adaptive strategy G of length m can be built from an optimal G of length m 1 by prepending it with a new rst column. Let Ki,m denote the value of k that achieves the maximum in Equation (1). Then to achieve maximal expected payo, one must play game Ki,m in state i when m steps remain. In other words, the optimal adaptive strategy G of length N is given by Gi,t := Ki,N t+1 . To compute one column of K requires O(r 2 ) time. Therefore, to compute the optimal adaptive strategy of length N requires O(r 2 N ) time. In fact, all optimal strategies of lengths 1, . . . , N are computed in the process, which means that only O(r 2 ) time per strategy is required. The expected payo for the optimal strategy G of length N is immediately available for any start distribution as

E[X(G, N )] =
i=1

i Yi

(N )

Conclusion

While the fact that several losing strategies can be combined to form a winning strategy has become known as Parrondos paradox, we have seen that there is in fact nothing paradoxical about this. Maybe the most surprising fact is that even by using an arbitrary randomized strategy of length 1, a positive asymptotic payo is expected in Example 8. A number of interesting algorithmic problems on Markov chains are motivated by this paradox. We have given ecient solutions to all evaluation problems and to the optimization problem for adaptive strategies of nite length. The algorithms described in Sections 4 and 5 have been implemented as MATLAB functions. We have also implemented an exhaustive search method that nds the best xed strategy of length n for a game sequence of length N , and a function that nds the asymptotically best xed strategy of length at most n. The MATLAB package is available for download from the authors homepage http://www.molgen.mpg.de/~rahmann. We would like to encourage the readers of this article to experiment with games of the Parrondo type using the available MATLAB routines. It would be desirable to nd ecient (approximation) algorithms for the remaining optimization problems. Furthermore, it may not always be a good idea to maximize the expected payo. For example, the strategy with maximal expected payo could give a very large payo with small probability and small payos with large probability. Therefore, it is also interesting to look for a strategy that maximizes the probability of a positive payo. Acknowledgments. This work was inspired by a talk of Prof. Erhard Behrends (FU Berlin) during the German Open Conference on Probability and Statistics 2002 in Magdeburg. I wish to thank Martin Vingron for allowing me to take a break in my PhD work to study the Parrondo paradox.

References
[1] Sandra Blakeslee. Paradox in game theory: Losing strategy that wins. In: The New York Times, January 25th, 2000.

10

[2] Shalosh B. Ekhad and Doron Zeilberger. Remarks on the parrondo paradox. In: The personal journal of Ekhad and Zeilberger, October 2000. URL: http://www.math.temple.edu/~zeilberg/mamarim/mamarimhtml/parrondo.html. [3] G. P. Harmer and D. Abbott. Game theory: Losing strategies can win by Parrondos paradox. Nature, 402:864, 1999. [4] G. P. Harmer and D. Abbott. Parrondos paradox. Statistical Science, 14(2):206213, 1999. [5] G. P. Harmer, D. Abbott, and P. Taylor. The paradox of parrondos games. Proceedings of the Royal Society A, 456(1994):247259, 2000.

11

You might also like