# International Journal of Foundations of Computer Science Vol. 23, No. 3 (2012) 585–608 c World Scientiﬁc Publishing Company DOI: 10.

1142/S0129054112400291

USING STRATEGY IMPROVEMENT TO STAY ALIVE∗

ˇ BRIM† and JAKUB CHALOUPKA‡ LUBOS Faculty of Informatics, Masaryk University Botanick´ a 68a, 60200 Brno, Czech Republic † brim@ﬁ.muni.cz ‡ xchalou1@ﬁ.muni.cz

Received 25 October 2010 Accepted 22 June 2011 Communicated by Margherita Napoli We design a novel algorithm for solving Mean-Payoﬀ Games (MPGs). Besides solving an MPG in the usual sense, our algorithm computes more information about the game, information that is important with respect to applications. The weights of the edges of an MPG can be thought of as a gained/consumed energy – depending on the sign. For each vertex, our algorithm computes the minimum amount of initial energy that is suﬃcient for player Max to ensure that in a play starting from the vertex, the energy level never goes below zero. Our algorithm is not the ﬁrst algorithm that computes the minimum suﬃcient initial energies, but according to our experimental study it is the fastest algorithm that computes them. The reason is that it utilizes the strategy improvement technique which is very eﬃcient in practice. Keywords : Mean-payoﬀ games; strategy improvement; experimental evaluation.

1. Introduction A Mean-Payoﬀ Game (MPG) [13, 16, 20] is a two-player inﬁnite game played on a ﬁnite weighted directed graph, the vertices of which are divided between the two players. A play starts by placing a token on some vertex and the players, named Max and Min, move the token along the edges of the graph ad inﬁnitum. If the token is on Max’s vertex, he chooses an outgoing edge and the token goes to the destination vertex of that edge. If the token is on Min’s vertex, it is her turn to choose an outgoing edge. Roughly speaking, Max wants to maximize the average weight of the traversed edges whereas Min wants to minimize it. It was proved in [13] that each vertex v has a value, denoted by ν (v ), which each player can secure by a positional strategy, i.e., strategy that always chooses the same outgoing edge in the same vertex. To solve an MPG is to ﬁnd the values of all vertices, and, optionally, also strategies that secure the values.
∗ This

work has been partially supported by the Grant Agency of the Czech Republic grants No. 201/09/1389, 102/09/H042. 585

586

L. Brim & J. Chaloupka

In this paper we deal with MPGs with other than the standard average-weight goal. Player Max now wants the sum of the weights of the traversed edges, plus some initial value (initial “energy”), to be non-negative at each moment of the play. He also wants to know the minimal suﬃcient amount of initial energy that enables him to stay non-negative. For diﬀerent starting vertices, the minimal suﬃcient initial energy may be diﬀerent and for starting vertices with ν < 0, it is impossible to stay non-negative with arbitrarily large amount of initial energy. The problem of computation of the minimal suﬃcient initial energies has been studied under diﬀerent names by Chakrabarti et al. [6], Lifshits and Pavlov [18], and Bouyer et al. [2]. In [6] it was called the problem of pure energy interfaces, in [18] it was called the problem of potential computation, and in [2] it was called the lowerbound problem. The paper [2] also contains the deﬁnition of a similar problem – the lower-weak-upper-bound problem. An instance of this problem contains, besides an MPG, also a bound b. The goal is the same, Max wants to know how much initial energy he needs to stay non-negative forever, but now the energy level is bounded from above by b and during the play, all increases above this bound are immediately truncated. Various resource scheduling problems for which the standard solution of an MPG is not useful can be formulated as the lower-bound or the lower-weak-upper-bound problems, which extends the applicability of MPGs. For example, an MPG can be used to model a robot in a hostile environment. The weights of edges represent changes in the remaining battery capacity of the robot – positive edges represent recharging, negative edges represent energy consuming actions. The bound b is the maximum capacity of the battery. Player Max chooses the actions of the robot and player Min chooses the actions of the hostile environment. By solving the lowerweak-upper-bound problem, we ﬁnd out if there is some strategy of the robot that allows him to survive in the hostile environment, i.e., its remaining battery capacity never goes below zero, and if there is such a strategy, we also get the minimum initial remaining battery capacity that allows him to survive. The ﬁrst algorithm solving the lower-bound problem was proposed by Chakrabarti et al. [6] and it is based on value iteration. The algorithm can also be easily modiﬁed to solve the lower-weak-upper-bound problem. The value iteration algorithm was later improved by Chaloupka and Brim in [8], and independently by Doyen, Gentilini, and Raskin [12]. An extended version of [8, 12] was recently published [5]. Henceforward we will use the term “value iteration” (VI) to denote only the improved version from [8, 12]. The algorithms of Bouyer et al. [2] that solve the two problems are essentially the same as the original algorithm from [6]. However, [2] focuses mainly on other problems than the lower-bound and the lower-weakupper-bound problems for MPGs. A diﬀerent approach to solving the lower-bound problem was proposed by Lifshits and Pavlov [18], but their algorithm has exponential space complexity, and so it is not appropriate for practical use. VI seems to be the best known approach to solving the two problems.

Therefore. and it is also not helpful for solving the lower-weak-upper-bound problem for small bound b. This gives us two algorithms: VI + BV and VI + SW. suitable for practical solving of the lower-bound and the lower-weak-upper-bound problems for large MPGs. but strategy improvement algorithms for solving MPGs in general exist as well [1. As a by-product of the design of KASI. the same as the complexity of VI + BV. Moreover. VMin ). w) is a ﬁnite weighted directed graph such that V is a disjoint union of the sets VMax and VMin . ﬁrst part. VI + SW. last paragraph). However. E. The use of the strategy improvement technique for solving MPGs goes back to the algorithm of Hoﬀman and Karp from 1966 [17]. 19. The shortcoming of VI is that it takes enormous time on MPGs with at least one vertex with ν < 0. w : E → Z is the weight function. Section 4. we also study the algorithm VI without the preprocessing. It is slightly worse than the complexity of VI. We call our algorithm “Keep Alive Strategy Improvement” (KASI). we design a novel algorithm based on the strategy improvement technique. 11].Using Strategy Improvement to Stay Alive 587 In this paper. However. Natural way to alleviate this problem is to ﬁnd the vertices with ν < 0 by some fast algorithm and run VI on the rest. VI + BV. Our algorithm is the ﬁrst solution of this kind. where W is the maximal absolute edge-weight. and each . 20] is given by a triple Γ = (G. we selected two algorithms for computation of the set of vertices with ν < 0. Their algorithm can be used to solve only a restricted class of MPGs. the preprocessing is not helpful on MPGs with all vertices with ν ≥ 0. which can be viewed as an extended version of [3]. all of them solve neither the lower-bound nor the lower-weak-upper-bound problem (cf. KASI is the best algorithm. VMax . It solves both the lower-bound and the lower-weak-upper-bound problem. and the algorithm of Schewe [19] (SW). as each algorithm that solves the lower-bound problem also divides the vertices of an MPG into those with ν ≥ 0 and those with ν < 0. Based on our previous experience with algorithms for solving MPGs [7]. According to the study. the algorithm of Bj¨ orklund and Vorobyov [1] (BV). and better than the complexity of VI + SW. To evaluate and compare the algorithms VI. where G = (V. Our new algorithm based on the strategy improvement technique that we propose in this paper has the complexity O(|V | · (|V | · log |V | + |E |) · W ). 2. Preliminaries A Mean-Payoﬀ Game (MPG) [13. and KASI. Moreover. we describe a way to construct an optimal strategy for Min with respect to the lower-weak-upper-bound problem. Another contribution of this paper is a further improvement of VI. we implemented them and carried out an experimental study. which can be used to compute the exact ν values of all vertices. Namely. KASI can be thought of as an algorithm that also solves MPGs in the usual sense. we improved the complexity of BV and proved that Min may not have positional strategy that is also optimal with respect to the lower-weak-upper-bound problem. 16.

vk ) with vk ∈ VMax .) that agrees with both σ and π . named Max and Min. . .) is formed. because arbitrarily large amount of initial energy is not suﬃcient. Max moves it. VMax . we think of a positional strategy of Max as a function σ : VMax → V such that (v. v ) ∈ E | u ∈ VMin ∨ σ (u) = v }. then we say that Max loses from v . v2 . The game is played by two opposing players. σ. The strategy σ ∈ ΣΓ is called a positional strategy if σ (p) = σ (p′ ) for all ﬁnite ′ ′ ′ paths p = (v0 . as deﬁned below [13]. and wσ = w ↾ Eσ . Min moves it. If lbΓ (v ) ∈ N0 . . . A play starts by placing a token on some given vertex and the players then move the token along the edges of G ad inﬁnitum. . If the token is on vertex v ∈ VMax . π ) = (v = v0 . . and for π ∈ ΠΓ . wσ ). v1 . . where Eσ = {(u. VMin ). . For σ ∈ ΣΓ M . and so the deﬁnition of a strategy is correct. which means that ν (v ) < 0. . it holds that (vk . Γ = ( G . It was proved that it is equal to the minimal loss that Min can ensure. we get Gσ from G by deleting all the edges emanating from Max’s vertices that do not follow σ . Gπ for a strategy π of Min is deﬁned analogously. VMax . We say that an inﬁnite path p = (v0 . and Min’s aim is to minimize her loss: n−1 1 w ( v . . Chaloupka v ∈ V has out-degree at least one. That is. The set of all strategies of Min in Γ is denoted by ΠΓ . we also deﬁne Γσ = (Gσ . for each v ∈ VMax .588 L. E. . Brim & J. If the token is on vertex v ∈ VMin . . . . The set of all strategies of Max in Γ is denoted by ΣΓ . v1 . such that: lbΓ (v ) = min{x ∈ N0 | (∃σ ∈ ΣΓ )(∀π ∈ ΠΓ ) ( outcomeΓ (v. . . v2 . π ) = (v = v0 . . . the restriction of G to σ . For the sake of simplicity. . A (general) strategy of Max is a function σ : V ∗ · VMax → V such that for each ﬁnite path p = (v0 . w). vk ′ ) such that vk = vk′ ∈ VMax . vk ) and p′ = (v0 . We deﬁne Gσ . vi ) = vi+1 . Given an initial vertex v ∈ V . v1 . vi+1 ) ≥ 0) ) } where the minimum of an empty set is ∞. π π Max Min M The lower-bound problem for an MPG Γ = (G = (V. If lbΓ (v ) = ∞. A strategy π of Min is deﬁned analogously. The set of all positional strategies of Max in Γ is denoted by ΣΓ M .) agrees with the strategy σ ∈ ΣΓ if for each vi ∈ VMax . σ (v0 . as the graph (V. the outcome of two strategies σ ∈ ΣΓ and π ∈ ΠΓ is the (unique) inﬁnite path outcomeΓ (v. Max’s aim is to maximize n−1 1 his gain: lim inf n→∞ n i=0 w(vi . both players can ensure ν (v ) by using positional strategies. The set of all positional strategies of Min in Γ is denoted by ΠΓ M . if the play starts from v . vi+1 ).) ∧ n−1 (∀n ∈ N)(x + i=0 w(vi . denoted lim supn→∞ n i i=0 by ν (v ). lbΓ (v ) is the minimal suﬃcient amount of initial energy that enables Max to keep the energy level non-negative forever. . . V . . v1 . Moreover. V ). σ. v2 . . . This way an inﬁnite path p = (v0 . vi+1 ). That is. σ (v )) ∈ E . . A positional strategy π of Min is deﬁned analogously. . we deﬁne its value. Eσ . . Recall that each vertex has out-degree at least one. v2 . σ (p)) ∈ E . . VMin ) is the problem of ﬁnding lbΓ (v ) ∈ N0 ∪ {∞} for each v ∈ V . For each vertex v ∈ V . as the maximal gain that Max can ensure if the play starts at vertex v . then Max wins from v . .

then for each v ∈ V . we will focus only on the lower-weak-upper-bound problem. It was proved in [2] that both for the lower-bound problem and the lower-weakupper-bound problem Max can restrict himself only to positional strategies. That is. Therefore. If lwubΓ b (v ) = ∞. VMin ) and a bound b ∈ N0 is the problem of ﬁnding lwubΓ b (v ) ∈ N0 ∪ {∞} for each v ∈ V . . vk ) be a path in G. w) be a weighted directed graph. The set of all (ﬁnite) paths in G is denoted by pathG . E. and ﬁnally the set of all inﬁnite paths in G is denoted by pathG ∞. r −1 w(c) = i=0 w(ui . then Max wins from v . VMax . w). However. l(c) = r. . and for each v ∈ V such that lbΓ (v ) = ∞. if the play starts from v . If lwubΓ b (v ) ∈ N0 . This will be shown later. we could use the Γ set ΣΓ M instead of the set Σ in the deﬁnitions of both the lower-bound problem and the lower-weak-upper-bound problem. he always has a positional strategy that is also optimal. it holds that lbΓ (v ) ≤ (|V | − 1) · W . . i. . . the number k −1 of edges in c. σ. let p = (v0 . are deﬁned in the following way: w(p) = i=0 w(vi . Therefore. . l(p) = k . The additional condition is equivalent to the condition that the play does not contain a segment of weight less than −b. if it ensures that for each v ∈ V such that lbΓ (v ) = ∞. and l(c). . lbΓ (v ) is a suﬃcient amount of initial energy. the weight of c. In the rest of the paper. under the additional condition that the energy level is truncated to b whenever it exceeds the bound. . if we choose b = (|V | − 1) · W . n2 ∈ N0 )(n1 < n2 ⇒ i= n1 w(vi . then we say that Max loses from v . l(p). where W is the maximal absolute edge-weight in G [2]. π ) = (v = v0 . lbΓ (v ) = lwubΓ b (v ). because it includes the lower-bound problem as a special case. vi+1 ) ≥ −b) ) } where minimum of an empty set is ∞. Then w(p). and let c = (u0 . vi+1 ) ≥ 0) ∧ n2 −1 (∀n1 . because arbitrarily large amount of initial energy is not suﬃcient. the weight of p. Optimal strategies for Max and Min with respect to the lower-weak-upper-bound problem are deﬁned in the same way as for the lower-bound problem. the number of edges in p. vi+1 ). The strategy π ∈ ΠΓ is an optimal strategy of Min with respect to the lowerbound problem. The reason is that for each v ∈ V such that lbΓ (v ) < ∞. w(c).e. . . The lower-weak-upper-bound problem for an MPG Γ = (G = (V.) ∧ n−1 (∀n ∈ N)(x + i=0 w(vi . v1 .. Let G = (V. ur = u0 ) be a cycle in G. ur−1 . Max loses. if it ensures that for each v ∈ V such that lbΓ (v ) = ∞. E. Max needs at least lbΓ (v ) units of initial energy. . ui+1 ). v2 . the set of all cycles in G is denoted by cycleG . Min cannot restrict herself to positional strategies.Using Strategy Improvement to Stay Alive 589 The strategy σ ∈ ΣΓ is an optimal strategy of Max with respect to the lowerbound problem. . such that: Γ Γ lwubΓ b (v ) = min{x ∈ N0 | (∃σ ∈ Σ )(∀π ∈ Π ) Γ ( outcome (v. lwubΓ b (v ) is the minimal suﬃcient amount of initial energy that enables Max to keep the energy level non-negative forever.

. . Finally.e. ui+1 . . d0 (v ) ≤ d1 (v ). . and for some v ∈ V . Furthermore. If uj ∈ c. VMin . . . . . where G′ (D) is the graph G(D) with negative self-loops added to all vertices with zero outdegree. u0 ) ∈ cycleG . d0 (v ) < d1 (v ). and so −d is increased. Formally. −d ≤ lwubb . The reason why KASI maintains the vector d rather than −d is that d contains weights of certain paths and we ﬁnd it more natural to keep them as they are. Brim & J.e. KASI maintains a vector d ∈ (Z ∪ {−∞})V Γ such that −d ≥ 0 is always a lower estimate of lwubΓ b . Let c = (u0 . We denote the set of all paths from A to B by pathG (A. VMin ∩ D. than to keep their opposite values. . w) be a graph and let B. We denote the set of all paths from v to B by pathG (v. W = maxe∈E |w(e)|. . a segment of p is a path (vi . VMin ) be an MPG and let W be the maximal absolute edge-weight in G. The deﬁnitions of segments and preﬁxes naturally extend to inﬁnite paths.590 L. For example. . That is. . then d0 < d1 means that for each v ∈ V . . A segment of c is a path (ui . . not the numbers of edges. A ⊆ V . .) ∈ pathG ∞ is a path (vi . k } and i ≤ j . vj ). B ). v1 . Since some vertices might have zero out-degree in G(D). j ∈ {0. VMax . The set of all segments of c is denoted by segment(c). . . we make the vertices with zero out-degree in G(D) losing for Max in Γ(D) with respect to the the lower-weak-upper-bound problem. . where i ∈ N0 . vk ). Then G(D) is the subgraph of G induced by the set D.). B ). . formally: p = (v = v0 . E. . . until −d = lwubb . The vector Γ d is gradually decreased. where i. uj ). w). The set of all suﬃxes of p is denoted by suﬃx(p). . . vi ). because there is a path in c between any two vertices in c. . . . . . . . where i. A suﬃx of a path p = (v0 . Please note that we do not require i ≤ j . . E. . . k }. . k }. where i ∈ {0. E. b) be an instance of the lower-weak-upper-bound problem. w). . 3.. j ∈ {0. . . . . The set of all preﬁxes of p is denoted by preﬁx(p). then uj +1 is the vertex following uj in c. G(D) = (D. . vk ) ∈ pathG is a path (vi . vi+1 . Chaloupka A suﬃx of a path p = (v0 . We also deﬁne the restriction of Γ induced by D. . . .. i. Let G = (V. vk ). VMax ∩ D). For the whole paper let Γ = (G = (V. . ur−1 . . The Algorithm A high-level description of our Keep Alive Strategy Improvement algorithm (KASI) for the lower-weak-upper-bound problem is as follows. where i ∈ {0. j +1 is taken modulo r. Let Γ = (G = (V. vi+2 . r − 1}. . where v0 . . . VMax ) be an MPG and let D ⊆ V . Operations on vectors of the same dimension are element-wise. . . if d0 and d1 are two vectors of dimension |V |. If we say that “p is a path from v to B ” we mean a path with the last vertex and only the last vertex in B . A preﬁx of p is a path (v0 . The set of all segments of p is denoted by segment(p). w ↾ D × D). E ∩ D × D. we deﬁne Γ(D) = (G′ (D). . The term “longest” in connection with paths always refers to the weights of the paths. Let (Γ. i. . . In particular. a path from A to B is a path from v to B such that v ∈ A. . . v2 . vk−1 ∈ V \ B ∧ vk ∈ B . The algorithm also maintains a set D of vertices such that about the vertices in V \ D it already knows that they have inﬁnite lwubΓ b value.

That is. Then. computed for the previous strategy. The input to the procedure consists of four parts. it holds that d−1 (v ) ≥ d−1 (u) + w(v. The strategy evaluation examines the graph Gπ (D) and updates the vector d so that for each v ∈ D. u) and continuing from u costs at least −w(v. it solves the lower-weak-upper-bound problem in the restricted game Γπ (D). for each edge emanating from v in Gπ . u) ∈ E such that v ∈ VMin and d(v ) > −∞ it holds that d(v ) > d(u) + w(v. The ﬁrst and the second part form the lower-weak-upper-bound problem instance that the main algorithm KASI is solving. then it is losing for Max in Γ.e. where Min has no choices. u) ∈ Eπ . The vector d−1 is such that −d−1 is a lower estimate of lwubΓ b . u). any of them is acceptable. The vertices with the d value equal to −∞ are removed from the set D. the current strategy is ﬁrst evaluated and then improved. π (v ) is switched to u. i. The third part is the strategy π ∈ ΠΓ M that we want to evaluate and the fourth part of the input is a vector d−1 ∈ (Z ∪ {−∞})V . the MPG Γ and the bound b ∈ N0 . in case of the ﬁrst iteration of KASI. If v has more than one such edge emanating from it. In each iteration. it holds that d−1 (v ) < 0 and for each edge (v. u) − d(u) units of energy. KASI starts with an arbitrary strategy π ∈ ΠΓ M and then iteratively improves it until no improvement is possible. the following conditions are prerequisites for the procedure EvaluateStrategy(): (i) Each cycle in Gπ (D \ A) is negative (ii) For each v ∈ D \ A. the strategy π is improved in the following way. the algorithm terminates. u). Let A = {v ∈ V | d−1 (v ) = 0} and D = {v ∈ V | d−1 (v ) > −∞}. set by initialization to a vector of zeros. In Figure 1. u). because traversing the edge w(v. Since the strategy π is either the ﬁrst strategy or an improvement of the previous strategy. This explains why the restricted game was deﬁned the way it was. another iteration of KASI is started. d = 0 and D = V . This is called a strategy improvement condition. because it holds that each vertex v ∈ V is such that −d(v ) = lwubΓ b (v ). If there are edges satisfying the condition. If no such edge exists. it holds −d(v ) = lwubb π Γ (D ) (v ) . u) ∈ E such that d(v ) > d(u) + w(v. For each vertex v ∈ VMin such that there is an edge (v. Such an edge indicates that −d(v ) is not a suﬃcient initial energy at v . which is greater than −d(v ) (Recall that −d is a lower estimate of lwubΓ b ). Detailed description of the algorithm follows. To improve the current strategy the algorithm checks whether for some (v. there is a pseudo-code of the strategy evaluation part of our algorithm. because if a vertex from D has outgoing edges only to V \ D. From these technical conditions it follows that −d−1 is also a lower estimate of Γ (D ) lwubb π and the purpose of the strategy evaluation procedure is to decrease the Γ (D ) vector d−1 so that the resulting vector d satisﬁes −d = lwubb π . the vector d is always decreased by the strategy evaluation and we get a better estimate of lwubΓ b . To see why from . or..Using Strategy Improvement to Stay Alive 591 Initially.

it holds that d−1 (vj ) ≥ d−1 (vj +1 ) + w(vj . The output of the strategy evaluation procedure is a vector d ∈ (Z ∪ {−∞})V Γ (D ) such that for each v ∈ D. u) such that w(v. The conditions (i. . The strategy evaluation works only with the restricted graph Gπ (D) and it is Γ (D ) based on the fact that if we have the set Bz = {v ∈ D | lwubb π (v ) = 0}. . In each subsequent iteration. Chaloupka Γ (D ) (i. More precisely. If no path to Bz exists or all such paths have suﬃxes of weight less than −b. and so Bz is not empty. From (ii. in order to win. −d−1 is a lower estimate of lwubb π . . u) − d−1 (v ) + d−1 (u) ≥ 0 emanating from them (line 2). All paths from D \ Bz to Bz must be negative (otherwise Bz would be larger). and so the minimal energy to get to Bz is the absolute value of the weight of a longest path to Bz such that the weight of each suﬃx of the path is greater or equal to −b. consider a path p = (v0 . and then iteratively removes vertices from the set until it arrives at the correct set Bz . then we Γ (D ) can compute lwubb π of the remaining vertices by computing the weights of the Γ (D ) longest paths to the set Bz . Brim & J. Max has to get to some vertex in Bz without exhausting all of his energy.) it follows that −d−1 ≤ lwubb π . The vector −di is always a lower estimate . Γ (D ) then lwubb π (v ) = ∞. All in all. it holds that −d(v ) = lwubb π (v ). d−1 is taken from the output of the previous iteration and the fact that it satisﬁes the conditions will be shown later.e. the procedure over-approximates the set Bz by the set B0 of vertices v with d−1 (v ) = 0 that have an edge (v. . Max does not need any initial energy to win a play starting from the appropriate vertex (Please note that Min has no choices in Γπ (D)). we get d−1 (v0 ) ≥ d−1 (vk ) + w(p). lwubb π (v ) is equal to the absolute value of the weight of a longest path from v to Bz in Gπ (D) such that the weight of each suﬃx of the path is greater or equal to −b. i.) and (ii. The energy level never drops below zero in the play. vk ) from D \ A to A in Gπ (D). To get some idea about why this holds consider a play winning for Max.) it follows that for each j ∈ {0. . Therefore. weights of its preﬁxes cannot even be bounded from below. because by (i. k − 1}. If we sum the inequalities. Furthermore. So the minimal suﬃcient energy to win is the minimal energy that Max needs to get to some vertex in Bz . and so there must be a moment from which onwards the energy level never drops below the energy level of that moment. Recall that D = {v ∈ V | d−1 (v ) > −∞}.. If each path from v to Bz has a suﬃx of weight less than −b or no path from v to Bz exists.592 L. . d−1 (vk ) = 0 and the inequality becomes d−1 (v0 ) ≥ w(p). vj +1 ). if the inﬁnite path does not contain a vertex from A.) and (ii. Since vk ∈ A.).) trivially hold in the ﬁrst iteration of the main algorithm. Initially. Therefore. . each inﬁnite path in Gπ (D) starting from v ∈ D and containing a vertex from A has a preﬁx of weight less or equal to d−1 (v0 ). Max cannot win. for d−1 = 0. all cycles in Γ (D ) Gπ (D \ A) are negative. for each vertex v ∈ D \ Bz . . For the vertices in D \ Bz . the set of vertices where Max does not need any initial energy to win.

and so −di increases. i. u) − di−1 (v ) + di−1 (u) ≥ 0. In particular. in case i = 0. the vertices v with di (v ) = 0 such that for each edge (v. u) − di (v ) + di (u) < 0 are removed from the set of candidates. Evaluation of a strategy.Using Strategy Improvement to Stay Alive 593 1 2 3 4 5 6 7 8 9 proc EvaluateStrategy(Γ. b. u). it always holds that −di ≤ lwubb π . di decreases. Dijkstra’s algorithm requires all edgeweights be non-positive (Please note that we are computing longest paths). for each v ∈ Bi . On line 5. until Γ (D ) −di = lwubb π . we remove from Bi−1 each vertex v that does not have an outgoing edge (v. Since Bi is an over-approximation of Bz . The reason is that since di (v ) = 0. which is always non-positive for the relevant edges. the procedure uses a variant of the Dijkstra’s algorithm to compute the weights of the longest paths from all vertices to Bi on line 4. di−1 ) i := i + 1 Bi := Bi−1 \ {v | max(v. di (v ) = 0. The weights of the longest paths are assigned to di . then more than zero units of initial energy are needed at v . y ) is w(x. of lwubb π . b. the absolute values of the weights of the longest paths Γ (D ) are a lower estimate of lwubb π . Since edge-weights are arbitrary integers. During the execution of the procedure. only vertices v with di (v ) = 0 are candidates for the ﬁnal set Bz . Therefore. Bi .u)∈Eπ (w(v. The Dijkstra’s algorithm is also modiﬁed so that it assigns −∞ to each v ∈ D such that each path from v to Bi has a suﬃx of weight less than −b. 1.. d−1 ) i := 0. u) − di−1 (v ) + di−1 (u)) < 0} od return di−1 end Fig. Therefore. In each iteration. π. and in the subsequent iterations it follows from the properties of longest path weights and the fact that only vertices with all outgoing edges negative with the potential transformation are removed from the candidate set. and so if the edge (v.). vertices from V \ D have di equal to −∞. As vertex potentials we use di−1 . which contains the longest path weights computed in the previous iteration. u) − d−1 (v ) + d−1 (u)) ≥ 0} while i = 0 ∨ Bi−1 = Bi do di := Dijkstra(Gπ . B0 := {v ∈ V | d−1 (v ) = 0 ∧ max(v. Another iteration Γ (D ) Γ (D ) . u) is chosen in the ﬁrst step. the vertices from which Bi is not reachable or is reachable only via paths with suﬃxes of weight less than −b have di equal to −∞. and on line 6. u) − di (u) > 0. In the ﬁrst iteration of the main algorithm it follows from the condition (ii. Also. A detailed description of Dijkstra() can be found in the Appendix. we apply potential transformation on them to make them non-positive. The transformed weight of an edge (x. the inequality can be developed to −w(v. or.u)∈Eπ (w(v. is given as input.e. However. the variable i is increased (thus the current longest path weights are now in di−1 ). it holds that w(v. y ) − di−1 (x) + di−1 (y ). u) such that w(v.

The following theorem states the correctness of EvaluateStrategy(). is started only if Bi = Bi−1 . di−1 ) improvement := f alse i := i + 1 πi := πi−1 foreach v ∈ VMin do if di−1 (v ) > −∞ then foreach (v. A formal proof can be found in the full version of this paper [4]. b.594 L. then Bi = Bi−1 and the algorithm terminates and returns di−1 as output. Let (Γ. since Bi ⊆ V loses at least one element in each iteration. b) i := 0. πi . b). The pseudo-code corresponds to the high-level description of the algorithm given at the beginning of this section. and ﬁnally let d−1 ∈ (Z ∪ {−∞}) be such that for A = {v ∈ V | d−1 (v ) = 0} and D = {v ∈ V | d−1 (v ) > −∞}. and initializing the lower estimate of lwubb to vector of zeros . improvement := true ﬁ od ﬁ od od return − di−1 end Fig. d−1 ) Γ (D ) it holds that for each v ∈ D. Brim & J. Chaloupka 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 proc LowerWeakUpperBound(Γ.) hold. Then for d := EvaluateStrategy(Γ. The output of the algorithm is the vector lwubΓ b .) and (ii. An intuition why it holds was given above. the conditions (i. It starts by taking an arbitrary strategy Γ from ΠΓ M on line 2. If no vertex is removed on line 6. The complexity of EvaluateStrategy() is O(|V | · (|V | · log |V | + |E |)). d(v ) = −lwubb π (v ). b) be an instance of the lower-weak-upper-bound problem. Let V further π ∈ ΠΓ M be a positional strategy of Min. In Figure 2. b. π. u) ∈ E do if di−1 (v ) > di−1 (u) + w(v. 2. The algorithm proceeds in iterations. u) then πi (v ) := u. Theorem 1. Each iteration takes O(|V | · log |V | + |E |) because of the complexity of the procedure Dijkstra() and the number of iterations of the while loop on lines 3–7 is at most |V |. The input to the algorithm is a lower-weak-upper-bound problem instance (Γ. there is a pseudo-code of our strategy improvement algorithm for solving the lower-weak-upper-bound problem using EvaluateStrategy(). Solving the lower-weak-upper-bound problem. π0 := Arbitrary strategy from ΠΓ M d−1 := 0 improvement := true while improvement do di := EvaluateStrategy(Γ.

we will use the symbols π . v2 . v1 . v3 . Initially. namely. there is an example of a run of our algorithm KASI on a simple MPG. d. and so EvaluateStrategy() takes {v1 . The whole algorithm KASI is illustrated on Example 2. B . Let π = π 2 be the ﬁrst selected strategy. There are three vertices in Gπ (D) with non-negative edges emanating from them. on line 3.s) shows a state of the computation right after update of the vector d by Dijkstra(). Example 2. Each ﬁgure denoted by (r. In Figure 3.Using Strategy Improvement to Stay Alive −10 v4 v2 0 −12 0 0 0 −10 v1 0 0 v3 20 v4 −10 0 −12 v2 0 −10 −10 v1 0 0 v3 20 −10 595 −10 v2 0 v3 −10 v1 20 v4 −∞ −12 B B (a) −10 v2 −10 −12 0 −10 −10 v1 0 0 v3 20 v4 −∞ v2 −12 −12 0 (0.0) B B (1. Also. and s is the value of the iteration counter i of EvaluateStrategy(). Circles are Max’s vertices and the square is a Min’s vertex. Edges that do not belong to the current strategy π of Min are dotted. b). Min has only two positional strategies. namely. where π 1 (v3 ) = v1 and π 2 (v 3) = v4 . Detailed description of the progress of the algorithm follows. Figure 3 illustrates the progress of the algorithm. v4 }. we mean the weight with the potential transformation by d.1) (2.0) Fig. d = 0. π 1 and π 2 . For simplicity. Let us denote the MPG by Γ. In each ﬁgure. 3. and the set D of vertices with ﬁnite d value is not even explicitly used. and D without indices. At that point.0) −10 v3 −∞ −10 v1 0 0 20 v4 −∞ (1. and D = {v1 . the d value of each vertex is shown by that vertex. π = π 2 . The following lemmas and theorem establish the correctness of the algorithm. v3 . v2 . Then it alternates strategy evaluation (line 6) and strategy improvement (lines 10–18) until no improvement is possible at which point the main while-loop on lines 5–19 terminates and the ﬁnal d vector is returned on line 20. if we speak about a weight of an edge. v3 } as the ﬁrst set . although in pseudo-codes these symbols have indices. v2 . Example of a Run of KASI (b = 15). r is the value of the iteration counter i of LowerWeakUpperBound(). The MPG is in Figure 3(a). it holds that for each v ∈ V . −di−1 (v ) = lwubΓ b (v ). let b = 15 and consider a lower-weakupper-bound problem given by (Γ.

y ) satisfying d(x) > d(y ) + w(x. i. 12. y ).1)). We have already used the ﬁrst one: If p is a path from v to u such that for each edge (x.e. thus it is also removed from the set B and the vector d is recomputed again (Figure 3 (1. Now the vertex v3 does not have a non-negative edge emanating from it. y ) in the path it holds that d(x) ≥ d(y ) + w(x. then 0 ≥ w(c). This completes the ﬁrst iteration of KASI and another one is started to evaluate and possibly improve the new strategy π . y ) emanating from a vertex from D \ B satisﬁes d(x) ≥ d(y ) + w(x. Lemma 3. The following lemma states that the d vector in LowerWeakUpperBound() is decreasing. and so lwubΓ b = −d = (0. only the vertices with each outgoing edge (x. The proof uses the following facts. y ). y ). then d(v ) > d(u) + w(p). conditions (i. The second fact is similar: If c is a cycle such that for each edge (x. are trivially satisﬁed in the ﬁrst iteration of LowerWeakUpperBound(). and if for some edge the inequality is strict.) and (ii. After the vector d is updated so that it contains longest path weights to B (Figure 3 (0. Only vertices with all the outgoing edges negative with the potential transformation are removed from the set B . then d(v ) ≥ d(u) + w(p). A formal proof of Lemma 3 can be found in the full version of this paper [4]. No further improvement of π is possible. so it is removed from the set B and the longest path weights are recomputed (Figure 3 (1. Using the facts from . The strategy improvement condition is satisﬁed for the edge (v3 .0)). because v3 cannot reach the set B . Please note that the only path from v4 to B has a suﬃx of weight less than −b. Every time line 6 of LowerWeakUpperBound() is reached.0). The strategy improvement condition is satisﬁed for the edge (v3 . Brim & J. The update to d causes that v2 does not have a non-negative edge. and so each edge (x.) and (ii. then the cycle is strictly negative. ∞). Every time line 7 of LowerWeakUpperBound() is reached and i > 0. as already mentioned. The vertex v3 has d(v3 ) = −∞. because the set D is now smaller. and if for some edge the inequality is strict. all vertices in B still have non-negative edges.) remain satisﬁed. conditions (i. However. The d values of vertices from D are the weights of the longest paths to B . and so d(v4 ) = −∞ and v4 is removed from the set D. This ﬁnishes the strategy evaluation and the strategy improvement follows. and di−1 satisfy the assumptions of Theorem 1. The assumptions of Theorem 1. v4 ). Using these facts we can now give an intuitive proof of Lemma 3. for the following reasons. b. and so the strategy evaluation terminates and the strategy improvement phase is started. which also results in the removal of v3 from D.596 L. and the subsequent lemma uses this fact to prove that LowerWeakUpperBound() terminates. Chaloupka B .. πi . it holds that di < di−1 . y ) in the cycle it holds that d(x) ≥ d(y ) + w(x. v1 ) and so π is improved so that π = π 1 . During the execution of EvaluateStrategy(). and so the strategy π 2 is selected as the current strategy π again.0)). The evaluation of the strategy π results in the d vector as depicted in Figure 3 (2. this is not the same situation as at the beginning. ∞.). Γ. y ).

she sends the play to v1 . this is not true. Min can play optimally using the sequence of positional strategies computed by our algorithm. . Example 2 witnesses this fact. and it is the best she can do. If the play starts or gets to a vertex with inﬁnite ﬁnal d . The optimal strategy of Max is constructed from the ﬁnal longest path forest computed by the procedure Dijkstra() and the non-negative (with potential transformation) edges emanating from the ﬁnal set B . since it is the weight of some path in G with no repeated vertices (Except for the case when di (v ) = −∞. In order to complete the intuition. This follows from the fact that the new edges introduced by the strategy improvement are negative with the potential transformation. to guarantee that Max loses from v3 . and Min’s strategy that we will deﬁne ensures that Max cannot do with smaller amount of initial energy. b). Unfortunately. . The procedure LowerWeakUpperBound() always terminates. In particular. as stated by Theorem 1. As a result. di (v ) is bounded from below by −(|V | − 1) · W . the strategy ensures that Max will eventually go negative or traverse a path segment of weight less than −b with arbitrarily large amount of initial energy. Max loses. it remains to show why the conditions still hold after the strategy improvement and why the strategy improvement results in the decrease of the d vector.Using Strategy Improvement to Stay Alive 597 the previous paragraph. Min uses the sequence in the following way: if the play starts from a vertex with ﬁnite ﬁnal d value and never leaves the set of vertices with ﬁnite ﬁnal d value. Min ﬁrst sends the play from v3 to v4 and when it returns back to v3 . Lemma 4. then Min uses the last strategy in the sequence. . none of the two positional strategies of Min guarantees that Max loses from v3 . Since di is a vector of integers. In Example 2. From the existence of such strategies it follows that for each v ∈ V . Proof: By Lemma 3. be the sequence of positional strategies computed by the algorithm. The optimal strategy of Min is more complicated. The key idea of the proof is to deﬁne strategies for both players with the following properties. This guarantees the termination. In general. states the correctness of our algorithm. Max’s strategy that we will deﬁne ensures that for each vertex v ∈ V . However. we can conclude that all newly formed cycles in Gπ (D \ B ) are negative and the weights of the longest paths to B cannot increase. Let ds := LowerWeakUpperBound(Γ. Indeed. Unlike Max. ds(v ) = lwubΓ b (v ). ds(v ) is a suﬃcient amount of initial energy no matter what his opponent does. a path of weight −20 is traversed and since b = 15. For each v ∈ V . but this is obviously not a problem). no inﬁnite chain of improvements is possible. There is a theorem in [2] which claims that Min can restrict herself to positional strategies. and both strategies are optimal with respect to the lower-weakupper-bound problem. let π0 . Theorem 5. di decreases in each iteration. Min sometimes needs memory. π1 . Our main theorem. for vertices with ds(v ) = ∞.

By Lemma 4. v1 . Min never switches to a strategy with higher index. Please note that. let σ (v ) = u such that (v. all references to lines in pseudocode are references to lines in the pseudocode of LowerWeakUpperBound() in Figure 2. Such a vertex u exists by Theorem 1. b). Let Dj = {v ∈ V | dj (v ) > −∞}. ds(v ) = lwubΓ b (v ). the procedure LowerWeakUpperBound(Γ. vj +1 ) ≥ 0)∧ n2 −1 (∀n1 .) ∧ n−1 (∀n ∈ N)(x + j =0 w(vj . v2 . then ds(v ) is a suﬃcient amount of initial energy for plays starting from v . Such a vertex u exists by Theorem 1. At that moment Min switches to the appropriate strategy. Brim & J. π ′ ) = (v = v0 . hence for each v ∈ VMin ∩ Di−1 .) ∧ n−1 (∀n ∈ N)(x + j =0 w(vj . u) ∈ E and di−1 (v ) = w(v. it holds that for each (v. i − 1}. σ. In particular. Let us ﬁrst ﬁnd the strategy σ . n2 ∈ N0 )(n1 < n2 ⇒ j =n1 w(vj . vj +1 ) ≥ −b) ) } . In this situation it holds that ds = −di−1 . choose σ (v ) to be an arbitrary successor of v . then Max needs at least ds(v ) units of initial energy. Theorem 5. vj +1 ) ≥ 0) ∧ n2 −1 (∀n1 . We will show that there is a positional strategy of Max σ ∈ ΣΓ M and there is a strategy of Min π ∈ ΠΓ such that the following holds for each v ∈ V : ds(v ) ≥ min{x ∈ N0 | (∀π ′ ∈ ΠΓ ) ( outcomeΓ (v. b) terminates and gives us the vector ds. Let ds := LowerWeakUpperBound(Γ. . u) ∈ E and w(v. n2 ∈ N0 )(n1 < n2 ⇒ j = n1 w(vj . Let further B = {v ∈ V | di−1 (v ) = 0} and let σ ∈ ΣΓ M be the following strategy of Max. . by Lemma 3. u) − di−1 (v ) + di−1 (u) ≥ 0. Since the main while-loop on lines 5–19 has terminated. . For v ∈ (VMax \ B ) ∩ Di−1 . This . v2 . we get that ds(v ) = lwubΓ b (v ). w(v.598 L. she uses the strategy that caused that the d value of that vertex became −∞. By putting (1) and (2) together. Proof: In the whole proof. π ) = (v = v0 . . vj +1 ) ≥ −b) ) } (1) ds(v ) ≤ min{x ∈ N0 | (∃σ ′ ∈ ΣΓ ) ( outcomeΓ (v. for each j ∈ {−1. there is no vertex in VMin ∩ Di−1 that satisﬁes the strategy improvement condition. . D−1 ⊇ D0 ⊇ · · · ⊇ Di−1 . . let σ (v ) = u such that (v. For v ∈ VMax ∩(V \Di−1 ). but only until the play gets to a vertex that was made inﬁnite by some strategy with lower index. . Chaloupka value. σ ′ . . . u) − di−1 (v ) + di−1 (u) ≥ 0. (2) The inequality (1) says that if Max uses the strategy σ . then for each v ∈ V . v1 . And ﬁnally. for v ∈ B ∩ VMax . 0. The inequality (2) says that if Min uses the strategy π . . Let us consider the situation just after termination of the main while-loop. u) ∈ E . u)+ di−1 (u).

vj +1 ). Therefore. . vk ) ∈ pathG such that vk ∈ VMin . π (p) = πj (vk ). . . If there is no vertex in p with inﬁnite di−1 value.) contains a vertex from V \ Di−2 . v1 . v2 . we get that di−1 (vn1 ) ≤ di−1 (vn2 ) + j = n1 w(vj . For each path p = (v0 . it holds that −b ≤ di−1 (vn1 ). The reason is that for each v ∈ V . ds = lwubb i−1 . Therefore. . vk ) of p. . . the strategy π makes the same choice as the ﬁrst positional strategy from the sequence (π0 . di−1 (vn2 ) ≤ 0. (∀j ∈ N0 )(di−1 (vj ) ≤ di−1 (vj +1 ) + w(vj . By Theorem 1. Γπ (Di−2 ) . . . . . and so n2 −1 di−1 (vn1 ) − di−1 (vn2 ) ≤ j =n1 w(vj . . then the inequality (2). it holds that di−1 (vk ) ≤ 0. It holds Γπ (Dk−1 ) that dk (v∞ ) = −∞. the play starts from v ∈ Di−2 . by Theorem 1. . Let us denote the outcome by p. That is.) satisﬁes the following. by Theorem 1. From the vertex v∞ onwards. and the inequality (1) is proved. . π ′ ) = (v = v0 . We have that if Min uses π and the play leaves Di−2 then Max loses. vj +1 ). and so we do not have to consider the case when the minimum is equal to ∞. i − 1} | (∃y ∈ {0. v2 . σ. . and so. Let us denote the outcome by p and let us prove that it is losing for Max. and so p is losing for Max too. min{x ∈ {0. the suﬃx of p starting from v∞ is losing for Max. . lwubb k (v∞ ) = ∞. . . there is no edge from Di−1 (vertices with ﬁnite di−1 value) to V \ Di−1 (vertices with inﬁnite di−1 value). k })(dx (vy ) = −∞)}). then d(v ) ≥ −(|V | − 1) · W . v1 . . Therefore. By the same reasoning as in the pren2 −1 vious paragraph. Let k = min{x ∈ {0. The algorithm has a pseudopolynomial time complexity: O(|V |2 · (|V | · log |V | + |E |)·W ). vj +1 )) . then π makes the same choice as the strategy πi−1 . It follows that k < i − 1. Let n1 . . From the deﬁnition of k . . we have di−1 (v ) ≤ w(pk ). it remains to show that p does not contain a segment of weight less than −b. if d(v ) > −∞. . This completes the proof of the theorem. . and there is vertex v∞ from Dk−1 \ Dk in the path p. π ) = (v = v0 . πi−1 ) that is responsible for making one of the vertices in p losing for Max. . outcomeΓ (v. . Please note that. .Using Strategy Improvement to Stay Alive 599 implies that in the graph Gσ . . There is some vertex from V \ Di−2 in p. if we show that each play that leaves Di−2 is losing for Max. will be proved. It follows that for each preﬁx pk = (v = v0 . it also follows that the play never leaves Dk−1 . . and stays in Di−2 (and so π follows the strategy πi−1 ). Let v ∈ V and σ ′ ∈ ΣΓ such that the outcome outcomeΓ (v. Therefore. . and since by Theorem 1. . To prove (1). Min uses the strategy πk . That is. σ ′ . di−1 (vn1 ) − di−1 (vn2 ) ≥ −b. n2 ∈ N0 be such that n1 < n2 . It takes O(|V |2 ·W ) iterations until the while-loop on lines 5–19 terminates. for each v ∈ Di−1 and π ′ ∈ ΠΓ . . then Max needs at least ds(v ) units of initial energy. it holds that di−1 (v ) ≤ di−1 (vk ) + w(pk ). if Min uses the strategy π . Let us now ﬁnd the strategy π . where j = min(i − 1. and hence the theorem. i − 1} | (∃y ∈ N0 )(dx (vy ) = −∞)}.

the fourth algorithm is our algorithm KASI. The algorithm can even be improved so that its complexity is O(|V | · (|V | · log |V | + |E |) · W ). The shortcoming of VI is that it takes enormous time before the vertices with inﬁnite lbΓ and lwubΓ b value are identiﬁed. plus dk (v ). The ﬁrst is value iteration [8. We will now brieﬂy describe the algorithms based on VI.u)∈E max(0. . if considered separately. and then computes d1 . Brim & J. greater or equal to zero in a k -step play. For the lower-weak-upper-bound problem. . Chaloupka because d(v ) is the weight of some path with no repeated vertices. which is slightly better than the complexity of KASI. Let (Γ. Finally. the same technique can be used to improve the complexity of the algorithm of Bj¨ orklund and Vorobyov so that the complexities of the two algorithms are the same. b) be an instance of the lower-weak-upper-bound problem. The computation continues until two consecutive d vectors are equal. u)) if v ∈ VMin ∧ x ≤ b   ∞ otherwise . d2 . Detailed description of the technique can be found in the full version of this paper [4]. The last d vector is then the desired vector lwubΓ b . di (u) − w(v. The second and the third are combinations of VI with other algorithm. Experimental Evaluation Our experimental study compares four algorithms for solving the lower-bound and the lower-weak-upper-bound problems. VI starts with d0 (v ) = 0. takes O(|V | · (|V | · log |V | + |E |)). according to the following rules:  if v ∈ VMax ∧ x ≤ b   x = min(v. Each iteration. dk (v ) is the minimum amount of Max’s initial energy that enables him to keep the sum of traversed edges. For the lower-bound problem. so one would say that the overall complexity should be O(|V |3 · (|V | · log |V | + |E |) · W ). the . the algorithm solves the lower-bound problem. hence the amortized complexity of one iteration is only O(|V | · log |V | + |E |).600 L. If b = (|V | − 1) · W . However. . which was improved in [8. 12] to O(|V | · |E | · W ). even between two distinct calls of the evaluation procedure. This is accomplished by an eﬃcient computation of the vertices which will update their d value in the next iteration so that computational time is not wasted on vertices whose d value is not going to change. Interestingly enough. for each v ∈ V . the number of elements of the set Bi in EvaluateStrategy() never increases. This is why we ﬁrst compute the vertices with ν < 0 by some fast MPG solving algorithm and then apply VI on the rest of the graph. and so the d vector can be improved at most O(|V |2 · W ) times. the vertices with ν < 0 are exactly the vertices with inﬁnite lbΓ value.u)∈E max(0. di (u) − w(v. The complexity of the straightforward implementation of the algorithm is O(|V |2 · |E | · W ). 12] (VI). It is easy to see that for each v ∈ V and k ∈ N0 . u)) di+1 (v ) = x = max(v. 4.

We even improved the complexity of the deterministic algorithm from O(|V |2 · |E | · W ) to O(|V | · (|V | · log |V | + |E |) · W ) using the same technique as for the improvement of the complexity of KASI which is described in the full version of this paper [4]. all results of BV in this paper are the results of the improved BV. solving the two problems by repeated application of BV and SW would lead to higher runtimes than the ones of KASI. According to our experiments. but still the preprocessing sometimes saves a lot of time in practice. we decided not to obey the restrictions and use only the “deterministic part” of the algorithm. because that paper focuses on parallel algorithms and the computation of the exact ν values. If the restrictions are not obeyed. BV and SW do not run faster than KASI. This is why we compared KASI only with the algorithm VI and the two combined algorithms: VI + BV and VI + SW. It is also not helpful for the lower-weak-upper-bound problem for small bound b. some restrictions had to be imposed. and the strategy improvement steps in SW are optimal in a certain sense. It might seem that this is in contradiction with the title of Schewe’s paper [19]. Since the results of the improved BV were signiﬁcantly better on all input instances included in our experimental study.Using Strategy Improvement to Stay Alive 601 vertices with ν < 0 might be a strict subset of the vertices with inﬁnite lwubΓ b value. The complexity of SW is O(|V |2 · (|V | · log |V | + |E |) · W ). then BV/SW has to be executed Θ(|V | · log(|V | · W )) times to solve the lower-bound problem. and the main restriction was that in each strategy improvement step the strategy could be improved only for one vertex. partly published in [7]. According to our experiments. the fastest algorithms in practice for dividing the vertices of an MPG into those with ν ≥ 0 and ν < 0 are the algorithm of Bj¨ orklund and Vorobyov [1] (BV) and the algorithm of Schewe [19] (SW). More precisely. and so the complexities of the combined algorithms are the complexities of BV and SW. BV runs faster. but to the strategy improvement technique. . To prove that the algorithm is sub-exponential. We note that any algorithm that divides the vertices of an MPG into those with ν ≥ 0 and those ν < 0 can be used to solve the lower-bound and the lower-weakupper-bound problem with the help of binary search. Therefore. The fact that they are the fastest does not directly follow from [7]. We used only the modiﬁed BV algorithm in our experimental study. but it requires the introduction of auxiliary edges and vertices into the input MPG and the repeated applications of the algorithm. If we use the reduction technique from [2]. The term “optimal” in the title of the paper does not refer to the complexity. The original algorithm BV is a sub-exponential randomized algorithm. SW is also a strategy improvement algorithm. The complexities of BV and SW exceed the complexity of VI. BV is also a strategy improvement algorithm. Therefore. and Θ(|V | · log b) times to solve the lower-weakupper-bound. It is obvious that on MPGs with all vertices with ν ≥ 0 the preprocessing does not help at all.

Numbers of vertices and edges. They all contain 218 vertices. The MPGs preﬁxed by “sqnc”. running GNU/Linux kernel version 2. with 218 vertices – no suﬃx.2. and “taxi” are the models of simple reactive systems created by ourselves. and so for b ≪ (|V | − 1) · W . In addition. in thousands. and with 220 vertices – suﬃx “h”. Its operation costs money and the taxi also earns money. there are several other interesting points. very simpliﬁed. all vertices in our inputs have inﬁnite lwubΓ b value. This was true for all values of b that we tried.00GHz processors and 16GB of RAM. we tried two diﬀerent values of parameters. the combined algorithms are often slower than VI alone. Therefore.2 with the “-O2” option. which seems to be a reasonable amount so that the results provide insight. For each model. the BV and SW parts of VI + BV and VI + SW always perform the same work. The ﬁrst table contains the results for the lower-bound problem. and for some inputs it was signiﬁcantly faster. To get MPGs of manageable size. the algorithms essentially solve the lower-bound problem. However.6. and “pnc” were generated by the TOR generator. Each MPG used in the experiments has four columns in each table. which contains a bound b as a part of the input. Finally. with 219 vertices – suﬃx “b”. “lnc”. we selected as b the average lbΓ value of the vertices with ﬁnite lbΓ value divided by 2. We tried various values of b. We note that smaller b makes the computation of VI and KASI faster. Tables 1 and 2 show that the algorithm KASI was the fastest on all inputs for the lower-bound problem. 4.Using Strategy Improvement to Stay Alive 603 The third model is called “taxi” and models a taxi which transports people at their request. All algorithms were implemented in C++ and compiled with GCC 4.26. excluding the time for reading input. and they become very easy to solve. are in brackets. Tables 1 and 2 give the results of our experiments. If the bound is too low. Both for the rand5 and rand10 family. The goal is to never run out of money. and for this paper. “supply”. The ﬁrst column of each table contains names of the input MPGs. and so the runtimes are practically the same as for the lower-bound problem. the second table contains the results for the lower-weak-upper-bound problem. . Results The experiments were carried out on a machine equipped with two dual-core Intel r Xeon r 2. the results clearly suggest that KASI is the best algorithm. The term “n/a” means more than 10 hours. of course. Each column headed by a name of an algorithm contains execution times of that algorithm in seconds. we experimented with three sizes of graphs. The MPGs preﬁxed by “rand” were generated by the SPRAND generator. If the bound is too high. and we also want to know the minimal suﬃcient initial amount of money. the MPGs preﬁxed by “collect”.3. but they are still much closer to real world problems than the synthetic MPGs. For the lower-weak-upper-bound problem it was never slower than the fastest algorithm by more than a factor of 2. the models are. namely.

30 105.63 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 996.29 1.16 36. and taxi1–2.99 3.36 27.79 57.72 449. for the lower-weak-upper-bound problem for the bound we selected.24 111.49 23.66 25.80 3.03 65.49 14.47 13.83 11.29 86.46 1027.31 10. However.08 15.60 14.39 25. the diﬀerence was never signiﬁcant.55 39.28 11.77 3.54 11.27 25.35 5.71 4. collect1–2.32 93.56 6956. Runtimes of the experiments for the lower-bound problem (in seconds).69 20.45 16.79 3.30 3.63 4. which takes more .96 43.00 VI + BV 31. and it was mostly caused by the initialization phase of the algorithms.54 29.68 1032.14 3.27 5.48 4.34 1.22 13.55 10.12 109.91 4.68 5.85 7.01 1.37 17. all input MPGs had vertices with ν < 0.34 10.05 10.604 L.01 36.40 20. The preprocessing by BV and SW reduces the execution time by orders of magnitude for these MPGs.18 9.23 28046.64 6.89 63.08 12.24 29.48 26.42 33.89 11. VI was even faster than KASI on a lot of inputs.70 1.89 24.08 338. On the other hand.03 VI + SW 55.30 16.55 367.30 21. VI is often very fast and the preprocessing slows the computation down in most cases.69 140.29 VI is practically unusable for solving the lower-bound problem for MPGs with some vertices with ν < 0.54 11.09 21.09 19.06 59.12 352.48 10.58 8.86 3.16 7. lower-bound MPG sqnc01 sqnc02 sqnc03 sqnc04 sqnc05 lnc01 lnc02 lnc03 lnc04 lnc05 pnc01 pnc02 pnc03 pnc04 pnc05 rand5 rand5b rand5h rand10 rand10b rand10h collect1 collect2 supply1 supply2 taxi1 taxi2 (262k 524k) (262k 524k) (262k 525k) (262k 532k) (262k 786k) (262k 524k) (262k 524k) (262k 525k) (262k 528k) (262k 786k) (262k 2097k) (262k 2097k) (262k 2098k) (262k 2101k) (262k 2359k) (262k 1310k) (524k 2621k) (1048k 5242k) (262k 2621k) (524k 5242k) (1048k 10485k) (636k 3309k) (636k 3309k) (363k 1014k) (727k 2030k) (509k 979k) (509k 979k) VI n/a n/a n/a n/a n/a 60.89 4.45 67.80 3. Brim & J.51 KASI 17. Except for lnc01–02. Chaloupka Table 1.64 1.43 110.

17 KASI 9.99 5.53 26.41 1.88 0.15 19.24 59.48 37.05 2. Runtimes of the experiments for the lower-weak-upper-bound problem (in seconds).53 7.38 531.70 3. which consumes a very small amount of energy per time unit. it cannot survive by idling forever.19 53.52 102. especially from the “collect” family.38 1.00 43.51 2.06 3.45 2.17 11.70 13. and so until the idling consumes at least as much energy as the minimal suﬃcient initial energy to keep the energy level non-negative forever.52 87.65 3.41 15.43 1.39 5.98 1.54 33.39 105.27 36.17 2.49 1.77 1.04 2.04 3.98 2.55 10. new iterations have to be started.66 25.31 1.40 181.54 17.77 189.97 140.24 1.78 208.87 33.70 0.38 time for the more complex algorithm KASI.57 1.95 71. We .53 1.70 VI + BV 19.85 3. VI makes a lot of iterations for the inputs from the collect family.75 1. However.64 11.57 544.16 39.34 11.65 10.07 2.38 17.97 418.73 3.89 4.17 27.69 20.84 0.49 1. for some inputs.55 10.07 3.28 2. VI is very slow.45 110.97 1.49 VI + SW 43. Moreover.89 1.03 4.72 5.65 1.20 17. The i-th iteration of VI computes the minimal suﬃcient initial energy to keep the energy level non-negative for i time units. lower-weak-upper-bound MPG sqnc01 sqnc02 sqnc03 sqnc04 sqnc05 lnc01 lnc02 lnc03 lnc04 lnc05 pnc01 pnc02 pnc03 pnc04 pnc05 rand5 rand5b rand5h rand10 rand10b rand10h collect1 collect2 supply1 supply2 taxi1 taxi2 (262k 524k) (262k 524k) (262k 525k) (262k 532k) (262k 786k) (262k 524k) (262k 524k) (262k 525k) (262k 528k) (262k 786k) (262k 2097k) (262k 2097k) (262k 2098k) (262k 2101k) (262k 2359k) (262k 1310k) (524k 2621k) (1048k 5242k) (262k 2621k) (524k 5242k) (1048k 10485k) (636k 3309k) (636k 3309k) (363k 1014k) (727k 2030k) (509k 979k) (509k 979k) VI 13.98 17.31 25.03 16.35 7.08 15.49 14.72 30.85 24.20 8.34 27.23 29.41 10.88 2. because the robot can survive for a quite long time by idling.Using Strategy Improvement to Stay Alive 605 Table 2.68 7.87 4.53 1.64 24.48 1.17 8.65 1.82 563.52 4.55 23.06 1.52 3.68 0.17 1.07 11.37 5.

Formal Modeling and Analysis of Timed Systems. Chaloupka and L. R. Formal Methods in System Design. and J. 5. the lower-weakupper-bound problem. Brim. which we also improved by combining it with the algorithm of Bj¨ orklund and Vorobyov [1] (BV) and the algorithm of Schewe (SW). Srba. Discrete Applied Math. 85:277–311. and Formal Veriﬁcation. Brno. Brim and J. Other inputs for which VI took a lot of time are: sqnc01. In Proceedings of the 5th Doctoral Wokrshop on Mathematical and Engineering Methods in Computer Science (MEMICS 2009). Raskin. Negative-cycle detection algorithms. pages 40–54. Doyen. Automata. Parallel algorithms for mean-payoﬀ games: An experimental evaluation. [6] A. [5] L. 2010. Brim. [3] L. Springer. Shortest paths algorithms: Theory and experimental evaluation. is based on the strategy improvement technique which is very eﬃcient in practice. 2007.606 L. Conclusion We proposed a novel algorithm for solving the lower-bound and the lower-weakupper-bound problems for MPGs. Brim & J. Finally. [9] B. Cherkassky. Chaloupka. [4] L. Our algorithm. Logics. Chaloupka. Technical Report FIMU-RS-2010-03. G. L. Using strategy improvement to stay alive. 2009. Brim and J. Chaloupka believe that this is a typical situation for this kind of application. Two additional results of this paper are the improvement of the complexity of BV. In Proc. Mathematical Programming. and M. . NOVPRESS. 12].-F. Mathematical Programming. J. Games. 2003. Fahrenberg. Faster algorithms for mean-payoﬀ games. Springer.r. Inﬁnite runs in weighted timed automata with energy constraints. we comment on the scalability of the algorithms. Bj¨ orklund and S.t. V. N. V. pages 117–133. 1996. [2] P. [10] B. [7] J. In Proc. supply1–2. T. To demonstrate that the algorithm is able to solve the two problems for large MPGs. pages 33–47. A combinatorial strongly subexponential strategy improvement algorithm for mean payoﬀ games. 155(2): 210–229. Chakrabarti. Faster algorithm for mean-payoﬀ games. and the characterization of Min’s optimal strategies w. 1999. Resource interfaces. V. Gentilini. 2010. and so they are able to scale up to very large MPGs. K. European Symposium on Algorithms. Stoelinga. pages 45–53. EPTCS. Goldberg. and T. pages 599–610. [8] J. 2009. Henzinger. U. Chaloupka. Larsen. 38(2):97–118. References [1] H. In Proc. the runtimes of the algorithms increase no faster than the term |V | · |E |. V. Vorobyov. de Alfaro. KASI is the clear winner of the experimental study. 2011. lnc01–02. 2008. called Keep Alive Strategy Improvement (KASI). Masaryk University. we carried out an experimental study. Using strategy improvement to stay alive. Radzik. L. Chaloupka. In the study we compared KASI with the value iteration algorithm (VI) from [8. Cherkassky and A. Czech Republic. Faculty of Informatics. Markey.. In Proc. A. Goldberg. volume 2855 of LNCS. Bouyer. volume 5757 of LNCS. As the experiments on the SPRAND generated inputs suggest. 73:129–174. Embedded Software. Springer. and J.

8(2):109–113. 2009. The graph is denoted by Gπ to emphasize which graph the main algorithm KASI computes longest paths in. Mycielski. Computer Science Logic. Positional strategies for mean payoﬀ games. The set S contains vertices from V with ﬁnite dp value.-F. Werneck. R. 12(5):359–370. Gurvich. Khachivan. and R. Raskin. [17] A. [18] Y. A. pages 369–384. Goldberg. and L.Using Strategy Improvement to Stay Alive 607 [11] J. The second part of the input is a bound b. [20] U. 2006. 158(1–2):343–359. Faster pseudopolynomial algorithms for mean-payoﬀ games. The vector dp contains longest path weights computed in diﬀerent setting.com/andrew/ soft. [19] S. USSR Comput. Pavlov. 2009. Georgiadis. In Proc. F. Doyen. Theoretical Computer Science. An experimental study of minimum mean cycle algorithms. and ﬁnally the fourth part is a vector of integers dp used for potential transformation of edge-weights. Each vertex from X is also put to the maximum priority queue q and its presence in the queue is recorded in the vector queued. An optimal strategy improvement algorithm for solving parity and payoﬀ games. Math. which are returned as output. The complexity of mean payoﬀ games on graphs. The input to Dijkstra() always guarantees that X ⊆ S . the tentative distance key (v ) of each vertex v ∈ S is initialized to −∞. http://www. and J. Tarjan. Karp. [12] L. International Journal of Game Theory.avglab. 2008. Lifshits and D.. 1979. Gaubert. A. Dijkstra() computes the diﬀerence between dp and the longest path weights in the current setting and then adds the diﬀerence to dp .html. thus obtaining the current longest path weights. Workshop on Algorithm Engineering and Experiments. to which the algorithm computes longest paths. [15] Andrew Goldberg’s network optimization library. Paterson. Bruxelles. E. S. 28(5):85–91. April 2011. Ehrenfeucht and J. volume 5213 of LNCS. [14] L. 1996. A. SIAM. Zapiski Nauchnyh Seminarov POMI. there is the pseudocode of the modiﬁed Dijkstra’s algorithm used in the strategy evaluation procedure EvaluateStrategy(). In Proc. 1988. 1966. and so the initialization phase on lines 4–8 sets key (v ) of each vertex v ∈ X to the (ﬁnal) value 0. Appendix Modiﬁed Dijkstra’s algorithm In Figure 4. the third part is a set X . Fast exponential deterministic algorithm for mean payoﬀ games. pages 1–13. Cochet-Terrasson and S. On line 3. Zwick and M. [13] A.120. Karzanov. 2006. Cyclic games and an algorithm to ﬁnd minimax cycle means in directed graphs. The input to Dijkstra() consists of four parts. Belgium. Comptes Rendus Mathematique. 343:377–382. Universit´ e Libre de Bruxelles (ULB). R. The ﬁrst part is a directed graph Gπ . Schewe. Springer. Dijkstra() works only with the vertices in the set S computed on line 2. [16] V. J. Hoﬀman and R. and Math. V. Gentilini. A policy iteration algorithm for zero-sum stochastic games with mean payoﬀ. . On nonterminating stochastic games. G. The priority of a vertex v in the maximum priority queue q is key (v ). The algorithm computes the weights of the longest paths to the set X . Phys. M. V. 340:61–75. Technical Report 2009. Management Science.

4.608 L. and so do (v ) is set to −∞ too.enqueue(v ) queued(v ) := true od [q is a maximum-priority queue of vertices. X. the longest path weight do (v ) for each v ∈ S is computed as dp (v ) + key (v ). for each vertex v ∈ V \ S (dp (v ) = −∞). The vector do is then returned as output on line 24.dequeue() foreach (v.enqueue(v ). . where the priority of v is key (v )] while ¬q. do (v ) is simply set to −∞. w). u) ∈ Eπ do if v ∈ S ∧ v ∈ / X then tmp := key (u) + w(v. queued(v ) := true ﬁ ﬁ ﬁ od od foreach v ∈ S do do (v ) := dp (v ) + key (v ) od foreach v ∈ V \ S do do (v ) = −∞ od return do end Fig. Brim & J. the updates of key (v ) that do not lead to path weight greater or equal to −b are ignored. For each v ∈ S . The main loop on lines 10–23 diﬀers from the standard Dijkstra’s algorithm only in the following. Eπ . These are the vertices for which there is no path to X in Gπ (S ) or each such path has a suﬃx of weight less than −b. Modiﬁed Dijkstra’s algorithm.empty () do u := q. On line 23. Chaloupka 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 proc Dijkstra(Gπ = (V. dp ) S = {v ∈ V | dp (v ) > −∞} foreach v ∈ S do key (v ) := −∞ od foreach v ∈ X do key (v ) := 0 q. b. On line 22. u) − dp (v ) + dp (u) if dp (v ) + tmp ≥ −b ∧ tmp > key (v ) then key (v ) := tmp if ¬queued(v ) then q. Please note that key (v ) might be equal to −∞.

Copyright of International Journal of Foundations of Computer Science is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. . users may print. download. However. or email articles for individual use.