International Journal of Foundations of Computer Science Vol. 23, No. 3 (2012) 585–608 c World Scientific Publishing Company DOI: 10.

1142/S0129054112400291

USING STRATEGY IMPROVEMENT TO STAY ALIVE∗

ˇ BRIM† and JAKUB CHALOUPKA‡ LUBOS Faculty of Informatics, Masaryk University Botanick´ a 68a, 60200 Brno, Czech Republic † brim@fi.muni.cz ‡ xchalou1@fi.muni.cz

Received 25 October 2010 Accepted 22 June 2011 Communicated by Margherita Napoli We design a novel algorithm for solving Mean-Payoff Games (MPGs). Besides solving an MPG in the usual sense, our algorithm computes more information about the game, information that is important with respect to applications. The weights of the edges of an MPG can be thought of as a gained/consumed energy – depending on the sign. For each vertex, our algorithm computes the minimum amount of initial energy that is sufficient for player Max to ensure that in a play starting from the vertex, the energy level never goes below zero. Our algorithm is not the first algorithm that computes the minimum sufficient initial energies, but according to our experimental study it is the fastest algorithm that computes them. The reason is that it utilizes the strategy improvement technique which is very efficient in practice. Keywords : Mean-payoff games; strategy improvement; experimental evaluation.

1. Introduction A Mean-Payoff Game (MPG) [13, 16, 20] is a two-player infinite game played on a finite weighted directed graph, the vertices of which are divided between the two players. A play starts by placing a token on some vertex and the players, named Max and Min, move the token along the edges of the graph ad infinitum. If the token is on Max’s vertex, he chooses an outgoing edge and the token goes to the destination vertex of that edge. If the token is on Min’s vertex, it is her turn to choose an outgoing edge. Roughly speaking, Max wants to maximize the average weight of the traversed edges whereas Min wants to minimize it. It was proved in [13] that each vertex v has a value, denoted by ν (v ), which each player can secure by a positional strategy, i.e., strategy that always chooses the same outgoing edge in the same vertex. To solve an MPG is to find the values of all vertices, and, optionally, also strategies that secure the values.
∗ This

work has been partially supported by the Grant Agency of the Czech Republic grants No. 201/09/1389, 102/09/H042. 585

586

L. Brim & J. Chaloupka

In this paper we deal with MPGs with other than the standard average-weight goal. Player Max now wants the sum of the weights of the traversed edges, plus some initial value (initial “energy”), to be non-negative at each moment of the play. He also wants to know the minimal sufficient amount of initial energy that enables him to stay non-negative. For different starting vertices, the minimal sufficient initial energy may be different and for starting vertices with ν < 0, it is impossible to stay non-negative with arbitrarily large amount of initial energy. The problem of computation of the minimal sufficient initial energies has been studied under different names by Chakrabarti et al. [6], Lifshits and Pavlov [18], and Bouyer et al. [2]. In [6] it was called the problem of pure energy interfaces, in [18] it was called the problem of potential computation, and in [2] it was called the lowerbound problem. The paper [2] also contains the definition of a similar problem – the lower-weak-upper-bound problem. An instance of this problem contains, besides an MPG, also a bound b. The goal is the same, Max wants to know how much initial energy he needs to stay non-negative forever, but now the energy level is bounded from above by b and during the play, all increases above this bound are immediately truncated. Various resource scheduling problems for which the standard solution of an MPG is not useful can be formulated as the lower-bound or the lower-weak-upper-bound problems, which extends the applicability of MPGs. For example, an MPG can be used to model a robot in a hostile environment. The weights of edges represent changes in the remaining battery capacity of the robot – positive edges represent recharging, negative edges represent energy consuming actions. The bound b is the maximum capacity of the battery. Player Max chooses the actions of the robot and player Min chooses the actions of the hostile environment. By solving the lowerweak-upper-bound problem, we find out if there is some strategy of the robot that allows him to survive in the hostile environment, i.e., its remaining battery capacity never goes below zero, and if there is such a strategy, we also get the minimum initial remaining battery capacity that allows him to survive. The first algorithm solving the lower-bound problem was proposed by Chakrabarti et al. [6] and it is based on value iteration. The algorithm can also be easily modified to solve the lower-weak-upper-bound problem. The value iteration algorithm was later improved by Chaloupka and Brim in [8], and independently by Doyen, Gentilini, and Raskin [12]. An extended version of [8, 12] was recently published [5]. Henceforward we will use the term “value iteration” (VI) to denote only the improved version from [8, 12]. The algorithms of Bouyer et al. [2] that solve the two problems are essentially the same as the original algorithm from [6]. However, [2] focuses mainly on other problems than the lower-bound and the lower-weakupper-bound problems for MPGs. A different approach to solving the lower-bound problem was proposed by Lifshits and Pavlov [18], but their algorithm has exponential space complexity, and so it is not appropriate for practical use. VI seems to be the best known approach to solving the two problems.

Therefore. and it is also not helpful for solving the lower-weak-upper-bound problem for small bound b. This gives us two algorithms: VI + BV and VI + SW. suitable for practical solving of the lower-bound and the lower-weak-upper-bound problems for large MPGs. but strategy improvement algorithms for solving MPGs in general exist as well [1. As a by-product of the design of KASI. the same as the complexity of VI + BV. Moreover. VMin ). w) is a finite weighted directed graph such that V is a disjoint union of the sets VMax and VMin . first part. VI + SW. last paragraph). However. E. The use of the strategy improvement technique for solving MPGs goes back to the algorithm of Hoffman and Karp from 1966 [17]. 19. The shortcoming of VI is that it takes enormous time on MPGs with at least one vertex with ν < 0. w : E → Z is the weight function. Section 4. we also study the algorithm VI without the preprocessing. It is slightly worse than the complexity of VI. We call our algorithm “Keep Alive Strategy Improvement” (KASI). we design a novel algorithm based on the strategy improvement technique. 11].Using Strategy Improvement to Stay Alive 587 In this paper. However. Natural way to alleviate this problem is to find the vertices with ν < 0 by some fast algorithm and run VI on the rest. VI + BV. Our algorithm is the first solution of this kind. where W is the maximal absolute edge-weight. and each . 20] is given by a triple Γ = (G. we selected two algorithms for computation of the set of vertices with ν < 0. Their algorithm can be used to solve only a restricted class of MPGs. the preprocessing is not helpful on MPGs with all vertices with ν ≥ 0. which can be viewed as an extended version of [3]. all of them solve neither the lower-bound nor the lower-weak-upper-bound problem (cf. KASI is the best algorithm. VMax . It solves both the lower-bound and the lower-weak-upper-bound problem. and the algorithm of Schewe [19] (SW). as each algorithm that solves the lower-bound problem also divides the vertices of an MPG into those with ν ≥ 0 and those with ν < 0. Based on our previous experience with algorithms for solving MPGs [7]. According to the study. the algorithm of Bj¨ orklund and Vorobyov [1] (BV). and better than the complexity of VI + SW. To evaluate and compare the algorithms VI. where G = (V. Our new algorithm based on the strategy improvement technique that we propose in this paper has the complexity O(|V | · (|V | · log |V | + |E |) · W ). 2. Preliminaries A Mean-Payoff Game (MPG) [13. and KASI. Moreover. we describe a way to construct an optimal strategy for Min with respect to the lower-weak-upper-bound problem. Another contribution of this paper is a further improvement of VI. we implemented them and carried out an experimental study. which can be used to compute the exact ν values of all vertices. Namely. KASI can be thought of as an algorithm that also solves MPGs in the usual sense. we improved the complexity of BV and proved that Min may not have positional strategy that is also optimal with respect to the lower-weak-upper-bound problem. 16.

vk ) with vk ∈ VMax .) that agrees with both σ and π . named Max and Min. . .) is formed. because arbitrarily large amount of initial energy is not sufficient. Max moves it. VMax . we think of a positional strategy of Max as a function σ : VMax → V such that (v. v ) ∈ E | u ∈ VMin ∨ σ (u) = v }. then we say that Max loses from v . v2 . The game is played by two opposing players. σ. The strategy σ ∈ ΣΓ is called a positional strategy if σ (p) = σ (p′ ) for all finite ′ ′ ′ paths p = (v0 . as defined below [13]. and wσ = w ↾ Eσ . Min moves it. If lbΓ (v ) ∈ N0 . . . A play starts by placing a token on some given vertex and the players then move the token along the edges of G ad infinitum. . If the token is on vertex v ∈ VMax . π ) = (v = v0 . . and for π ∈ ΠΓ . wσ ). v1 . . where Eσ = {(u. VMin ). . For σ ∈ ΣΓ M . and so the definition of a strategy is correct. which means that ν (v ) < 0. . it holds that (vk . Γ = ( G . It was proved that it is equal to the minimal loss that Min can ensure. we get Gσ from G by deleting all the edges emanating from Max’s vertices that do not follow σ . Gπ for a strategy π of Min is defined analogously. VMax . We say that an infinite path p = (v0 . and Min’s aim is to minimize her loss: n−1 1 w ( v . . Chaloupka v ∈ V has out-degree at least one. That is. The set of all strategies of Min in Γ is denoted by ΠΓ . we also define Γσ = (Gσ . for each v ∈ VMax .588 L. E. . Brim & J. If the token is on vertex v ∈ VMin . . . . The set of all strategies of Max in Γ is denoted by ΣΓ . v1 . such that: lbΓ (v ) = min{x ∈ N0 | (∃σ ∈ ΣΓ )(∀π ∈ ΠΓ ) ( outcomeΓ (v. . . v2 . π ) = (v = v0 . . . the restriction of G to σ . For the sake of simplicity. . A (general) strategy of Max is a function σ : V ∗ · VMax → V such that for each finite path p = (v0 . w). vk ′ ) such that vk = vk′ ∈ VMax . vk ) and p′ = (v0 . We define Gσ . vi ) = vi+1 . Given an initial vertex v ∈ V . v1 . vi+1 ) ≥ 0) ) } where the minimum of an empty set is ∞. π π Max Min M The lower-bound problem for an MPG Γ = (G = (V. If lbΓ (v ) = ∞. A strategy π of Min is defined analogously. The set of all positional strategies of Max in Γ is denoted by ΣΓ M .) agrees with the strategy σ ∈ ΣΓ if for each vi ∈ VMax . σ (v0 . as the graph (V. the outcome of two strategies σ ∈ ΣΓ and π ∈ ΠΓ is the (unique) infinite path outcomeΓ (v. Max’s aim is to maximize n−1 1 his gain: lim inf n→∞ n i=0 w(vi . both players can ensure ν (v ) by using positional strategies. The set of all positional strategies of Min in Γ is denoted by ΠΓ M . if the play starts from v . vi+1 ).) ∧ n−1 (∀n ∈ N)(x + i=0 w(vi . denoted lim supn→∞ n i i=0 by ν (v ). lbΓ (v ) is the minimal sufficient amount of initial energy that enables Max to keep the energy level non-negative forever. . . V . . v1 . Moreover. V ). σ. v2 . . . This way an infinite path p = (v0 . vi+1 ). That is. σ (v )) ∈ E . . A positional strategy π of Min is defined analogously. . we define its value. Eσ . . Recall that each vertex has out-degree at least one. v2 . σ (p)) ∈ E . . VMin ) is the problem of finding lbΓ (v ) ∈ N0 ∪ {∞} for each v ∈ V . For each vertex v ∈ V . as the maximal gain that Max can ensure if the play starts at vertex v . then Max wins from v . .

then for each v ∈ V . we will focus only on the lower-weak-upper-bound problem. It was proved in [2] that both for the lower-bound problem and the lower-weakupper-bound problem Max can restrict himself only to positional strategies. That is. Therefore. If lwubΓ b (v ) = ∞. VMin ) and a bound b ∈ N0 is the problem of finding lwubΓ b (v ) ∈ N0 ∪ {∞} for each v ∈ V . . vk ) be a path in G. w) be a weighted directed graph. The set of all (finite) paths in G is denoted by pathG . E. and finally the set of all infinite paths in G is denoted by pathG ∞. r −1 w(c) = i=0 w(ui . then Max wins from v . VMax . w). However. l(c) = r. . and for each v ∈ V such that lbΓ (v ) = ∞. if the play starts from v . If lwubΓ b (v ) ∈ N0 . This will be shown later. we could use the Γ set ΣΓ M instead of the set Σ in the definitions of both the lower-bound problem and the lower-weak-upper-bound problem. he always has a positional strategy that is also optimal. it holds that lbΓ (v ) ≤ (|V | − 1) · W . . i. . . the number k −1 of edges in c. σ. let p = (v0 . are defined in the following way: w(p) = i=0 w(vi . Therefore. . l(p) = k . The additional condition is equivalent to the condition that the play does not contain a segment of weight less than −b. if it ensures that for each v ∈ V such that lbΓ (v ) = ∞. and l(c). . lbΓ (v ) is a sufficient amount of initial energy. the weight of c. In the rest of the paper. under the additional condition that the energy level is truncated to b whenever it exceeds the bound. . if we choose b = (|V | − 1) · W . n2 ∈ N0 )(n1 < n2 ⇒ i= n1 w(vi . then we say that Max loses from v . l(p). where W is the maximal absolute edge-weight in G [2]. π ) = (v = v0 . lbΓ (v ) = lwubΓ b (v ). because it includes the lower-bound problem as a special case. vi+1 ) ≥ −b) ) } where minimum of an empty set is ∞. Then w(p). and let c = (u0 . vi+1 ) ≥ 0) ∧ n2 −1 (∀n1 . because arbitrarily large amount of initial energy is not sufficient. the weight of p. Optimal strategies for Max and Min with respect to the lower-weak-upper-bound problem are defined in the same way as for the lower-bound problem. the number of edges in p. vi+1 ). The strategy π ∈ ΠΓ is an optimal strategy of Min with respect to the lowerbound problem. The reason is that for each v ∈ V such that lbΓ (v ) < ∞. w(c).e. . . The lower-weak-upper-bound problem for an MPG Γ = (G = (V.) ∧ n−1 (∀n ∈ N)(x + i=0 w(vi . v1 .. Let G = (V. ur = u0 ) be a cycle in G. ur−1 . Max loses. if it ensures that for each v ∈ V such that lbΓ (v ) = ∞. E. Max needs at least lbΓ (v ) units of initial energy. . ui+1 ). v2 . the set of all cycles in G is denoted by cycleG . Min cannot restrict herself to positional strategies.Using Strategy Improvement to Stay Alive 589 The strategy σ ∈ ΣΓ is an optimal strategy of Max with respect to the lowerbound problem. . such that: Γ Γ lwubΓ b (v ) = min{x ∈ N0 | (∃σ ∈ Σ )(∀π ∈ Π ) Γ ( outcome (v. lwubΓ b (v ) is the minimal sufficient amount of initial energy that enables Max to keep the energy level non-negative forever.

. . Finally.e. ui+1 . . d0 (v ) ≤ d1 (v ). . and for some v ∈ V . Furthermore. If uj ∈ c. VMin . . . . . where G′ (D) is the graph G(D) with negative self-loops added to all vertices with zero outdegree. u0 ) ∈ cycleG . d0 (v ) < d1 (v ). and so −d is increased. Formally. −d ≤ lwubb . The reason why KASI maintains the vector d rather than −d is that d contains weights of certain paths and we find it more natural to keep them as they are. Brim & J.e. KASI maintains a vector d ∈ (Z ∪ {−∞})V Γ such that −d ≥ 0 is always a lower estimate of lwubΓ b . Let c = (u0 . We denote the set of all paths from A to B by pathG (A. VMin ∩ D. than to keep their opposite values. . w) be a graph and let B. We denote the set of all paths from v to B by pathG (v. W = maxe∈E |w(e)|. . a segment of p is a path (vi . VMin ) be an MPG and let W be the maximal absolute edge-weight in G. The definitions of segments and prefixes naturally extend to infinite paths.590 L. For example. . That is. . then d0 < d1 means that for each v ∈ V . . A segment of c is a path (ui . . not the numbers of edges. A ⊆ V . .) ∈ pathG ∞ is a path (vi . k } and i ≤ j . vj ). B ). v1 . Since some vertices might have zero out-degree in G(D). j ∈ {0. VMax . The set of all segments of c is denoted by segment(c). . . we make the vertices with zero out-degree in G(D) losing for Max in Γ(D) with respect to the the lower-weak-upper-bound problem. . where i ∈ N0 . vk ). Then G(D) is the subgraph of G induced by the set D.). B ). . formally: p = (v = v0 . E. . . until −d = lwubb . The vector Γ d is gradually decreased. where i. uj ). w). The set of all suffixes of p is denoted by suffix(p). . . vi ). because there is a path in c between any two vertices in c. . . . . . . . where i. A suffix of a path p = (v0 . Please note that we do not require i ≤ j . . E. . . k }. . k }. where i ∈ {0. E. b) be an instance of the lower-weak-upper-bound problem. w). . 3.. j ∈ {0. . . . . The set of all prefixes of p is denoted by prefix(p). then uj +1 is the vertex following uj in c. G(D) = (D. . vk ) ∈ pathG is a path (vi . vi+1 . Chaloupka A suffix of a path p = (v0 . We also define the restriction of Γ induced by D. . . .. i. Let G = (V. vk ). VMax ∩ D). For the whole paper let Γ = (G = (V. . ur−1 . . The Algorithm A high-level description of our Keep Alive Strategy Improvement algorithm (KASI) for the lower-weak-upper-bound problem is as follows. where i ∈ {0. j +1 is taken modulo r. Let Γ = (G = (V. vi+2 . r − 1}. . where v0 . . . VMax ) be an MPG and let D ⊆ V . Operations on vectors of the same dimension are element-wise. . . if d0 and d1 are two vectors of dimension |V |. If we say that “p is a path from v to B ” we mean a path with the last vertex and only the last vertex in B . A prefix of p is a path (v0 . The set of all segments of p is denoted by segment(p). w ↾ D × D). E ∩ D × D. we define Γ(D) = (G′ (D). . The term “longest” in connection with paths always refers to the weights of the paths. Let (Γ. i. . . In particular. a path from A to B is a path from v to B such that v ∈ A. . . v2 . vk−1 ∈ V \ B ∧ vk ∈ B . The algorithm also maintains a set D of vertices such that about the vertices in V \ D it already knows that they have infinite lwubΓ b value.

That is. Then. computed for the previous strategy. The input to the procedure consists of four parts. it holds that d−1 (v ) ≥ d−1 (u) + w(v. The strategy evaluation examines the graph Gπ (D) and updates the vector d so that for each v ∈ D. u) and continuing from u costs at least −w(v. it solves the lower-weak-upper-bound problem in the restricted game Γπ (D). for each edge emanating from v in Gπ . u) ∈ E such that v ∈ VMin and d(v ) > −∞ it holds that d(v ) > d(u) + w(v. The first and the second part form the lower-weak-upper-bound problem instance that the main algorithm KASI is solving. then it is losing for Max in Γ.e. where Min has no choices. u) ∈ Eπ . The vector d−1 is such that −d−1 is a lower estimate of lwubΓ b . u). any of them is acceptable. The vertices with the d value equal to −∞ are removed from the set D. the current strategy is first evaluated and then improved. π (v ) is switched to u. i. The third part is the strategy π ∈ ΠΓ M that we want to evaluate and the fourth part of the input is a vector d−1 ∈ (Z ∪ {−∞})V . the MPG Γ and the bound b ∈ N0 . in case of the first iteration of KASI. If v has more than one such edge emanating from it. In each iteration. it holds that d−1 (v ) < 0 and for each edge (v. u) − d(u) units of energy. KASI starts with an arbitrary strategy π ∈ ΠΓ M and then iteratively improves it until no improvement is possible. the following conditions are prerequisites for the procedure EvaluateStrategy(): (i) Each cycle in Gπ (D \ A) is negative (ii) For each v ∈ D \ A. the strategy π is improved in the following way. the algorithm terminates. u). Let A = {v ∈ V | d−1 (v ) = 0} and D = {v ∈ V | d−1 (v ) > −∞}. set by initialization to a vector of zeros. In Figure 1. u). because traversing the edge w(v. Since the strategy π is either the first strategy or an improvement of the previous strategy. This explains why the restricted game was defined the way it was. another iteration of KASI is started. d = 0 and D = V . This is called a strategy improvement condition. because it holds that each vertex v ∈ V is such that −d(v ) = lwubΓ b (v ). If there are edges satisfying the condition. If no such edge exists. it holds −d(v ) = lwubb π Γ (D ) (v ) . u) ∈ E such that d(v ) > d(u) + w(v. For each vertex v ∈ VMin such that there is an edge (v. Such an edge indicates that −d(v ) is not a sufficient initial energy at v . which is greater than −d(v ) (Recall that −d is a lower estimate of lwubΓ b ). Detailed description of the algorithm follows. To improve the current strategy the algorithm checks whether for some (v. there is a pseudo-code of the strategy evaluation part of our algorithm. because if a vertex from D has outgoing edges only to V \ D. From these technical conditions it follows that −d−1 is also a lower estimate of Γ (D ) lwubb π and the purpose of the strategy evaluation procedure is to decrease the Γ (D ) vector d−1 so that the resulting vector d satisfies −d = lwubb π . the vector d is always decreased by the strategy evaluation and we get a better estimate of lwubΓ b . To see why from . or..Using Strategy Improvement to Stay Alive 591 Initially.

it holds that d−1 (vj ) ≥ d−1 (vj +1 ) + w(vj . The output of the strategy evaluation procedure is a vector d ∈ (Z ∪ {−∞})V Γ (D ) such that for each v ∈ D. u) such that w(v. The conditions (i. . The strategy evaluation works only with the restricted graph Gπ (D) and it is Γ (D ) based on the fact that if we have the set Bz = {v ∈ D | lwubb π (v ) = 0}. . In each subsequent iteration. Chaloupka Γ (D ) (i. More precisely. If no path to Bz exists or all such paths have suffixes of weight less than −b. and so Bz is not empty. From (ii. in order to win. −d−1 is a lower estimate of lwubb π . . u) − d−1 (v ) + d−1 (u) ≥ 0 emanating from them (line 2). All paths from D \ Bz to Bz must be negative (otherwise Bz would be larger). and so the minimal energy to get to Bz is the absolute value of the weight of a longest path to Bz such that the weight of each suffix of the path is greater or equal to −b. consider a path p = (v0 . and then iteratively removes vertices from the set until it arrives at the correct set Bz . then we Γ (D ) can compute lwubb π of the remaining vertices by computing the weights of the Γ (D ) longest paths to the set Bz . Brim & J. Max has to get to some vertex in Bz without exhausting all of his energy.) it follows that −d−1 ≤ lwubb π . The vector −di is always a lower estimate . Γ (D ) then lwubb π (v ) = ∞. All in all. it holds that −d(v ) = lwubb π (v ). d−1 is taken from the output of the previous iteration and the fact that it satisfies the conditions will be shown later.e. the procedure over-approximates the set Bz by the set B0 of vertices v with d−1 (v ) = 0 that have an edge (v. . Max does not need any initial energy to win a play starting from the appropriate vertex (Please note that Min has no choices in Γπ (D)). we get d−1 (v0 ) ≥ d−1 (vk ) + w(p). lwubb π (v ) is equal to the absolute value of the weight of a longest path from v to Bz in Gπ (D) such that the weight of each suffix of the path is greater or equal to −b. i.) and (ii. The energy level never drops below zero in the play. vk ) from D \ A to A in Gπ (D). To get some idea about why this holds consider a play winning for Max.) it follows that for each j ∈ {0. . Therefore. weights of its prefixes cannot even be bounded from below. because by (i. k − 1}. If we sum the inequalities. Furthermore. So the minimal sufficient energy to win is the minimal energy that Max needs to get to some vertex in Bz . and so there must be a moment from which onwards the energy level never drops below the energy level of that moment. Recall that D = {v ∈ V | d−1 (v ) > −∞}.. If each path from v to Bz has a suffix of weight less than −b or no path from v to Bz exists.592 L. . d−1 (vk ) = 0 and the inequality becomes d−1 (v0 ) ≥ w(p). vj +1 ). if the infinite path does not contain a vertex from A.) and (ii. Since vk ∈ A.).) trivially hold in the first iteration of the main algorithm. Initially. Therefore. . each infinite path in Gπ (D) starting from v ∈ D and containing a vertex from A has a prefix of weight less or equal to d−1 (v0 ). Max cannot win. for d−1 = 0. all cycles in Γ (D ) Gπ (D \ A) are negative. for each vertex v ∈ D \ Bz . . For the vertices in D \ Bz . the set of vertices where Max does not need any initial energy to win.

and so −di increases. i. u) − di−1 (v ) + di−1 (u) ≥ 0. In particular. in case i = 0. the vertices v with di (v ) = 0 such that for each edge (v. u) − di (v ) + di (u) < 0 are removed from the set of candidates. Evaluation of a strategy.Using Strategy Improvement to Stay Alive 593 1 2 3 4 5 6 7 8 9 proc EvaluateStrategy(Γ. b. u). it always holds that −di ≤ lwubb π . di decreases. Dijkstra’s algorithm requires all edgeweights be non-positive (Please note that we are computing longest paths). for each v ∈ Bi . On line 5. until Γ (D ) −di = lwubb π . we remove from Bi−1 each vertex v that does not have an outgoing edge (v. Since Bi is an over-approximation of Bz . The reason is that since di (v ) = 0. which is always non-positive for the relevant edges. the procedure uses a variant of the Dijkstra’s algorithm to compute the weights of the longest paths from all vertices to Bi on line 4. di−1 ) i := i + 1 Bi := Bi−1 \ {v | max(v. di (v ) = 0. The weights of the longest paths are assigned to di . then more than zero units of initial energy are needed at v . y ) is w(x. of lwubb π . b. the absolute values of the weights of the longest paths Γ (D ) are a lower estimate of lwubb π . Since edge-weights are arbitrary integers. During the execution of the procedure. only vertices v with di (v ) = 0 are candidates for the final set Bz . Therefore. Bi .u)∈Eπ (w(v. The Dijkstra’s algorithm is also modified so that it assigns −∞ to each v ∈ D such that each path from v to Bi has a suffix of weight less than −b. 1.. d−1 ) i := 0. u) − di−1 (v ) + di−1 (u)) < 0} od return di−1 end Fig. Therefore. In each iteration. π. and in the subsequent iterations it follows from the properties of longest path weights and the fact that only vertices with all outgoing edges negative with the potential transformation are removed from the candidate set. and so if the edge (v.). vertices from V \ D have di equal to −∞. As vertex potentials we use di−1 . which contains the longest path weights computed in the previous iteration. u) − d−1 (v ) + d−1 (u)) ≥ 0} while i = 0 ∨ Bi−1 = Bi do di := Dijkstra(Gπ . B0 := {v ∈ V | d−1 (v ) = 0 ∧ max(v. Another iteration Γ (D ) Γ (D ) . u) is chosen in the first step. the vertices from which Bi is not reachable or is reachable only via paths with suffixes of weight less than −b have di equal to −∞. and on line 6. u) − di (u) > 0. In the first iteration of the main algorithm it follows from the condition (ii. Also. A detailed description of Dijkstra() can be found in the Appendix. we apply potential transformation on them to make them non-positive. The transformed weight of an edge (x. the inequality can be developed to −w(v. or.u)∈Eπ (w(v. is given as input.e. However. the variable i is increased (thus the current longest path weights are now in di−1 ). it holds that w(v. y ) − di−1 (x) + di−1 (y ). u) such that w(v.

The following theorem states the correctness of EvaluateStrategy(). is started only if Bi = Bi−1 . di−1 ) improvement := f alse i := i + 1 πi := πi−1 foreach v ∈ VMin do if di−1 (v ) > −∞ then foreach (v. A formal proof can be found in the full version of this paper [4]. b.594 L. then Bi = Bi−1 and the algorithm terminates and returns di−1 as output. Let (Γ. since Bi ⊆ V loses at least one element in each iteration. b) i := 0. πi . b). The pseudo-code corresponds to the high-level description of the algorithm given at the beginning of this section. and finally let d−1 ∈ (Z ∪ {−∞}) be such that for A = {v ∈ V | d−1 (v ) = 0} and D = {v ∈ V | d−1 (v ) > −∞}. and initializing the lower estimate of lwubb to vector of zeros . improvement := true fi od fi od od return − di−1 end Fig. d−1 ) Γ (D ) it holds that for each v ∈ D. Brim & J. Chaloupka 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 proc LowerWeakUpperBound(Γ.) hold. Then for d := EvaluateStrategy(Γ. The output of the algorithm is the vector lwubΓ b .) and (ii. An intuition why it holds was given above. the conditions (i. It starts by taking an arbitrary strategy Γ from ΠΓ M on line 2. If no vertex is removed on line 6. The complexity of EvaluateStrategy() is O(|V | · (|V | · log |V | + |E |)). d(v ) = −lwubb π (v ). b) be an instance of the lower-weak-upper-bound problem. Let V further π ∈ ΠΓ M be a positional strategy of Min. In Figure 2. b. π. u) ∈ E do if di−1 (v ) > di−1 (u) + w(v. 2. The algorithm proceeds in iterations. u) then πi (v ) := u. Theorem 1. Each iteration takes O(|V | · log |V | + |E |) because of the complexity of the procedure Dijkstra() and the number of iterations of the while loop on lines 3–7 is at most |V |. The input to the algorithm is a lower-weak-upper-bound problem instance (Γ. there is a pseudo-code of our strategy improvement algorithm for solving the lower-weak-upper-bound problem using EvaluateStrategy(). Solving the lower-weak-upper-bound problem. π0 := Arbitrary strategy from ΠΓ M d−1 := 0 improvement := true while improvement do di := EvaluateStrategy(Γ.

we will use the symbols π . v2 . v1 . v3 . Initially. namely. there is an example of a run of our algorithm KASI on a simple MPG. d. and so EvaluateStrategy() takes {v1 . The whole algorithm KASI is illustrated on Example 2. B . Let π = π 2 be the first selected strategy. There are three vertices in Gπ (D) with non-negative edges emanating from them. on line 3.s) shows a state of the computation right after update of the vector d by Dijkstra(). Example 2. Each figure denoted by (r. In Figure 3.Using Strategy Improvement to Stay Alive −10 v4 v2 0 −12 0 0 0 −10 v1 0 0 v3 20 v4 −10 0 −12 v2 0 −10 −10 v1 0 0 v3 20 −10 595 −10 v2 0 v3 −10 v1 20 v4 −∞ −12 B B (a) −10 v2 −10 −12 0 −10 −10 v1 0 0 v3 20 v4 −∞ v2 −12 −12 0 (0.0) B B (1. Also. and s is the value of the iteration counter i of EvaluateStrategy(). Circles are Max’s vertices and the square is a Min’s vertex. Edges that do not belong to the current strategy π of Min are dotted. b). Min has only two positional strategies. namely. where π 1 (v3 ) = v1 and π 2 (v 3) = v4 . Detailed description of the progress of the algorithm follows. Figure 3 illustrates the progress of the algorithm. v4 }. we mean the weight with the potential transformation by d.1) (2.0) Fig. d = 0. π 1 and π 2 . For simplicity. Let us denote the MPG by Γ. In each figure. 3. and the set D of vertices with finite d value is not even explicitly used. and D without indices. At that point.0) −10 v3 −∞ −10 v1 0 0 20 v4 −∞ (1. and D = {v1 . the d value of each vertex is shown by that vertex. π = π 2 . The following lemmas and theorem establish the correctness of the algorithm. v3 . v2 . Then it alternates strategy evaluation (line 6) and strategy improvement (lines 10–18) until no improvement is possible at which point the main while-loop on lines 5–19 terminates and the final d vector is returned on line 20. if we speak about a weight of an edge. v3 } as the first set . although in pseudo-codes these symbols have indices. v2 . Example of a Run of KASI (b = 15). r is the value of the iteration counter i of LowerWeakUpperBound(). The MPG is in Figure 3(a). it holds that for each v ∈ V . −di−1 (v ) = lwubΓ b (v ). let b = 15 and consider a lower-weakupper-bound problem given by (Γ.

y ) satisfying d(x) > d(y ) + w(x. i. 12. y ).1)). We have already used the first one: If p is a path from v to u such that for each edge (x.e. thus it is also removed from the set B and the vector d is recomputed again (Figure 3 (1. Now the vertex v3 does not have a non-negative edge emanating from it. y ) in the path it holds that d(x) ≥ d(y ) + w(x. then 0 ≥ w(c). This completes the first iteration of KASI and another one is started to evaluate and possibly improve the new strategy π . y ) emanating from a vertex from D \ B satisfies d(x) ≥ d(y ) + w(x. Lemma 3. The following lemma states that the d vector in LowerWeakUpperBound() is decreasing. and so lwubΓ b = −d = (0. only the vertices with each outgoing edge (x. The proof uses the following facts. y ). y ). then d(v ) > d(u) + w(p). conditions (i. The second fact is similar: If c is a cycle such that for each edge (x. are trivially satisfied in the first iteration of LowerWeakUpperBound(). and if for some edge the inequality is strict.) and (ii. After the vector d is updated so that it contains longest path weights to B (Figure 3 (0. Only vertices with all the outgoing edges negative with the potential transformation are removed from the set B . then d(v ) ≥ d(u) + w(p). A formal proof of Lemma 3 can be found in the full version of this paper [4]. No further improvement of π is possible. so it is removed from the set B and the longest path weights are recomputed (Figure 3 (1. Using the facts from . The strategy improvement condition is satisfied for the edge (v3 .0)). because v3 cannot reach the set B . Please note that the only path from v4 to B has a suffix of weight less than −b. Every time line 6 of LowerWeakUpperBound() is reached.0). The strategy improvement condition is satisfied for the edge (v3 . Brim & J. The update to d causes that v2 does not have a non-negative edge. and so each edge (x.) and (ii. then the cycle is strictly negative. ∞). Every time line 7 of LowerWeakUpperBound() is reached and i > 0. as already mentioned. The vertex v3 has d(v3 ) = −∞. because the set D is now smaller. and if for some edge the inequality is strict. all vertices in B still have non-negative edges.) remain satisfied. conditions (i. However. The d values of vertices from D are the weights of the longest paths to B . and so d(v4 ) = −∞ and v4 is removed from the set D. This finishes the strategy evaluation and the strategy improvement follows. and di−1 satisfy the assumptions of Theorem 1. The assumptions of Theorem 1. v4 ). Using these facts we can now give an intuitive proof of Lemma 3. for the following reasons. b. and so the strategy evaluation terminates and the strategy improvement phase is started. which also results in the removal of v3 from D.596 L. and the subsequent lemma uses this fact to prove that LowerWeakUpperBound() terminates. Chaloupka B .. πi . it holds that di < di−1 . y ) in the cycle it holds that d(x) ≥ d(y ) + w(x. v1 ) and so π is improved so that π = π 1 . During the execution of EvaluateStrategy(). and so the strategy π 2 is selected as the current strategy π again.0)). The evaluation of the strategy π results in the d vector as depicted in Figure 3 (2. this is not the same situation as at the beginning. ∞.). Γ. y ).

she sends the play to v1 . this is not true. Min can play optimally using the sequence of positional strategies computed by our algorithm. . Example 2 witnesses this fact. and it is the best she can do. If the play starts or gets to a vertex with infinite final d . The optimal strategy of Max is constructed from the final longest path forest computed by the procedure Dijkstra() and the non-negative (with potential transformation) edges emanating from the final set B . since it is the weight of some path in G with no repeated vertices (Except for the case when di (v ) = −∞. In order to complete the intuition. This follows from the fact that the new edges introduced by the strategy improvement are negative with the potential transformation. to guarantee that Max loses from v3 . and Min’s strategy that we will define ensures that Max cannot do with smaller amount of initial energy. b). Unfortunately. . The procedure LowerWeakUpperBound() always terminates. In particular. as stated by Theorem 1. As a result. di (v ) is bounded from below by −(|V | − 1) · W . the strategy ensures that Max will eventually go negative or traverse a path segment of weight less than −b with arbitrarily large amount of initial energy. Max loses. it remains to show why the conditions still hold after the strategy improvement and why the strategy improvement results in the decrease of the d vector.Using Strategy Improvement to Stay Alive 597 the previous paragraph. Min uses the sequence in the following way: if the play starts from a vertex with finite final d value and never leaves the set of vertices with finite final d value. Min first sends the play from v3 to v4 and when it returns back to v3 . Lemma 4. then Min uses the last strategy in the sequence. . none of the two positional strategies of Min guarantees that Max loses from v3 . Since di is a vector of integers. In Example 2. From the existence of such strategies it follows that for each v ∈ V . Proof: By Lemma 3. be the sequence of positional strategies computed by the algorithm. The optimal strategy of Min is more complicated. The key idea of the proof is to define strategies for both players with the following properties. This guarantees the termination. In general. states the correctness of our algorithm. Max’s strategy that we will define ensures that for each vertex v ∈ V . However. we can conclude that all newly formed cycles in Gπ (D \ B ) are negative and the weights of the longest paths to B cannot increase. Let ds := LowerWeakUpperBound(Γ. Indeed. Unlike Max. ds(v ) = lwubΓ b (v ). ds(v ) is a sufficient amount of initial energy no matter what his opponent does. a path of weight −20 is traversed and since b = 15. For each v ∈ V . but this is obviously not a problem). no infinite chain of improvements is possible. There is a theorem in [2] which claims that Min can restrict herself to positional strategies. and both strategies are optimal with respect to the lower-weakupper-bound problem. let π0 . Theorem 5. di decreases in each iteration. Min sometimes needs memory. π1 . Our main theorem. for vertices with ds(v ) = ∞.

By Lemma 4. v1 . Min never switches to a strategy with higher index. Please note that. let σ (v ) = u such that (v. all references to lines in pseudocode are references to lines in the pseudocode of LowerWeakUpperBound() in Figure 2. Such a vertex u exists by Theorem 1. b). Let Dj = {v ∈ V | dj (v ) > −∞}. ds(v ) = lwubΓ b (v ). the procedure LowerWeakUpperBound(Γ. vj +1 ) ≥ 0)∧ n2 −1 (∀n1 .) ∧ n−1 (∀n ∈ N)(x + j =0 w(vj . v2 . then ds(v ) is a sufficient amount of initial energy for plays starting from v . Such a vertex u exists by Theorem 1. At that moment Min switches to the appropriate strategy. Brim & J. π ′ ) = (v = v0 . hence for each v ∈ VMin ∩ Di−1 .) ∧ n−1 (∀n ∈ N)(x + j =0 w(vj . u) ∈ E and di−1 (v ) = w(v. it holds that for each (v. i − 1}. σ. In particular. Let us first find the strategy σ . n2 ∈ N0 )(n1 < n2 ⇒ j =n1 w(vj . vj +1 ) ≥ −b) ) } . In this situation it holds that ds = −di−1 . choose σ (v ) to be an arbitrary successor of v . then Max needs at least ds(v ) units of initial energy. Theorem 5. vj +1 ) ≥ 0) ∧ n2 −1 (∀n1 . We will show that there is a positional strategy of Max σ ∈ ΣΓ M and there is a strategy of Min π ∈ ΠΓ such that the following holds for each v ∈ V : ds(v ) ≥ min{x ∈ N0 | (∀π ′ ∈ ΠΓ ) ( outcomeΓ (v. b) terminates and gives us the vector ds. Let ds := LowerWeakUpperBound(Γ. . u) ∈ E and w(v. n2 ∈ N0 )(n1 < n2 ⇒ j = n1 w(vj . Let further B = {v ∈ V | di−1 (v ) = 0} and let σ ∈ ΣΓ M be the following strategy of Max. . by Lemma 3. u) − di−1 (v ) + di−1 (u) ≥ 0. Since the main while-loop on lines 5–19 has terminated. . For v ∈ (VMax \ B ) ∩ Di−1 . This . v2 . we get that ds(v ) = lwubΓ b (v ). w(v.598 L. she uses the strategy that caused that the d value of that vertex became −∞. By putting (1) and (2) together. Proof: In the whole proof. π ) = (v = v0 . . vj +1 ) ≥ −b) ) } (1) ds(v ) ≤ min{x ∈ N0 | (∃σ ′ ∈ ΣΓ ) ( outcomeΓ (v. for each j ∈ {−1. there is no vertex in VMin ∩ Di−1 that satisfies the strategy improvement condition. . D−1 ⊇ D0 ⊇ · · · ⊇ Di−1 . . let σ (v ) = u such that (v. For v ∈ VMax ∩(V \Di−1 ). but only until the play gets to a vertex that was made infinite by some strategy with lower index. . Chaloupka value. σ ′ . . . u) − di−1 (v ) + di−1 (u) ≥ 0. (2) The inequality (1) says that if Max uses the strategy σ . then for each v ∈ V . v1 . And finally. for v ∈ B ∩ VMax . 0. The inequality (2) says that if Min uses the strategy π . . Let us consider the situation just after termination of the main while-loop. u) ∈ E . u)+ di−1 (u).

vj +1 ). Therefore. . vk ) ∈ pathG such that vk ∈ VMin . π (p) = πj (vk ). . . If there is no vertex in p with infinite di−1 value.) contains a vertex from V \ Di−2 . v1 . v2 . we get that di−1 (vn1 ) ≤ di−1 (vn2 ) + j = n1 w(vj . For each path p = (v0 . it holds that −b ≤ di−1 (vn1 ). The reason is that for each v ∈ V . ds = lwubb i−1 . Therefore. . vk ) of p. . . the strategy π makes the same choice as the first positional strategy from the sequence (π0 . di−1 (vn2 ) ≤ 0. (∀j ∈ N0 )(di−1 (vj ) ≤ di−1 (vj +1 ) + w(vj . By Theorem 1. Γπ (Di−2 ) . . . . . and so n2 −1 di−1 (vn1 ) − di−1 (vn2 ) ≤ j =n1 w(vj . . then the inequality (2). it holds that di−1 (vk ) ≤ 0. It holds Γπ (Dk−1 ) that dk (v∞ ) = −∞. the play starts from v ∈ Di−2 . by Theorem 1. . Let us denote the outcome by p. That is.) satisfies the following. by Theorem 1. From the vertex v∞ onwards. and the inequality (1) is proved. . π ′ ) = (v = v0 . We have that if Min uses π and the play leaves Di−2 then Max loses. vj +1 ). and so we do not have to consider the case when the minimum is equal to ∞. i − 1} | (∃y ∈ {0. v2 . σ. . and so. Let us denote the outcome by p and let us prove that it is losing for Max. and so p is losing for Max too. min{x ∈ {0. the suffix of p starting from v∞ is losing for Max. . lwubb k (v∞ ) = ∞. . . there is no edge from Di−1 (vertices with finite di−1 value) to V \ Di−1 (vertices with infinite di−1 value). k })(dx (vy ) = −∞)}). then d(v ) ≥ −(|V | − 1) · W . v1 . . Therefore. By the same reasoning as in the pren2 −1 vious paragraph. Let k = min{x ∈ {0. The algorithm has a pseudopolynomial time complexity: O(|V |2 · (|V | · log |V | + |E |)·W ). vj +1 )) . then π makes the same choice as the strategy πi−1 . It follows that k < i − 1. Let n1 . . From the definition of k . . we have di−1 (v ) ≤ w(pk ). it remains to show that p does not contain a segment of weight less than −b. if d(v ) > −∞. . This completes the proof of the theorem. . and there is vertex v∞ from Dk−1 \ Dk in the path p. π ) = (v = v0 . πi−1 ) that is responsible for making one of the vertices in p losing for Max. . outcomeΓ (v. . Please note that. .Using Strategy Improvement to Stay Alive 599 implies that in the graph Gσ . . There is some vertex from V \ Di−2 in p. if we show that each play that leaves Di−2 is losing for Max. will be proved. It follows that for each prefix pk = (v = v0 . it also follows that the play never leaves Dk−1 . . and stays in Di−2 (and so π follows the strategy πi−1 ). Let v ∈ V and σ ′ ∈ ΣΓ such that the outcome outcomeΓ (v. Therefore. . and since by Theorem 1. . To prove (1). Min uses the strategy πk . That is. σ ′ . di−1 (vn1 ) − di−1 (vn2 ) ≥ −b. n2 ∈ N0 be such that n1 < n2 . It takes O(|V |2 ·W ) iterations until the while-loop on lines 5–19 terminates. for each v ∈ Di−1 and π ′ ∈ ΠΓ . . then Max needs at least ds(v ) units of initial energy. it holds that di−1 (v ) ≤ di−1 (vk ) + w(pk ). if Min uses the strategy π . Let us now find the strategy π . where j = min(i − 1. and hence the theorem. i − 1} | (∃y ∈ N0 )(dx (vy ) = −∞)}.

the fourth algorithm is our algorithm KASI. The algorithm can even be improved so that its complexity is O(|V | · (|V | · log |V | + |E |) · W ). The shortcoming of VI is that it takes enormous time before the vertices with infinite lbΓ and lwubΓ b value are identified. plus dk (v ). The first is value iteration [8. We will now briefly describe the algorithms based on VI.u)∈E max(0. . if considered separately. and then computes d1 . Brim & J. greater or equal to zero in a k -step play. For the lower-weak-upper-bound problem. . Chaloupka because d(v ) is the weight of some path with no repeated vertices. which is slightly better than the complexity of KASI. Let (Γ. Finally. the same technique can be used to improve the complexity of the algorithm of Bj¨ orklund and Vorobyov so that the complexities of the two algorithms are the same. b) be an instance of the lower-weak-upper-bound problem. The computation continues until two consecutive d vectors are equal. u)) if v ∈ VMin ∧ x ≤ b   ∞ otherwise . d2 . Detailed description of the technique can be found in the full version of this paper [4]. The last d vector is then the desired vector lwubΓ b . di (u) − w(v. The second and the third are combinations of VI with other algorithm. Experimental Evaluation Our experimental study compares four algorithms for solving the lower-bound and the lower-weak-upper-bound problems. VI starts with d0 (v ) = 0. takes O(|V | · (|V | · log |V | + |E |)). according to the following rules:  if v ∈ VMax ∧ x ≤ b   x = min(v. Each iteration. dk (v ) is the minimum amount of Max’s initial energy that enables him to keep the sum of traversed edges. For the lower-bound problem. so one would say that the overall complexity should be O(|V |3 · (|V | · log |V | + |E |) · W ). the . the algorithm solves the lower-bound problem. hence the amortized complexity of one iteration is only O(|V | · log |V | + |E |).600 L. If b = (|V | − 1) · W . However. . which was improved in [8. 12] to O(|V | · |E | · W ). even between two distinct calls of the evaluation procedure. This is accomplished by an efficient computation of the vertices which will update their d value in the next iteration so that computational time is not wasted on vertices whose d value is not going to change. Interestingly enough. for each v ∈ V . the number of elements of the set Bi in EvaluateStrategy() never increases. This is why we first compute the vertices with ν < 0 by some fast MPG solving algorithm and then apply VI on the rest of the graph. and so the d vector can be improved at most O(|V |2 · W ) times. the vertices with ν < 0 are exactly the vertices with infinite lbΓ value.u)∈E max(0. di (u) − w(v. The complexity of the straightforward implementation of the algorithm is O(|V |2 · |E | · W ). 12] (VI). It is easy to see that for each v ∈ V and k ∈ N0 . u)) di+1 (v ) = x = max(v. 4.

We even improved the complexity of the deterministic algorithm from O(|V |2 · |E | · W ) to O(|V | · (|V | · log |V | + |E |) · W ) using the same technique as for the improvement of the complexity of KASI which is described in the full version of this paper [4]. all results of BV in this paper are the results of the improved BV. solving the two problems by repeated application of BV and SW would lead to higher runtimes than the ones of KASI. According to our experiments. but still the preprocessing sometimes saves a lot of time in practice. we decided not to obey the restrictions and use only the “deterministic part” of the algorithm. because that paper focuses on parallel algorithms and the computation of the exact ν values. If the restrictions are not obeyed. BV and SW do not run faster than KASI. This is why we compared KASI only with the algorithm VI and the two combined algorithms: VI + BV and VI + SW. It is also not helpful for the lower-weak-upper-bound problem for small bound b. some restrictions had to be imposed. and the strategy improvement steps in SW are optimal in a certain sense. It might seem that this is in contradiction with the title of Schewe’s paper [19]. Since the results of the improved BV were significantly better on all input instances included in our experimental study.Using Strategy Improvement to Stay Alive 601 vertices with ν < 0 might be a strict subset of the vertices with infinite lwubΓ b value. The complexity of SW is O(|V |2 · (|V | · log |V | + |E |) · W ). then BV/SW has to be executed Θ(|V | · log(|V | · W )) times to solve the lower-bound problem. and the main restriction was that in each strategy improvement step the strategy could be improved only for one vertex. partly published in [7]. According to our experiments. the fastest algorithms in practice for dividing the vertices of an MPG into those with ν ≥ 0 and ν < 0 are the algorithm of Bj¨ orklund and Vorobyov [1] (BV) and the algorithm of Schewe [19] (SW). More precisely. and so the complexities of the combined algorithms are the complexities of BV and SW. BV runs faster. but to the strategy improvement technique. . To prove that the algorithm is sub-exponential. We note that any algorithm that divides the vertices of an MPG into those with ν ≥ 0 and those ν < 0 can be used to solve the lower-bound and the lower-weakupper-bound problem with the help of binary search. Therefore. The fact that they are the fastest does not directly follow from [7]. We used only the modified BV algorithm in our experimental study. but it requires the introduction of auxiliary edges and vertices into the input MPG and the repeated applications of the algorithm. If we use the reduction technique from [2]. The term “optimal” in the title of the paper does not refer to the complexity. The original algorithm BV is a sub-exponential randomized algorithm. SW is also a strategy improvement algorithm. The complexities of BV and SW exceed the complexity of VI. BV is also a strategy improvement algorithm. Therefore. and Θ(|V | · log b) times to solve the lower-weakupper-bound. It is obvious that on MPGs with all vertices with ν ≥ 0 the preprocessing does not help at all.

Input MPGs We experimented with completely random MPGs as well as more structured synthetic MPGs and MPGs modeling simple reactive systems. Brim & J. or the lower-weak-upper-bound problem for the corresponding MPG. The generator TOR was used for generation of the families “sqnc”. we adjusted each MPG generated by TOR so that the ν value of each vertex is close to 0. namely SPRAND [10] and TOR [9].1.602 L. The sqnc and lnc families are 2-dimensional grids with wrap-around. . As for the MPGs modeling simple reactive systems. Like for the SPRAND generated inputs. we created three parameterized models. we also find out the minimal sufficient amount of initial energy. Each of these MPGs contains |E | = x · |V | edges and consist of a random Hamiltonian cycle and |E | − |V | additional random edges. depending on whether there is some upper bound on the robot’s energy. If it was. The first is called “collect” and models a robot on a ground with obstacles which has to collect items occurring at different locations according to certain rules. We also want to know the minimal sufficient initial amount of the material. The synthetic MPGs were generated by two generators. To make these inputs harder for the algorithms. The generator SPRAND was used to generate the “randx” MPG family. However. while the pnc family contains layered networks embedded on a torus. We divided them uniformly at random. We note that the energy is not a part of the states of the model. We also created subfamilies of each of the three families by adding cycles to the graphs. “lnc”. the problem would be much simpler. The MPGs modeling simple reactive systems we created ourselves. For each “good” initial configuration. and “pnc”. and so we had to divide vertices between Max and Min ourselves. The second model is called “supply” and models a truck which delivers material to various locations the selection of which is beyond its control. We could simply compute the set of states from which Min has a strategy to get the play to a state where the robot has zero energy and it is not in the docking station. making the energy part of the states would cause an enormous increase in the number of states and make the model unmanageable. By solving the lower-bound. 10000]. with weights chosen uniformly at random from [1. we subtracted a constant from each edge-weight so that the ν value of each vertex is close to 0. and we also get some strategy which ensures it. For more information on these inputs we refer the reader to [14] or [7]. in each of them. Moving and even idling consumes energy. and so the robot has to return to its docking station from time to time to recharge. we find out from which initial configurations the robot has a strategy which ensures that it will never consume all of its energy outside the docking station. The outputs of these generators are only directed weighted graphs. The goal is to never run out of the material so that the truck is always able to satisfy each delivery request. downloadable from [15]. Chaloupka 4.

Numbers of vertices and edges. They all contain 218 vertices. The MPGs prefixed by “sqnc”. running GNU/Linux kernel version 2. with 218 vertices – no suffix.2. and “taxi” are the models of simple reactive systems created by ourselves. and so for b ≪ (|V | − 1) · W . In addition. in thousands. and with 220 vertices – suffix “h”. Its operation costs money and the taxi also earns money. there are several other interesting points. very simplified. all vertices in our inputs have infinite lwubΓ b value. This was true for all values of b that we tried.00GHz processors and 16GB of RAM. we tried two different values of parameters. the combined algorithms are often slower than VI alone. Therefore.2 with the “-O2” option. which seems to be a reasonable amount so that the results provide insight. For each model. the BV and SW parts of VI + BV and VI + SW always perform the same work. The first table contains the results for the lower-bound problem. and for some inputs it was significantly faster. To get MPGs of manageable size. the algorithms essentially solve the lower-bound problem. However.6. and “pnc” were generated by the TOR generator. Each MPG used in the experiments has four columns in each table. which contains a bound b as a part of the input. Finally. with 219 vertices – suffix “b”. “lnc”. we selected as b the average lbΓ value of the vertices with finite lbΓ value divided by 2. We tried various values of b. We note that smaller b makes the computation of VI and KASI faster. Tables 1 and 2 show that the algorithm KASI was the fastest on all inputs for the lower-bound problem. 4.Using Strategy Improvement to Stay Alive 603 The third model is called “taxi” and models a taxi which transports people at their request. All algorithms were implemented in C++ and compiled with GCC 4.26. excluding the time for reading input. and they become very easy to solve. are in brackets. Tables 1 and 2 give the results of our experiments. If the bound is too low. Both for the rand5 and rand10 family. The goal is to never run out of money. and for this paper. “supply”. The first column of each table contains names of the input MPGs. and so the runtimes are practically the same as for the lower-bound problem. the second table contains the results for the lower-weak-upper-bound problem. . Results The experiments were carried out on a machine equipped with two dual-core Intel r Xeon r 2. the results clearly suggest that KASI is the best algorithm. The term “n/a” means more than 10 hours. of course. Each column headed by a name of an algorithm contains execution times of that algorithm in seconds. we experimented with three sizes of graphs. The MPGs prefixed by “rand” were generated by the SPRAND generator. If the bound is too high. and we also want to know the minimal sufficient initial amount of money. the MPGs prefixed by “collect”.3. but they are still much closer to real world problems than the synthetic MPGs. For the lower-weak-upper-bound problem it was never slower than the fastest algorithm by more than a factor of 2. the models are. namely.

30 105.63 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 996.29 1.16 36. and taxi1–2.99 3.36 27.79 57.72 449. for the lower-weak-upper-bound problem for the bound we selected.24 111.49 23.66 25.80 3.03 65.49 14.47 13.83 11.29 86.46 1027.31 10. However.08 15.60 14.39 25. the difference was never significant.55 39.28 11.77 3.54 11.27 25.35 5.71 4. collect1–2.32 93.56 6956. Runtimes of the experiments for the lower-bound problem (in seconds).69 20.45 16.79 3.30 3.63 4. which takes more .96 43.00 VI + BV 31. and it was mostly caused by the initialization phase of the algorithms.54 29.68 1032.14 3.27 5.48 4.34 1.22 13.55 10.12 109.91 4.68 5.85 7.01 1.37 17. all input MPGs had vertices with ν < 0.34 10.05 10.604 L.01 36.40 20. The preprocessing by BV and SW reduces the execution time by orders of magnitude for these MPGs.18 9.23 28046.64 6.89 63.08 12.24 29.48 26.42 33.89 11. VI was even faster than KASI on a lot of inputs.70 1.89 24.08 338. On the other hand.03 VI + SW 55.30 16.55 367.30 21. VI is often very fast and the preprocessing slows the computation down in most cases.69 140.29 VI is practically unusable for solving the lower-bound problem for MPGs with some vertices with ν < 0.54 11.09 21.09 19.06 59.12 352.48 10.58 8.86 3.16 7. lower-bound MPG sqnc01 sqnc02 sqnc03 sqnc04 sqnc05 lnc01 lnc02 lnc03 lnc04 lnc05 pnc01 pnc02 pnc03 pnc04 pnc05 rand5 rand5b rand5h rand10 rand10b rand10h collect1 collect2 supply1 supply2 taxi1 taxi2 (262k 524k) (262k 524k) (262k 525k) (262k 532k) (262k 786k) (262k 524k) (262k 524k) (262k 525k) (262k 528k) (262k 786k) (262k 2097k) (262k 2097k) (262k 2098k) (262k 2101k) (262k 2359k) (262k 1310k) (524k 2621k) (1048k 5242k) (262k 2621k) (524k 5242k) (1048k 10485k) (636k 3309k) (636k 3309k) (363k 1014k) (727k 2030k) (509k 979k) (509k 979k) VI n/a n/a n/a n/a n/a 60.89 4.45 67.80 3. Brim & J.51 KASI 17. Except for lnc01–02. Chaloupka Table 1.64 1.43 110.

17 KASI 9.99 5.53 26.41 1.88 0.15 19.24 59.48 37.05 2. Runtimes of the experiments for the lower-weak-upper-bound problem (in seconds).53 7.38 531.70 3. which consumes a very small amount of energy per time unit. it cannot survive by idling forever.19 53.52 102. especially from the “collect” family.38 1.00 43.51 2.06 3.45 2.17 11.70 13. and so until the idling consumes at least as much energy as the minimal sufficient initial energy to keep the energy level non-negative forever.52 87.65 3.41 15.43 1.39 5.98 1.54 33.39 105.27 36.17 2.49 1.77 1.04 2.04 3.98 2.55 10. new iterations have to be started.66 25.31 1.40 181.54 17.77 189.97 140.24 1.78 208.87 33.70 0.38 time for the more complex algorithm KASI.57 1.95 71. We .53 1.70 VI + BV 19.85 3. VI makes a lot of iterations for the inputs from the collect family.75 1. However.64 11.57 544.16 39.34 11.65 10.07 2.38 17.97 418.73 3.89 4.17 27.69 20.84 0.49 1. for some inputs.55 10.07 3.28 2. VI is very slow.45 110.97 1.49 VI + SW 43. Moreover.89 1.03 4.72 5.65 1.20 17. The i-th iteration of VI computes the minimal sufficient initial energy to keep the energy level non-negative for i time units. lower-weak-upper-bound MPG sqnc01 sqnc02 sqnc03 sqnc04 sqnc05 lnc01 lnc02 lnc03 lnc04 lnc05 pnc01 pnc02 pnc03 pnc04 pnc05 rand5 rand5b rand5h rand10 rand10b rand10h collect1 collect2 supply1 supply2 taxi1 taxi2 (262k 524k) (262k 524k) (262k 525k) (262k 532k) (262k 786k) (262k 524k) (262k 524k) (262k 525k) (262k 528k) (262k 786k) (262k 2097k) (262k 2097k) (262k 2098k) (262k 2101k) (262k 2359k) (262k 1310k) (524k 2621k) (1048k 5242k) (262k 2621k) (524k 5242k) (1048k 10485k) (636k 3309k) (636k 3309k) (363k 1014k) (727k 2030k) (509k 979k) (509k 979k) VI 13.98 17.31 25.03 16.35 7.08 15.49 14.72 30.85 24.20 8.34 27.23 29.41 10.88 2. because the robot can survive for a quite long time by idling.Using Strategy Improvement to Stay Alive 605 Table 2.68 7.87 4.53 1.64 24.48 1.17 8.65 1.82 563.52 4.55 23.06 1.52 3.68 0.17 1.07 11.37 5.

Formal Modeling and Analysis of Timed Systems. Chaloupka and L. R. Formal Methods in System Design. and J. 5. the lower-weakupper-bound problem. Brim. which we also improved by combining it with the algorithm of Bj¨ orklund and Vorobyov [1] (BV) and the algorithm of Schewe (SW). Srba. Discrete Applied Math. 85:277–311. and Formal Verification. Brno. Brim and J. Other inputs for which VI took a lot of time are: sqnc01. In Proceedings of the 5th Doctoral Wokrshop on Mathematical and Engineering Methods in Computer Science (MEMICS 2009). Raskin. Negative-cycle detection algorithms. pages 40–54. Doyen. Automata. Parallel algorithms for mean-payoff games: An experimental evaluation. [6] A. [5] L. 2010. Brim. [3] L. Springer. Shortest paths algorithms: Theory and experimental evaluation. is based on the strategy improvement technique which is very efficient in practice. 2007.606 L. Conclusion We proposed a novel algorithm for solving the lower-bound and the lower-weakupper-bound problems for MPGs. Brim & J. Finally. [9] B. Cherkassky. Chaloupka. [4] L. Our algorithm. Logics. Chaloupka. Technical Report FIMU-RS-2010-03. G. L. Using strategy improvement to stay alive. 2009. Brim and J. Chaloupka believe that this is a typical situation for this kind of application. Two additional results of this paper are the improvement of the complexity of BV. In Proc. Mathematical Programming. and M. . NOVPRESS. 12].-F. Mathematical Programming. J. Games. 2003. Fahrenberg. Faster algorithms for mean-payoff games. Springer.r. Infinite runs in weighted timed automata with energy constraints. we comment on the scalability of the algorithms. Bj¨ orklund and S.t. V. N. V. pages 117–133. 1996. [2] P. [10] B. [7] J. In Proc. supply1–2. T. To demonstrate that the algorithm is able to solve the two problems for large MPGs. pages 33–47. A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games. 155(2): 210–229. Chakrabarti. Faster algorithm for mean-payoff games. and the characterization of Min’s optimal strategies w. 1999. Resource interfaces. V. Gentilini. 2010. and so they are able to scale up to very large MPGs. K. European Symposium on Algorithms. Stoelinga. pages 45–53. EPTCS. Goldberg. and T. pages 599–610. [8] J. 2009. Henzinger. U. Chaloupka. Larsen. 38(2):97–118. References [1] H. In Proc. the runtimes of the algorithms increase no faster than the term |V | · |E |. V. Vorobyov. de Alfaro. KASI is the clear winner of the experimental study. 2011. lnc01–02. 2008. called Keep Alive Strategy Improvement (KASI). Masaryk University. we carried out an experimental study. Using strategy improvement to stay alive. Radzik. L. Chaloupka. In the study we compared KASI with the value iteration algorithm (VI) from [8. Cherkassky and A. Czech Republic. Faculty of Informatics. Markey.. In Proc. A. Goldberg. volume 2855 of LNCS. Bouyer. volume 5757 of LNCS. As the experiments on the SPRAND generated inputs suggest. 73:129–174. Embedded Software. Springer. and J.

8(2):109–113. 2009. The graph is denoted by Gπ to emphasize which graph the main algorithm KASI computes longest paths in. Mycielski. Computer Science Logic. Positional strategies for mean payoff games. The set S contains vertices from V with finite dp value.-F. Werneck. R. 12(5):359–370. Gurvich. Khachivan. and R. Raskin. [17] A. [18] Y. A. pages 369–384. Goldberg. and L.Using Strategy Improvement to Stay Alive 607 [11] J. The second part of the input is a bound b. [20] U. 2006. 158(1–2):343–359. Faster pseudopolynomial algorithms for mean-payoff games. The vector dp contains longest path weights computed in different setting.com/andrew/ soft. [19] S. USSR Comput. Pavlov. 2009. Georgiadis. In Proc. F. Doyen. Theoretical Computer Science. An experimental study of minimum mean cycle algorithms. and finally the fourth part is a vector of integers dp used for potential transformation of edge-weights. Each vertex from X is also put to the maximum priority queue q and its presence in the queue is recorded in the vector queued. An optimal strategy improvement algorithm for solving parity and payoff games. Math. which are returned as output. The complexity of mean payoff games on graphs. The input to Dijkstra() always guarantees that X ⊆ S . the tentative distance key (v ) of each vertex v ∈ S is initialized to −∞. http://www. and J. Tarjan. Karp. [12] L. International Journal of Game Theory.avglab. 2008. Lifshits and D.. 1979. Gaubert. A. Dijkstra() computes the difference between dp and the longest path weights in the current setting and then adds the difference to dp .html. thus obtaining the current longest path weights. Workshop on Algorithm Engineering and Experiments. to which the algorithm computes longest paths. [15] Andrew Goldberg’s network optimization library. Paterson. Bruxelles. E. S. 28(5):85–91. April 2011. Ehrenfeucht and J. volume 5213 of LNCS. [14] L. 1996. A. SIAM. Zapiski Nauchnyh Seminarov POMI. there is the pseudocode of the modified Dijkstra’s algorithm used in the strategy evaluation procedure EvaluateStrategy(). In Proc. 1988. 1966. and so the initialization phase on lines 4–8 sets key (v ) of each vertex v ∈ X to the (final) value 0. Appendix Modified Dijkstra’s algorithm In Figure 4. the third part is a set X . Fast exponential deterministic algorithm for mean payoff games. pages 1–13. Cochet-Terrasson and S. On line 3. Zwick and M. [13] A.120. Karzanov. 2006. Cyclic games and an algorithm to find minimax cycle means in directed graphs. The input to Dijkstra() consists of four parts. Belgium. Comptes Rendus Mathematique. 343:377–382. Universit´ e Libre de Bruxelles (ULB). R. The first part is a directed graph Gπ . Schewe. Springer. Dijkstra() works only with the vertices in the set S computed on line 2. [16] V. J. Hoffman and R. and Math. V. Gentilini. A policy iteration algorithm for zero-sum stochastic games with mean payoff. . On nonterminating stochastic games. G. The priority of a vertex v in the maximum priority queue q is key (v ). The algorithm computes the weights of the longest paths to the set X . Phys. M. V. 340:61–75. Technical Report 2009. Management Science.

4.608 L. and so do (v ) is set to −∞ too.enqueue(v ) queued(v ) := true od [q is a maximum-priority queue of vertices. X. the longest path weight do (v ) for each v ∈ S is computed as dp (v ) + key (v ). for each vertex v ∈ V \ S (dp (v ) = −∞). The vector do is then returned as output on line 24.dequeue() foreach (v.enqueue(v ). . where the priority of v is key (v )] while ¬q. do (v ) is simply set to −∞. w). u) ∈ Eπ do if v ∈ S ∧ v ∈ / X then tmp := key (u) + w(v. queued(v ) := true fi fi fi od od foreach v ∈ S do do (v ) := dp (v ) + key (v ) od foreach v ∈ V \ S do do (v ) = −∞ od return do end Fig. Brim & J. the updates of key (v ) that do not lead to path weight greater or equal to −b are ignored. For each v ∈ S . The main loop on lines 10–23 differs from the standard Dijkstra’s algorithm only in the following. Eπ . These are the vertices for which there is no path to X in Gπ (S ) or each such path has a suffix of weight less than −b. Modified Dijkstra’s algorithm.empty () do u := q. On line 23. Chaloupka 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 proc Dijkstra(Gπ = (V. dp ) S = {v ∈ V | dp (v ) > −∞} foreach v ∈ S do key (v ) := −∞ od foreach v ∈ X do key (v ) := 0 q. b. On line 22. u) − dp (v ) + dp (u) if dp (v ) + tmp ≥ −b ∧ tmp > key (v ) then key (v ) := tmp if ¬queued(v ) then q. Please note that key (v ) might be equal to −∞.

Copyright of International Journal of Foundations of Computer Science is the property of World Scientific Publishing Company and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. . users may print. download. However. or email articles for individual use.