• Aucun résultat trouvé

A Note on the Approximation of Mean-Payoff Games

N/A
N/A
Protected

Academic year: 2022

Partager "A Note on the Approximation of Mean-Payoff Games"

Copied!
8
0
0

Texte intégral

(1)

A Note on the Approximation of Mean-Payoff Games

Raffaella Gentilini1

1University of Perugia, Italy

Abstract. We consider the problem of designing approximation schemes for the values of mean-payoff games. It was recently shown that (1) mean-payoff with rational weights scaled on[−1,1]admit additive fully-polynomial approxima- tion schemes, and (2) mean-payoff games with positive weights admit relative fully-polynomial approximation schemes. We show that the problem of design- ing additive/relative approximation schemes for general mean-payoff games (i.e.

with no constraint on their edge-weights) is P-time equivalent to determining their exact solution.

1 Introduction

Two-player mean-payoff games are played on weighted graphs1with two types of ver- tices: in player-0vertices, player0chooses the successor vertex from the set of outgoing edges; in player-1vertices, player1chooses the successor vertex from the set of outgo- ing edges. The game results in an infinite path through the graph. The long-run average of the edge-weights along this path, called thevalueof the play, is won by player0and lost by player1.

Thedecision problemfor mean-payoff games asks, given a vertexvand a threshold ν ∈ Q, if player0has a strategy to win a value at leastν when the game starts inv.

Thevalue problemconsists in computing the maximal (rational) value that player0can achieve from each vertex v of the game. The associated(optimal) strategy synthesis problemis to construct a strategy for player0that secures the maximal value.

Mean-payoff games have been first studied by Ehrenfeucht and Mycielski in [1], where it is shown that memoryless (or positional) strategies suffice to achieve the op- timal value. This result entails that the decision problem for these games lies in NP

∩coNP [2, 18], and it was later shown to belong to2 UP∩coUP [10]. Despite many efforts [19, 18, 13, 5, 6, 20, 9, 12], no polynomial-time algorithm for the mean-payoff game problems is known so far.

Beside such a theoretically engaging complexity status, mean-payoff games have plenty of applications, especially in the synthesis, analysis and verification of reactive (non-terminating) systems. Many natural models of such systems include quantitative information, and the corresponding question requires the solution of quantitative games, like mean-payoff games. Concrete examples of applications include various kinds of scheduling, finite-window online string matching, or more generally, analysis of online problems and algorithms, as well as selection with limited storage [18]. Mean-payoff games can even be used for solving the max-plus algebraAx=Bxproblem, which in

1in which every edge has a positive/negative (rational) weight

2The complexity class UP is the class of problems recognizable by unambiguous polynomial time nondeterministic Turing machines [14]. Obviously P⊆UP∩coUP⊆NP∩coNP.

(2)

Problems

Algorithms Decision Problem Value Problem Note

[12] O(|E| · |V| ·W) O(|E| · |V|2·W·(log|V|+logW)) Deterministic

[18] Θ(|E| · |V|2 ·W) Θ(|E| · |V|3·W) Deterministic

[20] O(|E| · |V| ·2|V|) O(|E| · |V| ·2|V|·logW) Deterministic [9] min(O(|E| · |V|2·W),min(O(|E| · |V|3·W·(logV+logW)), Randomized

2O(

|V|·log|V|)·logW) 2O(

|V|·log|V|)·logW)

Table 1.Complexity of the main algorithms to solve mean-payoff games.

turn has further applications [6]. Beside their applicability to the modeling of quantita- tive problems, mean-payoff games have tight connections with important problems in game theory and logic. For instance, parity games [8] and the model-checking problem for the modal mu-calculus [11] are poly-time reducible to mean-payoff games [7], and it is a long-standing open question to know whether these problems are in P.

Table 1 summarize the complexity of the main algorithms for solving mean-payoff games in the literature. In particular, alldeterministicalgorithms for mean-payoff games are either pseudopolynomial (i.e., polynomial in the number of vertices|V|, the number of edges|E|, and the maximal absolute weightW, rather than in the binary represen- tation ofW) or exponential [19, 18, 13, 12, 20, 17]. The works in [9, 3] define aran- domizedalgorithm which is both subexponential and pseudopolynomial. Recently, the authors of [15, 4] show that the pseudopolynomial procedures in [18, 13, 12] can be used to design (fully) polynomial value approximation schemes for certain classes of mean- payoff games: namely, mean-payoff games with positive (integer) weights or rational weights with absolute value less or equal to1. In this paper, we consider the problem of extending such positive approximation results for general mean-payoff games, i.e.

mean-payoff games with weights arbitrary shifted/scaled on the line of rational num- bers.

2 Preliminaries and Definitions

Game graphs Agame graphis a tupleΓ = (V, E, w,hV0, V1i)whereGΓ = (V, E, w) is a weighted graph andhV0, V1iis a partition ofV into the setV0of player-0vertices and the setV1of player-1vertices. Aninfinite gameonΓ is played for infinitely many rounds by two players moving a pebble along the edges of the weighted graphGΓ. In the first round, the pebble is on some vertexv ∈V. In each round, if the pebble is on a vertexv ∈ Vi (i = 0,1), then player ichooses an edge(v, v0) ∈ E and the next round starts with the pebble onv0. Aplayin the game graphΓ is an infinite sequence p = v0v1. . . vn. . . such that (vi, vi+1) ∈ E for alli ≥ 0. Astrategy for player i (i= 0,1) is a functionσ:V·Vi →V, such that for all finite pathsv0v1. . . vnwith vn ∈ Vi, we have(vn, σ(v0v1. . . vn)) ∈ E. A strategy-profileis a pair of strategies hσ0, σ1i, whereσ0(resp.σ1) is a strategy for player0 (resp. player1). We denote by Σi(i= 0,1) the set of strategies for playeri. A strategyσfor playeriismemoryless

(3)

if σ(p) = σ(p0)for all sequences p = v0v1. . . vn andp0 = v00v10. . . v0msuch that vn = v0m. We denote byΣiM the set of memoryless strategies of player i. A play v0v1. . . vn. . . isconsistentwith a strategyσfor playeriifvj+1 =σ(v0v1. . . vj)for all positionsj ≥ 0such thatvj ∈ Vi. Given an initial vertexv ∈ V, theoutcomeof thestrategy profilehσ0, σ1iinvis the (unique) playoutcomeΓ(v, σ0, σ1)that starts in v and is consistent with bothσ0 andσ1. Given a memoryless strategyπi for playeri in the gameΓ, we denote byGΓi) = (V, Eπi, w)the weighted graph obtained by removing fromGΓ all edges(v, v0)such thatv∈Viandv06=πi(v).

Mean-Payoff Games Amean-payoff game (MPG) [1] is an infinite game played on a game graphΓ where player0 wins a payoff value defined as the long-run average weights of the play, while player1loses that value. Formally, the payoff value of a play v0v1. . . vn. . . inΓ is

MP(v0v1. . . vn. . .) = lim inf

n→∞

1 n·

n−1

X

i=0

w(vi, vi+1).

The valuesecuredby a strategyσ0∈Σ0in a vertexvis valσ0(v) = inf

σ1∈Σ1

MP(outcomeΓ(v, σ0, σ1))

and the(optimal) valueof a vertexvin a mean-payoff gameΓ is valΓ(v) = sup

σ0∈Σ0

inf

σ1∈Σ1

MP(outcomeΓ(v, σ0, σ1)).

We say thatσ0isoptimalifvalσ0(v) =valΓ(v)for allv∈V. Secured value and opti- mality are defined analogously for strategies of player1. Ehrenfeucht and Mycielski [1]

show that mean-payoff games arememoryless determined, i.e., memoryless strategies are sufficient for optimality and the optimal (maximum) value that player0can secure is equal to the optimal (minimum) value that player1can achieve.

Theorem 1 ([1]).For allMPGΓ = (V, E, w,hV0, V1i)and for all verticesv∈V, we have

valΓ(v) = sup

σ0∈Σ0

inf

σ1∈Σ1MP(outcomeΓ(v, σ0, σ1)) = inf

σ1∈Σ1 sup

σ0∈Σ0

MP(outcomeΓ(v, σ0, σ1)),

and there exist two memoryless strategiesπ0∈Σ0M andπ1∈Σ1M such that valΓ(v) =valπ0(v) =valπ1(v).

Moreover,uniformoptimal strategies exist for both players, i.e. there exists a strategy profilehσ0, σ1ithat can be used to secure the optimal value independently of the initial vertex [1]. Such a strategy profile is said theoptimal strategy profile.

The following lemma characterizes the shape of MPG values in a MPG Γ = (V, E, w,hV0, V1i)with integer weights in{−W, . . . , W}. Note that solvingMPGwith rational weights is P-time reducible to solvingMPGwith integer weights [20, 18].

(4)

Lemma 1 ([1, 20]).LetΓ = (V, E, w,hV0, V1i)be aMPGwith integer weights and letW = max(v,v0)∈E|w(v, v0)|. For each vertexv∈V, the optimal valuevalΓ(v)is a rational number nd such that1≤d≤ |V|and|n| ≤d·W.

We consider the following three classical problems [18, 9] for aMPGΓ = (V, E, w,hV0, V1i):

1. Decision Problem. Given a thresholdν∈Qand a vertexv∈V, decide ifvalΓ(v)≥ ν.

2. Value Problem. Compute for each vertexv∈V the valuevalΓ(v).

3. (Optimal) Strategy Problem . Given an MPG Γ, compute an (optimal) strategy profile forΓ.

Approximate Solutions forMPG

Dealing with approximate MPGsolutions, we can take into consideration either ab- solute or relative error measures, and define the notions ofadditiveandrelativeMPG approximate value.

Definition 1 (MPG additive ε-value).Let Γ = (V, E, w,hV0, V1i) be a MPG, let v ∈V and considerε≥0. The valuevalf ∈Qis said anadditiveε-valueonvif and only if:

|valf −valΓ(v)| ≤ε

Definition 2 (MPGrelativeε-value).LetΓ = (V, E, w,hV0, V1i)be aMPG, letv∈ V and considerε≥0. The valuevalf ∈Qis said arelativeε-valueonvif and only if:

|valf −valΓ(v)|

|valΓ(v)| ≤ε

Note that additiveMPGε-values are shift-invariant. More precisely, ifvalf is an additive approximateε-value on the vertexv inΓ = (V, E, w,hV0, V1i), thenvalf +k is an additive approximateε-value in theMPGΓ0 = (V, E, w+k,hV0, V1i), where all the weights are shifted byk. On the contrary, additiveMPGε-values are not scale-invariant.

In fact, ifvalf is a relativeε-value forvin theMPGΓ = (V, E, w,hV0, V1i), thenk·valf is a relativeε·k-value forv in theMPGΓ0 = (V, E, w·k,hV0, V1i), where all the weights are multiplied byk. In other words, the additive error onΓ is amplified by a factorkin the scaled version of the game,Γ0. Conversely, relativeMPGε-values are scale invariant but not shift invariant.

The notions of (fully) polynomial approximation schemes w.r.t relative and additive errors are formally defined below.

Definition 3 (MPGFully Polynomial Time Approximation Scheme (FPTAS)).An additive (resp. relative) fully polynomial approximation schemefor the MPG Γ = (V, E, w,hV0, V1i)is an algorithmAsuch that for allε >0,Acomputes an additive (resp. relative)ε-value in time polynomial w.r.t. the size3ofΓ and 1ε.

Definition 4 (MPGPolynomial Time Approximation Scheme (PTAS)).An additive (resp. relative)polynomial approximation schemefor theMPGΓ = (V, E, w,hV0, V1i) is an algorithmAsuch that for allε >0,Acomputes an additive (resp. relative)ε- value in time polynomial w.r.t. the size ofΓ.

3GivenΓ = (V, E, w,hV0, V1i),size(Γ) =|E|+|V|+ log(W), whereW is the maximum (absolute value) of a weight inΓ.

(5)

3 Mean-Payoff Games and Additive Approximation Schemes

Recently, [15] provides an additive fully polynomial scheme for theMPGvalue prob- lem on graphs with rational weights in the interval [−1,+1]. A natural question is whether we could efficiently approximate the value in MPGwith no restrictions on the weights. The next theorem shows that a generalization of the positive approxima- tion result in [15] onMPGwith arbitrary (rational) weights would indeed provide a polynomial timeexactsolution to theMPGvalue problem.

Theorem 2. The MPG value problem does not admit an additive FPTAS, unless it is in P.

Proof. We start to consider theMPGproblem on graphs with integer weights. Assume that theMPGvalue problem on graphs with integer weights admits an additive FPTAS.

Given a MPGΓ = (V, E, w,hV0, V1i)and a vertex v ∈ V, let |V| = n andε =

1

2n(n−1). Then, our FPTAS computes an additiveε-valuevalf onvin time polynomial w.r.t.n. By Lemma 1,valΓ(v)is a rational number with denominatordsuch that1≤ d≤n. Two rationals with denominatordfor which1 ≤d≤nhave distance at least

1

n(n−1). Hence, there is a unique rational with denominatord,1 ≤d ≤n, within the intervalI={q∈Q|valf−ε≤q≤valf+ε}, whereε= 2n(n−1)1 . Such unique rational isvalΓ(v)and can be easily found in time logarithmic w.r.t.n[16]. Thus, we have an algorithmAto solve the value problem onΓ in time polynomial w.r.t.n. TheMPG problem on graphs with rational weights can be reduced in polynomial time (w.r.t. the size ofΓ) to theMPGon graphs with integer weights by simply resizing the weights

in the original graph [20, 12]. ut

In view of the proof of the above theorem, we could still hope to obtain some positive approximation results for general (i.e. arbitrarly scaled)MPGby considering weaker notion of approximations with respect toFPTAS. Unfortunately, the next lemma shows that the following is sufficient to show that theMPGvalue problem is in P: determining in time polynomial w.r.t. the size of a givenMPGΓ ak-approximate value ofv, where v∈V andkis an arbitrary constant.

Theorem 3. For any constantk: If the problem of computing an additivek-approximate MPGvalue can be solved in polynomial time (w.r.t. the size of the inputMPG), then the MPG value problem belongs toP.

Proof. We start to considerMPGwith integer weights. Letvbe a vertex in theMPG Γ = (V, E, w :E 7→[−W, W],hV0, V1i)and denote|V|=n. If2k+ 1>(n−2)!, then the problem of determiningvalΓ(v)can be solved in time O(kk) = O(1) by simply enumerating all the strategies available to the players.

Otherwise, assume2k+1≤(n−2)!. Consider the gameΓ0= (V, E, w0,hV0, V1i), where ∀e ∈ E : w0(e) = w(e)·n!. By hypothesis, there is an algorithm Athat computes ak-approximate valuevalf forv onΓ0 in timeT polynomial w.r.t. the size of Γ0. Since log(W ·n!) = O(n·log(n) + log(W)), T is also polynomial w.r.t.

the size ofΓ. By construction, valΓ0(v)is an integer. There are at most2k+ 1in- tegers in the interval [valf −k,valf +k], thus we have at most 2k+ 1 candidates {bval−kcfn! , . . . ,bval+kcgn! } forvalΓ(v). Moreover, those candidates lie in an interval of

(6)

lengthL ≤ 2k+1n!(n−2)!n! = n·(n−1)1 . The minimum distance between two possible candidates forvalΓ(v)is n·(n−1)1 .

The exact valuevalΓ(v)is thus the unique rational number with denominator of size at mostnthat lies in the interval[bval−kcfn! ,bval+kcgn! ]and can be easily found in time logarithmic w.r.t.n[16].

TheMPGproblem on graphs with rational weights can be reduced in polynomial time (w.r.t. the size ofΓ) to theMPGon graphs with integer weights by simply resizing

the weights in the original graph [20, 12]. ut

A direct consequence of Theorem 3 is that the MPG value problem does not admit aPTAS, unless it is in P. More precisely, Theorem 2 and Theorem 3 entail a result of P-time equivalence between the exactMPGvalue problem and the three classes of approximations listed in Corollary 1.

Corollary 1. The following problems areP-time equivalent:

1. Solving theMPGvalue problem.

2. Determining an additiveFPTASfor theMPGvalue problem.

3. Determining an additivePTASfor theMPGvalue problem.

4. Computing an additivek-approximateMPGvalue in polynomial time, for any con- stantk.

4 Mean-Payoff Games and Relative Approximation Schemes

In the recent work in [4], the authors consider the design approximation schemes for theMPGvalue problem based on the relative–rather than absolute–error. In particular, they provide a relativeFPTASfor theMPGvalue problem on graphs withnonnegative weights. Note that negative weights are necessary to encode parity games and theµ- calculus model checking intoMPGgames [10]. The following theorem considers the problem of designing (fully) polynomial approximation schemes for the MPGvalue problem on graphs with arbitrary (positive and negative) rational weights. It shows that solving such a problem would indeed provide an exact solution to theMPGvalue prob- lem, computable in time polynomial w.r.t. the size of theMPG.

Theorem 4. The MPG value problem does not admit a relative PTAS, unless it is inP.

Proof. LetΓ = (V, E, p,hV0, V1i)be aMPG, letv ∈V. Assume thatMPGadmit a relative PTAS and considerε= 12. Our assumption entails that we have an algorithm Athat computes a relative 12-valuevalf for v in time polynomial w.r.t. the size ofΓ. We show thatvalf ≥0 if and only ifvalΓ(v)≥ 0. In other words, we show that the MPGdecision problem isPTIMEreducible to the computation of a relative 12-value.

By definition of relativeε-value, forε= 12, we have:

|valf −valΓ(v)|

|valΓ(v)| ≤ 1

2 (1)

We have four cases to consider:

(7)

1. In the first case, assume thatvalf −valΓ(v) ≥0andvalf ≥0. By contradiction, supposevalΓ(v)<0. Then, Disequation implies:

valf −valΓ(v)≤ 12· |valΓ(v)| ⇒ valf ≤valΓ(v) +12· |valΓ(v)|<0 that contradicts our hypothesis.

2. In the second case, assume thatvalf −valΓ(v)≥0andval <f 0. Then,0>valf ≥ valΓ(v). that contradicts our hypothesis.

3. In the third case, assume thatvalf −valΓ(v)<0andval <f 0. By contradiction, supposevalΓ(v)>0. Then, Disequation implies:

valΓ(v)−valf ≤ 12· |valΓ(v)| ⇒ valf ≥valΓ(v)−12· |valΓ(v)| ≥0 that contradicts our hypothesis.

4. The last case to consider is:valf −valΓ(v) < 0andvalf ≥ 0. Then,valΓ(v) >

valf ≥0.

Provided a P-time algorithm for deciding whethervalΓ(v)≥ 0, a dichotomic search can be used to determinevalΓ(v)in time polynomial w.r.t. the size ofΓ [12, 20]. ut As a direct consequence of Theorem 4 we obtain the following result of P-time equiv- alence involving the computation ofMPGexact and approximate solutions.

Corollary 2. The following problems areP-time equivalent:

1. Solving theMPGvalue problem.

2. Determining a relativeFPTASfor theMPGvalue problem.

3. Determining a relativePTASfor theMPGvalue problem.

References

1. A. Ehrenfeucht and J. Mycielski. International journal of game theory.Positional Strategies for Mean-Payoff Games, 8:109–113, 1979.

2. A. V. Karzanov and V. N. Lebedev. Cyclical games with proibitions. Mathematical Pro- graming, 60:277–293, 1993.

3. D. Andersson and S. Vorobyov. Fast algorithms for monotonic discounted linear programs with two variables per inequality. Technical Report Preprint NI06019-LAA, Isaac Newton Institute for Mathematical Sciences, Cambridge, UK, 2006.

4. Y. Boros, K. Elbassioni, M. Fouz, V. Gurvich, K. Makino, and B. Manthey. Stochastic mean-payoff games: Smoothed analysis and approximation schemes. InProc. of ICALP:

Colloquium on Automata, Languages and Programming, 2011.

5. A. Condon. On algorithms for simple stochastic games. InAdvances in Computational Complexity Theory, volume 13 ofDIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 51–73. American Mathematical Society, 1993.

6. V. Dhingra and S. Gaubert. How to solve large scale deterministic games with mean payoff by policy iteration. InProc. Performance evaluation methodolgies and tools,article no. 12.

ACM, 2006.

(8)

7. E. A. Emerson, C. Jutla, and A. P. Sistal. On model checking for fragments of theµ-calculus.

InProc. of CAV: Computer Aided Verification, LNCS 697, pages 385–396. Springer, 1993.

8. Y. Gurevich and L. Harrington. Trees, automata, and games. InProc. of STOC: Symposium on Theory of Computing, pages 60–65. ACM, 1982.

9. H. Bjorklund and S. Vorobyov. A combinatorial strongly subexponential strategy improve- ment algorithm for mean payoff games.Discrete Applied Mathematics, 155:210–229, 2007.

10. M. Jurdzinski. Deciding the winner in parity games is in UP∩coUP. Inf. Process. Lett., 68(3):119–124, 1998.

11. D. Kozen. Results on the propositional mu-calculus.Theor. Comput. Sci., 27:333–354, 1983.

12. L. Brim, J. Chaloupka, L. Doyen, R. Gentilini, and J-F. Raskin. Faster algorithms for mean payoff games.Formal Methods in System Design, 38(2):97–118, 2011.

13. N. Pisaruk. Mathematics of operations research.Mean Cost Cyclical Games, 4(24):817–828, 1999.

14. C. M. Papadimitriou.Computational complexity. Addison-Wesley, Reading, Massachusetts, 1994.

15. A. Roth, M. Balcan, A. Kalai, and Y. Mansour. On the equilibria of alternating move games.

InProc. of SODA: Symposium on Discrete Algorithms, pages 805–816. ACM, 2010.

16. S. Kwek and K. Mehlhorn. Optimal search for rationals. Information Processing Letters, 86:23–26, 2003.

17. S. Schewe. From parity and payoff games to linear programming. InProceedings of MFCS:

Mathematical Foundations of Computer Science, LNCS 5734, pages 675–686. Springer, 2009.

18. U. Zwick and M. Paterson. The complexity of mean payoff games on graphs. Theoretical Computer Science, 158:343–359, 1996.

19. V. A. Gurvich, A. V. Karzanov, and L. G. Kachiyan. Ussr computational mathematics and mathematical physics. Cyclic Games and an Algorithm to Find Minmax Cycle Means in Directed Graphs, 5(28):85–91, 1988.

20. Y. Lifshits and D. Pavlov. Potential theory for mean payoff games.Journal of Mathematical Sciences, 145(3):4967–4974, 2007.

Références

Documents relatifs

Mean-field approximation of stochastic population processes in games 4 time intervals, gives the expected net increase in each population share during the time interval, per time

The mean-field limit concerns systems of interacting (classical or quantum)

This proof follows closely the one given in [3] for the discrete case, which in turn follows the one given in [4] for Markov decision processes... It is a

We prove the existence of the positronium, the bound state of an electron and a positron, represented by a critical point of the energy functional in the absence of external field..

When the initial conditions v i (0) are not set in line with the stationary wealths given by equation (14), a certain time is needed to redistribute the excessive wealth levels over

The purpose of this note is to discuss a connection between the works of Daniels [3] and Jensen [4, 5] on the tail accuracy of the saddlepoint approximation of

Finally, in Section 5 we construct an approximation algorithm for solving the problem under study and prove that our algorithm is a fully polynomial-time approximation scheme

The following theorem provides the link between the Liouville equation (73) satisfied by the velocity field v t (.) and the Cauchy problem (74) on a infinite dimensional