• Aucun résultat trouvé

Improved Pseudo-polynomial Bound for the Value Problem and Optimal Strategy Synthesis in Mean Payoff Games

N/A
N/A
Protected

Academic year: 2022

Partager "Improved Pseudo-polynomial Bound for the Value Problem and Optimal Strategy Synthesis in Mean Payoff Games"

Copied!
27
0
0

Texte intégral

(1)

HAL Id: hal-01577457

https://hal-upec-upem.archives-ouvertes.fr/hal-01577457

Submitted on 25 Aug 2017

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Improved Pseudo-polynomial Bound for the Value Problem and Optimal Strategy Synthesis in Mean Payoff

Games

Carlo Comin, Romeo Rizzi

To cite this version:

Carlo Comin, Romeo Rizzi. Improved Pseudo-polynomial Bound for the Value Problem and Optimal Strategy Synthesis in Mean Payoff Games. Algorithmica, Springer Verlag, 2017, 77 (4), pp.995-1021.

�10.1007/s00453-016-0123-1�. �hal-01577457�

(2)

An Improved Pseudo-Polynomial Upper Bound for the Value Problem and Optimal Strategy Synthesis

in Mean Payoff Games

Carlo Comin

Romeo Rizzi

Abstract

In this work we offer anO(|V|2|E|W)pseudo-polynomial time deterministic algorithm for solving the Value Problem and Optimal Strategy Synthesis in Mean Payoff Games. This im- proves by a factor log(|V|W)the best previously known pseudo-polynomial time upper bound due to Brim,et al. The improvement hinges on a suitable characterization of values, and a description of optimal positional strategies, in terms of reweighted Energy Games and Small Energy-Progress Measures.

Keywords:Mean Payoff Games, Value Problem, Optimal Strategy Synthesis, Energy Games, Small Energy-Progress Measures, Pseudo-Polynomial Time.

1 Introduction

AMean Payoff Game(MPG) is a two-player infinite gameΓ:= (V,E,w,hV0,V1i), which is played on a finite weighted directed graph, denotedGΓ:= (V,E,w), the vertices of which are partitioned into two classes,V0andV1, according to the player to which they belong. It is assumed thatGΓhas no sink vertex and that the weights of the arcs are integers, i.e.,w:E→ {−W, . . . ,0, . . . ,W}for someW ∈N.

At the beginning of the game a pebble is placed on some vertexvs∈V, and then the two players, named Player 0 and Player 1, move the pebble ad infinitum along the arcs. Assuming the pebble is currently on Player 0’s vertexv, then he chooses an arc(v,v0)∈E going out ofvand moves the pebble to the destination vertexv0. Similarly, assuming the pebble is currently on Player 1’s vertex, then it is her turn to choose an outgoing arc. The infinite sequencevs,v,v0. . .of all the en- countered vertices is aplay. In order to play well, Player 0 wants to maximize the limit inferior of the long-run average weight of the traversed arcs, i.e., to maximize lim infn→∞1

nn−1i=0w(vi,vi+1), whereas Player 1 wants to minimize the lim supn→∞1nn−1i=0w(vi,vi+1). Ehrenfeucht and Myciel- ski [5] proved that each vertexvadmits avalue, denotedvalΓ(v), which each player can secure by means of amemoryless(orpositional) strategy, i.e., a strategy that depends only on the current vertex position and not on the previous choices.

Department of Mathematics, University of Trento, Trento, Italy; LIGM, Universit´e Paris-Est, Marne-la-Vall´ee, Paris, France. (carlo.comin@unitn.it)

Department of Computer Science, University of Verona, Verona, Italy. (romeo.rizzi@univr.it)

arXiv:1503.04426v7 [cs.DS] 24 Apr 2016

Improved Pseudo-polynomial Bound for the Value Problem

and Optimal Strategy Synthesis in Mean Payoff Games

(3)

Solving an MPG consists in computing the values of all vertices (Value Problem) and, for each player, a positional strategy that secures such values to that player (Optimal Strategy Syn- thesis). The corresponding decision problem lies inNP∩coNP[11] and it was later shown by Jurdzi´nski [8] to be recognizable withunambiguous polynomial time non-deterministic Turing Machines, thus falling within theUP∩coUPcomplexity class.

The problem of devising efficient algorithms for solving MPGs has been studied extensively in the literature. The first milestone was that of Gurvich, Karzanov and Khachiyan [7], in which they offered anexponentialtime algorithm for solving a slightly wider class of MPGs calledCyclic Games. Afterwards, Zwick and Paterson [11] devised the first deterministic procedure for comput- ing values in MPGs, and optimal strategies securing them, within apseudo-polynomialtime and polynomial space. In particular, Zwick and Paterson established anO(|V|3|E|W)upper bound for the time complexity of the Value Problem, as well as an upper bound ofO(|V|4|E|Wlog(|E|/|V|)) for that of Optimal Strategy Synthesis [11].

Recently, several research efforts have been spent in studying quantitative extensions of infi- nite games for modeling quantitative aspects of reactive systems [2–4]. In this context quantities may represent, for example, the power usage of an embedded component, or the buffer size of a networking element. These studies have brought to light interesting connections with MPGs.

Remarkably, they have recently led to the design of faster procedures for solving them. In par- ticular, Brim,et al.[3] devised faster deterministic algorithms for solving the Value Problem and Optimal Strategy Synthesis in MPGs withinO(|V|2|E|Wlog(|V|W))pseudo-polynomial time and polynomial space.

To the best of our knowledge, this is the tightest pseudo-polynomial upper bound on the time complexity of MPGs which is currently known.

Indeed, a wide spectrum of different approaches have been investigated in the literature. For in- stance, Andersson and Vorobyov [1] provided a fastsub-exponentialtimerandomizedalgorithm for solving MPGs, whose time complexity can be bounded asO(|V|2|E|exp(2

q

|V|ln(|E|/p

|V|) + O(p

|V|+ln|E|))). Furthermore, Lifshits and Pavlov [9] devised anO(2|V||V| |E|logW)singly- exponentialtime deterministic procedure by considering thepotential theoryof MPGs.

These results are summarized in Table 1.

Table 1: Complexity of the main Algorithms for solving MPGs.

Algorithm Value Problem

Optimal Strategy Synthesis

Note

This work O(|V|2|E|W) O(|V|2|E|W) Determ.

[3] O(|V|2|E|Wlog(|V|W)) O(|V|2|E|W log(|V|W)) Determ.

[11] Θ(|V|3|E|W) Θ(|V|4|E|W log|E||V|) Determ.

[9] O(2|V||V| |E|logW) n/a Determ.

[1] O

|V|2|E|e2

r

|V|ln |E|

|V|

+O(

|V|+ln|E|)

same complexity Random.

Contribution. The main contribution of this work is that to provide anO(|V|2|E|W)pseudo- polynomialtime andO(|V|)space deterministic algorithm for solving the Value Problem and Op-

(4)

timal Strategy Synthesis in MPGs. As already mentioned in the introduction, the best previously known procedure has a deterministic time complexity ofO(|V|2|E|Wlog(|V|W)), which is due to Brim,et al.[3]. In this way we improve the best previously known pseudo-polynomial time upper bound by a factor log(|V|W). This result is summarized in the following theorem.

Theorem 1. There exists a deterministic algorithm for solving the Value Problem and Optimal Strategy Synthesis of MPGs within O(|V|2|E|W)time and O(|V|)space, on any input MPGΓ= (V,E,w,hV0,V1i). Here, W=maxe∈E|we|.

In order to prove Theorem 1, this work points out a novel and suitable characterization of values, and a description of optimal positional strategies, in terms of certain reweighting operations that we will introduce later on in Section 2.

In particular, we will show that the optimal valuevalΓ(v)of any vertexvis the unique rational numberν for whichv “transits” from the winning region of Player 0 to that of Player 1, with respect to reweightings of the formw−ν. This intuition will be clarified later on in Section 3, where Theorem 3 is formally proved.

Concerning strategies, we will show that an optimal positional strategy for each vertexv∈V0

is given by any arc(v,v0)∈E which is compatible with certainSmall Energy-Progress Measures (SEPMs) of the above mentioned reweighted arenas. This fact is formally proved in Theorem 4 of Section 3.

These novel observations are smooth, simple, and their proofs rely on elementary arguments.

We believe that they contribute to clarifying the interesting relationship between values, optimal strategies and reweighting operations (with respect to some previous literature, see e.g. [3, 9]).

Indeed, they will allow us to prove Theorem 1.

Organization. This manuscript is organized as follows. In Section 2, we introduce some notation and provide the required background on infinite two-player games and related algorithmic results.

In Section 3, a suitable relation between values, optimal strategies, and certain reweighting oper- ations is investigated. In Section 4, anO(|V|2|E|W)pseudo-polynomial time andO(|V|)space algorithm, for solving the Value Problem and Optimal Strategies Synthesis in MPGs, is designed and analyzed by relying on the results presented in Section 3. In this manner, Section 4 actually provides a proof of Theorem 1 which is our main result in this work.

2 Notation and Preliminaries

We denote byN,Z,Qthe set of natural, integer, and rational numbers (respectively). It will be sufficient to consider integral intervals, e.g.,[a,b]:={z∈Z|a≤z≤b}and[a,b):={z∈Z|a≤ z<b}for anya,b∈Z.

Weighted Graphs. Our graphs are directed and weighted on the arcs. Thus, ifG= (V,E,w)is a graph, then every arce∈Eis a triplete= (u,v,we), wherewe=w(u,v)∈Zis theweightofe. The maximum absolute weight isW :=maxe∈E|we|. Given a vertexu∈V, the set of its successors is post(u) ={v∈V|(u,v)∈E}, whereas the set of its predecessors ispre(u) ={v∈V|(v,u)∈E}.

Apathis a sequence of verticesv0v1. . .vn. . .such that(vi,vi+1)∈Efor everyi. We denote byV the set of all (possibly empty) finite paths. Asimple path is a finite pathv0v1. . .vnhaving no repetitions, i.e., for anyi,j∈[0,n]it holds vi6=vj wheneveri6= j. Thelengthof a simple path ρ=v0v1. . .vnequalsnand it is denoted by|ρ|. Acycleis a pathv0v1. . .vn−1vnsuch thatv0. . .vn−1

(5)

is simple andvn=v0. Thelengthof a cycleC=v0v1. . .vnequalsnand it is denoted by|C|. The average weightof a cyclev0. . .vnis1nn−1i=0w(vi,vi+1). A cycleC=v0v1. . .vnisreachablefromv inGif there exists a simple pathp=vu1. . .uminGsuch thatp∩C6=/0.

Arenas. An arenais a tuple Γ= (V,E,w,hV0,V1i)whereGΓ:= (V,E,w)is a finite weighted directed graph and(V0,V1)is a partition ofV into the setV0of vertices owned by Player 0, and the setV1of vertices owned by Player 1. It is assumed thatGΓhas no sink, i.e.,post(v)6=/0 for every v∈V; still, we remark thatGΓis not required to be a bipartite graph on colour classesV0andV1. Fig. 1 depicts an example.

A B

C D

−1

−2

+1 +2 −1

−1 +1 +2

−9

Figure 1: An arenaΓ.

A game onΓis played for infinitely many rounds by two players moving a pebble along the arcs ofGΓ. At the beginning of the game we find the pebble on some vertexvs∈V, which is called thestarting positionof the game. At each turn, assuming the pebble is currently on a vertexv∈Vi (fori=0,1), Playerichooses an arc(v,v0)∈Eand then the next turn starts with the pebble onv0. Aplayis any infinite pathv0v1. . .vn. . .∈VinΓ. For anyi∈ {0,1}, a strategy of Playeri is any function σi:V×Vi→V such that for every finite path p0v inGΓ, where p0∈V and v∈Vi, it holds that(v,σi(p0,v))∈E. A strategyσi of Playeri ispositional(ormemoryless) if σi(p,vn) =σi(p0,v0m)for every finite pathspvn=v0. . .vn−1vnandp0v0m=v00. . .v0m−1v0minGΓsuch thatvn=v0m∈Vi. The set of all the positional strategies of Player iis denoted byΣMi . A play v0v1. . .vn. . .isconsistentwith a strategyσ∈Σiifvj+1=σ(v0v1. . .vj)whenevervj∈Vi.

Given a starting position vs∈V, the outcome of strategies σ0∈Σ0 and σ1∈Σ1, denoted outcomeΓ(vs01), is the unique play that starts atvsand is consistent with bothσ0andσ1.

Given a memoryless strategy σi∈ΣMi of Player i inΓ, thenGΓσi = (V,Eσi,w)is the graph obtained fromGΓby removing all the arcs(v,v0)∈Esuch thatv∈Viandv06=σi(v); we say that GΓσi is obtained fromGΓby projectionw.r.t.σi.

Concluding this subsection, the notion of reweightingis recalled. For any weight function w,w0:E→Z, thereweightingofΓ= (V,E,w,hV0,V1i)w.r.t.w0is the arenaΓw

0= (V,E,w0,hV0,V1i).

Also, for w:E →Z and any ν∈Z, we denote by w+ν the weight function w0 defined as w0e:=we+ν for every e∈E. Indeed, we shall consider reweighted games of the form Γw−q, for someq∈Q. Notice that the corresponding weight functionw0:E→Q:e7→we−qis rational, while we required the weights of the arcs to be always integers. To overcome this issue, it is suf- ficient to re-defineΓw−qby scaling all the weights by a factor equal to the denominator ofq∈Q, namely, to re-define:Γw−q:=ΓD·w−N, whereN,D∈Nare such thatq=N/Dand gcd(N,D) =1.

This re-scaling will not change the winning regions of the corresponding games, and it has the significant advantage of allowing for a discussion (and an algorithmics) which is strictly based on integer weights.

(6)

Mean Payoff Games. AMean Payoff Game(MPG) [3, 5, 11] is a game played on some arena Γfor infinitely many rounds by two opponents, Player 0 gains a payoff defined as the long-run average weight of the play, whereas Player 1 loses that value. Formally, the Player 0’spayoff of a playv0v1. . .vn. . .inΓis defined as follows:

MP0(v0v1. . .vn. . .):=lim inf

n→∞

1 n

n−1

i=0

w(vi,vi+1).

The valuesecuredby a strategyσ0∈Σ0in a vertexvis defined as:

valσ0(v):= inf

σ1∈Σ1MP0 outcomeΓ(v,σ01) ,

Notice that payoffs and secured values can be defined symmetrically for the Player 1 (i.e., by interchanging the symbol0with1andinfwithsup).

Ehrenfeucht and Mycielski [5] proved that each vertexv∈V admits a uniquevalue, denoted valΓ(v), which each player can secure by means of amemoryless(orpositional) strategy. More- over,uniformpositional optimal strategies do exist for both players, in the sense that for each player there exist at least one positional strategy which can be used to secure all the optimal values, inde- pendently with respect to the starting positionvs. Thus, for every MPGΓ, there exists a strategy σ0∈ΣM0 such thatvalσ0(v)≥valΓ(v)for everyv∈V, and there exists a strategyσ1∈ΣM1 such thatvalσ1(v)≤valΓ(v)for everyv∈V. Indeed, the(optimal) valueof a vertexv∈V in the MPG Γis given by:

valΓ(v) = sup

σ0∈Σ0

valσ0(v) = inf

σ1∈Σ1valσ1(v).

Thus, a strategyσ0∈Σ0isoptimalifvalσ0(v) =valΓ(v)for allv∈V. A strategyσ0∈Σ0is said to bewinningfor Player 0 ifvalσ0(v)≥0, andσ1∈Σ1is winning for Player 1 ifvalσ1(v)<0.

Correspondingly, a vertexv∈Vis awinning starting positionfor Player 0 ifvalΓ(v)≥0, otherwise it is winning for Player 1. The set of all winning starting positions of Playeriis denoted byWifor i∈ {0,1}.

A B

C D

start −1

−2

+1 +2 −1

−1 +1 +2

−9

A B

C D

−1

−2

+1 +2 −1

−1 +1 +2

−9

A B

C D

−1

−2

+1 +2 −1

−1 +1 +2

−9

A B

C D

−1

−2

+1 +2 −1

−1 +1 +2

−9

Figure 2: An MPG onΓ, played from left to right, whose payoff equals −1+12 =0.

A finite variant of MPGs is well-known in the literature [3, 5, 11]. Here, the game stops as soon as a cyclic sequence of vertices is traversed (i.e., as soon as one of the two players moves the pebble into a previously visited vertex). It turns out that this finite variant is equivalent to the infinite one [5]. Specifically, the values of an MPG are in relationship with the average weights of its cycles, as stated in the next lemma.

Lemma 1(Brim,et al.[3]). LetΓ= (V,E,w,hV0,V1i)be an MPG. For allν∈Q, for all positional strategiesσ0∈ΣM0 of Player0, and for all vertices v∈V , the valuevalσ0(v)is greater thanνif and only if all cycles C reachable from v in the projection graph GΓσ

0 have an average weight w(C)/|C|greater thanν.

(7)

The proof of Lemma 1 follows from the memoryless determinacy of MPGs. We remark that a proposition which is symmetric to Lemma 1 holds for Player 1 as well: for allν∈Q, for all positional strategiesσ1∈ΣM1 of Player 1, and for all verticesv∈V, the valuevalσ1(v)is less than νif and only if all cycles reachable fromvin the projection graphGΓσ1have an average weight less thanν.

Also, it is well-known [3, 5] that each valuevalΓ(v)is contained within the following set of rational numbers:

SΓ= N

D

D∈[1,|V|],N∈[−DW,DW]

.

Notice that|SΓ| ≤ |V|2W.

The present work tackles on the algorithmics of the following two classical problems:

• Value Problem.Compute for each vertexv∈V the (rational) optimal valuevalΓ(v).

• Optimal Strategy Synthesis.Compute an optimal positional strategyσ0∈ΣM0.

Currently, the asymptotically fastest pseudo-polynomial time algorithm for solving both prob- lems is a deterministic procedure whose time complexity isO(|V|2|E|W log(|V|W))[3]. This result has been achieved by devising a binary-search procedure that ultimately reduces the Value Problem and Optimal Strategy Synthesis to the resolution of yet another family of games known as theEnergy Games. Even though we do not rely on binary-search in the present work, and thus we will introduce some truly novel ideas that diverge from the previous solutions, still, we will reduce to solving multiple instances of Energy Games. For this reason, the Energy Games are recalled in the next paragraph.

Energy Games and Small Energy-Progress Measures. AnEnergy Game(EG) is a game that is played on an arenaΓfor infinitely many rounds by two opponents, where the goal of Player 0 is to construct an infinite playv0v1. . .vn. . .such that for some initialcredit c∈Nthe following holds:

c+

j

i=0

w(vi,vi+1)≥0 , for all j≥0. (1)

Given a creditc∈N, a playv0v1. . .vn. . . is winning for Player 0 if it satisfies (1), otherwise it is winning for Player 1. A vertexv∈V is a winning starting position for Player 0 if there exists an initial creditc∈Nand a strategyσ0∈Σ0such that, for every strategyσ1∈Σ1, the play outcomeΓ(v,σ01)is winning for Player 0. As in the case of MPGs, the EGs are memoryless determined [3], i.e., for everyv∈V, eithervis winning for Player 0 orvis winning for Player 1, and (uniform) memoryless strategies are sufficient to win the game. In fact, as shown in the next lemma, the decision problems of MPGs and EGs are intimately related.

Lemma 2(Brim,et al.[3]). LetΓ= (V,E,w,hV0,V1i)be an arena. For all thresholdν∈Q, for all vertices v∈V , Player0has a strategy in the MPGΓthat secures value at leastνfrom v if and only if, for some initial credit c∈N, Player0has a winning strategy from v in the reweighted EG Γw−ν.

In this work we are especially interested in theMinimum Credit Problem(MCP) for EGs: for each winning starting positionv, compute the minimum initial creditc=c(v)such that there exists a winning strategyσ0∈ΣM0 for Player 0 starting fromv. A fast pseudo-polynomial time deterministic procedure for solving MCPs comes from [3].

(8)

Theorem 2(Brim,et al.[3]). There exists a deterministic algorithm for solving the MCP within O(|V| |E|W)pseudo-polynomial time, on any input EG(V,E,w,hV0,V1i).

The algorithm mentioned in Theorem 2 is theValue-Iterationalgorithm analyzed by Brim,et al.in [3]. Its rationale relies on the notion ofSmall Energy-Progress Measures(SEPMs). These are bounded, non-negative and integer-valued functions that impose local conditions to ensure global properties on the arena, in particular, witnessing that Player 0 has a way to enforce conservativity (i.e., non-negativity of cycles) in the resulting game’s graph. Recovering standard notation, see e.g. [3], let us denoteCΓ={n∈N|n≤ |V|W} ∪ {>}and letbe the total order onCΓdefined asxyif and only if eithery=>orx,y∈Nandx≤y.

In order to cast the minus operation to range overCΓ, let us consider an operator :CΓ×Z→ CΓdefined as follows:

a b:=

max(0,a−b) ,ifa6=>anda−b≤ |V|W; a b=> ,otherwise.

Given an EG Γ on vertex setV =V0∪V1, a function f :V →CΓ is a Small Energy-Progress Measure(SEPM) forΓif and only if the following two conditions are met:

1. ifv∈V0, then f(v)f(v0) w(v,v0)forsome(v,v0)∈E;

2. ifv∈V1, then f(v)f(v0) w(v,v0)forall(v,v0)∈E.

The values of a SEPM, i.e., the elements of the image f(V), are called theenergy levelsof f. It is worth to denote byVf ={v∈V | f(v)6=>}the set of vertices having finite energy. Given a SEPM f and a vertexv∈V0, an arc(v,v0)∈E is said to becompatible with f whenever f(v) f(v0) w(v,v0); moreover, a positional strategyσ0f ∈ΣM0 is said to becompatible with f whenever for allv∈V0, ifσ0f(v) =v0then(v,v0)∈Eis compatible withf. Notice that, as mentioned in [3], if f andgare SEPMs, then so is theminimum functiondefined as:h(v) =min{f(v),g(v)}for every v∈V. This fact allows one to consider theleastSEPM, namely, the unique SEPM f:V →CΓ

such that, for any other SEPMg:V →CΓ, the following holds: f(v)g(v)for everyv∈V. Also concerning SEPMs, we shall rely on the following lemmata. The first one relates SEPMs to the winning regionW0of Player 0 in EGs.

Lemma 3(Brim,et al.[3]). LetΓ= (V,E,w,hV0,V1i)be an EG.

1. If f is any SEPM of the EGΓand v∈Vf, then v is a winning starting position for Player0 in the EGΓ. Stated otherwise, Vf ⊆W0;

2. If fis the least SEPM of the EGΓ, and v is a winning starting position for Player0in the EGΓ, then v∈Vf. Thus, Vf=W0.

Also notice that the following bound holds on the energy levels of any SEPM (actually by definition ofCΓ).

Lemma 4. LetΓ= (V,E,w,hV0,V1i)be an EG. Let f be any SEPM ofΓ. Then, for every v∈V either f(v) =>or0≤f(v)≤ |V|W .

(9)

Value-Iteration Algorithm. The algorithm devised by Brim,et al. for solving the MCP in EGs is known asValue-Iteration[3]. Given an EGΓas input, the Value-Iteration aims to compute the least SEPM fofΓ. This simple procedure basically relies on a lifting operatorδ. Givenv∈V, thelifting operatorδ(·,v):[V→CΓ]→[V→CΓ]is defined byδ(f,v) =g, where:

g(u) =

f(u) ifu6=v

min{f(v0) w(v,v0)|v0∈post(v)} ifu=v∈V0

max{f(v0) w(v,v0)|v0∈post(v)} ifu=v∈V1

We also need the following definition. Given a function f :V →CΓ, we say that f isinconsis- tentinvwhenever one of the following two holds:

1. v∈V0andfor all v0∈post(v)it holds f(v)≺f(v0) w(v,v0);

2. v∈V1andthere exists v0∈post(v)such that f(v)≺f(v0) w(v,v0).

To start with, the Value-Iteration algorithm initializes f to the constant zero function, i.e., f(v) =0 for everyv∈V. Furthermore, the procedure maintains a listLof vertices in order to witness the inconsistencies of f. Initially, v∈V0∩L if and only if all arcs going out of vare negative, whilev∈V1∩Lif and only ifvis the source of at least one negative arc. Notice that checking the above conditions takes timeO(|E|).

As long as the list Lis nonempty, the algorithm picks a vertex v fromL and performs the following:

1. Apply the lifting operatorδ(f,v)to f in order to resolve the inconsistency of f inv;

2. Insert intoLall verticesu∈pre(v)\Lwitnessing a new inconsistency due to the increase of f(v).

(The same vertex can’t occur twice inL, i.e., there are no duplicate vertices inL.)

The algorithm terminates whenLis empty. This concludes the description of the Value-Iteration algorithm.

As shown in [3], the update ofLfollowing an application of the lifting operatorδ(f,v)requires O(|pre(v)|)time. Moreover, a single application of the lifting operatorδ(·,v)takesO(|post(v)|) time at most. This implies that the algorithm can be implemented so that it will always halt within O(|V| |E|W) time (the reader is referred to [3] in order to grasp all the details of the proof of correctness and complexity).

Remark.The Value-Iteration procedure lends itself to the following basic generalization, which turns out to be of a pivotal importance in order to best suit our technical needs. Let fbe the least SEPM of the EGΓ. Recall that, as a first step, the Value-Iteration algorithm initializes f to be the constant zero function. Here, we remark that it is not necessary to do that really. Indeed, it is sufficient to initializefto be any functionf0which boundsffrom below, that is to say, to initialize f to any f0:V→CΓsuch that f0(v) f(v)for everyv∈V. Soon after,Lcan be initialized in a natural way: just insertvintoLif and only if f0is inconsistent atv. This initialization still requires O(|E|)time and it doesn’t affect the correctness of the procedure.

So, we shall assume to have at our disposal a procedure namedValue-Iteration(), which takes as input an EGΓ= (V,E,w,hV0,V1i)and aninitial function f0that bounds from below the least SEPM fof the EGΓ(i.e., s.t. f0(v)f(v)for everyv∈V). Then,Value-Iteration() outputs the least SEPM fof the EGΓwithinO(|V| |E|W)time and working withO(|V|)space.

(10)

3 Values and Optimal Positional Strategies from Reweightings

This section aims to show that values and optimal positional strategies of MPGs admit a suitable description in terms of reweighted arenas. This fact will be the crux for solving the Value Problem and Optimal Strategy Synthesis inO(|V|2|E|W)time.

3.1 On optimal values

A simple representation of values in terms ofFarey sequencesis now observed, then, a characteri- zation of values in terms of reweighted arenas is provided.

Optimal values and Farey sequences. Recall that each valuevalΓ(v)is contained within the following set of rational numbers:

SΓ= N

D

D∈[1,|V|],N∈[−DW,DW]

.

Let us introduce some notation in order to handleSΓ in a way that is suitable for our purposes.

Firstly, we write everyν∈SΓasν=i+F, wherei=iν=bνcis the integral andF=Fν={ν}= ν−iis the fractional part. Notice thati∈[−W,W]and thatF is a non-negative rational number having denominator at most|V|.

As a consequence, it is worthwhile to consider theFarey sequenceFnof ordern=|V|. This is the increasing sequence of all irreducible fractions from the (rational) interval[0,1]with denomi- nators less than or equal ton. In the rest of this paper,Fndenotes the following sorted set:

Fn= N

D

0≤N≤D≤n,gcd(N,D) =1

.

Farey sequences have numerous and interesting properties, in particular, many algorithms for generating the entire sequenceFnin timeO(n2)are known in the literature [6], and these rely on Stern-Brocottrees andmediantproperties. Notice that the above mentioned quadratic running time is optimal, as it is well-known that the sequenceFnhass(n) =3n2

π2 +O(nlnn) =Θ(n2)terms.

Throughout the article, we shall assume thatF0, . . . ,Fs−1is an increasing ordering ofFn, so thatFn={Fj}s−1j=0andFj<Fj+1for every j.

Also notice thatF0=0 andFs−1=1.

For example,F5={0,15,14,13,25,12,35,23,34,45,1}.

At this point,SΓcan be represented as follows:

SΓ= [−W,W) +F|V|= i+Fj

i∈[−W,W),j∈[0,s−1] . The above representation ofSΓwill be convenient in a while.

Optimal values and reweightings. Two introductory lemmata are shown below, then, a charac- terization of optimal values in terms of reweightings is provided.

Lemma 5. LetΓ= (V,E,w,hV0,V1i)be an MPG and let q∈Qbe a rational number having denominator D∈N. Then,valΓ(v) = 1

DvalΓw+q(v)−q holds for every v∈V .

(11)

Proof. Let us consider the playoutcomeΓw+q(v,σ01) =v0v1. . .vn. . .By the definition ofvalΓ(v), and by that of reweightingΓw+q(=ΓD·w+N), the following holds:

valΓw+q(v) = supσ

0∈Σ0infσ1∈Σ1MP0(outcomeΓw+q(v,σ01))

= supσ

0∈Σ0infσ1∈Σ1lim infn→∞1

nn−1i=0(D·w(vi,vi+1) +N) (ifq=N/D)

= D·supσ

0∈Σ0infσ1∈Σ1MP0(outcomeΓ(v,σ01)) +N

= D·valΓ(v) +N.

Then,valΓ(v) = 1

DvalΓw+q(v)−ND= 1

DvalΓw+q(v)−qholds for everyv∈V.

2

Lemma 6. Given an MPGΓ= (V,E,w,hV0,V1i), let us consider the reweightings:

Γi,jw−i−Fj, for any i∈[−W,W]and j∈[0,s−1], where s=|F|V||and Fjis the j-th term of the Farey sequenceF|V|.

Then, the following propositions hold:

1. For any i∈[−W,W]and j∈[0,s−1], we have:

v∈W0i,j)if and only ifvalΓ(v)≥i+Fj; 2. For any i∈[−W,W]and j∈[1,s−1], we have:

v∈W1i,j)if and only ifvalΓ(v)≤i+Fj−1. Proof.

1. Let us fix arbitrarily somei∈[−W,W]and j∈[0,s−1].

Assume thatFj=Nj/Djfor someNj,Dj∈N.

Since

Γi,j= (V,E,Dj(w−i)−Nj,hV0,V1i), then by Lemma 5 (applyed toq=−i−Fj) we have:

valΓ(v) = 1

DjvalΓi,j(v) +i+Fj. Recall thatv∈W0i,j)if and only ifvalΓi,j(v)≥0.

Hence, we havev∈W0i,j)if and only if the following inequality holds:

valΓ(v) = 1

DjvalΓi,j(v) +i+Fj

≥i+Fj. This proves Item 1.

(12)

2. The argument is symmetric to that of Item 1, but with some further observations.

Let us fix arbitrarily some i∈[−W,W] and j∈[1,s−1]. Assume that Fj=Nj/Dj for someNj,Dj∈N. SinceΓi,j= (V,E,Dj(w−i)−Nj,hV0,V1i), then by Lemma 5 we have valΓ(v) =D1

jvalΓi,j(v) +i+Fj. Recall thatv∈W1i,j)if and only ifvalΓi,j(v)<0.

Hence, we havev∈W1i,j)if and only if the following inequality holds:

valΓ(v) = 1 Dj

valΓi,j(v) +i+Fj

<i+Fj. Now, recall from Section 2 thatvalΓ(v)∈SΓ, where

SΓ={i+Fj|i∈[−W,W), j∈[0,s−1]}.

By hypothesis we have:

j≥1 and 0≤Fj−1<Fj, thus, at this point,v∈W1i,j)if and only ifvalΓ(v)≤i+Fj−1. This proves Item 2.

2 We are now in the position to provide a simple characterization of values in terms of reweight- ings.

Theorem 3. Given an MPGΓ= (V,E,w,hV0,V1i), let us consider the reweightings:

Γi,jw−i−Fj, for any i∈[−W,W]and j∈[1,s−1], where s=|F|V||and Fjis the j-th term of the Farey sequenceF|V|.

Then, the following holds:

valΓ(v) =i+Fj−1if and only if v∈W0i,j−1)∩W1i,j).

Proof. Let us fix arbitrarily somei∈[−W,W]and j∈[1,s−1].

By Item 1 of Lemma 6, we havev∈W0i,j−1)if and only ifvalΓ(v)≥i+Fj−1. Symmetri- cally, by Item 2 of Lemma 6, we havev∈W1i,j)if and only ifvalΓ(v)≤i+Fj−1. Whence, by composition,v∈W0i,j−1)∩W1i,j)if and only ifvalΓ(v) =i+Fj−1. 2

3.2 On optimal positional strategies

The present subsections aims to provide a suitable description of optimal positional strategies in terms of reweighted arenas. An introductory lemma is shown next.

Lemma 7. LetΓ= (V,E,w,hV0,V1i)be an MPG, the following hold:

1. If v∈V0, let v0∈post(v). ThenvalΓ(v0)≤valΓ(v)holds.

(13)

2. If v∈V1, let v0∈post(v). ThenvalΓ(v0)≥valΓ(v)holds.

3. Given any v∈V0, consider the reweighted EGΓvw−valΓ(v).

Let fv:V →CΓvbe any SEPM of the EGΓvsuch that v∈Vfv (i.e., fv(v)6=>). Let v0f

v∈V

be any vertex out of v such that(v,v0f

v)∈E is compatible with fvinΓv. Then,valΓ(v0f

v) =valΓ(v).

Proof. 1. It is sufficient to construct a strategyσ0v∈ΣM0 securing to Player 0 a payoff at least valΓ(v0)fromvin the MPGΓ. Letσv

0

0 ∈ΣM0 be a strategy securing payoff at leastvalΓ(v0) fromv0inΓ. Then, letσ0vbe defined as follows:

σ0v(u) =





σ0v0(u) , ifu∈V0\ {v};

σv

0

0(v) , ifu=vandv isreachable fromv0inGΓ

σ0v0; v0 , ifu=vandv is notreachable fromv0inGΓ

σ0v0

.

We argue thatσ0vsecures payoff at leastvalΓ(v0)fromvinΓ. First notice that, by Lemma 1 (applied tov0), all cyclesCthat are reachable fromv0inΓsatisfy:

w(C)

|C| ≥valΓ(v0).

The fact is that any cycle reachable from vinGΓσv

0 is also reachable fromv0 inGΓ

σ0v0 (by definition ofσ0v), therefore, the same inequality holds for all cycles reachable fromv. At this point, the thesis follows again by Lemma 1 (applied tov, in the inverse direction). This proves Item 1.

2. The proof of Item 2 is symmetric to that of Item 1.

3. Firstly, notice thatvalΓ(v0f

v)≤valΓ(v)holds by Item 1. To conclude the proof it is suffi- cient to showvalΓ(v0f

v)≥valΓ(v). Recall that(v,v0f

v)∈E is compatible with fvinΓvby hypothesis, that is:

fv(v) fv(v0f

v) w(v,v0fv)−valΓ(v)

.

This, together with the fact thatv∈Vfv(i.e.,fv(v)6=>) also holds by hypothesis, implies that v0f

v∈Vf (i.e., fv(v0f

v)6=>). Thus, by Item 1 of Lemma 3,v0f

v is a winning starting position of Player 0 in the EGΓv. Whence, by Lemma 2, it holds thatvalΓ(v0f

v)≥valΓ(v). This proves Item 3.

2 We are now in position to provide a sufficient condition, for a positional strategy to be optimal, which is expressed in terms of reweighted EGs and their SEPMs.

Theorem 4. LetΓ= (V,E,w,hV0,V1i)be an MPG. For each v∈V , consider the reweighted EG Γvw−valΓ(v). Let fv:V→CΓvbe any SEPM ofΓvsuch that v∈Vfv(i.e., fv(v)6=>). Moreover, assume: fv1= fv2 whenevervalΓ(v1) =valΓ(v2).

(14)

When v∈V0, let v0f

v∈V be any vertex such that(v,v0f

v)∈E is compatible with fvin the EGΓv, and consider the positional strategyσ0∈ΣM0 defined as follows:

σ0(v) =v0fv for every v∈V0. Then,σ0is an optimal positional strategy for Player0in the MPGΓ.

Proof. Let us consider the projection graphGΓσ

0 = (V,Eσ

0,w). Letv∈Vbe any vertex. In order to prove thatσ0is optimal, it is sufficient (by Lemma 1) to show that every cycleCthat is reachable fromvinGΓσ

0 satisfies w(C)|C| ≥valΓ(v).

• Preliminaries. Letv∈V and letCbe any cycle of length|C| ≥1 that is reachable fromv inGΓσ

0. Then, there exists a pathρof length|ρ| ≥1 inGΓσ

0 and such that: if|ρ|=1, then ρ=ρ0ρ1=vv; otherwise, if|ρ|>1, then:

ρ=ρ0. . .ρ|ρ|=vv1v2. . .vku1u2. . .u|C|u1, wherevv1. . .vkis a simple path, for somek≥0 andu1. . .u|C|u1=C.

GΓσ 0 v C

v1 vk

u1

u2

u3 u4 u|C|

Figure 3: A cycleCthat is reachable fromvthroughv1· · ·vkinGΓσ 0.

• Fact 1.It holdsvalΓi)≤valΓi+1)for everyi∈[0,|ρ|).

of Fact 1. If ρi∈V0 then valΓi) =valΓi+1) by Item 3 of Lemma 7; otherwise, if ρi∈V1, thenvalΓi)≤valΓi+1)by Item 2 of Lemma 7. This proves Fact 1. In par- ticular, notice thatvalΓ(v)≤valΓ(u1)when|ρ|>1. 2

• Fact 2.AssumeC=u1. . .u|C|u1, thenvalΓ(ui) =valΓ(u1)for everyi∈[0,|C|].

of Fact 2. By Fact 1,valΓ(ui−1)≤valΓ(ui)for everyi∈[2,|C|], as well asvalΓ(u|C|)≤ valΓ(u1). Then, the following chain of inequalities holds:

valΓ(u1)≤valΓ(u2)≤. . .≤valΓ(u|C|)≤valΓ(u1).

Since the first and the last value of the chain are actually the same, i.e.,valΓ(u1), then, all these inequalities are indeed equalities. This proves Fact 2. 2

(15)

• Fact 3.The following holds for everyi∈[0,|ρ|):

fρii),fρii+1)6=>andfρii)≥fρii+1)−w(ρii+1) +valΓi).

of Fact 3. Firstly, we argue that any arc(ρii+1)∈Eis compatible with fρi inΓρi. Indeed, ifρi∈V0, then(ρii+1)is compatible with fρiinΓρibecauseρi+10i)by hypothesis;

otherwise, if ρi∈V1, then(ρi,x)is compatible with fρi inΓρi forevery x∈post(ρi), in particular forx=ρi+1, by definition of SEPM.

At this point, since(ρii+1)is compatible with fρi inΓρi, then:

fρii)fρii+1) w(ρii+1)−valΓi) . Now, recall thatρi∈Vfρ

i (i.e., fρii)6=>) holds for everyρiby hypothesis. Sincefρii)6=

>and the above inequality holds, then we havefρii+1)6=>. Thus, we can safely write:

fρii)≥fρii+1)−w(ρii+1) +valΓi).

This proves Fact 3. 2

• Fact 4.Assume that the cycleC=u1. . .u|C|u1is such that:

valΓ(ui) =valΓ(u1)≥valΓ(v),for everyi∈[1,|C|].

Then, provided thatu|C|+1=u1, the following holds for everyi∈[1,|C|]:

fu1(u1),fui+1(ui+1)6=>andfu1(u1)≥fui+1(ui+1)−

i

j=1

w(uj,uj+1) +i·valΓ(v).

of Fact 4. Firstly, notice that fu1(u1),fui+1(ui+1)6=>holds by hypothesis.

The proof proceeds by induction oni∈[1,|C|].

– Base Case.Assume that|C|=1, so thatC=u1u1. Thenfu1(u1)≥fu1(u1)−w(u1,u1)+

valΓ(u1)follows by Fact 3. SincevalΓ(u1)≥valΓ(v)by hypothesis, then the thesis follows.

– Inductive Step.Assume by induction hypothesis that the following holds:

fu1(u1)≥fui(ui)−

i−1

j=1

w(uj,uj+1) + (i−1)·valΓ(v).

By Fact 3, we have:

fui(ui)≥ fui(ui+1)−w(ui,ui+1) +valΓ(ui).

SincevalΓ(ui+1) =valΓ(ui)holds by hypothesis, then we have fui+1 =fui. Recall thatvalΓ(ui)≥valΓ(v)also holds by hypothesis.

Thus, we obtain the following:

fu1(u1)≥fui+1(ui+1)−

i

j=1

w(uj,uj+1) +i·valΓ(v).

This proves Fact 4.

(16)

2

• We are now in position to show that every cycleCthat is reachable fromvinGΓ

σ0 satisfies w(C)/|C| ≥valΓ(v). By Fact 1 and Fact 2, we havevalΓ(v)≤valΓ(u1) =valΓ(ui)for everyi∈[1,|C|]. At this point, we apply Fact 4. Consider the specialization of Fact 4 when i=|C|and also recall thatu|C|+1=u1. Then, we have the following:

fu1(u1)≥fu1(u1)−

|C|

j=1

w(uj,uj+1) +|C| ·valΓ(v).

As a consequence, the following lower bound holds on the average weight ofC:

w(C)

|C| = 1

|C|

|C|

j=1

w(uj,uj+1)≥valΓ(v),

which concludes the proof.

2 Remark 1. Notice that Theorem 4 holds, in particular, when fv is the least SEPM fv of the reweighted EGΓv. This follows because v∈Vfv always holds for the least SEPM fvof the EG Γv, as shown next: by Lemma 2 and by definition ofΓv, then v is a winning starting position for Player 0 in the EGΓv(for some initial credit); now, since fvis the least SEPM of the EGΓv, then v∈Vfv follows by Item 2 of Lemma 3.

4 An O(|V |

2

|E |W ) time Algorithm for solving the Value Prob- lem and Optimal Strategy Synthesis in MPGs

This section offers a deterministic algorithm for solving the Value Problem and Optimal Strat- egy Synthesis of MPGs within O(|V|2|E|W) time and O(|V|) space, on any input MPG Γ= (V,E,w,hV0,V1i).

Let us now recall some notation in order describe the algorithm in a suitable way.

Given an MPGΓ= (V,E,w,hV0,V1i), consider again the following reweightings:

Γi,jw−i−Fj, for anyi∈[−W,W]and j∈[0,s−1], wheres=|F|V||andFjis the j-th term ofF|V|.

AssumingFj=Nj/Djfor someNj,Dj∈N, we focus on the following weights:

wi,j=w−i−Fj=w−i−Nj Dj; w0i,j=Djwi,j=Dj(w−i)−Nj. Recall thatΓi,jis defined asΓi,j:=Γw

0

i,j, which is an arena having integer weights. Also notice that, sinceF0< . . . <Fs−1is monotone increasing, then the corresponding weight functionswi,j

(17)

can be ordered in a natural way, i.e.,w−W,1>w−W,2> . . . >wW−1,s−1> . . . >wW,s−1. In the rest of this section, we denote byfw0

i,j :V→CΓi,j the least SEPM of the reweighted EGΓi,j. Moreover, the function fi,j:V →Q, defined as fi,j(v):= 1

Djfw0 i,j

(v)for everyv∈V, is called therational scalingof fw0

i,j.

4.1 Description of the Algorithm

In this section we shall describe a procedure whose pseudo-code is given below in Algorithm 1.

It takes as input an arenaΓ= (V,E,w,hV0,V1i), and it aims to return a tuple(W0,W1,ν,σ0)such that: W0 andW1are the winning regions of Player 0 and Player 1 in the MPGΓ(respectively), ν:V→SΓis a map sending each starting positionv∈V to its optimal value, i.e.,ν(v) =valΓ(v), and finally,σ0:V0→V is an optimal positional strategy for Player 0 in the MPGΓ.

The intuition underlying Algorithm 1 is that of considering the following sequence of weights:

w−W,1>w−W,2> . . . >w−W,s−1>w−W+1,1>w−W+1,2> . . . >wW−1,s−1> . . . >wW,s−1

where the key idea is that to rely on Theorem 3 at each one of these steps, testing whether a transition of winning regionshas occurred. Stated otherwise, the idea is to check, for each vertex

start

v∈?W0prev(i,j))∩W1i,j) w−W,1 w−W,2 w−W,3 · · · wprev(i,j) wi,j · · ·

· · ·

wW−1,s−1wW,1

Figure 4: An illustration of Algorithm 1.

v∈V, whethervis winning for Player 1 with respect to the current weightwi,j, meanwhile recalling whethervwas winning for Player 0 with respect to the immediately preceding elementwprev(i,j)in the weight sequence above.

If such a transition occurs, say for some ˆv∈W0prev(i,j))∩W1i,j), then one can easily computevalΓ(v)ˆ by relying on Theorem 3; Also, at that point, it is easy to compute an optimal positional strategy, provided that ˆv∈V0, by relying on Theorem 4 and Remark 1 in that case.

Each one of these phases, in which one looks at transitions of winning regions, is namedScan Phase. A graphical intuition of Algorithm 1 is given in Fig. 4.

Références

Documents relatifs

En effet, le problème de contrôle optimal pour les agents dans l’interprétation ci-dessus n’a pas plus de sens pour la raison suivante : si, d’une part, la contrainte m ≤ m

Robinson, Measurement of the Kerr effect in nickel-iron films, High Speed Computer System Research, Quarterly Progress Report No.. THE TRANSVERSE KERR MAGNETO-OPTIC

Digital color recognition could be used to track moving objects in the same way that rotoscoping is used as a Special Effects technique to superimpose a title

Sensitivity in Volt/nm of the real-time analog sensor as a function of the working point, characterized by the average value of the optical phase shift ¯ ϕ.. The dashed line is the

ð9Þ Despite the fact that the experimental observation [1 –3] of the H → γγ decay channel prevents the observed boson from being a spin-one particle, it is still important

Static uncurrying is a compile-time optimization that translates an ML- or Haskell-like sur- face language of unary functions to an intermediate language featuring n-ary functions

Then, we identify a class of weighted timed games, that we call robust, for which this reduction to the cycle forming game on the region graph is complete: in this case the good

Concernant le processus de privatisation à atteint 668 entreprises privatisées à la fin de l’année 2012, ce total la privatisation des entreprise économique vers les repreneurs