HAL Id: hal-02917552
https://hal.archives-ouvertes.fr/hal-02917552
Submitted on 17 Nov 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Arena-Independent Finite Memory
Patricia Bouyer, Stéphane Le Roux, Youssouf Oualhadj, Mickael Randour, Pierre Vandenhove
To cite this version:
Patricia Bouyer, Stéphane Le Roux, Youssouf Oualhadj, Mickael Randour, Pierre Vanden- hove. Games Where You Can Play Optimally with Arena-Independent Finite Memory. 31st International Conference on Concurrency Theory (CONCUR’20), Sep 2020, Vienna, Austria.
�10.4230/LIPIcs.CONCUR.2020.18�. �hal-02917552�
Arena-Independent Finite Memory
2
Patricia Bouyer
3
LSV – CNRS & ENS Paris-Saclay, Université Paris-Saclay, France
4
Stéphane Le Roux
5
LSV – CNRS & ENS Paris-Saclay, Université Paris-Saclay, France
6
Youssouf Oualhadj
7
LACL – UPEC, France
8
Mickael Randour
9
F.R.S.-FNRS & UMONS – Université de Mons, Belgium
10
Pierre Vandenhove
11
F.R.S.-FNRS & UMONS – Université de Mons, Belgium
12
LSV – CNRS & ENS Paris-Saclay, Université Paris-Saclay, France
13
Abstract
14
For decades, two-player (antagonistic) games on graphs have been a framework of choice for many
15
important problems in theoretical computer science. A notorious one is controller synthesis, which
16
can be rephrased through the game-theoretic metaphor as the quest for a winning strategy of the
17
system in a game against its antagonistic environment. Depending on the specification,optimal
18
strategies might be simple or quite complex, for example having to use (possibly infinite) memory.
19
Hence, research strives to understand which settings allow for simple strategies.
20
In 2005, Gimbert and Zielonka [27] provided a complete characterization of preference relations
21
(a formal framework to model specifications and game objectives) that admitmemorylessoptimal
22
strategies for both players. In the last fifteen years however, practical applications have driven the
23
community toward games with complex or multiple objectives, where memory — finite or infinite —
24
is almost always required. Despite much effort, the exact frontiers of the class of preference relations
25
that admitfinite-memoryoptimal strategies still elude us.
26
In this work, we establish a complete characterization of preference relations that admit optimal
27
strategies usingarena-independent finite memory, generalizing the work of Gimbert and Zielonka to
28
the finite-memory case. We also prove an equivalent to their celebrated corollary of great practical
29
interest: if both players have optimal (arena-independent-)finite-memory strategies in all one-player
30
games, then it is also the case in all two-player games. Finally, we pinpoint the boundaries of our
31
results with regard to the literature: our work completely covers the case of arena-independent
32
memory (e.g., multiple parity objectives, lower- and upper-bounded energy objectives), and paves
33
the way to the arena-dependent case (e.g., multiple lower-bounded energy objectives).
34
2012 ACM Subject Classification Theory of computation→Formal languages and automata theory
35
Keywords and phrases two-player games on graphs, finite-memory determinacy, optimal strategies
36
Digital Object Identifier 10.4230/LIPIcs.CONCUR.2020.18
37
Related Version Full paper on arXiv: https://arxiv.org/abs/2001.03894
38
Funding Research supported by F.R.S.-FNRS under Grant n◦F.4520.18 (ManySynth), F.R.S.-FNRS
39
mobility funding for scientific missions (Y. Oualhadj in UMONS, 2018), and ENS Paris-Saclay
40
visiting professorship (M. Randour, 2019). Mickael Randour is an F.R.S.-FNRS Research Associate
41
and Pierre Vandenhove is an F.R.S.-FNRS Research Fellow.
42
Acknowledgements We extend our warmest thanks to Mathieu Sassolas, for inspiring discussions
43
that were essential in starting this work.
44
© Patricia Bouyer, Stéphane Le Roux, Youssouf Oualhadj, Mickael Randour, and Pierre Vandenhove;
licensed under Creative Commons License CC-BY
1 Introduction
45
Controller synthesis through the game-theoretic metaphor. Two-player games on (finite)
46
graphs are studied extensively, in particular for their application to controller synthesis for
47
reactive systems (see, e.g., [29,37,7,2]). The seminal model is antagonistic(i.e.,zero-sumif
48
one chooses a quantitative view): player 1 (P1) is seen as the system to control, player 2 (P2)
49
as its antagonistic environment, and the game models their interaction. Each vertex of the
50
game graph (calledarena) models astate of the system and belongs to one of the players.
51
Players take turns moving a pebble from state to state according to the edges, each player
52
choosing the destination whenever the pebble is on one of his states. These choices are made
53
according to thestrategy of the player, which, in general, might use memory (bounded or
54
not) of the past moves to prescribe the next action.
55
The resulting infinite sequence of states, called play, represents the execution of the
56
system. The objective ofP1 is to enforce a givenspecification, often encoded as awinning
57
condition(i.e., a set of winning plays) or as apayoff functionto maximize (i.e., a quantitative
58
performance to optimize). This paradigm focuses on the worst-case performance of the
59
system, henceP2’s goal is to preventP1from achieving his objective.
60
The goal ofsynthesis is thus to decide ifP1 has a winning strategy, i.e., one ensuring
61
a given winning condition or guaranteeing a given payoff threshold, against all possible
62
strategies ofP2, and to build such a strategy efficiently if it exists.
63
Winning strategies are formal blueprints for controllers to implement in applications.
64
Therefore, their complexity is of tremendous importance: the simpler the strategy, the easier
65
and cheaper it will be to build the controller and maintain it. This explains why a lot of
66
research effort is constantly put in identifying the exact complexity (in terms of memory and/or
67
randomness) of strategies needed to playoptimally (i.e., to the best of the player’s ability)
68
for each specific class of games and objectives (e.g., [27, 17,14,42,22,10,1,4, 41,5,11]).
69
Alongside the practical interest lies the theoretical puzzle: understanding the underlying
70
mechanisms and implicit properties of games that lead to “simple” strategies being sufficient.
71
Given the numerous connections between games and various branches of mathematics and
72
computer science, this fundamental question has interest in its own right.
73
Preference relations. There are two prominent ways to formalize a game objective in the
74
literature. The first one, dubbedquantitativeand inspired by games in economics, is to use
75
payoff functionsmapping plays to numerical values, and to seeP1 as a maximizer player.
76
This is for example the case of mean-payoff games [19]. The second one, calledqualitative, is
77
to define aset of winning plays — calledwinning condition — induced by some property, as
78
in, e.g., parity games [20,43]. The two formalisms are strongly linked: the classical decision
79
problem for quantitative games is to fix a payoff threshold and ask ifP1 has a strategy to
80
guarantee it, transforming the problem into a qualitative one (where the winning plays are
81
all those with a payoff at least equal to the threshold). To define payoff functions or winning
82
conditions, one often uses weights, priorities, colors, etc, on states or edges of the arena.
83
In this work, we walk in the footsteps of Gimbert and Zielonka [27]: we associate acolor
84
to each edge of our arenas, and we adopt the abstract formalism ofpreference relations over
85
infinite sequences of colors (induced by plays). This general formalism permits to encode
86
virtually all classical game objectives, both qualitative and quantitative, and lets us reason
87
in a well-founded framework under minimal assumptions.
88
Memoryless optimal strategies. Remarkably, several canonical classes of games that have
89
been around for decades and proved their usefulness over and over — e.g., mean-payoff [19],
90
parity [20,43], or energy games [12] — share a desirable property: they all admitmemoryless
91
optimal strategies for both players. That is, for every strategy σi ofPi, there is a strategy
92
σMLi which isat least as good (i.e., wins wheneverσiwins or ensures at least the same payoff)
93
and that uses no memory at all. Such a memoryless strategy always picks the same edge
94
when in the same state, regardless of what happened before in the game.
95
Memoryless strategies are the simplest kind of strategies one can use. Therefore, it
96
is quite interesting that they suffice for objectives as rich as the ones we just discussed.
97
Following this observation, a lot of effort has been put in understanding which games
98
admit memoryless optimal strategies, and in identifying the exact frontiers ofmemoryless
99
determinacy. Let us mention, non-exhaustively, works by Gimbert and Zielonka [26, 27]
100
(culminating in a complete characterization), Aminof and Rubin [1] (through the prism of
101
first-cycle games), and Kopczyński [33] (half-positional determinacy). All these advances
102
were built by identifying the common underlying mechanisms in ad hoc proofs for specific
103
classes of games, and generalizing them to wide classes (e.g., the first-cycle games of [1] are
104
inspired by the seminal paper of Ehrenfeucht and Mycielski on mean-payoff games [19]).
105
Gimbert and Zielonka’s approach. Arguably, the most important result in this direction is
106
thecomplete characterizationof preference relations admitting memoryless optimal strategies,
107
established in [27], fifteen years ago. By complete characterization, we mean sufficientand
108
necessary conditions on the preference relations.
109
It can be stated as follows: a preference relation admits memoryless optimal strategies
110
for both players on all arenas if and only if the relation (used byP1) and its inverse (used
111
by P2) aremonotone andselective. These concepts will be defined formally in Section 3,
112
but let us give an intuition here. Roughly, a preference relation is monotone if it is stable
113
underprefix addition: given two sequences of colors such that one isstrictly preferred to the
114
other, it is impossible to reverse this order of preference by adding the same prefix to both
115
sequences. Selectivity is similarly defined with regard tocycle mixing: if a preference relation
116
is selective, then, starting from two sequences of colors, it is impossible to create a third one
117
by mixing the first two in such a way that the third one is strictly preferred to the first two.
118
These elegant notions coincide with the natural intuition that memoryless strategies suffice if
119
there is no interest in behaving differently in a state depending on what happened before.
120
In addition to this complete characterization, Gimbert and Zielonka proved another great
121
result, of high interest in practice [27, Corollary 7]: as a by-product of their approach, they
122
obtain that if memoryless strategies suffice in all one-player games ofP1 and all one-player
123
games of P2, they also suffice in all two-player games. Such a lifting corollary provides a
124
neat and easy way to prove that a preference relation admits memoryless optimal strategies
125
without proving monotony and selectivity at all: proving it in the two one-player subcases,
126
which is generally much easier as it boils down to graph reasoning, and then lifting the result
127
to the general two-player case through the corollary.
128
The rise of memory. The need to model complex specifications has shifted research toward
129
games where multiple (quantitative and qualitative) objectives co-exist and interact, requiring
130
the analysis of interplay andtrade-offs between several objectives. Hence, a lot of effort
131
is put in studying games where objectives are actually conjunctions of objectives, or even
132
richer Boolean combinations. See for example [16] for combinations of parity, [13,17,32] for
133
combinations of energy and parity, [42] for combinations of mean-payoff, [5,4] for combinations
134
of energy and average-energy, [11] for combinations of energy and mean-payoff, [14] for
135
combinations of total-payoff, or [14,10,8] for combinations of window objectives.
136
s2 s1
1
1
−1
−1
Figure 1P1 (circle) needs infinite memory to win.
When considering such rich objectives,memoryless strategies usually do not suffice, and
137
one has to use an amount of memory that can quickly become an obstacle to implementation
138
(e.g., exponential memory) or that can prevent it completely (infinite memory). Establishing
139
precise memory bounds for such general combinations of objectives is tricky and sometimes
140
counterintuitive. For example, while energy games and mean-payoff games are inter-reducible
141
in the single-objective setting, exponential-memory strategies are both sufficient and necessary
142
for conjunctions of energy objectives [17,32] whileinfinite-memory strategies are required
143
for conjunctions of mean-payoff ones [42].
144
A natural question arises: which preference relations do admit finite-memory optimal
145
strategies? Surprisingly, whether an equivalent to Gimbert and Zielonka’s characterization
146
could be obtained for finite memory or not has remained an open question up to now. It is
147
worth noticing that such an equivalent could be of tremendous help in practice, especially if
148
alifting corollary also holds: see for example [5,4,11], where proving that finite-memory
149
strategies suffice in one-player games was fairly easy, in contrast to the high complexity of
150
the two-player case — a lifting corollary could grant the two-player case for free!
151
Having said that, one has to hope that the following corollary can be established: “if
152
finite-memory strategies suffice in all one-player games ofP1and all one-player games of P2,
153
they also suffice in all two-player games.” Unfortunately, this hope is but a delusion.
154
Lifting corollary: a counterexample. Consider games where the colors are integers, and
155
the objective ofP1 is to create a play such that (a) the running sum of weights grows up to
156
infinity (e.g., consider its lim inf to define it properly),or (b) this running sum of weights
157
takes value zero infinitely often. As this defines a qualitative objective, the corresponding
158
preference relation induces only two equivalence classes: winning and losing plays. The
159
inverse relation, used byP2, is trivial to obtain. It is fairly easy to prove thatP1 always
160
has finite-memory optimal strategies in his one-player games (i.e., games whereP2 has no
161
choice), and so doesP2 in his one-player games.
162
Now, consider the very simple two-player game depicted in Figure1. PlayerP1(circle)
163
has aninfinite-memory strategy to win: keeping track of the running sum of weights (which
164
is unbounded, hence the need for infinite memory) and looping ins1up to the point where
165
this sum hits zero, then going tos2. This strategy ensures victory because eitherP2always
166
goes back tos1, in which case (b) is satisfied; orP2 eventually loops forever ons2, in which
167
case (a) is satisfied. It remains to argue thatP1has nofinite-memory winning strategy in
168
this game. This can be done using a standard argument: whatever the amount of memory
169
used byP1,P2 may loop ins2 long enough as to exceed the bound up to whichP1can track
170
the sum accurately; thus doomingP1 to fail to reset the sum to zero ins1 infinitely often.
171
This modest example proves thatGimbert and Zielonka’s approach cannot work in full
172
generality in the finite-memory case, and for good reasons. Informally, in this case, the
173
corollary breaks down because of (the absence of some sort of) monotony. In the case of
174
memorylessstrategies, as in [27],P1is already doomed in one-player games in the absence of
175
monotony: two prefixes to distinguish — in order to play optimally — can be hardcoded as
176
different paths leading to the same state in a game arena, as if they were chosen by P2 in a
177
two-player game. In the case offinite-memorystrategies, however, the situation is different.
178
In one-player games, the number of such paths that can be hardcoded in an arena is always
179
bounded, hence finite memory might suffice to react, i.e., to keep track of which prefix is the
180
current one and how to behave accordingly. However, in two-player games,P2 might create
181
an infinitenumber of prefixes to distinguish (using a cycle), thus requiringP1 to use infinite
182
memory to be able to do so. This is exactly what happens in the example above: in any
183
one-player game, the largest sum thatP1 has to track is bounded, whereasP2 can make this
184
sum as large as he wants in two-player games.
185
Our approach. We generalize Gimbert and Zielonka’s results — characterizationand lifting
186
corollary — to the case of arena-independent finite memory. That is, we encompass all
187
situations where the memory needed by the two players issolely dependent on the preference
188
relation(e.g., colors, dimensions of weight vectors), and noton the game arena (i.e., number
189
of edges/states). Let us take some examples.
190
All memoryless-determined relations — studied in [27] — use arena-independent memory:
191
the memory required, none, is the same for all arenas.
192
Combinations of parity objectives use arena-independent memory [16]: the memory only
193
depends on the number of objectives and the number of priorities — both parameters of
194
the preference relation, not on the size of the arena.
195
Lower- and upper-bounded energy objectives also use arena-independent memory (see,
196
e.g., [3,5, 4]): the memory only depends on the bounds and the weights — parameters
197
of the preference relation, not on the size of the arena.
198
On the contrary, combinations of lower-bounded energy objectives (with no upper bound)
199
requirearena-dependent memory [17,32]: it depends on the size of the arena in addition
200
to the weights used in it. Such a setting falls outside the scope of our results.
201
This informal concept of arena-independent memory is transparent in our work: in all
202
our results, we usememory skeletons — essentially Mealy machines without a next-action
203
function (Section2) — that suffice forall arenas, and that are at the basis of the strategies
204
we build. A quick look at our main concepts (Section3) and results (Section4) suffices to
205
grasp the formalism behind this intuition.
206
This restriction to arena-independent memory is natural given the counterexample to a
207
general approach presented above. It is also important to note that it is not as restrictive as
208
it may seem, as hinted by the examples above: we arenot restricted to constant memory
209
but to memory only depending on theparameters of the preference relation (or equivalently,
210
objective), and not of the arena. This framework thus already encompasses many objectives
211
from the literature — e.g., [19, 20, 43, 12, 5, 21, 16, 10, 14, 3, 5, 4], as well as possible
212
extensions. More details in AppendixA.
213
The arena-independent case, which we solve here, is an exact equivalent to Gimbert and
214
Zielonka’s results in the finite-memory case: the memoryless case is de facto arena-independent.
215
Therefore, this paper strictly generalizes [27] by allowing to study any arena-independent
216
memory skeleton instead of the unique trivial one corresponding to memoryless strategies.
217
Outline of our contributions. Informally, our characterization can be stated as follows:
218
given a preference relation and a memory skeleton M, both players have optimal finite-
219
memory strategies based on skeletonMin all games if and only if the relation and its inverse
220
areM-monotone andM-selective.
221
These last two concepts are keys to our approach. Intuitively, they correspond to Gimbert
222
and Zielonka’s monotony and selectivity,modulo a memory skeleton. Recall that monotony
223
and selectivity are related to stability of the preference relation with regard to prefix addition
224
and cycle mixing, respectively. Our more general concepts ofM-monotony andM-selectivity
225
serve the same purpose, but they only compare sequences of colors that are deemed equivalent
226
by the memory skeleton. For the sake of illustration, take selectivity: it implies that one has
227
no interest in mixing different cycles of the game arena. For its generalization, the memory
228
skeleton is taken into account: M-selectivity implies that one has no interest in mixing cycles
229
of the game arenathat are read as cycles on the same memory state in the skeleton M.
230
Let us give a quick breakdown of our paper. Due to space constraints, we only provide
231
an intuitive overview of our results and technical approach in this conference version: formal
232
details and proofs, along with additional results, can be found in the full article [6].
233
In Section2, we introduce some basic notions, including the memory skeletons, and we
234
establish several technical results. We also discussoptimal strategies andNash equilibria.
235
In Section3, we introduceM-monotony and M-selectivity, cornerstones of our work. We
236
also present two essential tools: prefix-coversandcyclic-coversof arenas. Section4states
237
formally our characterization (Theorem 9), as well as the corresponding lifting corollary
238
(Corollary10), from one-player to two-player games. We show an example of application
239
in Section5. Finally, we give an overview of the technical highlights of our approach in
240
Section6— its details are broken down in several intermediate results in our full paper [6].
241
In a nutshell, the proof of the characterization (Theorem 9) is split in two. We first
242
establish that (the sufficiency of) finite memory based on M implies M-monotony and
243
M-selectivity of the preference relation. The crux is to build game arenas based on automata
244
recognizing the languages involved in the two concepts, and to use the existence of finite-
245
memory optimal strategies in these arenas to prove thatM-monotony andM-selectivity hold.
246
To prove the converse implication, we proceed in two steps, first establishing the existence of
247
memorylessoptimal strategies in “covered” arenas, and then building on it to obtain the
248
existence offinite-memoryoptimal strategies in general arenas. The main technical tools we
249
use are Nash equilibria and the aforementioned notions of prefix-covers and cyclic-covers.
250
Alongside the technical details, we analyze our characterization in Appendix A: we
251
highlight some limitations and interesting features, compare our techniques with Gimbert and
252
Zielonka’s, discuss our place in the research landscape, and sketch directions for future work.
253
Let us just stress already that our result — relating a memory skeletonMand preference
254
relations for which this skeleton suffices — cannot be obtained by simply considering product
255
arenas and invoking Gimbert and Zielonka’s result on memoryless determinacy [27]. While,
256
of course, memoryless strategies on product arenas correspond to memoryfull strategies on
257
original arenas (as we will formally establish in Lemma1), invoking [27] requires to be able
258
to quantify onall arenas, not onlyproduct arenas. Filling this gap is exactly the goal of this
259
paper, and it is made possible through the new concepts we sketched above.
260
2 Preliminaries
261
We only give here the notions and results necessary to understand this overview. Necessary
262
notions and results — some of them interesting in their own right — to understand the
263
technical details of the approach are found in the full paper [6].
264
Automata and languages of colors. LetCbe an arbitrary set ofcolors. We assume knowl-
265
edge ofnon-deterministic finite-state automata (NFA), which recognizeregular languages.
266
For any finite subsetB⊆C, we denote byReg(B) the set of all regular languages overB.
267
LetR(C) =S
B⊆C,|B|<∞Reg(B),that is, all the regular languages built overC.
268
Let K⊆C∗ be a language offinitewords. We writePrefs(K) for the set of allprefixes of
269
words inK. We define the set ofinfinite words [K] ={w=c1. . . ∈Cω| ∀n≥1, c1. . . cn∈
270
Prefs(K)},which contains all infinite words for which every finite prefix is a prefix of a word
271
inK. Intuitively, ifK is regular, [K] is the language of infinite words that correspond to
272
infinite paths that can always branch and reach a final state, on an automaton forK. Given
273
a finite wordw∈C∗ and a languageK⊆C∗, we writewK for their concatenation, i.e., the
274
languagewK={w0=ww00|w00∈K} ⊆C∗.
275
Arenas. We consider two players: player 1 (P1) and player 2 (P2). An arena is a tuple
276
A= (S1, S2, E) such thatS =S1]S2(disjoint union) is a finite set ofstatespartitioned into
277
states ofP1 (S1) andP2 (S2), andE⊆S×C×S is a finite set ofedges. Letcol:E→Cbe
278
the projection of edges tocolorsandcolc its natural extension tosequences of edges. For an
279
edgee∈E, we usein(e) andout(e) to denote its starting state and arrival state respectively,
280
i.e.,e= (in(e),col(e),out(e)). We assume all arenas to be non-blocking, i.e., for all s∈S,
281
there existse∈Esuch thatin(e) =s. Fori∈ {1,2}, we call an arenaA= (S1, S2, E) aPi’s
282
one-player arena if for alls∈S3−i,|{e∈E|in(e) =s}|= 1 — that is,P3−i has no choice.
283
Let Hists(A, s) denote the histories in A from s ∈ S, i.e., finite sequences of edges
284
ρ= e1. . . en ∈E+ such that in(e1) = s and for alli, 1 ≤i < n, out(ei) = in(ei+1). Let
285
Plays(A, s) denote theplays inAfroms∈S, i.e., infinite sequences π=e1e2. . . ∈Eω such
286
thatin(e1) =sand for alli≥1,out(ei) =in(ei+1). We write Hists(A, S0) andPlays(A, S0)
287
for unions overS0 ⊆S, and writeHists(A) andPlays(A) for the unions over all states ofA.
288
Let ρ=e1. . . en ∈Hists(A) (resp.π=e1e2. . .∈Plays(A)): we extend the operatorin
289
to histories (resp. plays) by identifyingin(ρ) (resp.in(π)) to in(e1). We proceed similarly
290
foroutand histories: out(ρ) =out(en). For the sake of convenience, we consider that any
291
set Hists(A, s) contains the empty history λs such that in(λs) = out(λs) = s. We write
292
Histsi(A, s) andHistsi(A) for the subsets of historiesρsuch thatout(ρ)∈Si, i∈ {1,2}, i.e.,
293
histories whose last state belongs toPi. For any setH ⊆Hists(A), we writecol(H) for itsc
294
projection to colors, i.e.,col(Hc ) ={ccol(ρ)|ρ∈H}. We do the same for sets of plays.
295
Memory skeletons. Amemory skeleton is a tupleM= (M, minit, αupd) whereM is afinite
296
set ofstates,minit∈M is a fixedinitial state andαupd:M×C→M is anupdate function.
297
We writeαdupdfor the natural extension ofαupdto sequences inC∗. Memory skeletons are
298
deterministic and might have aninfinitenumber of transitions, in contrast to NFA. We define
299
thetrivial memory skeleton asMtriv= (M ={minit}, minit, αupd:{minit} ×C→ {minit}): it
300
permits to formalize memoryless strategies [27] in our framework.
301
Let M= (M, minit, αupd) be a skeleton. Form, m0∈M, we define the languageLm,m0 =
302
{w∈C∗|αdupd(m, w) =m0}that contains all words that can be read frommto m0 inM.
303
LetM1= (M1, m1init, α1upd) andM2= (M2, m2init, α2upd). Theirproduct M1⊗ M2is the
304
memory skeletonM= (M, minit, αupd) whereM =M1×M2, minit= (m1init, m2init), and, for
305
allm1 ∈M1,m2∈M2,c∈C, αupd((m1, m2), c) = (α1upd(m1, c), α2upd(m2, c)). That is, the
306
memories are updated in parallel when a color is read.
307
Product arenas. LetA= (S1, S2, E) be an arena andM= (M, minit, αupd) be a skeleton.
308
Their product AnM is the arena (S10, S20, E0) where S10 = S1×M, S20 = S2×M, and
309
E0 ⊆S0×C×S0, withS0 =S10 ]S20, is such that ((s1, m1), c,(s2, m2))∈E0 if and only if
310
(s1, c, s2)∈Eandαupd(m1, c) =m2. That is, the memory is updated according to the colors
311
of the edges inE. Though Mmight contain an infinite number of transitions, AnM is
312
always finite, asE is finite. Since we assumeAis non-blocking, it is also the case ofAnM.
313
Strategies. A strategy σi for Pi, i ∈ {1,2}, on arena A = (S1, S2, E), is a function
314
σi:Histsi(A)→E such that for allρ∈Histsi(A),in(σi(ρ)) =out(ρ). Let Σi(A) be the set
315
of all strategies ofPion A. Afinite-memory strategy σi can be encoded as aMealy machine,
316
i.e., a skeletonM= (M, minit, αupd) with transitions over afinite subset of colorsB ⊆C,
317
enriched with anext-action function αnxt:M ×Si →E such that for all m∈M, s∈ Si,
318
in(αnxt(m, s)) =s. Given a Mealy machine Γσi = (M, αnxt), strategyσi is defined as follows:
319
∀s∈Si, σi(λs) =αnxt(minit, s),
320
∀ρ·e∈Histsi(A),e∈E, σi(ρ·e) =αnxt
αdupd
minit,colc(ρ·e)
,out(e) .
321
Let ΣFMi (A) be the finite-memory strategies ofPionA. We say that a strategyσi∈ΣFMi (A)
322
isbased on memory skeleton Mif it can be encoded as a Mealy machine Γσi= (M, αnxt), as
323
above. We implicitly assume that strategies of ΣFMi (A) are built by restricting the transitions
324
of their skeletonMto the actual subset of colors appearing inA. A strategyσiismemoryless
325
if it is a functionσi: Si→E, or equivalently, if it is based onMtriv.
326
We writePlays(A, s, σi) for the playsconsistentwith a strategyσiofPifrom a states, i.e.,
327
playsπ=e1e2. . .∈Plays(A, s) such that for allρ=e1. . . en,out(ρ)∈Si =⇒ σi(ρ) =en+1.
328
We writePlays(A, s, σ1, σ2) for the singleton set containing the unique play consistent with a
329
couple of strategies for the two players. We use similar notations for histories.
330
Preference relations. Let v be a total preorder on Cω, called preference relation. We
331
considerantagonistic games, where the objective of P1 is to create the best possible play
332
with regard tovwhereas the objective of P2 is the opposite. That is,P2 uses the inverse
333
relationv−1. This corresponds tozero-sum games when using a quantitative framework.
334
Givenw, w0 ∈Cω, we writew@w0 if we have¬(w0 vw) since the preorder is total. We
335
extendvto subsets: forW, W0 ⊆Cω,W vW0 ⇐⇒ ∀w∈W,∃w0 ∈W0, wvw0.We also
336
writeW @W0 ⇐⇒ ∃w0∈W0, ∀w∈W, w@w0.Note thatW @W0 ⇐⇒ ¬(W0vW).
337
To compare a wordw∈Cω with a languageK⊆Cω, we simply identify it to{w}.
338
Games. A (deterministic turn-based two-player)game is a tupleG = (A,v) whereA is
339
an arena andvis a preference relation. All classical objectives from the literature (both
340
qualitative and quantitative) can be expressed in the general framework of preference relations
341
(see Example 3 in [6]). Fori∈ {1,2}, aPi’s one-player gameis a gameG= (A,v) such that
342
Ais a Pi’s one-player arena.
343
Optimal strategies. Let G = (A,v) be a game on arena A = (S1, S2, E). Given a Pi-
344
strategyσi∈Σi(A) and a states∈S, we define
345
UColv(A, s, σi) ={w∈Cω| ∃σ3−i∈Σ3−i(A),col(Plays(A, s, σc 1, σ2))vw},
346
DColv(A, s, σi) ={w∈Cω| ∃σ3−i∈Σ3−i(A), wvcol(Plays(A, s, σc 1, σ2))}.
347348
Note thatDColv(A, s, σi) =UColv−1(A, s, σi). Intuitively,UColv andDColv represent the
349
upwardanddownward closuresof sequences of colors (consistent with a strategy) with respect
350
to the preference relation.
351
Taking the standpoint of P1, we say thatσ1∈Σ1(A) isat least as good as σ01∈Σ1(A)
352
froms∈S ifUColv(A, s, σ1)⊆UColv(A, s, σ01). Intuitively,σ1 is at least as good asσ10 if
353
the “worst-case” plays consistent withσ1 are at least as good as the ones consistent withσ10.
354
TheUColoperator is useful to define this notion properly even in the case where there is no
355
“worst-case” play (i.e., if the infimum used in the classical quantitative setting is not reached).
356
Similar notions have been used before, e.g., in [38]. Symmetrically, for P2, we say that
357
σ2∈Σ2(A) is at least as good asσ20 ∈Σ2(A) froms∈SifDColv(A, s, σ2)⊆DColv(A, s, σ02).
358
Now, we say thatσi∈Σi(A) ofPi isoptimal froms∈S, akas-optimal, if it is at least
359
as good as every otherσ0i∈Σi(A) froms. We extend this notation to subsets of states in
360
the natural way, and we say that a strategyσi isuniformly-optimal if it isS-optimal.
361
Our goal is to characterize the preference relations that admit uniformly-optimal finite-
362
memory (UFM) strategies based on a given skeletonMin all arenas. We also discuss the
363
simpler case ofuniformly-optimal memoryless (UML) strategies, which corresponds to the
364
subcase studied by Gimbert and Zielonka [27], using the trivial skeletonMtriv.
365
In that respect, the following link is important to observe.
366
ILemma 1. Let G= (A,v)be a game on arena A= (S1, S2, E). Let M= (M, minit, αupd)
367
be a memory skeleton and letσi∈ΣFMi (A)be a finite-memory strategy encoded by the Mealy
368
machineΓσi = (M, αnxt). Then,σi is a UFM strategy in G if and only ifαnxt corresponds to
369
an(S× {minit})-optimal memoryless strategy inG0= (AnM,v).
370
Nash equilibria. We use Nash equilibria [36] as tools to establish the existence of optimal
371
strategies in some of our proofs. LetG= (A,v) be a game on arenaA= (S1, S2, E). Formally,
372
aNash equilibrium(NE) from a states∈Sis a couple of strategies (σ1, σ2)∈Σ1(A)×Σ2(A)
373
such that, for allσ10 ∈Σ1(A),σ20 ∈Σ2(A),
374
col(Plays(A, s, σc 10, σ2))vcol(Plays(A, s, σc 1, σ2))vcol(Plays(A, s, σc 1, σ02)). (1)
375
Similarly to optimal strategies, we call an NEuniformif it is an NE from all states s∈S.
376
The connection between optimal strategies andNash equilibria in our specific context of
377
antagonisticgames is interesting to discuss, especially with respect to Gimbert and Zielonka’s
378
original work [27]. We defer this discussion to [6] due to space constraints, and only provide
379
here a brief account of the results one has to know to understand this overview. First, NE
380
are de facto pairs of optimal strategies. Second, it is possible to mix different NE.
381
ILemma 2. LetG= (A,v)be a game on arenaA= (S1, S2, E), and lets∈S be a state.
382
Let (σ1a, σ2a)and(σb1, σb2)∈Σ1(A)×Σ2(A)be two Nash equilibria froms. Then,(σa1, σb2) is
383
also a Nash equilibrium froms.
384
IRemark 3. Lemma2crucially relies on the assumption (transparent in our definition of
385
Nash equilibrium) that we considerantagonisticgames, that is,P2uses the inverse preference
386
relationv−1. C
387
3 Concepts
388
Generalizing monotony and selectivity. As seen in Section 1, Gimbert and Zielonka’s
389
characterization [27] relies onmonotony andselectivity of the preference relation. The main
390
difference between their technical approach and ours is the following. In the memoryless
391
setting, all the reasoning can be abstracted awayfrom the underlying arena and done on
392
sequences of colors. In the finite-memory one, however, one has to pay attention to how
393
sequences of colors are composed and compared, to maintain consistency with regard to
394
the memory and the game arena. This need to intertwine abstract reasoning on arbitrary
395
sequences of colors with concrete tracking of memory updates is the key obstacle to overcome.
396
Much of our effort was thus spent on trying to define concepts that would preserve the
397
elegance of monotony and selectivity while allowing us to lift the theory to the finite-memory
398
case. As often the case, the good concepts turned out to be the most natural ones, capturing
399
the intuitive idea that one needs monotony and selectivitymodulo a memory skeleton.
400
IDefinition 4(M-monotony). LetM= (M, minit, αupd)be a memory skeleton. A preference
401
relationvisM-monotoneif for allm∈M, for allK1, K2∈ R(C),
402
(∃w∈Lminit,m,[wK1]@[wK2]) =⇒ (∀w0∈Lminit,m,[w0K1]v[w0K2]). (2)
403
Recall that a skeletonMhas a fixed initial stateminit. Intuitively,M-monotonyextends
404
Gimbert and Zielonka’s monotony by comparing prefixesbelonging to the same language
405
Lminit,m, that is, prefixes that are deemed equivalent by skeletonM. This property roughly
406
captures thatvisstable with regard to prefix addition, for memory-equivalent prefixes.
407
The original monotony notion is equivalent to ourM-monotony withMbeing the trivial
408
skeletonMtriv: that is, the memoryless case is naturally a subcase of our framework.
409
IDefinition 5(M-selectivity). LetM= (M, minit, αupd)be a memory skeleton. A preference
410
relationvisM-selectiveif for allw∈C∗,m=αdupd(minit, w), for allK1, K2∈ R(C)such
411
that K1, K2⊆Lm,m, for all K3∈ R(C),
412
[w(K1∪K2)∗K3]v[wK1∗]∪[wK2∗]∪[wK3]. (3)
413
Similarly, M-selectivity extends Gimbert and Zielonka’s selectivity by asking one to
414
compare sequences of colorsbelonging to the same language Lm,m, that is, sequences read as
415
cycles on the memory skeleton. Note also that the memory statem should be consistent
416
with the prefixwread from the initial memory stateminit. This property roughly captures
417
thatvisstable with regard to cycle mixing, for memory-equivalent cycles.
418
Again, the original selectivity notion is exactly equivalent toMtriv-selectivity.
419
In a nutshell,M-monotony deals with prefixes up to the first cycle (on memory) and
420
M-selectivity deals with the cycles thereafter; we will see that memory skeletons can be built
421
in a compositional way based on these two orthogonal yet complementary tasks.
422
Our notions respect the natural intuition that access to additional memory should always
423
be helpful: if a skeletonMis sufficient to classify sequences of colors in a way that guarantees
424
M-monotony andM-selectivity, then it should also be the case for “more powerful” skeletons.
425
ILemma 6. Let M and M0 be two memory skeletons. If v is M-monotone (resp. M-
426
selective) then, it is also (M ⊗ M0)-monotone (resp.(M ⊗ M0)-selective).
427
Prefix-covers and cyclic-covers. While the concepts of M-monotony and M-selectivity
428
are the primordial ones for stating the characterization, we still need two additional notions
429
to prove it. Let us sketch the issue. To prove that monotone and selective preference relations
430
yield UML strategies, Gimbert and Zielonka deploy aninductive argument on the number
431
of choices in an arena. Intuitively, we want to use a similar approach for UFM strategies,
432
but because of the unavoidable coupling between the memory skeleton and the arena (e.g.,
433
Lemma1), the induction argument breaks, as adding one choice in the arena results in adding
434
many in theproduct arena (as many as there are memory states), where the reasoning needs
435
to take place. New insight and techniques are thus needed to patch this scheme.
436
To solve this issue, we decouple the two aspects. We first establish that, on arenas that
437
inherently share the same good properties as product arenas (i.e., they already “classify”
438
prefixes and cycles as the memory would), we can deploy the induction argument and obtain
439
UML strategies. Then, we obtain UFM strategies on general arenas as a corollary. The crux
440
is identifying such “good” arenas: this is done through the following notions.
441
I Definition 7 (Prefix-covers and cyclic-covers). Let M = (M, minit, αupd) be a memory
442
skeleton andA= (S1, S2, E)be an arena. LetScov⊆S.
443
We say that Mis a prefix-cover ofScov in Aif for alls∈S, there existsms∈M such
444
that, for all ρ∈Hists(A) such thatin(ρ)∈Scov,out(ρ) =s and such that for allρ0 proper
445
prefix ofρ,out(ρ0)6=s, we have αdupd(minit,col(ρ)) =c ms.
446
We say thatMis a cyclic-cover ofScovinAif for allρ∈Hists(A)such thatin(ρ)∈Scov,
447
ifs=out(ρ)and m=αdupd(minit,col(ρ)), for allc ρ0∈Hists(A)such that in(ρ0) =out(ρ0) =s,
448
αdupd(m,col(ρc 0)) =m.
449
Intuitively,M is a prefix-cover for a set of statesScov if the histories starting inScov and
450
visiting a given states∈S for the first time are read up to the same memory state in the
451
memory skeleton. Similarly,Mis a cyclic-cover ofAif the cycles ofAare read as cycles in
452
the memory skeleton, once the memory has been initialized properly.
453
As hinted above, the canonical example of a prefix- and cyclic-covered arena is a product
454
arena (but many more may be in this case; it is beneficial to be general with these concepts).
455
I Lemma 8. Let M = (M, minit, αupd) be a memory skeleton and A = (S1, S2, E) be an
456
arena. ThenM is a prefix- and cyclic-cover forScov=S× {minit} inAnM.
457
4 Characterization
458
Equivalence. We now have the necessary ingredients to state our equivalence result.
459
ITheorem 9(Equivalence). Letvbe a preference relation and letMbe a memory skeleton.
460
Then, both players have UFM strategies based on memory skeletonMin all gamesG= (A,v)
461
if and only if vandv−1 areM-monotone andM-selective.
462
We state this theorem broadly and with a focus on UFM strategies. The actual results
463
we have for each direction of the equivalence — see [6, Section 4 and Section 5] — are a
464
bit stronger, of wider applicability and/or more interesting, but this statement carries the
465
take-home message of our work. It is also meant to mirror the seminal result of Gimbert
466
and Zielonka [27, Theorem 2]: their result can be retrieved from Theorem9by taking the
467
trivial memory skeletonMtriv. As such, our work brings astrict generalization of Gimbert
468
and Zielonka’s results [27] to the finite-memory case.
469
Lifting corollary. As discussed in Section 1, the work of Gimbert and Zielonka contains
470
not one, buttwo great results. Alongside the aforementioned equivalence result, Gimbert
471
and Zielonka provide a corollary of high practical interest [27, Corollary 7]: they essentially
472
obtain as a by-product of their approach that if memoryless strategies suffice in all one-player
473
games ofP1 and all one-player games ofP2, they also suffice in all two-player games.
474
This provides an elegant way to prove that a preference relation (equivalently, an objective)
475
admits memoryless optimal strategieswithout proving monotony and selectivity at all: proving
476
it in the two one-player subcases, which is generally much easier as it boils down to graph
477
reasoning, and then lifting the result to the general two-player case through the corollary.
478
See examples of one-player vs. two-player complexity in [5,4,11].
479
Again, we are able to lift this corollary to the arena-independent finite-memory case.
480