Extension to general fields - The DART-Europe E-theses Portal

−∞ ⩾ xk for all k ∈ [n]). Hence, we have S = ∪K⊂[n]S_∩XK = tconv(∪K⊂[n]S_∩XK).

Therefore, the claim follows from Lemma 5.51. To prove the implication Theorem 5.5 (d) to Theorem 5.5 (e), let S _⊂ _Tⁿ be a projection of a tropical Metzler spectrahedron S ⊂ Tⁿ×Tⁿ^′. By Proposition 4.14, there is a spectrahedron S ⊂ Kⁿ⁺ⁿ_⩾₀ ^′ such that val(S) = S.

Let π:Kⁿ⁺ⁿ^′ → Kⁿ, π:Tⁿ⁺ⁿ^′ → Tⁿ denote the projections on the first n coordinates. Then val(π(S)) = π(val(S)) = π(S) = S. The implication Theorem 5.5 (e) to Theorem 5.5 (a) follows trivially from the fact that projected spectrahedra are semialgebraic (Proposition 2.18) and convex.

5.6 Extension to general fields

In this section we partially extend Theorem 5.5 to general real closed valued fields and prove Theorem 5.2. We want to do so using a model-theoretic argument. However, this requires to control the size of the spectrahedra constructed in the proof of Theorem 5.5. As a consequence, the proof is rather technical. We start by presenting a more eﬀective version of Proposition 5.34.

This variant allows us to give a uniform bounds on the complexity of describing the pieces involved in the piecewise description of the operator F. To formalize this uniformity, we fix rational matrices B⁽¹⁾, . . . , B^(q), B^(s) ∈ Q^m^s^×ⁿ and, for every real vector c = (c⁽¹⁾, . . . , c^(q)), c^(s)∈R^m^s, consider the set S_c_⊂_Rⁿdefined as S_c:=^∪^q_s=1{x∈Rⁿ:B^(s)x⩽c^(s)}.

Lemma 5.52. There exists a number N ⩾ 1 that depends on B⁽¹⁾, . . . , B^(q), but not on c, such that if S_c is a nonempty real tropical cone, then the operator F: Rⁿ → Rⁿ, F_k(x) :=

max{yk: y ∈ S_c, y ⩽ x} defined in the proof of Proposition 5.34 has a piecewise description (W^(s), A^(s), b^(s))_s_∈_[p] such that p ⩽ N, the matrices A^(s) are rational and stochastic, and the common denominator of all the numbersA^(s)_kl is not greater than N.

Proof. Fix csuch thatS_c is a nonempty real tropical cone. We will show how to construct the piecewise description of the operatorF that has the desired properties. Givenx∈Rⁿwe denote by Q(x) ⊂[q] the set of alls ∈[q]such that the polyhedron {y ∈Rⁿ:B^(s)y ⩽ c^(s), y ⩽x} is nonempty. SinceS_cis a nonempty real tropical cone, the the set Q(x) is never empty (take an arbitrary z∈S_cand consider λ+z forλ∈R small enough). Therefore, by the strong duality of linear programming ([Sch87, Corollary 7.1g]) we have

F_k(x) = max

s∈Q(x)max{y_k:B^(s)y ⩽c^(s), y ⩽x} (5.8)

= max

s∈Q(x)min{z^⊺c^(s)+w^⊺x: (B^(s))^⊺z+w=ϵ_k, z⩾0, w⩾0}, (5.9) where ϵ_k denotes the kth vector of the standard basis. The polyhedron W_k^(s) := {(z, w) ∈ R^m^s ×Rⁿ: (B^(s))^⊺z+w = ϵ_k, z ⩾ 0, w ⩾ 0} is pointed, and hence the minimum in (5.9) is attained in some vertex of this polyhedron and these vertices are rational ([Sch87, Chapter 8]).

LetV_k^(s) denote the set of vertices of W_k^(s). Therefore, we have Fk(x) = max

s∈Q(x) min

(z,w)∈V_k^(s)

{z^⊺c^(s)+w^⊺x}. (5.10)

Moreover, by Farkas’ lemma (Theorem 4.36), the polyhedron {y ∈ Rⁿ:B^(s)y ⩽ c^(s), y ⩽ x} is nonempty if and only if for all (z, w) such that (B^(s))^⊺z+w = 0, z ⩾ 0, w ⩾ 0, we have

104 Chapter 5. Tropical analogue of the Helton–Nie conjecture z^⊺c^(s)+w^⊺x⩾0. As previously, ifU^(s) consists of precisely one rational representative of every extreme ray of the polyhedral cone {(z, w) ∈R^m^s ×Rⁿ: (B^(s))^⊺z+w= 0, z ⩾0, w ⩾0}, then {y∈Rⁿ:B^(s)y⩽c^(s), y⩽x}is nonempty if and only ifz^⊺c^(s)+w^⊺x⩾0for all(z, w)∈U^(s). Let U denote the disjoint union ofU⁽¹⁾, . . . , U^(q). For every possible choice♢∈ {<, >,=}^Uwe define W˜^♢:={x∈Rⁿ:∀s∈[q],∀(z, w)∈U^(s), z^⊺c^(s)+w^⊺x♢(s,z,w)0}. Note that Q(x) is constant on eachW˜^♢. By fixing the terms achieving the maximum and minimum in (5.10) we subdivide the setsW˜^♢ into smaller sets,W˜^♢=^∪^N_j=1^♢ W˜_j^♢ such that every W˜_j^♢ is an intersection of half-spaces (both open and closed) andF is aﬃne onW˜_j^♢. SinceF is continuous (by Lemma 5.31), we can then restrict ourselves to these sets W˜_j^♢ that are full dimensional, and this gives the piecewise description ofF. To see thatF has the expected property, note that the definition ofU^(s)does not depend on c. Therefore, the number of sets W˜^♢ does not depend on c. Furthermore, the definition ofV_k^(s)also does not depend onc. Hence, the numbersN_♢are bounded by a quantity that does not depend on c. Moreover, by (5.10), the matrices A^(s) are of the form A^(s)_k =w^⊺. Hence, the common denominator of the numbers A^(s)_kl is bounded by a quantity that does not depend on c. The fact thatA^(s) must be stochastic follows from Lemma 5.32.

Theorem 5.2 follows from Lemma 5.52 by a careful examination of the proofs presented in Section 5.4. We split the proof into a series of lemmas. The first lemma is a uniform version of Proposition 5.44.

Lemma 5.53. There exists a number N ⩾1 that depends onB⁽¹⁾, . . . , B^(q), but not onc, such that ifS_c is a real tropical cone, then it is a projection of a real tropical Metzler spectrahedron S(Q⁽⁰⁾|Q⁽¹⁾, . . . , Q⁽ⁿ^′⁾), Q^(k)∈T^m_±^×^m, satisfying n^′⩽N andm⩽N.

Proof. Suppose that S_c is a real tropical cone. We will examine all the steps involved in the proof of Proposition 5.44. By Lemma 5.52 and the proof of Proposition 5.34, there existsN₁⩾1 independent ofcsuch that we haveS_c={x∈Rⁿ:x⩽F(x)}, where the operatorF:Rⁿ→Rⁿ is semilinear, monotone, homogeneous, and has a piecewise description (W^(s), A^(s), b^(s))_s_∈_[p]

satisfying p ⩽ N1. Moreover, the matrices A^(s) are stochastic, rational, and the common denominator of all the numbers A^(s)_kl is not greater than N₁. By Theorem 5.27, the operator F can be written as ∀k, F_k(x) = min_i∈[M_k_]max_s_∈_S_ki(A^(s)_k x+b^(s)_k ), where 1⩽M_k ⩽2^N¹ for all k⩾1and every set S_ki is a subset of[p].

Given the operator F, the proof of Lemma 5.38 constructs a directed graph G⃗ satisfying Assumption C. By the construction presented in the proof of Lemma 5.38, the vertices, edges, and probability distributions of G⃗ depend on (n,(M_k)_k,(S_ki)_ki,(A^(s))s), but not on (b^(s))_s_∈_[p]. The vector(b^(s))_s_∈_[p]is used only to define the weightsreassociated with the edges of the graph.

Subsequently, the proof of Proposition 5.44 proceeds by applying Lemmas 5.41 to 5.43 to the graph G⃗ and transforming it into a graphG⃗^′. Let n^′′ denote the number of Min vertices of G⃗^′ and m^′ denote the number of Max vertices of G⃗^′. By the constructions given in Lemmas 5.41 to 5.43, the numbers n^′′, m^′ do not depend on the weights re associated with the edges of G⃗. Hence, they do not depend on(b^(s))_s_∈_[p]. GivenG⃗^′, we apply Proposition 5.40 to construct some symmetric tropical matricesQ˜⁽¹⁾, . . . ,Q˜⁽ⁿ^′′⁾ such thatQ˜^(k) ∈T^m_±^′^×^m^′. As a final step, the proof of Proposition 5.44 transforms these matrices into matrices Q⁽⁰⁾, Q(1), . . . , Q⁽ⁿ^′⁾ ∈ T^m_±^×^m by adding some additional variables and2×2blocks to the matricesQ˜⁽¹⁾, . . . ,Q˜⁽ⁿ^′′⁾. However, the number of these additional variables and blocks depends only onn and n^′′. As a consequence, the numbersn^′, mdepend on (n,(M_k)_k,(S_ki)_ki,(A^(s))_s), but not on (b^(s))_s_∈_[p].

5.6. Extension to general fields 105 Since the number of possible tuples((Mk)k,(Ski)ki,(A^(s))s)is finite and bounded by a quan-tity that depends only on n, N₁ (not on c), the numbers n^′, mare also bounded by a quantity that does not depend onc.

The claim of the previous lemma can be extended to tropically convex sets using the proof of Lemma 5.49.

Lemma 5.54. There exists a number N ⩾1 that depends onB⁽¹⁾, . . . , B^(q), but not onc, such that if S_c is tropically convex, then it is a projection of a real tropical Metzler spectrahedron S(Q⁽⁰⁾|Q(1), . . . , Q⁽ⁿ^′⁾), Q^(k)∈T^m_±^×^m, satisfying n^′⩽N and m⩽N.

Proof. Let e = (1,1, . . . ,1). For every s ∈ [q] we denote B˜^(s) = [−B^(s)e|B^(s)] ∈ Q^m^s^×⁽ⁿ⁺¹⁾. Then, the real homogenization ofS_c is given by

S_c^rh=

∪q s=1

{(x0, x)∈Rⁿ⁺¹: ˜B^(s)(x0, x)^⊺⩽c^(s)}. (5.11) Indeed, if we take any x0 ∈ R and x ∈ S_c such that B^(s)x ⩽ c^(s), then B˜^(s)(x0, x0 +x)^⊺ =

−x₀B^(s)e+B^(s)(x₀+x) =B^(s)x⩽c^(s). Conversely, if(x₀, x)is such thatB˜^(s)(x₀, x)^⊺⩽c^(s), and we denotey=−x0+x, thenB^(s)y=−x0B^(s)e+B^(s)x= ˜B^(s)(x0, x)^⊺⩽c^(s) andy∈S_c. Given S_c, the proof of Lemma 5.49 considers its real homogenizationS_c^rh, describes it as a projection of a real tropical Metzler spectrahedron S( ˜Q⁽⁰⁾|Q˜⁽¹⁾, . . . ,Q˜⁽ⁿ^′⁾), and adds a2×2 block to the matrices Q˜⁽⁰⁾,Q˜⁽¹⁾, . . . ,Q˜⁽ⁿ^′⁾. Therefore, the claim follows from Lemma 5.53 and (5.11).

The next lemma is a more precise version of Lemma 5.51.

Lemma 5.55. Suppose that S₁ _⊂_Tⁿ is a projection of a tropical Metzler spectrahedron S1 = S( ˜Q⁽⁰⁾|Q˜⁽¹⁾, . . . ,Q˜⁽ⁿ¹⁾), Q˜^(k) ∈ T^m_±¹^×^m¹ and thatS₂ _⊂_Tⁿ is a projection of a tropical Metzler spectrahedron S2 = S( ˆQ⁽⁰⁾|Qˆ⁽¹⁾, . . . ,Qˆ⁽ⁿ²⁾), Qˆ^(k) ∈ T^m_±²^×^m². Then, there exists a number N that depends only on (n, n1, m1, n2, m2) such that tconv(S₁ _∪S₂) is a projection of a tropical Metzler spectrahedron S(Q⁽⁰⁾|Q⁽¹⁾, . . . , Q⁽ⁿ^′⁾), Q^(k)∈T^m_±^′^×^m^′ satisfying n^′, m^′⩽N.

Proof. The proof follows from an examination of the construction given in Lemma 5.51. More precisely, the construction is as follows. The first step is to consider the homogenizations S₁^h,S₂^h and describe them as projections of tropical Metzler spectrahedra S˜1,S˜2. The matrices that describeS˜1,S˜2 are constructed using Lemma 5.50, by taking the formal homogenizations S₁^{f h},S₂^{f h} and adding variables and 2×2 blocks to the matrices that describe these formal homogenizations. The number of these variables and blocks depends only on(n, n1, n2). Then, the construction presented in Lemma 5.51 considers the setS˜1×S˜2, which is a tropical Metzler spectrahedron described by block diagonal matrices obtained by stacking the matrices that describe S˜1,S˜2. Afterwards, the construction adds some 2 ×2 blocks to the matrices that describeS˜1×S˜2and the number of these blocks depends only on(n, n1, n2). This gives a tropical Metzler spectrahedron S˜3 such that the homogenization of tconv(S₁ _∪S₂) is its projection.

Finally, the other part of Lemma 5.50 is used: we add a 2×2 block to the matrices that describeS˜3 and find the matrices Q⁽⁰⁾, Q⁽¹⁾, . . . , Q⁽ⁿ^′⁾. Hence, the numbersn^′, m^′ depend only on (n, n₁, m₁, n₂, m₂).

We are now ready to give a stratified version of Lemma 5.54 and prove Theorem 5.2. Suppose that(f_K,i)_{K⊂[n],i∈[q}_K_]is a collection of homogeneous polynomials with integer coeﬃcients,f_K,i∈ Z[X_K], where X_K = (x_k₁, . . . , x_k_ℓ) for K={k₁, . . . , k_ℓ}, and we use the convention thatf_∅_,i is

106 Chapter 5. Tropical analogue of the Helton–Nie conjecture the zero polynomial. Given a real vector c= (c^(K,i))K,i, c^(K,i) ∈R we denote by S_c _⊂_Tⁿ the set

∪

K∈[n]

∪

i∈[qK]

{x∈Tⁿ:∀k /∈K, x_k=−∞ ∧ ∀k∈K, x_k̸=−∞ ∧ fK,i(xK)⩽c^(K,i)}. (5.12)

Lemma 5.56. There exists a number N ⩾1 that depends on (fK,i)_K_⊂_[n],i_∈_[q_K_], but not on c, such that if S_c is tropically convex, then it is a projection of a tropical Metzler spectrahedron S(Q⁽⁰⁾|Q(1), . . . , Q⁽ⁿ^′⁾), Q^(k)∈T^m_±^×^m, satisfying n^′⩽N and m⩽N.

Proof. The claim follows from the previous lemmas by analyzing the proof of Theorem 5.5.

DenoteS_c=Sand take anyK ⊂[n]. IfK is nonempty, then the stratum ofSassociated with K, denoted S_K, is tropically convex and given by

S_K = ^∪

i∈[qK]

{x∈T^K:fK,i(x)⩽c^(K,i)}.

Therefore, by Lemma 5.54, there existsNK(independent onc) and such thatS_K is a projection of a real tropical Metzler spectrahedron SK ⊂ Rⁿ^K, n_K ⩽ N_K, described by matrices of size m_K ⩽N_K. Moreover, as noted in the proof of Theorem 5.5, the set

S˜_K = ^∪

i∈[q_K]

{x∈Tⁿ:∀k /∈K, x_k=−∞ ∧ ∀k∈K, x_k̸=−∞ ∧ f_K,i(x_K)⩽c^(K,i)}

is a projection of a tropical Metzler spectrahedron S˜K. This spectrahedron is constructed by adding variables and 1×1 blocks to the matrices that describe SK, and the number of these blocks and variables does not depend on c. Similarly, the setS_∅ is either empty or reduced to

−∞ and thus it is a tropical Metzler spectrahedron. We conclude by applying Lemma 5.55 to tconv(^∪S˜_K).

Proof of Theorem 5.2. Suppose that S ⊂Kⁿ is convex and semialgebraic. Fix a Lor-formula ψ and a vector a ∈ Kⁿ such that S = {x ∈ Kⁿ:K _|= ϕ(x, a)}. Then, by Theorem 3.1, the set val(S) has closed semilinear strata. In particular, by Lemma 3.12, there exists a collection (f_K,i)_K_⊂_[n],i_∈_[q_K_] and a vector c = (c^(K,i))_K,i, c^(K,i) ∈ Γ such that val(S) is of the form given in (5.12),val(S) =S_c(we note that this form is also correct forK=∅becauseΓ is nontrivial).

For every b ∈ K^m we denote S_b := {x ∈ Kⁿ:K _|= ϕ(x, b)}. If we fix N ⩾ 1, then the statement “for all b, c such that S_b is convex and val(S_b) =S_c there exist symmetric matrices Q⁽⁰⁾, . . . , Q⁽ⁿ^′⁾, Q^(k) ∈ K^m^′^×^m^′ such that n^′, m^′ ⩽ N, n^′ ⩾ n, and S_b has the same image by valuation as the projection of the spectrahedron associated withQ⁽⁰⁾, . . . , Q⁽ⁿ^′⁾” is a sentence in Lrcvf. This sentence is true ifK =Kand N is large enough. Indeed, in this case we can apply Lemma 5.56 to S_c to describe it as a projection of a tropical Metzler spectrahedron and lift this spectrahedron into K by Proposition 4.14. This spectrahedron fulfills the claim (as noted in the proof of Theorem 5.5). Therefore, if we fix N such that the sentence above in true over K, then the completeness result of Theorem 2.120 shows that this sentence is true inK.

Part II

Relation to stochastic mean payoﬀ

games

CHAPTER 6 Introduction to stochastic mean payoﬀ games

In this chapter we introduce the notion of stochastic mean payoﬀ games and we show that these games can be analyzed using monotone homogeneous operators (called Shapley operators).

Our presentation follows [AGG12]. The reference [AGG12] considered only deterministic mean payoff games, but numerous proofs of [AGG12] extend immediately to the case of stochastic mean payoff games. However, we add our two contributions. First, in Theorem 6.16 we show that the supports of tropical cones defined as sublevels sets of Shapley operators correspond to winning dominions of the underlying game (which is a nontrivial extension of the analogous result of [AGG12]). Second, in Theorem 6.30 we combine the analysis of this chapter with the previous one, showing the structural equivalence between tropicalizations of closed convex semialgebraic cones and stochastic mean payoff games. The chapter is organized as follows. In Section 6.1 we define the notion of a stochastic mean payoff game and the associated Shapley operator. We also show how Kohlberg’s theorem [Koh80] can be used to prove that these games have optimal policies. In Section 6.2 we study the dominions and show a simple version of the Collatz–Wielandt property of [Nus86, AGG12]. In Section 6.3 we specify the analysis to the case of bipartite games and give the relationship with the results of the previous chapter. A simplified version of the results presented here is included in the paper [AGS18].

110 Chapter 6. Introduction to stochastic mean payoﬀ games

6.1 Optimal policies and Shapley operators

A stochastic mean payoﬀ game involves two players, Max and Min, who control disjoint sets of states. The states owned by Player Max and Player Min are respectively denoted asV_Max:= [m]

and V_Min := [n]. The space of states of the games is a disjoint union of V_Max and V_Min, V :=VMax⊎VMin. We will use the symbolsi, j to refer to states of Player Max, andk, lto states of Player Min. Every statek∈[n]of Player Min is equipped with a finite and nonempty setA^(k) (called the set ofactions). Similarly, every statei∈[m]of Player Max is equipped with a finite and nonempty setB⁽ⁱ⁾. Every actiona∈^⊎_kA^(k) is equipped with a probability distributionpâ_k overV,pâ_k= (pâ_kv)v∈V ∈[0,1]^V,^∑_v_∈_V pâ_kv = 1and a real numberra∈R. Likewise, every action b ∈ ^⊎iB⁽ⁱ⁾ is equipped with a probability distribution p^b_i = (p^b_iv)_v_∈_V ∈ [0,1]^V, ^∑_v_∈_V p^b_iv = 1 over V and a real number rb ∈ R. Both players alternatively move a pawn over these states by the rules described as follows. When the pawn is on a statek∈[n], Player Min chooses an action a ∈ A^(k). Then, Player Min moves the pawn to state v ∈ V with probability pâ_kv and pays râ to Player Max. Similarly, once the pawn is on a state i ∈ [m], Player Max picks an actionb∈B⁽ⁱ⁾. Then, Player Max moves the pawn to the state v∈V with probabilityp^b_iv and Player Min pays him r^b.

A policy for Player Min is a function σ: [n]→ ^⊎_kA^(k) mapping every state k ∈ [n] to an action σ(k) in A^(k). Analogously, apolicy for Player Max is a function τ: [m]→ ^⊎iB⁽ⁱ⁾ such that τ(i) ∈ B⁽ⁱ⁾ for all i ∈[m]. Suppose that the game starts from a state v^∗. When players play according to a couple(σ, τ)of policies, the movement of the pawn is described by a Markov chain on the spaceV. The average payoff of Player Max in the long-term is then defined as the average payoff of the controller in this Markov chain, see Definition 2.136. In other words, the payoff of Player Max is given by fori∈[m]. This payoff is well defined by Theorem 2.137. The goal of Player Max is to find a policy that maximizes his average payoff, while Player Min aims at minimizing this quantity.

The main theorem of stochastic mean payoﬀ games ensures that this game has a well-defined optimal policies. This was shown by Liggett and Lippman [LL69].

Theorem 6.1 ([LL69]). There exists a pair of policies(σ, τ) and a unique vector χ∈R^V such that for all initial states v∈V, the following two conditions are satisfied:

• for each policyσ of Player Min, χv ⩽gv(σ, τ);

• for each policyτ of Player Max, χv ⩾gv(σ, τ).

The vectorχ called the value of the game, while the policies(σ, τ) are called optimal. We note that we have χv=gv(σ, τ) for all v∈V.

The first condition in Theorem 6.1 states that, by playing according to the policy τ, Player Max is certain to get an average payoﬀ greater than or equal to the value χv associated with the initial state. Symmetrically, Player Min is ensured to limit her average loss to the quantity χ_v by following the policyσ.

Example 6.2. Consider the game is depicted in Fig. 6.1. The states of Player Min are depicted by circles. The states of Player Max are depicted by squares. Moreover, all probabilities in

6.1. Optimal policies and Shapley operators 111

Figure 6.1: Stochastic mean payoﬀ game considered in Example 6.2.

this example belong to{0,1/2,1}. More precisely, edges represent the actions of players. Edges without full dots represent deterministic actions and full dots indicate actions where a coin toss is made. The corresponding payments received by Player Max are indicated on the edges.

Observe that both players in this game have only two policies: at state 2 Player Max can choose the move that goes to 1 or the one that goes to 3 , whereas at state 3 Player Min can choose the move ^{1, 3^} or the move ^{2, 3^}. Both players in this example game have only two policies: at state 2 Player Max can choose the action that goes to 1 or the action that goes to 3 , whereas at state 3 Player Min can choose the action that goes to ^{1,3^} or the action that goes to ^{2, 3^}. Suppose that Player Max chooses the action that goes to 1 and that Player Min chooses the action that goes to ^{1, 3^}. The Markov chain obtained in this way has the transition matrix of form

where Y describes the probabilities of transition from circle states to square states, and Z describes the probabilities of transition from square states to circle states, i.e.,

Y =

This chain has only one recurrent class and all states belong to this class. Moreover, it is easy to verify that

π= 1

10(2,1,2,2,2,1)∈R^V^Min×R^V^Max

is the stationary distribution of this chain. Therefore, by Theorem 2.137, the payoﬀ of Player Max is equal to

for every initial state. Similarly, if Player Max chooses the action that goes to 1 and Player Min chooses the action that goes to^{2, 3^}, then the transition matrix of the induced Markov chain has the formP^′ =^[_W⁰ ^Y₀^′^], where

112 Chapter 6. Introduction to stochastic mean payoﬀ games and Z is the same as previously. In this case the chain also has only one recurrent class and every state belongs to this class. Furthermore,

π^′= 1

14(4,1,2,2,4,1)∈R^V^Min×R^V^Max

is the stationary distribution of this chain. Hence the payoﬀ of Player Max is equal to g^({2, 3^},{1 }⁾= 1

14(2−4 +9 4) = 1

for every initial state. In both cases, the payoﬀ of Player Max is positive. Therefore, if Player Max chooses the action that goes to 1 , then he is guaranteed to obtain a nonnegative payoﬀ.

One can check, by doing the same calculations for the remaining policies, that^({2, 3^},{ 1}⁾ is the unique couple of optimal policies in this game and the value of the game is equal to χ= ₅₆¹(1,1,1,1,1,1).

In this section we prove Theorem 6.1 using the associated Shapley operator. This proof was given in [AGG12] in the case of deterministic mean payoﬀ games (i.e., when all probability distributions are reduced to vectors in{0,1}^V). However, the proof in the stochastic case is the same as the one given in [AGG12], and we reproduce it here for the sake of completeness.

Definition 6.3. Given a stochastic mean payoﬀ game, we define itsShapley operatorT:T^V → T^V as

∀v∈V,(T(x))_v :=

{min_a_∈_A(v)(ra+^∑_w_∈_V p^a_vwxw) ifv∈V_Min

max_b_∈_B(v)(r_b+^∑_w_∈_V p^b_vwx_w) ifv∈V_Max, (6.2) where we use the convention that−∞ ·0 = 0 and−∞ ·x=−∞for all x >0.

The basic properties of Shapley operators are summarized in the next lemma.

Lemma 6.4. The Shapley operatorT is monotone and homogeneous. Furthermore, it preserves R^V (i.e., if x∈R^V, then T(x)∈R^V) and its restrictionT_|RV:R^V →R^V is piecewise aﬃne.

Proof. The facts thatT is monotone, homogeneous, and preservesR^V follow immediately from the definition. Its restriction to R^V is piecewise aﬃne because it is defined as a coordinatewise minimum or maximum of aﬃne functions.

Our proof of Theorem 6.1 is based on the notion of invariant half-line and the following result of Kohlberg [Koh80].

Definition 6.5. Given a function f: Rⁿ → Rⁿ, we say that a pair (u, η) ∈ Rⁿ ×Rⁿ is an invariant half-lineoff if there existsγ0⩾0such that the equalityf(u+γη) =u+(γ+1)ηholds for allγ ⩾γ₀. Given a functionf:Tⁿ→Tⁿ that preservesRⁿ, we say that(u, η)∈Rⁿ×Rⁿ is an invariant half-line off if it is an invariant half-line of f_|Rⁿ:Rⁿ→Rⁿ.

Theorem 6.6 ([Koh80]). Suppose that function f:Rⁿ→Rⁿ is piecewise aﬃne and nonexpan-sive in any norm. Then, it admits and invariant half-line. Furthermore, if (u₁, η₁) and(u₂, η₂) are invariant half-lines, then η1 =η2.

Later on, we will need a more precise formulation of Kohlberg’s theorem. Therefore, we present its proof in Appendix B.3.

6.1. Optimal policies and Shapley operators 113 Corollary 6.7. Suppose that f: Rⁿ → Rⁿ is a monotone, homogeneous, and piecewise aﬃne function. Then, it admits and invariant half-line(u, η, γ₀). Furthermore, ifx∈Rⁿis any point, then we have the equality

Nlim→∞

Nf^N(x) =η ,

where f^N =f◦f ◦ · · · ◦f (N times). In particular, this limit exists.

Proof. The function f is nonexpansive in the supremum norm by Lemma 5.31. As a conse-quence, it admits an invariant half-line by Theorem 6.6. In particular, for every x ∈ Rⁿ we have

∥f^N(x)−N η−u−γ0η∥∞=∥f^N(x)−f^N(u+γ0η)∥∞⩽∥x−u−γ0η∥∞

and limN→∞ 1

Nf^N(x) =η.

In order to prove Theorem 6.1, we introduce the following notation. Ifσ is a policy of Player Min, then we denote byT^σ the Shapley operator of a game in which Player Min usesσ (i.e., a game in which every statek∈[n]controlled by Player Min is equipped with exactly one action and this action isσ(k)). Analogously, ifτ is a policy of Player Max, then we denote byT^τ the Shapley operator of a game in which Player Max uses τ. Furthermore, we denote by T^σ,τ the Shapley operator of a game in which Player Min usesσ and Player Max usesτ. In an explicit way, these operators are given by The properties of these operators are given in the next lemmas.

Lemma 6.8. If we fix x∈R^V, then the set {T^σ(x) :σ is a policy of Player Min} ⊂R^V has a well defined minimum in the partial order given by y⩽z ⇐⇒ ∀v∈V, yv ⩽zv. Moreover, this minimum is equal to T(x). Analogously, the set {T^τ(x) : τ is a policy of Player Max} ⊂ R^V has a well defined maximum in this partial order and this maximum is equal to T(x).

Proof. By definition we have (T^σ(x))v ⩾(T(x))v for all v ∈ V. Furthermore, for every state k ∈ [n] we can take an action a^∗ ∈ A^(k) that satisfies ra^∗ +^∑_w_∈_V p^a_vw^∗xw = min_a∈A(k)(ra+

∑

w∈V p^a_vwx_w)and defineσ^∗(k) :=a^∗. The policyσ^∗ satisfies(T^σ^∗(x))_v = (T(x))_v for allv∈V. The other case is analogous.

Lemma 6.9. If Player Min uses a policy σ and Player Max uses a policyτ, then the expected total payoﬀ of Player Max after the Nth move of the pawn is equal to (T^σ,τ)^N(0).

Proof. We prove the claim by induction. Let g^N ∈ R^V denote the expected payoﬀ of Player Max after the Nth move of the pawn. We want to show that g^N = (T^σ,τ)^N(0) for all N ⩾1.

Dans le document The DART-Europe E-theses Portal (Page 105-116)