• Aucun résultat trouvé

Existence of the limit value of two person zero-sum discounted repeated games via comparison theorems

N/A
N/A
Protected

Academic year: 2021

Partager "Existence of the limit value of two person zero-sum discounted repeated games via comparison theorems"

Copied!
14
0
0

Texte intégral

(1)

HAL Id: hal-00661844

https://hal.archives-ouvertes.fr/hal-00661844

Submitted on 20 Jan 2012

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Existence of the limit value of two person zero-sum discounted repeated games via comparison theorems

Sylvain Sorin, Guillaume Vigeral

To cite this version:

Sylvain Sorin, Guillaume Vigeral. Existence of the limit value of two person zero-sum discounted

repeated games via comparison theorems. Journal of Optimization Theory and Applications, Springer

Verlag, 2013, 157 (2), pp.564-576. �10.1007/s10957-012-0193-4�. �hal-00661844�

(2)

REPEATED GAMES VIA COMPARISON THEOREMS

SYLVAIN SORIN AND GUILLAUME VIGERAL

Abstract. We give new proofs of existence of the limit of the discounted values for two person zero-sum games in the following frameworks: incomplete information, absorbing, recursive. The idea of these new proofs is to use some comparison criteria.

AMS Classification: 91A15, 91A20, 49J40, 47J20

Keywords: stochastic games, repeated games, incomplete information, asymptotic value, comparison principle, variational inequalities

1. Introduction

The purpose of this article is to present a unified approach to the existence of the limit value for two person zero-sum discounted games. The main tools used in the proofs are

- the fact that the discounted valuevλ satisfies the Shapley equation [1],

- properties of accumulation points of the discounted values, and of the corresponding optimal strate- gies,

- comparison of two accumulation points leading to uniqueness and characterization.

We apply this program for three well known classes of games, each time covering the case where action spaces are compact.

The first class corresponds to games with incomplete information, where the results were initially ob- tained in Aumann and Maschler [2] and Mertens and Zamir [3] (including also the asymptotic study of the finitely repeated games). In that case we follow a quite similar approach to Laraki [4]. For absorbing games the results are due to Kohlberg [5] for finitely many actions, later extended in Rosenberg and Sorin [6] for the compact case. An explicit formula for the limit was recently obtained in Laraki [7] and we obtain a related one. The case of recursive games was first handled in Everett [8], with a different notion of limit value involving asymptotic payoff on plays. It was later shown in Sorin [9] that these results implied also the existence of the limit value for two person zero-sum discounted games.

Date: June 2011.

The research of the first author was supported by grant ANR-08-BLAN- 0294-01 (France).

The research of the second author was supported by grant ANR-10-BLAN 0112 (France).

1

(3)

2. Model, notations and basic Lemmas

Let G be a two person zero-sum stochastic game defined by a finite state space Ω, compact metric action spacesI andJ for player 1 and 2 (with mixed extensionsX= ∆(I), set of Borel probabilities onI andY = ∆(J) respectively), a separately continuous real bounded payoffg onI×J×Ω and a separately continuous transitionρfromI×J×Ω to ∆(Ω).

The game is played in discrete time. At staget, given the stateωtthe payers choose movesit∈I, jt∈J, the stage payoff isgt=g(it, jt, ωt) and the new stateωt+1is selected according toρ(it, jt, ωt) and is announced to the players. Givenλ∈]0,1] the total evaluation in theλ-discounted game is P

t=1λ(1−λ)t−1gt. The Shapley operator Φ(λ, f) [1] is then defined, forλ∈[0,1] and f in some closed subsetF of the set of bounded functions from Ω toIR, by the formula

Φ(λ, f)(ω) = min

Y max

X

λg(x, y, ω) + (1−λ)Eρ(x,y,ω)f(·)

= max

X min

Y

λg(x, y, ω) + (1−λ)Eρ(x,y,ω)f(·)

wheregandρare bilinearly extended toX×Y. Forλ >0, the only fixed point of Φ(λ, f) is the valuevλ of the discounted game.

The sets of optimal actions of each player in the above formula are denoted byXλ(f)(ω) andYλ(f)(ω).

Let X = X and similarly Y = Y. For simplicity for any (x,y) ∈ X×Y we denote ρ(x,y, ω) = ρ(x(ω),y(ω), ω). Also defineXλ(f) = ΠXλ(f)(ω) andYλ(f) = ΠYλ(f)(ω).

S denotes the set of fixed points of the projective operator Φ(0, .), andS0is the set of accumulation points of the familyvλ.

The following lemmas are easy to establish in this finite state framework:

Lemma 1. S0⊂ S.

Lemma 2. Assume thatvλn converges to v∈ S0 and that a sequence of optimal actionsxλn ∈Xλn(vλn) converges to x. Thenx∈X0(v).

Lemma 3. Letv andv be inS andΩ1= Argmax(v−v). For anyx∈X0(v),y∈Y0(v), andω∈Ω1, the probability ρ(x,y, ω)is supported byΩ1.

Proof. Sincev∈ S andx∈X0(v):

v(ω) = Φ(0, v)(ω)≤Eρ(x,y,ω)v(·).

(4)

Using a dual inequality as well:

v(ω)−v(ω)≤Eρ(x,y,ω)(v−v)(·)

and the result follows.

3. Games with incomplete information

We consider here two person zero-sum games with incomplete information (independent case and stan- dard signalling). πis a product probabilityp⊗q on a finite product space K×L, withp∈P = ∆(K), q ∈ Q = ∆(L). g is a payoff function from I×J ×K×L to IR where I and J are finite action sets.

Given the parameter (k, ℓ) selected according to π, each player knows his own component (k for player 1,ℓ for player 2) and holds a prior on the other player’s component. From stage 1 on, the parameter is fixed, the repeated game with payoffg(., ., k, ℓ) is played. The moves of the players at stagetare{it, jt}, the payoff isgt=g(it, jt, k, ℓ) and the information of the players after staget is{it, jt}. X = ∆(I)K and Y = ∆(J)L are the type-dependent mixed action sets of the players; g is extended onX ×Y ×M by g(x, y, p, q) =P

k,ℓ pkqg(xk, y, k, ℓ).

Given (x, y, p, q), letx(i) =P

kxkipk be the probability of actioniandp(i) be the conditional probability onKgiven the actioni, explicitlypk(i) = p

kxki

x(i) (and similarly fory andq).

While this framework is not a particular case of section 2 since the setP×Qthat will play the rˆole of the state space is not finite, it is still possible to introduce a Shapley operator for this game. This operator is defined on the setF of continuous concave-convex fonctions onP×Qby:

Φ(λ, f)(p, q) = min

y∈Y

max

x∈X



λg(p, q, x, y) + (1−λ)X

i,j

x(i)y(j)f(p(i), q(j))



 (1)

= max

x∈X

min

y∈Y



λg(p, q, x, y) + (1−λ)X

i,j

x(i)y(j)f(p(i), q(j))



and the valuevλ(p, q) of theλ-discounted game is the unique fixed point of Φ(λ, .) onF. These relations are due to Aumann and Maschler (1966) [2] and Mertens and Zamir (1971) [3].

Xλ(f)(p, q) denotes the set of optimal strategies of player 1 in Φ(λ, f)(p, q).

In this framework anyf ∈ F is a fixed point of the projective operator Φ(0, .), that isF =S.

Notice that ifCis a bound for the payoff functiong, then anyvλis bounded byCas well, and is moreover C-Lipschitz. The family{vλ}is thus relatively compact for the topology of the uniform convergence, hence S0, the set of accumulation points of the family{vλ}, is nonempty.

To ease the notations we will denote the sum P

i,jx(i)y(j)f(p(i), q(j)) by Eρ(x,p)×ρ(y,q)f(˜p,q). Notice˜ that ˜ponly depends onxandp, and that ˜qonly depends ony andq.

For any f ∈ F, Jensen’s inequality ensures that Eρ(x,p)f(˜p, q) ≤ f(p, q). The strategies of player 1 for

(5)

which the equality holds for all f ∈ F are called non revealing and denoted N R(p) = {x ∈ X;∀f ∈ F,Eρ(x,p)f(˜p) =f(p)}={x∈X; ˜p=p, ρ(x, p)a.s.}. The setN R(q) of non revealing strategies of player 2 is defined similarly.

Finally, the non revealing valueuis u(p, q) = min

y∈N R(q) max

x∈N R(p)g(x, y, p, q) = max

x∈N R(p) min

y∈N R(q)g(x, y, p, q).

The existence of limvλ was first proved in [2] for games with incomplete information on one side. It was then generalized in [3] for games with incomplete information on both sides, with a characterization of the limitvbeing the only solution of the system







v= Cavpmin(u, v) v= Vexqmax(u, v)

whereCav(f) (resp. Vex(f)) denotes the smallest concave function in the first variable which is larger than f (resp. the largest convex function in the second variable which is less thanf).

A shorter proof of this result (including characterization) was established in [4]. The tools used in the following proof are quite similar to the one used in [4], but the structure differs.

Lemmas 1 and 2 still hold in this framework; we now prove a more precise version of Lemma 3 using the geometry ofP×Q. LetC(P×Q) be the set of real continuous functions on P×Q.

Lemma 4. Let v ∈ S and let f ∈C(P ×Q) be concave with respect to the first variable. If(p, q)is an extreme point ofArgmax(v−f)then: X0(v)(p, q)⊂N R(p).

Proof. Letx∈X0(v)(p, q) andy∈N R(q), then

v(p, q)≤Eρ(x,p)×ρ(y,q)v(˜p,q) =˜ Eρ(x,p)v(˜p, q) while by Jensen’s inequality

f(p, q)≥Eρ(x,p)f(˜p, q) so

Eρ(x,p)(v−f)(˜p, q)≥(v−f)(p, q)

which implies thatx∈N R(p) since (p, q) is an extreme point of Argmax(v−f).

Remark that v ∈ F implies that N R(p) ⊂ X0(v)(p, q) since v is a saddle function, hence in fact N R(p) =X0(v)(p, q) in the previous lemma.

(6)

Lemma 5. Let xλ∈Xλ(vλ)(p, q) andy∈N R(q), then

vλ(p, q)≤g(xλ, y, p, q).

Proof. By definition ofvλ andxλ,

vλ(p, q) ≤ λg(xλ, y, p, q) + (1−λ)Eρ(xλ,p)×ρ(y,q)vλ(˜p,q)˜

≤ λg(xλ, y, p, q) + (1−λ)vλ(p, q) using Jensen’s inequality and the fact thaty ∈N R(q). Hence

vλ(p, q)≤g(xλ, y, p, q).

Recall thatS0⊂ S is the set of accumulation points of{vλ}for the uniform norm.

Proposition 6. Letv ∈ S0.

i) Let f ∈ C(P ×Q) be concave with respect to the first variable. Then at any extreme point (p, q) of Argmax(v−f)

v(p, q)≤u(p, q).

ii) Letf ∈C(P ×Q)be convex with respect to the second variable. Then at any extreme point (p, q) of Argmin(v−f)

v(p, q)≥u(p, q).

Proof. We provei). Apply Lemma 5 to any sequence{vλn} converging to v. By Lemma 2, their exists x∈X0(v)(p, q) such that

v(p, q)≤ inf

y∈N R(q)g(x, y, p, q).

Lemma 4 implies thatx∈N R(p) (sincev∈ S), and the result follows by definition ofu.

ii) is established in a dual way.

Proposition 6 implies the following corollaries of existence and caracterization of limvλ:

Corollary 7. vλ converges uniformly as λtend to 0.

Proof. Letv andv inS0 and let (p, q) be any extreme point of Argmax(v−v). Sincev is concave in its first variable, Proposition 6i) withf =v(·, q) implies thatv(p, q)≤u(p, q). Apply now Proposition 6ii) tof=v(p,·) to getv(p, q)≥u(p, q). This yieldsv(p, q)≤v(p, q), hence v≤v, thus uniqueness.

(7)

Corollary 8. Any accumulation pointv of vλ satisfies the Mertens-Zamir system:







v= Cavpmin(u, v) v= Vexqmax(u, v)

Proof. Letv be an accumulation point of the family {vλ}. It is enough to prove that v≤Cavpmin(u, v) since, v being concave in p, the other inequality is trivial and a dual argument gives the dual equality.

Denotef =Cav

pmin(u, v), and let (p, q) be any extreme point of Argmax(v−f). Since f is concave inp, Proposition 6 implies thatv(p, q)≤u(p, q). Hence,

v(p, q)≤min(u, v)(p, q)≤f(p, q)

and thusv≤f.

Remarks

The proof above also shows thatv is the smallest among the functions satisfyingw=Cavpmin(u, w).

A similar approach applies word for word to the dependent case, as defined in Mertens and Zamir [3].

The case where the action setsI and J are compact metric can also be handled in the same way, using the martingales (˜p,q) of regular conditional probabilities.˜

4. Absorbing games

We consider here a special class of stochastic games as defined in Section 2. We are given two separately continuous (payoff) functions g, g from I ×J to [−1,1] and a separately continuous (probability of absorption) functionpfromI×J to [0,1].

The repeated game with absorbing states is played in discrete time as follows. At stage t = 1,2, ... (if absorption has not yet occurred) player 1 choosesit∈I and, simultaneously, player 2 choosesjt∈J: (i) the payoff at stagetisg(it, jt);

(ii) with probabilityp(it, jt) := 1−p(it, jt) absorption is reached and the payoff in all future stagess > t isg(it, jt);

(iii) with probabilityp(it, jt) the situation is repeated at staget+ 1.

Recall that the asymptotic analysis for these games is due to Kohlberg [5] in the case whereI and J are finite.

As usual denote X = ∆(I) and Y = ∆(J) ; g, p and p are bilinearly extended to X ×Y. Let p(x, y)g(x, y) =R

I×Jp(i, j)g(i, j)x(di)y(dj). g(x, y) is thus the expected absorbing payoff condition- ally to absorption.

(8)

The Shapley operator of the game is then defined onIRby Φ(λ, f) = min

y∈Y max

x∈X{λg(x, y) + (1−λ)(p(x, y)f+p(x, y)g(x, y)}

= max

x∈Xmin

y∈Y{λg(x, y) + (1−λ)(p(x, y)f+p(x, y)g(x, y)}. The following lemma is analogous to Lemma 4:

Lemma 9.

i) Let f ∈IR such that f ≥Φ(0, f)andy∈Y0(f). Then for anyx∈X, p(x, y)>0 =⇒ f ≥g(x, y).

ii) Letf ∈IR such thatf ≤Φ(0, f)andx∈X0(f). Then for anyy∈Y, p(x, y)>0 =⇒ f ≤g(x, y).

Proof. We provei). Givenx∈X andy∈Y0(f), f ≥ Φ(0, f)

≥ p(x, y)f+p(x, y)g(x, y)

andp(x, y) = 1−p(x, y), hence the result.

Given λ ∈]0,1[, x∈ X and y ∈ Y, let rλ(x, y) be the induced payoff in the discounted game by the corresponding stationary strategies: rλ(x, y) =Ex,yP

λ(1−λ)t−1gt.

Lemma 10.

rλ(x, y)≤







g(x, y) if p(x, y) = 0

max(g(x, y), g(x, y)) if p(x, y)>0.

Proof.

rλ(x, y) =λg(x, y) + (1−λ) [p(x, y)rλ(x, y) +p(x, y)g(x, y)]

hence

rλ(x, y) =λg(x, y) + (1−λ)p(x, y)g(x, y) λ+ (1−λ)p(x, y)

Lemma 5 also has an analogue in this setup:

(9)

Lemma 11. Let λ∈]0,1[,xλ∈Xλ(vλ)andy∈Y, then

vλ







g(xλ, y) ifp(xλ, y) = 0 max(g(xλ, y), g(xλ, y)) ifp(xλ, y)>0.

Proof. Sincexλ is optimal in the discounted game, for anyy∈Y, vλ≤rλ(xλ, y)

and the assertion follows from Lemma 10.

As in the previous part, combining the two lemmas yields:

Proposition 12. Assume thatvλn→v andxλn→xwithxλn∈Xλn(vλn). Letv such thatv ≥Φ(0, v) andy∈Y0(v), then

v≤max(g(x, y), v).

Proof. For any n and any y ∈ Y, Lemma 11 implies that either vλn ≤g(xλn, y) or that p(xλn, y) >0 andvλn≤max(g(xλn, y), g(xλn, y)). In that second case, sincey∈Y0(v), the first assertion in Lemma 9 ensures thatg(xλn, y)≤v, so in both cases we havevλn≤max(g(xλn, y), v). Passing to the limit yields

the result.

Corollary 13. vλ converges asλgoes to 0.

Proof. Suppose on the contrary that there are two sequencesvλn →vandvλn→vwithv > v. Up to an extraction, one can assume thatxλn ∈Xλn(vλn) converges to xand similarly yλn ∈Yλn(vλn) converges to y. By Lemma 2,v = Φ(0, v) and y ∈Y0(v) so applying Proposition 12 we get v ≤max(g(x, y), v), hencev≤g(x, y). A dual reasoning yieldsv≥g(x, y), a contradiction.

We now identify the limitvof the absorbing game.

Definition 14. Define the functionW :X×Y →IRby W(x, y) = med g(x, y), sup

x;p(x,y)>0

g(x, y), inf

y;p(x,y)>0g(x, y)

!

where med(·,·,·) denotes the median of three numbers, with the usual convention that a supremum (resp.

an infimum) over an empty set equals−∞(resp. +∞).

Corollary 15. The limitv is the value of the zero-sum game, denoted byΥ, with action spacesX andY and payoff W.

(10)

Proof. It is enough to show that v ≤ w := supxinfyW(x, y) as a dual argument yields the conclusion.

Assume by contradiction thatw < v.

Letε >0 withw+ 2ε < v. Considerx∈X0(v) an accumulation point ofxλ∈Xλ(vλ) and let y be an ε-best response toxin the game Υ. Lemma 9 ii) implies that

y;pinf(x,y)>0g(x, y)≥v > w+ε≥W(x, y), so that

W(x, y) = max g(x, y), sup

x;p(x,y)>0

g(x, y)

! . Thus, sup

x;p(x,y)>0

g(x, y)≤w+ε < v−ε and similarlyg(x, y)< v−ε. The corresponding inequalities hold withxλ, forλsmall enough:

p(xλ, y)[g(xλ, y)−(v−ε)]≤0, g(xλ, y)≤v−ε leading by Lemma 10 tovλ≤v−ε, a contradiction.

Remarks

The proof of Corollary 15 establishes in itself the existence of the limit v (by doing the same reasoning with any accumulation point ofvλ).

Also notice that this proves that the game Υ has a value, which is not obvious a priori.

5. Recursive games

Recursive games are another special class of stochastic games as defined in Section 2. We are given a finite set Ω = Ω0∪Ω, two compact metric sets I and J, a payoff function g from Ω to R and a separately continuous function ρfrom I×J ×Ω0 to ∆(Ω). Ω is the set of absorbing states while Ω0 is the set of recursive states.

The repeated recursive game is played in discrete time as follows. At stage t = 1,2, ..., if absorbtion has not yet occurred and current state is ωt∈ Ω0, player 1 chooses it∈I and, simultaneously, player 2 choosesjt∈J:

(i) the payoff at stagetis 0;

(ii) the stateωt+1 is chosen with probability distributionρ(ωt+1|it, jt, ωt);

(iii) ifωt+1∈Ωabsorbtion is reached and the payoff in all future stagess > t isgt+1);

(iv) ifωt+1∈Ω0 absorbtion is not reached and the game goes on.

The study of those recursive games was first done by Everett [8], who proved that the game has a value when considering the asymptotic payoff on plays.

(11)

As before denoteX = ∆(I) andY = ∆(J), X=X and similarlyY=Y ;ρis bilinearly extended toX×Y. Recall that in this framework the Shapley operator is defined fromR|Ω0|to itself by

Φ(λ, f)(ω) = min

y∈Y max

x∈X

(

(1−λ) X

ω∈Ω

ρ(ω|x, y, ω)f(ω) )

= max

x∈Xmin

y∈Y

(

(1−λ) X

ω∈Ω

ρ(ω|x, y, ω)f(ω) )

where by conventionf(ω) =g) wheneverω∈Ω.

Proposition 16. Let v ∈ S0, and v such thatmaxv(ω)−v(ω)>0. Assume thatv(ω)≥Φ(0, v)(ω) for allω∈Ω1:= Argmax(v−v). Then v(·)≤0 onΩ1.

Proof. Denote by Ω2 the Argmax of v on the set Ω1, it is enough to prove that v(·) ≤ 0 on Ω2, so we assume the contrary. Up to extraction, vλn →v, xλn ∈Xλn(vλn)→x and there exists ω0 ∈ Ω2 which realizes the maximum ofvλn on Ω2 for everyn. In particularv(ω0)>0. Sincexλn is optimal, we get for anyy∈Y:

vλn0)≤(1−λn)

 X

ω∈Ω2

ρ(ω|xλn,y, ω0)vλn) + X

ω∈Ω\Ω2

ρ(ω|xλn,y, ω0)vλn)

so by definition ofω0,

(1−(1−λn)ρ(Ω2|xλn,y, ω0))vλn0)≤(1−λn) X

ω∈Ω\Ω2

ρ(ω|xλn,y, ω0)vλn).

For simplicity denoteρn=ρ(Ω2|xλn,y, ω0). Ifρn= 1 for infinitely manyn, we immediately getv(ω0)≤0 and the requested contradiction, hence we assume that it is not the case. Hence up to an extraction, µn

defined by µn(w) = ρ(ω|xλn,y, ω0) 1−ρn

is a probability measure on Ω\Ω2. Then for nlarge enough, we get an analogue of inequalities in Lemmas 5 and 11:

vλn0) ≤ 1−λn

1−(1−λnn

X

ω∈Ω\Ω2

ρ(ω|xλn,y, ω0)vλn)

= (1−λn)(1−ρn) λn+ (1−λn)(1−ρn)

X

ω∈Ω\Ω2

ρ(ω|xλn,y, ω0) 1−ρn

vλn)

≤ max

0, X

ω∈Ω\Ω2

µn)vλn)

. (2)

On the other hand, choose nowy∈Y0(v). Sinceω0∈Ω2,

v0) ≥ Φ(0, v)(ω0)

 X

ω∈Ω2

ρ(ω|xλn,y, ω0)v) + X

ω∈Ω\Ω2

ρ(ω|xλn,y, ω0)v)

(12)

so using the fact thatv is constant on Ω2, we get an analogue to Lemmas 4 and 9:

v0)≥ X

ω∈Ω\Ω2

µn)v). (3)

Letting ngo to infinity in inequalities (2) and (3), and usingv(ω0)>0, we obtain by compactness the existence ofµ∈∆(Ω\Ω2) such that

v(ω0) ≤ X

ω∈Ω\Ω2

µ(ω)v(ω) (4)

v0) ≥ X

ω∈Ω\Ω2

µ(ω)v). (5)

Substracting (5) from (4) yields

(v−v)(ω0)≤ X

ω∈Ω\Ω2

µ(ω)(v−v)(ω)

and sinceω0∈Ω1= Argmax(v−v), this implies that the support ofµis included in Ω1 and that the inequality in (4) is an equality. This in turn forces the support ofµ to be included in Ω2 = Argmax1v,

a contradiction to the construction ofµ.

Corollary 17. vλ converges asλgoes to 0.

Proof. Assume that there are two accumulation points v and v with max{v −v} > 0, and denote Ω1 = Argmax(v−v). Then Proposition 16 implies thatv(·)≤0 on Ω1. A dual argument yields that

v(·)≥0 on Ω1, a contradiction.

We now recover a characterization of the limit due to Everett [8]:

Corollary 18. S0⊂L+∩L, whereA is the closure of Aand

L+:=











f ∈R,

Φ(0, f)(ω)≤f(ω) ∀ω∈Ω0

Φ(0, f)(ω) =f(ω) =⇒ f(ω)≥0 f(ω)≥g(ω) ∀ω∈Ω











(6)

and symmetrically

L:=











f ∈R,

Φ(0, f)(ω)≥f(ω) ∀ω∈Ω0

Φ(0, f)(ω) =f(ω) =⇒ f(ω)≤0 f(ω)≤g(ω) ∀ω∈Ω











. (7)

We will need the following lemma:

(13)

Lemma 19. For any ε≥0, there exist Ω⊂Ω0 andv∈F such that the couple (Ω, v)satisfies a) v(ω) =g(ω)for all ω∈Ω.

b) v(ω) =v(ω)−εonΩ.

c) v(ω)≥v(ω)> v(ω)−εonΩ0\Ω. d) For anyω∈Ω0\Ω,Φ(0, v)(ω)> v(ω).

e) For anyω∈Ω,Φ(0, v)(ω) =v(ω).

Proof. This was proved in [10] but we recall the proof for the sake of completeness.

LetE be the set of couples (Ω′′, v′′)∈P(Ω0)×F that satisfies properties a) to d). This set is nonempty since (Ω0, v−ε1ω∈Ω0)∈E. Since Ω0 is finite, we can choose a couple (Ω, v) inE such that there is no (Ω′′, v′′) in E with Ω′′(Ω. Let Ω be the set on which Φ(0, ve )(ω) = v(ω); we now prove that Ω = Ωe , hence that (Ω, v) also satisfies property e).

By contradiction assume thatΩe (Ω and consider, for small α >0,vα:=v+α1

ω∈Ω\e. The couple (Ω, ve α) clearly satisfies property a) to c) for α < ε. It also satisfies property d) forα small enough by continuity of Φ(0,·). So for smallαthe couple (Ω, ve α) is inE, contradicting the minimality of Ω.

We can now prove Corollary 18:

Proof of Corollary 18. Let v∈ S0, letε > 0 and define (v,Ω) as in Lemma 19. By properties a) to c) , kv−vk≤ε. If Ω =∅, then property d) implies that v ∈L. If Ω is nonempty, then by properties b) c) and e) Ω = Argmax(v−v) and Φ(0, v)(·) =v(·) on Ω. Hence, Proposition 16 yields thatv(·)≤0 on Ω. Sov(·)≤0 on Ω andv∈L as well.

This implies thatv∈L, and by dualityv∈L+.

Remark 20. This corollary implies in itself that vλ converges, as there is at most one element in the intersection, see [6] and Proposition 9 in [9].

6. Conclusion

In this paper we proved the convergence of the discounted values of two person zero-sum discounted games in several frameworks, using a unified approach relying on a comparaison principle beetween the accumulation points of the discounted values. We also obtained some characterization of the limit. We believe that this approach can be extended in several directions :

• For other classes of stochastic games, where the existence of a limit value is still to establish : stochastic games with a finite number of states and compact sets of actions, absorbing games with incomplete information, and more generally stochastic games with incomplete information.

(14)

• For other evaluations of the payoff beyond the discounted case, for example to prove the convergence of the value of then-stage game whenntends to infinity.

References

[1] Shapley L. S.: Stochastic Games. Proceedings of the National Academy of Sciences of the U.S.A.,39, 1095-1100 (1953) [2] Aumann R.J. and M. Maschler: Repeated Games with Incomplete Information, M.I.T. Press (1995)

[3] Mertens J.-F. and S. Zamir: The Value of Two-Person Zero-Sum Repeated Games with Lack of Information on Both Sides. International Journal of Game Theory,1, 39-64 (1971)

[4] Laraki R.: Variational Inequalities, System of Functional Equations, and Incomplete Information Repeated Games.

SIAM J. Control and Optimization,40, 516-524 (2001)

[5] Kohlberg E.: Repeated Games with Absorbing States. Annals of Statistics,2, 724-738 (1974)

[6] Rosenberg D. and S. Sorin: An Operator Approach to Zero-Sum Repeated Games. Israel Journal of Mathematics,121, 221-246 (2001)

[7] Laraki R.: Explicit Formulas for Repeated Games with Absorbing States. International Journal of Game Theory, 39, 53-69 (2010)

[8] Everett H.: Recursive Games. Contributions to the Theory of Games, III (H. W. Kuhn and A.W. Tucker, eds.), Annals of Mathematical Studies 39, Princeton University Press 47-78 (1957)

[9] Sorin S.: The Operator Approach to Zero-Sum Stochastic Games. Chapter 27 in A. Neyman and S. Sorin (eds.), Stochastic Games and Applications, Kluwer Academic Publishers (2003)

[10] Vigeral G.: Propri´et´es Asymptotiques des Jeux R´ep´et´es `a Somme Nulle. Th`ese de doctorat, Universit´e Pierre et Marie Curie (2009)

Sylvain Sorin, Combinatoire et Optimisation, IMJ, CNRS UMR 7586, Facult´e de Math´ematiques, Universit´e P. et M. Curie - Paris 6, Tour 15-16, 1 ´etage, 4 Place Jussieu, 75005 Paris and Laboratoire d’Econom´etrie, Ecole Polytechnique, France

E-mail address: sorin@math.jussieu.fr http://www.math.jussieu.fr/ sorin/

Guillaume Vigeral (corresponding author), Universit´e Paris-Dauphine, CEREMADE, Place du Mar´echal De Lattre de Tassigny. 75775 Paris cedex 16, France

E-mail address: vigeral@ceremade.dauphine.fr

http://www.ceremade.dauphine.fr/ vigeral/indexenglish.html

Références

Documents relatifs

As in the case of absorbing games, the candidate for the max-min value is independent of the information structure of player 2, and the candidate for the min-max value is independent

Notice that our result does not imply the existence of the value for models when player 1 receives signals without having a perfect knowledge of the belief of player 2 on the state

Keywords Zero-sum stochastic games, Shapley operator, o-minimal structures, definable games, uniform value, nonexpansive mappings, nonlinear Perron-Frobenius theory,

Abstract: Consider a two-person zero-sum stochastic game with Borel state space S, compact metric action sets A, B and law of motion q such that the integral under q of every

Personalized diagnostic technologies will aid in the stratifica- tion of patients with specific molecular alterations for a clin- ical trial.. It has been shown in the past

In ähnlicher Weise ließ sich auch die Formel für das Kapillarmodell eines porösen Körpers bei radialer Anordnung aufstellen.. Mit den gewonnenen Resultaten konnten sodann

Concernant le processus de privatisation à atteint 668 entreprises privatisées à la fin de l’année 2012, ce total la privatisation des entreprise économique vers les repreneurs

data covering the whole eruption (grey) and the intrusion determined for the first part of the eruption using the Projected Disk method with cGNSS data, InSAR S1 D1 data or