A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games

(1)

HAL Id: inria-00585695

https://hal.inria.fr/inria-00585695

Submitted on 14 Apr 2011

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Sylvain Sorin, Pierre Cardaliaguet, Rida Laraki

To cite this version:

Sylvain Sorin, Pierre Cardaliaguet, Rida Laraki. A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games. SADCO Kick off, Mar 2011, Paris, France. �inria- 00585695�

(2)

Introduction: Shapley Extensions of the Shapley operator : general repeated games Extensions of the Shapley operator : general evaluation

Asymptotic analysis: the main results

Asymptotic analysis - the discounted case: games with incomplete information Asymptotic analysis - the continuous approach: games with incomplete information Asymptotic analysis - the continuous approach: extensions

A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games

Sylvain Sorin

UPMC-Paris 6 and Ecole Polytechnique

Joint work with Pierre Cardaliaguet and Rida Laraki

(3)

The Shapley Operator

TheShapley operatorT(λ,·) associates to a functionf inR^Ω the function:

T(λ,f)(ω) = max

x∈∆(I) min

y∈∆(J)[λg(x,y, ω) + (1−λ)X

˜ ω

Q(x,y, ω)(˜ω)f(˜ω)]

(13)

The Shapley Operator

TheShapley operatorT(λ,·) associates to a functionf inR^Ω the function:

T(λ,f)(ω) = max

x∈∆(I) min

y∈∆(J)[λg(x,y, ω) + (1−λ)X

˜ ω

Q(x,y, ω)(˜ω)f(˜ω)]

Lemma

The Shapley operator T(λ,·) is well defined fromR^Ω to itself. Its unique fixed point is v_λ.

(14)

Thenµinduces a partition Πof [0,1]with t0 =0,t_k =Pk m=1µm. The repeated game is naturally represented as a game played between times 0 and 1, where the actions are constant on each subinterval(t_k−1,t_k): its lengthµ_k is the weight of stagek in the original game. Letv_Π be its value. The recursive equation is

v_Π =val{t1g1+ (1−t1)Ev_Πt1}

whereΠt1 is the normalization on[0,1]of the trace of the partition Πon the interval[t1,1].

(32)

Define nowV_Π(tk) as the value of the game starting at timet_k with evaluationP

mµ_m+kg_m. One obtains the alternative recursive formula

V_Π(t_k) =val{(t_k+1−t_k)g_k+1+EV_Π(t_k+1)} (4)

(33)

Define nowV_Π(tk) as the value of the game starting at timet_k with evaluationP

mµ_m+kg_m. One obtains the alternative recursive formula

V_Π(t_k) =val{(t_k+1−t_k)g_k+1+EV_Π(t_k+1)} (4)

By taking the linear extension we define this way for every finite partitionΠ, a functionV_Π(t) on [0,1].

Lemma

Assumeµ(n) decreasing. Then V_Π is C -Lipschitz in t, where C is a bound on the payoffs.

(34)

The variational Inequalities

Theorem

For any accumulation point W of the family{W_n}, all

(p,q)∈∆(K)×∆(L) and all C¹ test function φ: [0,1]→R: (P1)If, for some t ∈[0,1),X(t,p,q,W) is non-revealing and W(·,p,q)−φ(·) has a global maximum at t, then

u(p,q) +φ^′(t)≥0.

(P2)If, for some t ∈[0,1),Y(t,p,q,W)is non-revealing and W(·,p,q)−φ(·) has a global minimum at t then

(61)

proof

Let t,p andq such thatX(t,p,q,W) is non-revealing and W(·,p,q)−φ(·)admits a global maximum at t.

(62)

proof

Adding s−(· −t)² to φ(s) if necessary, we can assume that this global maximum is strict.

(63)

proof

Let W_ϕ(n) converge toW and defineθ(n)∈ {0, . . . , ϕ(n)−1}

such that _ϕ(n)^θ(n) is a global maximum ofW_ϕ(n)(·,p,q)−φ(·).

Then _ϕ(n)^θ(n) →t.

(64)

proof

Let W_ϕ(n) converge toW and defineθ(n)∈ {0, . . . , ϕ(n)−1}

such that _ϕ(n)^θ(n) is a global maximum ofW_ϕ(n)(·,p,q)−φ(·).

Then _ϕ(n)^θ(n) →t.

One has W

θ(n) ,p,q

= maxmin[ 1

g(x,y,p,q)

(65)

proof

Letx_n be optimal for the maximizer andy ∈Y be any non-revealing strategy of playerJ.

(66)

proof

W_ϕ(n) θ(n)

ϕ(n),p,q

≤ 1

ϕ(n)g(xn,y,p,q)

+X

i

x_n(i)W_ϕ(n)

θ(n) +1

ϕ(n) ,p_n(i),q

(67)

proof

W_ϕ(n) θ(n)

ϕ(n),p,q

≤ 1

ϕ(n)g(xn,y,p,q)

+X

i

x_n(i)W_ϕ(n)

θ(n) +1

ϕ(n) ,p_n(i),q

By concavity ofW_ϕ(n) with respect to p Xx (i)W

θ(n) +1

,p (i),q

≤W

θ(n) +1 ,p,q

.

(68)

proof

Hence:

0≤g(xn,y,p,q) +ϕ(n)

W_ϕ(n)

θ(n) +1 ϕ(n) ,p,q

−W_ϕ(n) θ(n)

ϕ(n),p,q

Since _ϕ(n)^θ(n) is a global maximum ofW_ϕ(n)(·,p,q)−φ(·):

φ

θ(n) +1 ϕ(n)

−φ θ(n)

ϕ(n)

≥W_ϕ(n)

θ(n) +1 ϕ(n) ,p,q

−W_ϕ(n) θ(n)

ϕ(n),p,

(69)

proof

Hence:

0≤g(xn,y,p,q) +ϕ(n)

W_ϕ(n)

θ(n) +1 ϕ(n) ,p,q

−W_ϕ(n) θ(n)

ϕ(n),p,q

Since _ϕ(n)^θ(n) is a global maximum ofW_ϕ(n)(·,p,q)−φ(·):

φ

θ(n) +1 ϕ(n)

−φ θ(n)

ϕ(n)

≥W_ϕ(n)

θ(n) +1 ϕ(n) ,p,q

−W_ϕ(n) θ(n)

ϕ(n),p, Assume{x_n}converges to some x (hence non-revealing).

(70)

Passing to the limit:

g(x,y,p,q) +φ^′(t)≥0.

Since this inequality holds true for everyy, taking the maximum with respect tox yields:

u(p,q) +φ^′(t)≥0.

(71)

The comparison principle

Theorem

Let W1 and W2 be two fixed points of Φin G and suppose that:

(72)

The comparison principle

Theorem

W1 satisfies (P1)

(73)

The comparison principle

Theorem

W1 satisfies (P1) W1 satisfies (P2)

(74)

The comparison principle

Theorem

(P3)W1(1,p,q)≤W2(1,p,q) for any(p,q)∈∆(K)×∆(L).

(75)

The comparison principle

Theorem

(P3)W1(1,p,q)≤W2(1,p,q) for any(p,q)∈∆(K)×∆(L).

Then W1 ≤W2 on [0,1]×∆(K)×∆(L).

(76)

The comparison principle

We argue by contradiction, assuming that

t∈[0,1max],p∈P,q∈Q[W1(t,p,q)−W2(t,p,q)] =δ >0. Then, forε >0 sufficiently small,

δ(ε) := max

t∈[0,1],s∈[0,1],p∈P,q∈Q[W¹(t,p,q)−W²(s,p,q)−(t−s)²

2ε +εs] >0. (10)

(77)

The comparison principle

We claim that there is(t_ε,s_ε,p_ε,q_ε), point of maximum above, such thatX(tε,p_ε,q_ε,W1) is non-revealing for player I and Y(sε,p_ε,q_ε,W2) is non-revealing for player J.

(78)

The comparison principle

We claim that there is(t_ε,s_ε,p_ε,q_ε), point of maximum above, such thatX(tε,p_ε,q_ε,W1) is non-revealing for player I and Y(sε,p_ε,q_ε,W2) is non-revealing for player J.

Finally we note thatt_ε<1 and s_ε<1 forε sufficiently small, becauseδ(ε)>0 andW1(1,p,q)≤W2(1,p,q) for any (p,q)by P3.

(79)

The comparison principle

Since the mapt 7→W1(t,p_ε,q_ε)−^(t⁻₂^s_ε^ε⁾² has a global maximum att_ε and sinceX(t_ε,p_ε,q_ε,W1)is non-revealing for player I, conditionP1implies that

u(p_ε,q_ε) +t_ε−s_ε

ε ≥0. (11)

(80)

The comparison principle

ε ≥0. (11)

In the same way, since the maps 7→W2(s,p_ε,q_ε) + ^(t^ε^−s)2ε ² −εs has a global minimum ats_ε and sinceY(sε,p_ε,q_ε,W2) is non-revealing for player J, we have by conditionP2that

(81)

The comparison principle

ε ≥0. (11)

In the same way, since the maps 7→W2(s,p_ε,q_ε) + ^(t^ε^−s)2ε ² −εs has a global minimum ats_ε and sinceY(sε,p_ε,q_ε,W2) is non-revealing for player J, we have by conditionP2that

u(pε,q_ε) + t_ε−s_ε

+ε≤0.

(82)

A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games

HAL Id: inria-00585695

https://hal.inria.fr/inria-00585695

Sylvain Sorin, Pierre Cardaliaguet, Rida Laraki

To cite this version:

A Continuous Time Approach for the Asymptotic Value in Two-Person Zero-Sum Repeated Games

Contents

The Shapley Operator

The Shapley Operator

Contents

Contents

Contents

Contents

Contents

The variational Inequalities

proof

proof

proof

proof

proof

proof

proof

proof

proof

The comparison principle

The comparison principle

The comparison principle

The comparison principle

The comparison principle

The comparison principle

The comparison principle

The comparison principle

The comparison principle

The comparison principle

The comparison principle

Contents