• Aucun résultat trouvé

Sequence of opponents and reduced strategies Jean-Pierre Beaud

N/A
N/A
Protected

Academic year: 2022

Partager "Sequence of opponents and reduced strategies Jean-Pierre Beaud"

Copied!
6
0
0

Texte intégral

(1)

99920009

Sequence of opponents and reduced strategies

Jean-Pierre Beaud1and Sylvain Sorin1,2

Received: March 1999/Revised: May 2000

Abstract.We consider the framework of repeated two-person zero-sum games with lack of information on one side. We compare the equilibrium payo¨s of the informed player in two cases: where he is facing either a) a single long- lived uninformed player, or b) a sequence of short-lived uninformed players.

We show: 1) that situation b) is always (weakly) better than a), 2) that it can be strictly better in some cases, 3) that the two cases are equivalent if the long uninformed player has an optimal strategy independent of his own moves.

Key words:incomplete information, sequence of opponents

1. Presentation

We consider a two person zero-sum repeated game with incomplete informa- tion on one side ([1]). There is a ®nite state spaceK. For eachkAK, aIJ real payo¨ matrix Ak is given, as well as two IJ signalling matricesH1;k andH2;k with values in ®nite signal setsSandT.

The game G is played as follows: k is selected according to a publicly known distribution p. Player 1 (the maximizer) is informed on the outcome but not Player 2. Inductively at stage m, Player 1 chooses an action im in I and Player 2 chooses an action jminJ. The stage payo¨ isxmˆAikmjm but is not announced. Rather the signal smˆHi1;kmjm (resp. tmˆHi2;mkjm) is told to Player 1 (resp. Player 2). We assume that for each player his signal contains

1 Laboratoire d'EconomeÂtrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris, France he-mail: beaud@poly.polytechnique.fri

2 MODALX and THEMA, UFR SEGMI, Universite Paris X, 200 Avenue de la ReÂpublique, 92001 Nanterre, Francehe-mail: sorin@poly.polytechnique.fri

(2)

his move. A behavioral strategy for Player 1 is described by a sequence sˆ fsmgof mappings fromKSm 1to the set of mixed actionsD…I†. Similarly, but taking into account his lack of information, a behavioral strategy of Player 2 is a sequencetˆ ftmg of mappings from past signalsTm 1 toD…J†.

Denote bySandTthe corresponding sets. Any triple…p;s;t†inD…K† S T induces a probability distribution on the set of plays K …ST†y and Ep;s;tdenotes the induced expectation.

The game G is played the same way as the game G but with a di¨erent Player 2 at every stage. However each of these players knows the history of signals sent to his predecessors. We call 2:mthe Player 2 active at stage m, hence his strategy's set is also described by a mappingymfromTm 1toD…J†.

For each class of games we consider the ®nitely repeated and discounted version:

Gn…p† is then stage two person zero sum game with payo¨ for Player 1 given bygn1…p†…s;t† ˆEp;s;t 1

n Xn

mˆ1xm

.Gn…p†has a valuevn…p†.

Gl…p†is thel-discounted two person zero sum game with payo¨ for Player 1 given byg1l…p†…s;t† ˆEp;s;t…lPy

mˆ1…1 l†m 1xm†.Gl…p†has a valuevl…p†.

Gn…p† is a …n‡1† player game with payo¨ gn1…p†…s;y† ˆgn1…p†

…s;y1;. . .;yn† for Player 1 and g2;m…p†…s;y† ˆEp;s;y… xm† for Player 2:m,

mˆ1;. . .;n. (Note that this last payo¨ depends only upon …sl;yl† for

lUm).En…p†denotes the set of equilibrium payo¨s of Player 1 inGn…p†.

Gl…p† is a game with an in®nite family of Player 2. The payo¨ is g1l…p†…s;y†for Player 1 and againg2;m…p†…s;y† ˆEp;s;y… xm†for Player 2:m, mV1. The existence of equilibria inGl…p† follows from standard approxi- mation by truncated games with ®nitely many Player 2. LetEl…p†denote the corresponding set of equilibrium payo¨s for Player 1.

The purpose of this note is to comparevn…p†andEn…p†(respectivelyvl…p†

andEl…p†), namely to examine whether it is pro®table or not for the informed player to face a single long lived opponent or a sequence of short lived oppo- nents.

2. A ®rst result

Player 1 is always better o¨ facing a sequence of short lived opponents since we have:

Proposition 1.Ep;EnV1;EyAEn…p†: yVvn…p†

SimilarlyEp;ElA…0;1†;EzAEl…p†:

zVvl…p†

Proof:The proof is the same in both cases.

Player 1 faces the same set of strategies Tin Gand G. Thus the use by Player 1 inGn…p†orGl…p†of an optimal strategy inGn…p†orGl…p†implies

the result. 9

(3)

3. Reduced strategies

De®nition. tis a reduced strategy of player 2 if it depends upon his signals only through the moves of Player 1.

In other wordstm is measurable with respect to the algebra generated by …i1;. . .;im 1†. Explicitely: say that …i1;. . .;im 1† is compatible with …t1;. . .; tm 1† if there exists k with tlˆHi2;kl;jl for lˆ1;. . .;m 1, (recall that tl

determines jl). Now if some …i1;. . .;im 1† is compatible both with …t1;. . .; tm 1†and…t10;. . .;tm0 1†thentm…t1;. . .;tm 1† ˆtm…t10;. . .;tm0 1†.

Theorem 2.If Player 2 has a reduced optimal strategy in Gn…p†(resp. Gl…p†), thenEn…p†is reduced tofvn…p†g(resp.El…p†is reduced tofvl…p†g.

Proof: Let …s;y† be an equilibrium in Gn…p†. Let t? be a reduced optimal strategy of Player 2 inGn…p†. The equilibrium condition for Player 2:mgives:

g2;m…s;y†Vg2;m…s;y m;t?m†

The idea is to de®ne inductively a ®ctitious strategy of Player 1,s, as a func-~ tion ofsandysuch that, for allmˆ1;. . .;n:

g2;m…s;y m;t?m† ˆg2;m…~s;t?†:

~

sis a sequence of mapss~mfromKIm 1toD…I†(hence a strategy of Player 1 independent of the moves of Player 2) de®ned inductively as follows:

~ s1k ˆs1k

~

s2k…i1† ˆ X

s1AS;j1AJ

y1…j1†1fH1;k…i1;j1†ˆs1gs2k…s1†

In words, Player 1 is playing at stage 2 the expectation of his initial strategy with respect to the distribution on signals induced byy1. Similarly, at stagem,

~

sm is de®ned by taking the conditional expectation on the signals of Player 1, given his moves:

~

smk…i1;. . .;im 1† ˆE…smk…s1;. . .;sm 1† ji1;. . .;im 1†

ˆ X

s1;...;sm1

Pk…s1;. . .;sm 1ji1;. . .;im 1†smk…s1;. . .;sm 1†

where Pk…s1;. . .;sm 1ji1;. . .;im 1† is the probability (under sk and y) of s1;. . .;sm 1 giveni1;. . .;im 1(recall thatsdeterminesi).

We now have:

g2;m…p†…s;y m;t?m† ˆX

kAK

pk X

s1;t1;...;sm 1;tm1

Pk…s1;t1;. . .;sm 1;tm 1†

smk…s1;. . .;sm 1†… Ak†t?m…t1;. . .;tm 1†

(4)

Taking conditional expectation with respect to …i1;. . .;im 1† we have, since t? is reduced, hence measurable w.r.t. …i1;. . .;im 1† (and Ek stands for the expectation w.r.t.skandy):

g2;m…p†…s;y m;t?m† ˆ X

kAK

pkEk…smk… Ak†t?m†

ˆX

kAK

pkEk…Ek…smk… Ak†t?mji1;. . .;im 1††

ˆX

kAK

pkEk…Ek…smk ji1;. . .;im 1†… Ak†t?m†

ˆX

kAK

pkEk…s~mk… Ak†t?m†

Since the distribution of …i1;. . .;im 1† is the same under …s;y m;t?m† and …~s;t?†, we obtain:

g2;m…p†…s;y m;t?m† ˆg2;m…p†…~s;t?† It follows that:

ng1n…p†…s;y† ˆ Xn

mˆ1

g2;m…p†…s;y†U Xn

mˆ1

g2;m…p†…s;y m;t?m†

ˆ Xn

mˆ1

g2;m…p†…~s;t?† ˆng1n…p†…~s;t?†Unvn…p†

Hence

vn…p†Vgn1…p†…s;y†

which was to be shown.

The proof is analogous in the discounted case. 9 De®nition.Anon strategic signallingstructure for Player 2 satis®es:

H2;k…i;j† ˆH2;k…i0;j† ,H2;k…i;j0† ˆH2;k…i0;j0† Ej;j0

In words, the information that Player 2 obtains on Player 1's move is inde- pendent of his own move. A typical example is standard signalling where H2;k…i;j† ˆ …i;j†.

Theorem 3.Assume non strategic signalling for Player 2. ThenEn…p†is reduced tofvn…p†g(resp.El…p†is reduced tofvl…p†g.

Proof:The proof follows from Theorem 2 and the fact that under non strate- gic signalling Player 2 has a reduced optimal strategy.

(5)

This property can be obtained using the recursive formula for the (primal) game with incomplete information on one side ([1], [4]): since Player 1 knows the posterior distribution of Player 2, he does not have to know the moves.

The recursive formula for the dual game now implies the result (see [2], [5]).

An alternative direct approach is as follows (see [3], [6]). Assume that Player 1 playss, independent of the moves of Player 2. Then anytwill induce at each stage the same total distirbution on moves than its expectationt0with respect to Player 2's previous moves. In particular the payo¨s given…s;t†and …s;t0†will coõÈncide. On the other hand, if Player 2 uses a reduced strategyt, the payo¨ of Player 1 is the same for a strategy s or for s0, its expectation (givent) with respect to Player 2's moves. Denoting byS0andT0the corre- sponding sets of reduced strategies one thus has:

minTmaxS0 ˆminT0maxS0 minT0maxSˆminT0maxS0 hence:

minT0maxSUminTmaxS…ˆmaxSminT†

So that Player 2 has an optimal reduced strategy. 9 Remark:The same result holds if Player 2 receives a random signal indepen- dent of his moves. However in this case Player 1 may be unable to compute the posterior distribution of Player 2 and there is no recursive formula for the value onD…K†.

4. An example

The following example, due to Ponssard and Sorin, was initially constructed to show the impact of the signalling structure on the speed of convergence of the valuesvn…p†, (see [4] ex. 9, p. 303). The data are:

Iˆ fT;Bg; Jˆ fL;M;Rg; Kˆ fk1;k2g

Ak1 ˆ 0 2 2

0 2 3

Ak2ˆ 2 0 3

2 0 2

H1;k…i;j† ˆ …i;j† (standard signalling). The signalling matrix for Player 2 is deterministic, independent of the state and given by:

H2ˆ a b c a b f

Then one has (with pˆp1):

vy…p† ˆ 47 on ‰27;57Š, 2 minfp;1 pg otherwise.

v1…p† ˆ2 minfp;1 pg:

(6)

Proposition.For pˆ12and n large enough, there exists an equilibrium…s;y†of Gn…p†satisfying:

g1n…p†…s;y†>vn…p†

A similar property holds forGl…p†.

Proof: Simply de®nes1 as always T,s2 as always B andymˆ …12;12;0†. The one shot gameG1…12†is

0 BB B@

L M R

TT 1 1 12

TB 1 1 2

BT 1 1 3

BB 1 1 12 1 CC CA

sis clearly a best reply toy.

At stage m, given s and y, the posterior probability on K is still …12;12†.

Player 2:mis thus facing the gameG1…12†whereRis a dominated move. Hence ymis a best reply andg1n…p†…s;y† ˆ12.

Sincevn…12†converges tovy…21† ˆ47the result follows.

The proof forGlis similar. 9

Comments.A single uninformed Player 2 has a long term incentive to buy the information in playing the move Right. Obviously such a strategy is domi- nated for a short lived Player.

Acknowledgments.We thank the referees for precise remarks.

References

[1] Aumann RJ, Maschler M, with the collaboration of Stearns RE (1995) Repeated games with incomplete information. MIT Press

[2] De Meyer B (1996) Repeated games and partial di¨erential equations. Mathematics of Oper- ations Research 21:209±236

[3] De Meyer B (1996) Repeated games, duality and the Central Limit theorem. Mathematics of Operations Research 21:237±251

[4] Mertens JF, Sorin S, Zamir S (1994) Repeated Games. CORE DP 9420-9421-9422

[5] Rosenberg D (1998) Duality and markovian strategies. International Journal of Game Theory 27:577±597

[6] Rosenberg D (1998) Sur les jeux reÂpeÂteÂs aÁ somme nulle. TheÁse, Universite Paris X-Nanterre

Références

Documents relatifs

The phenomenon of violence against children is one of the most common problems facing the developed and underdeveloped societies. Countries are working hard to

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Extensions of these results to games with lack of information on both sides were achieved by Mertens and Zamir (1971).. Introduction: Shapley Extensions of the Shapley operator

Abstract: Consider a two-person zero-sum stochastic game with Borel state space S, compact metric action sets A, B and law of motion q such that the integral under q of every

Starting with Rosenberg and Sorin (2001) [15] several existence results for the asymptotic value have been obtained, based on the Shapley operator: continuous moves absorbing

• For other classes of stochastic games, where the existence of a limit value is still to establish : stochastic games with a finite number of states and compact sets of

ثحبلدا ثلاثلا دقن تنلدا دنع ماملإا سم مل في وباتك " زييمتلا 88 رصعلاو ،تُتعكر اىدعبو تُتعكر رهظلا رفسلا في ىلصو ،اعبرأ ءاشعلا

We give new proofs of existence of the limit of the discounted values for two person zero- sum games in the three following frameworks: absorbing, recursive, incomplete