99920009
Sequence of opponents and reduced strategies
Jean-Pierre Beaud1and Sylvain Sorin1,2
Received: March 1999/Revised: May 2000
Abstract.We consider the framework of repeated two-person zero-sum games with lack of information on one side. We compare the equilibrium payo¨s of the informed player in two cases: where he is facing either a) a single long- lived uninformed player, or b) a sequence of short-lived uninformed players.
We show: 1) that situation b) is always (weakly) better than a), 2) that it can be strictly better in some cases, 3) that the two cases are equivalent if the long uninformed player has an optimal strategy independent of his own moves.
Key words:incomplete information, sequence of opponents
1. Presentation
We consider a two person zero-sum repeated game with incomplete informa- tion on one side ([1]). There is a ®nite state spaceK. For eachkAK, aIJ real payo¨ matrix Ak is given, as well as two IJ signalling matricesH1;k andH2;k with values in ®nite signal setsSandT.
The game G is played as follows: k is selected according to a publicly known distribution p. Player 1 (the maximizer) is informed on the outcome but not Player 2. Inductively at stage m, Player 1 chooses an action im in I and Player 2 chooses an action jminJ. The stage payo¨ isxmAikmjm but is not announced. Rather the signal smHi1;kmjm (resp. tmHi2;mkjm) is told to Player 1 (resp. Player 2). We assume that for each player his signal contains
1 Laboratoire d'EconomeÂtrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris, France he-mail: beaud@poly.polytechnique.fri
2 MODALX and THEMA, UFR SEGMI, Universite Paris X, 200 Avenue de la ReÂpublique, 92001 Nanterre, Francehe-mail: sorin@poly.polytechnique.fri
his move. A behavioral strategy for Player 1 is described by a sequence s fsmgof mappings fromKSm 1to the set of mixed actionsD I. Similarly, but taking into account his lack of information, a behavioral strategy of Player 2 is a sequencet ftmg of mappings from past signalsTm 1 toD J.
Denote bySandTthe corresponding sets. Any triple p;s;tinD K S T induces a probability distribution on the set of plays K STy and Ep;s;tdenotes the induced expectation.
The game G is played the same way as the game G but with a di¨erent Player 2 at every stage. However each of these players knows the history of signals sent to his predecessors. We call 2:mthe Player 2 active at stage m, hence his strategy's set is also described by a mappingymfromTm 1toD J.
For each class of games we consider the ®nitely repeated and discounted version:
Gn p is then stage two person zero sum game with payo¨ for Player 1 given bygn1 p s;t Ep;s;t 1
n Xn
m1xm
.Gn phas a valuevn p.
Gl pis thel-discounted two person zero sum game with payo¨ for Player 1 given byg1l p s;t Ep;s;t lPy
m1 1 lm 1xm.Gl phas a valuevl p.
Gn p is a n1 player game with payo¨ gn1 p s;y gn1 p
s;y1;. . .;yn for Player 1 and g2;m p s;y Ep;s;y xm for Player 2:m,
m1;. . .;n. (Note that this last payo¨ depends only upon sl;yl for
lUm).En pdenotes the set of equilibrium payo¨s of Player 1 inGn p.
Gl p is a game with an in®nite family of Player 2. The payo¨ is g1l p s;yfor Player 1 and againg2;m p s;y Ep;s;y xmfor Player 2:m, mV1. The existence of equilibria inGl p follows from standard approxi- mation by truncated games with ®nitely many Player 2. LetEl pdenote the corresponding set of equilibrium payo¨s for Player 1.
The purpose of this note is to comparevn pandEn p(respectivelyvl p
andEl p), namely to examine whether it is pro®table or not for the informed player to face a single long lived opponent or a sequence of short lived oppo- nents.
2. A ®rst result
Player 1 is always better o¨ facing a sequence of short lived opponents since we have:
Proposition 1.Ep;EnV1;EyAEn p: yVvn p
SimilarlyEp;ElA 0;1;EzAEl p:
zVvl p
Proof:The proof is the same in both cases.
Player 1 faces the same set of strategies Tin Gand G. Thus the use by Player 1 inGn porGl pof an optimal strategy inGn porGl pimplies
the result. 9
3. Reduced strategies
De®nition. tis a reduced strategy of player 2 if it depends upon his signals only through the moves of Player 1.
In other wordstm is measurable with respect to the algebra generated by i1;. . .;im 1. Explicitely: say that i1;. . .;im 1 is compatible with t1;. . .; tm 1 if there exists k with tlHi2;kl;jl for l1;. . .;m 1, (recall that tl
determines jl). Now if some i1;. . .;im 1 is compatible both with t1;. . .; tm 1and t10;. . .;tm0 1thentm t1;. . .;tm 1 tm t10;. . .;tm0 1.
Theorem 2.If Player 2 has a reduced optimal strategy in Gn p(resp. Gl p), thenEn pis reduced tofvn pg(resp.El pis reduced tofvl pg.
Proof: Let s;y be an equilibrium in Gn p. Let t? be a reduced optimal strategy of Player 2 inGn p. The equilibrium condition for Player 2:mgives:
g2;m s;yVg2;m s;y m;t?m
The idea is to de®ne inductively a ®ctitious strategy of Player 1,s, as a func-~ tion ofsandysuch that, for allm1;. . .;n:
g2;m s;y m;t?m g2;m ~s;t?:
~
sis a sequence of mapss~mfromKIm 1toD I(hence a strategy of Player 1 independent of the moves of Player 2) de®ned inductively as follows:
~ s1k s1k
~
s2k i1 X
s1AS;j1AJ
y1 j11fH1;k i1;j1s1gs2k s1
In words, Player 1 is playing at stage 2 the expectation of his initial strategy with respect to the distribution on signals induced byy1. Similarly, at stagem,
~
sm is de®ned by taking the conditional expectation on the signals of Player 1, given his moves:
~
smk i1;. . .;im 1 E smk s1;. . .;sm 1 ji1;. . .;im 1
X
s1;...;sm1
Pk s1;. . .;sm 1ji1;. . .;im 1smk s1;. . .;sm 1
where Pk s1;. . .;sm 1ji1;. . .;im 1 is the probability (under sk and y) of s1;. . .;sm 1 giveni1;. . .;im 1(recall thatsdeterminesi).
We now have:
g2;m p s;y m;t?m X
kAK
pk X
s1;t1;...;sm 1;tm1
Pk s1;t1;. . .;sm 1;tm 1
smk s1;. . .;sm 1 Akt?m t1;. . .;tm 1
Taking conditional expectation with respect to i1;. . .;im 1 we have, since t? is reduced, hence measurable w.r.t. i1;. . .;im 1 (and Ek stands for the expectation w.r.t.skandy):
g2;m p s;y m;t?m X
kAK
pkEk smk Akt?m
X
kAK
pkEk Ek smk Akt?mji1;. . .;im 1
X
kAK
pkEk Ek smk ji1;. . .;im 1 Akt?m
X
kAK
pkEk s~mk Akt?m
Since the distribution of i1;. . .;im 1 is the same under s;y m;t?m and ~s;t?, we obtain:
g2;m p s;y m;t?m g2;m p ~s;t? It follows that:
ng1n p s;y Xn
m1
g2;m p s;yU Xn
m1
g2;m p s;y m;t?m
Xn
m1
g2;m p ~s;t? ng1n p ~s;t?Unvn p
Hence
vn pVgn1 p s;y
which was to be shown.
The proof is analogous in the discounted case. 9 De®nition.Anon strategic signallingstructure for Player 2 satis®es:
H2;k i;j H2;k i0;j ,H2;k i;j0 H2;k i0;j0 Ej;j0
In words, the information that Player 2 obtains on Player 1's move is inde- pendent of his own move. A typical example is standard signalling where H2;k i;j i;j.
Theorem 3.Assume non strategic signalling for Player 2. ThenEn pis reduced tofvn pg(resp.El pis reduced tofvl pg.
Proof:The proof follows from Theorem 2 and the fact that under non strate- gic signalling Player 2 has a reduced optimal strategy.
This property can be obtained using the recursive formula for the (primal) game with incomplete information on one side ([1], [4]): since Player 1 knows the posterior distribution of Player 2, he does not have to know the moves.
The recursive formula for the dual game now implies the result (see [2], [5]).
An alternative direct approach is as follows (see [3], [6]). Assume that Player 1 playss, independent of the moves of Player 2. Then anytwill induce at each stage the same total distirbution on moves than its expectationt0with respect to Player 2's previous moves. In particular the payo¨s given s;tand s;t0will coõÈncide. On the other hand, if Player 2 uses a reduced strategyt, the payo¨ of Player 1 is the same for a strategy s or for s0, its expectation (givent) with respect to Player 2's moves. Denoting byS0andT0the corre- sponding sets of reduced strategies one thus has:
minTmaxS0 minT0maxS0 minT0maxSminT0maxS0 hence:
minT0maxSUminTmaxS maxSminT
So that Player 2 has an optimal reduced strategy. 9 Remark:The same result holds if Player 2 receives a random signal indepen- dent of his moves. However in this case Player 1 may be unable to compute the posterior distribution of Player 2 and there is no recursive formula for the value onD K.
4. An example
The following example, due to Ponssard and Sorin, was initially constructed to show the impact of the signalling structure on the speed of convergence of the valuesvn p, (see [4] ex. 9, p. 303). The data are:
I fT;Bg; J fL;M;Rg; K fk1;k2g
Ak1 0 2 2
0 2 3
Ak2 2 0 3
2 0 2
H1;k i;j i;j (standard signalling). The signalling matrix for Player 2 is deterministic, independent of the state and given by:
H2 a b c a b f
Then one has (with pp1):
vy p 47 on 27;57, 2 minfp;1 pg otherwise.
v1 p 2 minfp;1 pg:
Proposition.For p12and n large enough, there exists an equilibrium s;yof Gn psatisfying:
g1n p s;y>vn p
A similar property holds forGl p.
Proof: Simply de®nes1 as always T,s2 as always B andym 12;12;0. The one shot gameG1 12is
0 BB B@
L M R
TT 1 1 12
TB 1 1 2
BT 1 1 3
BB 1 1 12 1 CC CA
sis clearly a best reply toy.
At stage m, given s and y, the posterior probability on K is still 12;12.
Player 2:mis thus facing the gameG1 12whereRis a dominated move. Hence ymis a best reply andg1n p s;y 12.
Sincevn 12converges tovy 21 47the result follows.
The proof forGlis similar. 9
Comments.A single uninformed Player 2 has a long term incentive to buy the information in playing the move Right. Obviously such a strategy is domi- nated for a short lived Player.
Acknowledgments.We thank the referees for precise remarks.
References
[1] Aumann RJ, Maschler M, with the collaboration of Stearns RE (1995) Repeated games with incomplete information. MIT Press
[2] De Meyer B (1996) Repeated games and partial di¨erential equations. Mathematics of Oper- ations Research 21:209±236
[3] De Meyer B (1996) Repeated games, duality and the Central Limit theorem. Mathematics of Operations Research 21:237±251
[4] Mertens JF, Sorin S, Zamir S (1994) Repeated Games. CORE DP 9420-9421-9422
[5] Rosenberg D (1998) Duality and markovian strategies. International Journal of Game Theory 27:577±597
[6] Rosenberg D (1998) Sur les jeux reÂpeÂteÂs aÁ somme nulle. TheÁse, Universite Paris X-Nanterre