Sequence of opponents and reduced strategies Jean-Pierre Beaud

(1)

999²⁰⁰⁰9

Sequence of opponents and reduced strategies

Jean-Pierre Beaud1and Sylvain Sorin1,2

Received: March 1999/Revised: May 2000

Abstract.We consider the framework of repeated two-person zero-sum games with lack of information on one side. We compare the equilibrium payo¨s of the informed player in two cases: where he is facing either a) a single long- lived uninformed player, or b) a sequence of short-lived uninformed players.

We show: 1) that situation b) is always (weakly) better than a), 2) that it can be strictly better in some cases, 3) that the two cases are equivalent if the long uninformed player has an optimal strategy independent of his own moves.

Key words:incomplete information, sequence of opponents

1. Presentation

We consider a two person zero-sum repeated game with incomplete information on one side ([1]). There is a ®nite state spaceK. For eachkAK, aIJ real payo¨ matrix A^k is given, as well as two IJ signalling matricesH^1;k andH^2;k with values in ®nite signal setsSandT.

The game G is played as follows: k is selected according to a publicly known distribution p. Player 1 (the maximizer) is informed on the outcome but not Player 2. Inductively at stage m, Player 1 chooses an action im in I and Player 2 chooses an action j_minJ. The stage payo¨ isxmA_i^k_m_j_m but is not announced. Rather the signal s_mH_i^1;k_m_j_m (resp. t_mH_i^2;_m^k_j_m) is told to Player 1 (resp. Player 2). We assume that for each player his signal contains

1 Laboratoire d'EconomeÂtrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris, France he-mail: beaud@poly.polytechnique.fri

2 MODALX and THEMA, UFR SEGMI, UniversiteÂ Paris X, 200 Avenue de la ReÂpublique, 92001 Nanterre, Francehe-mail: sorin@poly.polytechnique.fri

(2)

his move. A behavioral strategy for Player 1 is described by a sequence s fs_mgof mappings fromKS^m ¹to the set of mixed actionsDI. Similarly, but taking into account his lack of information, a behavioral strategy of Player 2 is a sequencet ft_mg of mappings from past signalsT^m ¹ toDJ.

Denote bySandTthe corresponding sets. Any triplep;s;tinDK S T induces a probability distribution on the set of plays K ST^y and Ep;s;tdenotes the induced expectation.

The game G is played the same way as the game G but with a di¨erent Player 2 at every stage. However each of these players knows the history of signals sent to his predecessors. We call 2:mthe Player 2 active at stage m, hence his strategy's set is also described by a mappingy_mfromT^m ¹toDJ.

For each class of games we consider the ®nitely repeated and discounted version:

Gnp is then stage two person zero sum game with payo¨ for Player 1 given byg_n¹ps;t E_p;s;t 1

n X_n

m1x_m

.G_nphas a valuev_np.

G_lpis thel-discounted two person zero sum game with payo¨ for Player 1 given byg¹_lps;t E_p;_s;_tlP_y

m11 l^m ¹x_m.G_lphas a valuev_lp.

Gnp is a n1 player game with payo¨ g_n¹ps;y g_n¹p

s;y₁;. . .;y_n for Player 1 and g^2;^mps;y E_p;_s;_y x_m for Player 2:m,

m1;. . .;n. (Note that this last payo¨ depends only upon sl;yl for

lUm).E_npdenotes the set of equilibrium payo¨s of Player 1 inG_np.

Glp is a game with an in®nite family of Player 2. The payo¨ is g¹_lps;yfor Player 1 and againg^2;^mps;y E_p;s;y x_mfor Player 2:m, mV1. The existence of equilibria inG_lp follows from standard approxi- mation by truncated games with ®nitely many Player 2. LetE_lpdenote the corresponding set of equilibrium payo¨s for Player 1.

The purpose of this note is to comparevnpandEnp(respectivelyvlp

andE_lp), namely to examine whether it is pro®table or not for the informed player to face a single long lived opponent or a sequence of short lived opponents.

2. A ®rst result

Player 1 is always better o¨ facing a sequence of short lived opponents since we have:

Proposition 1.Ep;EnV1;EyAE_np: yVvnp

SimilarlyEp;ElA0;1;EzAElp:

zVvlp

Proof:The proof is the same in both cases.

Player 1 faces the same set of strategies Tin Gand G. Thus the use by Player 1 inG_nporG_lpof an optimal strategy inG_nporG_lpimplies

the result. 9

(3)

3. Reduced strategies

De®nition. tis a reduced strategy of player 2 if it depends upon his signals only through the moves of Player 1.

In other wordst_m is measurable with respect to the algebra generated by i1;. . .;im 1. Explicitely: say that i1;. . .;im 1 is compatible with t1;. . .; tm 1 if there exists k with tlH_i^2;k_l_;_j_l for l1;. . .;m 1, (recall that tl

determines j_l). Now if some i1;. . .;im 1 is compatible both with t1;. . .; t_m ₁andt₁⁰;. . .;t_m⁰ ₁thent_mt₁;. . .;t_m ₁ t_mt₁⁰;. . .;t_m⁰ ₁.

Theorem 2.If Player 2 has a reduced optimal strategy in G_np(resp. G_lp), thenEnpis reduced tofvnpg(resp.Elpis reduced tofvlpg.

Proof: Let s;y be an equilibrium in Gnp. Let t^? be a reduced optimal strategy of Player 2 inGnp. The equilibrium condition for Player 2:mgives:

g^2;ms;yVg^2;ms;y _m;t^?_m

The idea is to de®ne inductively a ®ctitious strategy of Player 1,s, as a func-~ tion ofsandysuch that, for allm1;. . .;n:

g^2;ms;y _m;t^?_m g^2;m~s;t^?:

~

sis a sequence of mapss~_mfromKI^m ¹toDI(hence a strategy of Player 1 independent of the moves of Player 2) de®ned inductively as follows:

~ s₁^k s₁^k

~

s₂^ki1 X

s1AS;j1AJ

y1j₁1_fH^1;k_i₁_;_j₁_s₁_gs₂^ks1

In words, Player 1 is playing at stage 2 the expectation of his initial strategy with respect to the distribution on signals induced byy1. Similarly, at stagem,

~

s_m is de®ned by taking the conditional expectation on the signals of Player 1, given his moves:

~

s_m^ki₁;. . .;i_m ₁ Es_m^ks₁;. . .;s_m ₁ ji₁;. . .;i_m ₁

X

s1;...;sm1

P^ks1;. . .;sm 1ji1;. . .;im 1s_m^ks1;. . .;sm 1

where P^ks₁;. . .;s_m ₁ji₁;. . .;i_m ₁ is the probability (under s^k and y) of s1;. . .;sm 1 giveni1;. . .;im 1(recall thatsdeterminesi).

We now have:

g^2;mps;y m;t^?_m X

kAK

p^k X

s1;t1;...;sm 1;tm1

P^ks1;t1;. . .;sm 1;tm 1

s_m^ks₁;. . .;s_m ₁ A^kt^?_mt₁;. . .;t_m ₁

(4)

Taking conditional expectation with respect to i1;. . .;im 1 we have, since t^? is reduced, hence measurable w.r.t. i₁;. . .;i_m ₁ (and E^k stands for the expectation w.r.t.s^kandy):

g^2;mps;y m;t^?_m X

kAK

p^kE^ks_m^k A^kt^?_m

X

kAK

p^kE^kE^ks_m^k A^kt^?_mji₁;. . .;i_m ₁

X

kAK

p^kE^kE^ks_m^k ji₁;. . .;i_m ₁ A^kt^?_m

X

kAK

p^kE^ks~_m^k A^kt^?_m

Since the distribution of i₁;. . .;i_m ₁ is the same under s;y _m;t^?_m and ~s;t^?, we obtain:

g^2;mps;y m;t^?_m g^2;mp~s;t^? It follows that:

ng¹_nps;y Xⁿ

m1

g^2;mps;yU Xⁿ

m1

g^2;^mps;y _m;t^?_m

Xⁿ

m1

g^2;mp~s;t^? ng¹_np~s;t^?Unv_np

Hence

vnpVg_n¹ps;y

which was to be shown.

The proof is analogous in the discounted case. 9 De®nition.Anon strategic signallingstructure for Player 2 satis®es:

H^2;^ki;j H^2;ki⁰;j ,H^2;ki;j⁰ H^2;ki⁰;j⁰ Ej;j⁰

In words, the information that Player 2 obtains on Player 1's move is independent of his own move. A typical example is standard signalling where H^2;ki;j i;j.

Theorem 3.Assume non strategic signalling for Player 2. ThenE_npis reduced tofvnpg(resp.Elpis reduced tofvlpg.

Proof:The proof follows from Theorem 2 and the fact that under non strategic signalling Player 2 has a reduced optimal strategy.

(5)

This property can be obtained using the recursive formula for the (primal) game with incomplete information on one side ([1], [4]): since Player 1 knows the posterior distribution of Player 2, he does not have to know the moves.

The recursive formula for the dual game now implies the result (see [2], [5]).

An alternative direct approach is as follows (see [3], [6]). Assume that Player 1 playss, independent of the moves of Player 2. Then anytwill induce at each stage the same total distirbution on moves than its expectationt⁰with respect to Player 2's previous moves. In particular the payo¨s givens;tand s;t⁰will coõÈncide. On the other hand, if Player 2 uses a reduced strategyt, the payo¨ of Player 1 is the same for a strategy s or for s⁰, its expectation (givent) with respect to Player 2's moves. Denoting byS⁰andT⁰the corresponding sets of reduced strategies one thus has:

minTmax_S⁰ min_T⁰max_S⁰ min_T⁰maxSmin_T⁰max_S⁰ hence:

min_T⁰max_SUmin_Tmax_Smax_Smin_T

So that Player 2 has an optimal reduced strategy. 9 Remark:The same result holds if Player 2 receives a random signal independent of his moves. However in this case Player 1 may be unable to compute the posterior distribution of Player 2 and there is no recursive formula for the value onDK.

4. An example

The following example, due to Ponssard and Sorin, was initially constructed to show the impact of the signalling structure on the speed of convergence of the valuesv_np, (see [4] ex. 9, p. 303). The data are:

I fT;Bg; J fL;M;Rg; K fk₁;k₂g

A^k¹ 0 2 2

0 2 3

A^k² 2 0 3

2 0 2

H^1;ki;j i;j (standard signalling). The signalling matrix for Player 2 is deterministic, independent of the state and given by:

H² a b c a b f

Then one has (with pp¹):

v_yp ⁴⁷ on ²₇;⁵₇, 2 minfp;1 pg otherwise.

v₁p 2 minfp;1 pg:

(6)

Proposition.For p¹₂and n large enough, there exists an equilibriums;yof G_npsatisfying:

g¹_nps;y>v_np

A similar property holds forG_lp.

Proof: Simply de®nes¹ as always T,s² as always B andym ¹₂;¹₂;0. The one shot gameG₁¹₂is

0 BB B@

L M R

TT 1 1 ¹₂

TB 1 1 2

BT 1 1 3

BB 1 1 ¹₂ 1 CC CA

sis clearly a best reply toy.

At stage m, given s and y, the posterior probability on K is still ¹₂;¹₂.

Player 2:mis thus facing the gameG1¹₂whereRis a dominated move. Hence ymis a best reply andg¹_nps;y ¹₂.

Sincevn¹₂converges tovy₂¹ ⁴₇the result follows.

The proof forGlis similar. 9

Comments.A single uninformed Player 2 has a long term incentive to buy the information in playing the move Right. Obviously such a strategy is dominated for a short lived Player.

Acknowledgments.We thank the referees for precise remarks.

References

[1] Aumann RJ, Maschler M, with the collaboration of Stearns RE (1995) Repeated games with incomplete information. MIT Press

[2] De Meyer B (1996) Repeated games and partial di¨erential equations. Mathematics of Oper- ations Research 21:209±236

[3] De Meyer B (1996) Repeated games, duality and the Central Limit theorem. Mathematics of Operations Research 21:237±251

[4] Mertens JF, Sorin S, Zamir S (1994) Repeated Games. CORE DP 9420-9421-9422

[5] Rosenberg D (1998) Duality and markovian strategies. International Journal of Game Theory 27:577±597

[6] Rosenberg D (1998) Sur les jeux reÂpeÂteÂs aÁ somme nulle. TheÁse, UniversiteÂ Paris X-Nanterre