On Repeated Games Without a Recursive Structure:
Existence of Lim
V nBy S. Sorin I
Abstract: We consider repeated zero-sum games with symmetric incomplete information where at each stage the common signal is either non-revealing or completely revealing. We prove that the sequence of values of n-stage games converges by approximating the repeated game by a sequence of games in continuous time.
1 Introduction
We consider a class of zero-sum two person repeated games with incomplete informa- tion which was introduced and studied first by Mertens and Zamir (1976). Their model and results were later extended by Waternaux (1983a, b) and can be presented as follows:
A state k is chosen at random in a finite set K according to some initial probabili- ty p and none of the players is informed. At each stage m, the players choose some moves i m , ]m (in finite sets I J), receive a common signal h m = H ~ ~ , . / g which if k = k
' k r m m -
is either 0 or k, and player 1 payoff is g m = A ~ m j m , where H k (resp. A t:) is for each k a l x J signalling (resp. payoff) matrix. Note that the players are not told o f their opponent's move nor their own payoff; hence after each stage the set of "types of players" increases strictly so that the usual approach in repeated games with incomplete information using "exhaustive state variables" cannot be applied. (For a discussion of this point, see the references above.) Denote by G n ( P ) the n-stage repeated game with
1 n
average p a y o f f ~ n = - ~ g m and let v n ( p ) be its value.
n m = l
1 This paper was written at the Mathematical Sciences Research Institute, Berkeley and supported in part by NSF Grant 8120790. The support of these institutions is gratefully acknowledged.
Sylvain Sorin, D~partement de Math6matique, Universit6 Louis Pasteur, 7 Rue Ren6 Descartes, 67084 Strasbourg, France.
0 0 2 0 - 7 2 7 6 / 8 9 1 1 / 4 5 - 5 5 $2.50 9 1989 Physica-Verlag, Heidelberg
46 S. Sorin
In Mertens and Zamir (1976), as in Waternaux (1983a, b) the existence of a maxmin for the infinitely repeated game Go. was proved by constructing an auxiliary one shot game I'; the idea was to introduce a class of strategies that mimick certain strategies of the repeated game G=, to define the payoff in I? as the corresponding asymptotic payoff in G=, to show that P has a value and finally to prove that this class of strategies was sufficient (in the sense that both players could not do better, for the maxmin in G.~, than using a strategy in this class). Similar results hold for the minmax.
Here, in order to prove that lira vn exist, we will use a very similar approach. The two main differences are that:
i) We are dealing with finite games hence the approximation of the strategies and of the payoff cannot be asymptotic.
ii) We need to define a sequence of auxiliary games, for which a value exists, and then prove that the two sequences of values converge to the same limit.
Nevertheless this paper makes a frequent use of these previous ideas and con- structions, in particular relies deeply on Waternaux (1983b) (Part 5.II.A).
Before introducing the auxiliary game, let us make some conventions and define some notations. We can and shall assume that the value of each matrix A t: is 0, that A k is multiplied by pX (so that the payoff in G n ( p ) is the sum of the conditional payoffs given k = k) and finally that [[Akll ~< 1 for all k.
It follows that by requiring both players to play optimally after a revealing signal, the payoff will be absorbing with value 0 after each h m q= O. Now a pure strategy for player 1 (resp. player 2) can be described by a point in I n (resp..in), i.e. a sequence of actions conditionally to h m = 0 at each stage. We shall denote by X the set of be- havioral strategies of player 1. x in X is thus a sequence (Xm)m= 1 ... n where Xm is a mapping f r o m / m -1 into P(/), P(/) denoting the set of probabilities over/. Y is defined in the same way and we shall write $n(X, y ) for Ex, y,p(g,n ) (p being fixed no confusion will result).
2 T h e A u x i l i a r y G a m e
For each L E N, we shall construct a game PL- The heuristic interpretation for F L is G n played in L large blocks on each of which both players are using a stationary strategy except for some singular moves.
The strategies sets in PL will be ~L and I3 z where
~ = ~ p ( f ) x ~NI I\ I '
I'CI
(I = E P(J') x N s\s"
J'QJ
Any o in ~ will be written as cr = {I(cr); s; ai, i E/~I(cr)} with s in P(I(a)) and similarly for r in 11, r = {J(z); t; ~j,j C J \ J ( r ) } .
Before defining the payoff ~5, let B k = {(i,]); H ~ = O) denote the set of non- revealing entries in game k, and
B~= { j ; ( i , j ) E B k } ; B~.= { i ; ( i , j ) E B k } .
The probability of getting the signal h = O, under o and r, given k, is defined by:
p k ( o , r ) = l [ l ( o ) x J ( r ) C B k ] II (t(B~)) ai FI (s(B})) ~i
i~I\I(o)
jEJ\J('r)where obviously t(B~) stands for ~; tj and similarly for s(B~).
j~B~
The payoff under a and z, given k is
~k(e, r) = sAkt.
Now, if a = (a(1))lc L and r = (':(l))lc L are in ~L and ilL, we define ~ by:
1 L
qo(o, r)
= 2;k@K Z,
1=1 m = 1 Pk(O(m)'"f(m)))~k(o(l)' T(l))
Hence 9 is the average payoff per time where the payoff at time I is computed accord- ing to e(l), z(l) if the signal is non-revealing until time l - 1, or 0 otherwise.
Before proving that this game has a value we still need to introduce two subsets of strategies:
2:= ( o ~ f ~ ; s t > 0 v i E I ( o ) } n = { T E f i ; 9 > 0 Vj~d('r)}.
48 S. Sorin
Proposition 1: PL has a value and both players have e-optimal strategies with finite support in E L and li L .
Proof." The previous construction and result, as well as the proof are similar to Part 3 in Waternaux (1983b) and relies on a minmax theorem of Mertens (1986). We sketch the argument.
(i) Define on s a topology for which it is compact as follows:
Let ~ = ~ P(I') x ~qlV' (where N = N U (oo}) and let us introduce x form s to t'cI
such that:
x(6) = a with I(o) = 1(6) u (i : ai = +oo}
si =si ie1(6) s ; = 0
i~I(o)\I(O)
ai = 8i i E 1\1(o)
We then obtain p~:(6, r) = pg(x(5), r) and ~g(o, r) = cg(x(fi), r) for all r in 17.
Hence if 6 E ~L, we have, with obvious notations
9 (6, r) = cb(x(6), r) for all r in I] L.
Endow ~, with the strongest topology for which x is continuous, 22 being compact so will be ~.
(ii) Verify that ~ is uniformly bounded and measurable for any product measure on the product of compacts ~,L x I]L. Note that q~(., r) is continuous on ~,L, for r in II.
Moreover, for all r in II L, there exists a sequence r m in II L such that, for all o in
~,L, cb(o, rm) converges to q~(a, r). Clearly dual results hold.
(iii) Apply now a minmax theorem for c) in 2L (with regular Borel probabilities) x II L (with probability with finite support), then symmetrically on NL x l=l L and use (ii), Fatou and Fubini to get the result (cf. Mertens 1986, Part C). 9
Let us denote by u c the value of P L.
3 E x i s t e n c e o f L i m V.
The main result of this paper can now be stated as:
Theorem 2." lira vn and lira v L exist and are equal.
rl--->~ L - - - ~
Proof.- Obviously it will be enough to prove that
lim Vn >~lim ~L. (1)
(The games in consideration being symetrical, a dual inequality will follow, implying the claim).
Denote lira v z by ~. Now starting from some e > 0 and some L o we shall choose L t> L o such that v L > / v - e and take Pg an e-optimal strategy of player 1 in PL, with finite support in Z L. Each ~ in the support of PL will induce a strategy x(a)in G n for n large enough. Each pure strategy y of player 2 in Gn will be represented by a strategy r(y) in II L. We shall prove that there exists N(e, L) such that n >~N(e, L) implies V a E supp pg
gn(X(o),y) >>-~(o, r ( y ) ) - 8e V y E Y. (2)
Integrating this will give:
E,L(g,(x(o).
y ) ) / > ~'@L. ~(Y)) - 8e(3)
so that
vn/>VL - 9 e ~ > P - lOe
hence (1).
To get (2) we will first proof that the above mappings from N to X and Y to T preserve approximatively the probability of getting the blank signal (denoted by 0(o, r) and q(x, y), for P and G respectively), see (4) and then a similar property, see (14), for the payoffs (denoted by r r) and F(x, y)).
50 S. Sorin
A A Preliminary Lemma
Given o in 1; and n in IN, large enov, gh, we shall construct some strategy x in G n. Let first 8 0 = min {si: i E I(o)} and a o :( ~ ail + 1. x will consist of playing s i.i.d.
" iEI\I(o) )
at each stage except on % stages. These exceptional stages are obtained by using % independent random variables 0i(m), i ~_ I(a), 1 <. m <~ ai, uniformly distributed on [1, n] (adding new trials Of(m) if Oi(m) coincides with some Oi'(m'), i' < i or i' = i and m' < m). The description of x is now completed by playing i at stage 0.i(m), 1 ~< m ~< %, for all i ~I(e). Denote this strategy by x[o; n].
Given y strategy of player 2 in Gn, we shall construct some r in II. Let r I =
# { n ; y ( n ) = j, 1 ~< rn ~< n } and given some parameter r let:
:(y) = ( / e J; 9 ~>r)
Define now r by:
4~) = JO')
,1 9 - 2
jEJ(y)
/ ~JO,) r~
~j=rj ] ~:(y).
We shall denote this strategy by r[y; r; n].
Let ~2rn denote the event {h u = 0; 1 ~<u < m } . The following result will play a crucial role in our proof.
L e m m a 3 (Waternaux 1983b, 5.II, Part A (5.7)): Let o E G and - > q 1 > 0 . Put ( % + 1)[log (ex)l [log el I(#J)o~a(aa + 1) 4
r = . Then for all n ~> a o + l
log(1 - 8 0 ) el
"~o
Ipk(o, ; [ y ; r ; n]) - Probx[a;nl,.v(~2n [k = k)[ < 7 q (4)
for all y in I:, all k in K.
This inequality simply says that if n is large enough, p~(o, 7) is a good approxima- tion of the probability, if k = k, for the state not to be revealed before stage n, in Gn, given x and y.
B Construction o f x and r
6
Given e > 0; let e' = - and choose L o such that K
(1 - - e ' ) [L0/2I ~ d e ' and - - 1 < e' ( s )
[L /2 ]
where [... ] denotes the integer part. Take L >~ L o such that u z ~> P - e an d l e t PL be an e-optimal strategy of player 1 in PL. Define 6 = rain 6a(t) and a = max ao(l).
o E s u p p # a E s u p p #
Assume n o w 1 < l~< L 1 < l < L
e' ( a + 1)[log ex[ I l o g e I I ( # J ) ~ ( ~ + 1)
e 1 < ~ - ~ , r - l o g ( l - 6 ) ' No ~> e~ + 1 . 6 (6)
For n >~NoL write n =NL + n l , with n I ~<L. We shall now construct x in GNL. In fact x will be defined on each block l of length N by x(l), t h e / - t h block being from stage (l - 1)N + 1 to lN. Formally x(l) = x[o(/); N], l = 1 . . . L (see A).
Similarly given y E J n pure strategy of player 2 in Gn, we will look at the restric- tion y t of y on each block l and define r separately. Hence z = (r(l))t=l ... i where T(/) = T [ y t ; r ; N ] , meaning (see A) that r](l) = # { m ; y ( m ) =j, ( l - 1 ) N + 1 ~<m ~<N}, and so on.
Now we want to compare the payoff induced by a, r in F L to the payoff cor- responding to x, y in G n.
C Comparison of the Payoffs
First it is sufficient to approximate on each block, so let
Ftk(x, y ) average payoff in game G n on the/-th block given x, y and k = k.
Then we want to compare it to:
l - 1
9 r)
= npk(o(m), rq)).
52 S. Sorin
The second remark is that it is enough to work conditionally on k = k, since the map- pings o ~ x and y ~ r are independent of k. Thus in order to simplify the notations we shall drop the index k until (14).
On one hand one has
l - 1
'~l(o, r) = 11
m = l p(a(m), r(m))s(l)At(l)
and on the other hand
l--1
&(x, y) = rI q,.Cx, y)fl(x, y)
m = l
where qm(X, y ) = Probx.y {h u = O, (m - 1)N~< u < m N l g 2 ( m _ l ) N } and fl(x, Y) = average expected payoff between stages ( l - 1)N + 1 and IN conditionally on ~(/-1)N.
From Lemma 3 we have, using (6),
I p(o(m), r(m)) - q m (x, y)[ ~< 7Cl, V m (7)
hence
l
FI p(o(m), r(m)) -
m = l
t y )
1-I qm(X, <<. 7 L q <~ e' (8)
m = l
for all l in L and all y.
So that it remains to approximate the other factor in q~l and Ft, namely the pay- offs ~0 and ft.
In order to do this we shall "eliminate" the blocks on which qm (x, y) is small. In fact after many such blocks the payoff will be absorbing with a high probability (hence F I and q5 t both near zero).
Now on the blocks where qm is large the expected average, given g2(m_l)N, is near the Cesaro mean hence f n e a r ~. More precisely let: Ct = {u; 1 <~ u <<. l;qu(x, y ) ~ 1 - e'}
and c t = # C l. D e f i n e / = rain {(l;ci ~> [L1/2])t,.J (Z + 1)}. It follows from (5) that on
l l
l_ <~L, I1 qm(X, y ) <~ e' hence I1 p(o(m), r(m)) ~< 2e' by (8). From this we obtain
m = l m = l
IFI-OPll ~<3e' for all l > l . (9)
Now we shall ignore the difference IF l - ~11 on the blocks in C t since the number of such blocks is at most e'L by (5).
So let us consider some block/in (1 .... , / } \ C !. Since qt(x, y ) ~> 1 - e', this implies that the average expected payoff in G n on this block, given ~'2(l_1) N' can be written as"
1 lN
= - ~, Xmg, m(X,Y) (10)
fl(x Y) N m=(1-1)N+l
where X m = Probx.y[~2m_ 1 I ~'-2(1-1)N], hence X m ~> 1 - e', Ex(imAy(m)).
Denote by M t the set of Nstages in block l, so that
V m and g,m(X,Y) =
fl(X,y)--~[ ~, gm(X,y) <C'. 1 (11)
MI
Now with probability greater than (1 - e') the initial random times Oi(m) (see A) have disjoint values hence 0 = 0. Since they are uniform on M 1 and independent they de- fine a (random) set of a stages such that on its complement (in 3//):
(i) player t plays s(l) i.i.d.
(ii) the expected average empirical distribution of moves of player 2 is still
1 def
IP, y(m) = t'(l).
~r MI We thus obtain
1 ~ g m - s ( l ) A t ' ( l ) < < . N + e '. (12)
Ml
Now by the choice o f r and No, see (6),
It(l) - t'(l)l < e'
so that
I ~ ~ m -s(l)At(l)[ ~< 3e'. (13)
MI
54 S. Sofin
Collecting (3), (8), (11) and (13) we thus obtain, for l E ( 1 .... ,_/} - C/or l >_/that
Icbt(o, r) - Fl(x,
Y)I ~< 5e'. (14)Taking into account all the blocks and summing on k we get
I ~ ~ @f(o, r)-F~(x,y)l <~(5e'L + [L1/2I)K
k E K IEL
and finally
I~n(x,Y) ~(a,r)l<-K(5e' + [L@12]) 2L
-- "l" - -n
hence (2).
4 C o n c l u d i n g R e m a r k s
First it is easy to see that the previous proof can be adapted to obtain the convergence to the same limit (as 8 goes to 0) of v~ value of the game with discount factor G~, with payoff functiong~ = ~ 8(1.-
5) m-lg m.
m = l nl+ 1
In fact, given L and 6, define inductively n I by requiring E 8(1 - 8 ) m - 1 to be
1 nl+ 1
near -.
L
Then o induces x where
x(l) = x[a(l); nt].
Given y, denote by
Yl
its restriction on the l-th blockM 1
(of lengthnl); r(y)
is now defined byJ(r(l)) = J(Yl),
ti= ~,
( 1 - - ~ ) S m - l l [ f m = j ] l ~ ( 1 - - 5 ) S m - l l [ j m E J ( y l ) ] tM I M I
~j= #{m;fm =i, m ~Mt}, ]~-J(ffl)).
i ~
:(T(0)
Then the approximation of the payoffs are very similar to w 3, Part C.
Now the main idea of this paper was to approximate the game Gn for n large, by a game P on [0, 1 ] where the strategy between stage [t 1 n] and [t2n ] in G n looks like the strategy between time t l and tz in F, 0 ~< t 1 ~< t2 ~< 1.
A similar approach was used in Sorin (1984), where the strategies in the auxiliary game where positive Borel measures of mass less than 1 for player 1 and continuous functions from [0, 1 ] to itself for player 2.
Nevertheless in the present case we cannot directly work with the continuous game P. The reason is that, defining already the strategies on [0, 1 ] would not allow us to take care simultaneously of the absorbing probability p that requires some discrete approximation (to take into account the exceptional moves) and of the p a y o f f ~ that corresponds to a uniform approximation (basically the empirical average).
Among the possible extensions of the result, the case where the signal is random but common, with values in {0, k} if k = k, seems to follow along the same lines by defining B k = {(i, j); H/} (k) = 0 } as before and s(B~) = 1 - ~. sill ~ (k).
t
The case where the signals may be not completely revealing deserves a special study: different class of exceptional moves have to be introduced and the order in which they are played has to be taken into account.
References
Mertens JF (1986) The minmax theorem for U.S.C.-L.S.C. payoff functions. International Journal of Game Theory 15:237-250
Mertens JF, Zamir S (1976) On a repeated game without a recursive structure. International Journal of Game Theory 5:173-182
Sorin S (1984) "Big match" with lack of information on one side (part I). International Journal of Game Theory 13:201-255
Waternaux C (1983a) Solution for a class of games without reeursive structure. International Journal of Game Theory 12:129-160
Waternaux C (1983b) Minmax of Maxmin of repeated games without a recursive structure. CORE DP 8318, to appear in International Journal of Game Theory
Received August 1986
Revised version September 1987