DOI 10.1007/s13235-010-0006-z
Zero-Sum Repeated Games: Recent Advances and New Links with Differential Games
Sylvain Sorin
Published online: 16 October 2010
© Springer-Verlag 2010
Abstract The purpose of this survey is to describe some recent advances in zero-sum re- peated games and in particular new connections to differential games. Topics include: ap- proachability, asymptotic analysis: recursive formula and operator approach, dual game and incomplete information, uniform approach.
Keywords Zero-sum two-person repeated games·Differential games·Approachability· Recursive formula·Asymptotic value·Uniform value
1 Introduction: Asymptotic and Uniform Approaches in RG, Quantitative and Qualitative DG
This survey is a sequel to Sorin [81], and covers some recent advances in the theory of two- person zero-sum repeated games (RG) and new connections with differential games (DG).
Alternative developments in the theory of differential games within a much more general framework are treated in the companion paper by Buckdahn, Cardaliaguet and Quincam- poix, [8] in this volume.
This section describes informally both models and some of the main issues. RG are played in discrete time. There is a state space M and action sets I for Player 1 and J for Player 2. At each stage n, the game is in some state mn ∈M, each player chooses an action (in∈I, jn∈J) that, together with the current state, determines a stage payoff
This paper is dedicated to the memory of D. Blackwell. Comments from P. Cardaliaguet, R. Laraki, M. Oliu-Barton, J. Renault and G. Vigeral were very helpful. This research of was supported by grant ANR-08-BLAN- 0294-01 (France).
S. Sorin (
!
)Equipe Combinatoire et Optimisation, CNRS FRE 3232, Faculté de Mathématiques, Université P. et M. Curie – Paris 6, Tour 15-16, 1° étage, 4 Place Jussieu, 75005 Paris, France e-mail:[email protected]
url:http://www.ecp6.jussieu.fr/pageperso/sorin.html S. Sorin
Laboratoire d’Econométrie, Ecole Polytechnique, 91128 Palaiseau, France
gn=g(mn, in, jn)and the joint law of the new statemn+1and of the signals to the players (see Section3for a formal description). The basic structure of the game is stationary (i.e. in- dependent of the stagen) and we are interested in the way the players can guide the process {gn}for largen.
DG are played in continuous time. There is a state spaceZand control setsUfor Player 1 andV for Player 2. At each timet, the game is in some statezt and each player chooses a control (ut∈U, vt∈V). This induces a current payoffγt=γ (zt, t, ut, vt)and defines the dynamicsz˙t=f (zt, t, ut, vt)followed by the state. Notice that in the autonomous case, if the players use piecewise constant controls on intervals of sizeδ, the induced process is like a RG.
Clearly the above description extends to theN-player case as well. Let us specify the evaluation of the process in the zero-sum case.
1.1 RG Evaluations
For the RG framework there are basically three approaches.
1.1.1
We first introduce thecompact case. For every probability distributionµon the integers n≥1 (µn≥0,!
nµn=1), one defines a gameG[µ]with evaluation#µ, g$ =!
ngnµn. Under standard assumptions on the basic data (for example if all sets involved in the defi- nition of the game are finite), the natural product topology on plays specifies a game with compact strategy sets and continuous payoff function, hence the valuev[µ]will exist.
Theasymptotic approachstudies the family of such games as the expected length (the mean ofµ) goes to+∞. (Alternatively one could consider these games has being played between time 0 and 1, the duration of stagenbeingµn.) Natural assumptions are that for eachµ,µnis decreasing andµ1→0 along the family.
The analysis concentrates on the corresponding family of values{v[µ]}and of (approxi- mate) optimal strategies. Two typical examples correspond to:
• Thefiniten-stagegameGnwith outcome given by the average of the firstnstage payoffs:
gn=1 n
"n m=1
gm.
• Theλ-discounted gameGλwith outcome equal to the discounted sum of the payoffs:
gλ=
"∞ m=1
λ(1−λ)m−1gm.
The values of these games are denoted byvnandvλrespectively. The asymptotic analysis studies their behavior, asngoes to∞orλgoes to 0. The main issues are (i) to check whether the limit exists, (ii) is independent of the sequence of evaluations and (iii) to identify it.
Extensions consider games with random duration process where the weightµnis a ran- dom variable which law depends upon the previous path on a random duration tree, Neyman and Sorin [57], see Section3.3.2. Note that the knowledge of the duration (i.e. of the evalu- ation process) by the players is crucial in the choice of the strategies.
The main tool for this analysis is the recursive formula, see Section3.
1.1.2
An alternative analysis, called theuniform approach, considers the whole family of “long games” without specifying the exact duration. Hence one looks for strategies exhibiting asymptotic uniform properties in the following sense: they are almost optimal in any suffi- ciently long game. Explicitly:
Definition 1.1 vis the (uniform)maxminif the two following conditions are satisfied:
• Player 1 can guarantee v: for any ε >0, there exists a strategyσ of Player 1 and an integerNsuch that for anyn≥Nand any strategyτ of Player 2:
E(σ,τ )(gn)≥v−ε.
• Player 2 candefendv: for anyε >0 and any strategyσof Player 1, there exist an integer Nand a strategyτof Player 2 such that for alln≥N:
E(σ,τ )(gn)≤v+ε.
A dual definition holds for theminmaxv. Wheneverv=v, the game has auniform value, denoted byv∞. Note that the existence ofv∞implies:v∞=limn→∞vn=limλ→0vλ, and similarly for any random duration processes, Neyman and Sorin [57].
This analysis relies on an explicit construction of the strategies and is very sensitive to the information of the players: see e.g. stochastic games with known states and signals on the actions, Section7.4. and Zamir[91].
1.1.3
A third approach specifies directly an outcome associated to the sequence {gn}, like lim inf1n!n
m=1gm or a measurable function defined on plays (see Maitra and Sudderth [42,43]). This describes an infinitely repeated game in normal form and the issue is the existence of a value. We will not cover this direction in this survey but let us mention that there are fascinating measurability issues and important connections with both the recursive formula and the uniform approach.
1.2 DG Evaluations
We now turn to some definitions of the payoff in DG, but we will be far to cover all cases.
1.2.1
First note that in DG, the fact that time is continuous and the hypothesis that each player knows the previous control of his opponent induce an issue in defining the strategies of the players, in such a way that the induced process(zt, ut, vt)is well specified. One manner to proceed is as follows.
LetU (resp. V) denote the sets of measurable control maps fromR+ toU (resp.V).
α∈A)is a non anticipative strategy (NA) (resp.α∈Ais non anticipative strategy with delay (NAD)) ifαmapsv∈Vtoα(v)=u∈Usuch that ifvs=v)son[0, t]thenα(v)=α(v))on [0, t], for allt∈R+(resp. and there existsδ >0 such that ifvs=v)son[0, t]thenα(v)= α(v))on[0, t+δ], for allt∈R+). Note that a couple(α∈A),v∈V)or(u∈U, β∈B)) induces a pair of control maps, hence a well defined dynamics onZ. This was the initial
procedure to introduce upper and lower games and values, see e.g. Souganidis [84], Bardi and Capuzzo Dolcetta [3], Chapter VIII. It is more natural to work in a symmetric way with a normal form game defined onA×B. A couple(α, β)defines a unique couple(u,v)∈U×V withα(v)=uandβ(u)=v, thus the solutionzis well defined. The mapt∈ [0,+∞[+→
(zt,ut,vt) specifies the trajectory(z,u,v)(α, β), see e.g. Cardaliaguet [11], Cardaliaguet and Quincampoix [14].
1.2.2
A differential game is with fixed duration (or a “quantitative” DG) if the total evaluation is of the form
Γ (α, β)(z0)=
# T 0
γtdt+γ (zT) (1)
whereγ is some terminal payoff function, or Γ)(α, β)(z0)=# ∞
0
γtµ(dt) whereµis a probability on[0,+∞)likeT11[0,T]orλexp(−λt).
The game is now well defined in normal form and the issues are the existence of a value, its characterization and properties of optimal strategies.
The basic approach is to prove that the maxmin(resp. minmax)satisfies some dynamic programming inequality which leads to viscosity supersolution (resp. subsolution)of an Hamilton–Jacobi–Bellman equation and to use a comparison argument, seeAppendix. Even if the framework seems quite different, this analysis is deeply related to the asymptotic ap- proach for RG.
1.2.3
Qualitative DG are concerned with asymptotic properties of the trajectories likezt staying in some setCfor allt∈R+or from some timeT ≥0 on. By working on level sets forvthis approach is very similar to the uniform approach for RG. Basic references are Krasovskii and Subbotin [32], Cardaliaguet [9], Cardaliaguet, Quincampoix and Saint-Pierre [15,16].
1.2.4
Finally pursuit-evasion games or games of timing are more in the form of class1.1.3.
1.3 Contents
We will first describe in Section2results related to approachability theory, since both as- ymptotic and uniform analysis are available and the link with DG is especially explicit and useful. The next Section3introduces the recursive structure and describes the extension of the recursive formula in terms of state space and payoff evaluation. Section4builds on Section3to develop the asymptotic approach. We recall the basic results, then present the operator approach and related tools based on the (generalized) Shapley operator. Section5 deals with games with incomplete information and their dual, a tool that is fundamental both for RG and DG with incomplete information. Section6, devoted to the uniform approach, recalls the fundamental properties and describes some recent achievements. In Section7we comment on several directions of research.Appendixis a brief presentation of basic results of quantitative DG.
2 Approachability
This section describes the exciting and productive interaction between RG and DG in a specific area: approachability, introduced and studied by Blackwell [6].
2.1 Definitions
Given anI×J matrixA with coefficients inRk, a two-person infinitely repeated game formGis defined as follows. At each stagen=1,2, . . ., each player chooses an element in his set of actions:in∈I for Player 1 (resp.jn∈J for Player 2), the corresponding vector outcome isgn=Ainjn∈Rkand the couple of actions(in, jn)is announced to both players.
gn=1n!n
m=1gmis the average vector outcome up to stagen.
The aim of Player 1 is thatgnapproaches a target setC⊂Rk. Approachability theory is thus a generalization of max-min level in a (one shot) game with real payoff whereCis of the form[v,+∞).
Hn=(I×J )nis the set of possible histories at stagen+1 andH∞=(I×J )∞is the set of plays.Σ(resp.T) is the set of strategies of Player 1 (resp. Player 2): mappings from H=$
n≥0Hnto the sets of mixed actionsU=∆(I )(probabilities onI) (resp.V=∆(J )).
At stagen+1, given the historyhn∈Hn, Player 1 chooses an actionin+1∈I according to the probability distribution σ (hn)∈U (and similarly for Player 2). A couple(σ, τ )of strategies induces a probability onH∞andEσ,τdenotes the corresponding expectation.
The asymptotic notion is
Definition 2.1 A nonempty closed setCinRkisweakly approachableby Player 1 inGif, for everyε >0, there existsN∈Nsuch that for anyn≥Nthere is a strategyσ=σ (n, ε) of Player 1 such that, for any strategyτ of Player 2
Eσ,τ
%dC( gn)&
≤ε, wheredCstands for the distance toC.
If vn is the value of then-stage game with payoff−E(dC(g¯n)), weak approachability meansvn→0.
The uniform notion is
Definition 2.2 A nonempty closed setCinRkisapproachableby Player 1 inGif, for every ε >0, there exists a strategyσ=σ (ε)of Player 1 andN∈Nsuch that, for any strategyτ of Player 2 and anyn≥N
Eσ,τ
%dC(gn)&
≤ε.
In this case, asymptotically the average outcome remains close in expectation to the tar- getC, uniformly with respect to the opponent’s behavior. The dual concept is excludability.
2.2 Preliminaries
The “expected deterministic” repeated game formG,is an alternative two-person infinitely repeated game associated, as the previous one, to the matrixA. At each stagen=1,2, . . ., Player 1 (resp. Player 2) choosesun∈U =∆(I ) (resp.vn∈V =∆(J )), the outcome is g,n=unAvnand (un, vn)is announced. Accordingly, a strategyσ,for Player 1 inG,is a
map fromH,=$
n≥0Hn,toUwhereHn,=(U×V )n. A strategyτ,for Player 2 is defined similarly. A couple of strategies induces a play{(un, vn)}and a sequence of outcomes{g,n}, andg,n=1n!n
m=1gm, denotes the average outcome up to stagen.
G,is the game played in “mixed actions” or in expectation. Weak,approachability,v,n and,approachability are defined similarly.
2.3 Weak Approachability and Quantitative Differential Games
The next result is due to Vieille [87]. Recall that the aim is to obtain a good average outcome at stagen.
First consider the gameG,. Use as state variable the accumulated payoff and consider the differential gameΛof fixed duration played on[0,1]starting fromz0=0∈Rk with dynamics:
˙
zt=utAvt=f (zt, ut, vt)
and terminal payoff −dC(z(1)). Thus the state variable is zt ='t
0γsds, γs being the payoff at time s. Note that Isaacs’s condition holds: maxuminv#f (z, u, v), ξ$ = minvmaxu#f (z, u, v), ξ$ =valU×V#f (z, u, v), ξ$for allξ∈Rk.G,nappears then as a dis- crete time approximation ofΛand vn,=Vn(0,0) whereVnsatisfies, fork=0, . . . , n−1 andz∈Rk:
Vn
(k n, z
)
=valU×VVn
(k+1 n , z+1
nuAv )
(2) with terminal conditionV (1, z)= −dC(z).
LetΦ(t, z)be the value of the game played on[t,1]starting fromz(i.e. with total out- comez+'1
t γsds). Then basic results from DG implies, seeAppendix(DΦis the gradient inz):
Theorem 2.1 (1)Φ(z, t)is the unique viscosity solution on[0,1] ×Rkof
∂tΦ(z, t)+valU×V
*DΦ(z, t), uAv+
=0 withΦ(z,1)= −dC(z).
(2)
nlim→∞v,n=Φ(0,0).
The last step is to relate the values inG,nand inGn. Theorem 2.2
limvn,=limvn.
The idea of the proof is to play by blocks in GLn and to mimic an optimal behavior in G,n. Inductively at the mth block ofLstages inGLn Player 1 will play i.i.d. a mixed action optimal at stageminG,n(given the past history) andy,mis defined as the empirical distribution of actions of Player 2 during this block. Then the (average) outcome inGLnwill be close to the oneG,nfor largeL, hence the result.
The last property implies the following:
Corollary 2.1 Every set is weakly approachable or weakly excludable.
2.4 Approachability andB-Sets
The main notion was introduced by Blackwell [6]:
Definition 2.3 A closed setCinRkis aB-set for Player 1 ( for a givenA), if for anya /∈C, there existsb∈πC(a)and a mixed actionu= ˆu(a)inU=∆(I )such that the hyperplane throughborthogonal to the segment[ab]separatesafromuAV:
#uAv−b, a−b$ ≤0, ∀v∈V whereπC(a)denotes the set of closest points toainC.
The basic result of Blackwell, [6] is
Theorem 2.3 LetCbe aB-set for Player1.Then it is approachable inGand,approachable inG,by that player.An approachability strategy is given byσ (hn)= ˆu(g¯n)(resp.σ,(h,n)=
ˆ u(g¯,n)).
An important consequence of Theorem2.3is the next result.
Theorem 2.4 A convex setCis either approachable or excludable.
A further result due to Spinat [86], see also Hou [29], characterizes minimal approachable sets:
Theorem 2.5 A setCis approachable iff it contains aB-set.
2.5 Approachability and Qualitative Differential Games We follow here as Soulamani, Quincampoix and Sorin [1].
To study ,approachability, consider an alternative differential gameΓ where both the dynamics and the payoff function differ from the previous differential gameΛ. The aim is to control asymptotically the average payoff, hence the discrete dynamics on the state variable is of the form
¯
gn+1− ¯gn= 1
n+1(gn+1− ¯gn).
The continuous counterpart isγ¯t=1t't
0usAvsds. A change of variablezt= ¯γet leads to
˙
zt=utAvt−zt. (3)
which is the dynamics of an autonomous differential gameΓ withf (z, u, v)=uAv−z, that still satisfies Isaacs’s condition. In addition the aim of Player 1 is to stay in a certain setC.
We recall the next definitions, following Cardaliaguet [9].
Definition 2.4 A nonempty closed set C inRk is adiscriminating domainfor Player 1, givenf if:
∀a∈C,∀p∈NPC(a), sup
v∈V uinf∈U
*f (a, u, v), p+
≤0, (4)
whereNPC(a)= {p∈RK;dC(a+p)= .p.}is the set of proximal normals toCata.
The interpretation is that, at any boundary pointx∈C, Player 1 can react to any control of Player 2 in order to keep the trajectory in the half space facing a proximal normalp.
The following theorem, due to Cardaliaguet [9], states that Player 1 can ensure remaining in a discriminating domain as soon as he knows, at each timet, Player 2’s control up to timet.
Theorem 2.6 Assume that f satisfies Isaacs’s condition, that f (x, U, v) is convex for allx, v,and thatC is a closed subset of Rk.Then C is a discriminating domain if and only if for everyzbelonging toC,there exists a nonanticipative strategyα∈A),such that for anyv∈V,the trajectoryz[α(v),v, z](t)remains inCfor everyt≥0.
We shall say that such a strategyαpreservesthe setC. The link with approachability is through the following result:
Theorem 2.7 Letf (z, u, v)=uAv−z.A closed setC⊂Rkis a discriminating domain for Player1,if and only ifCis aB-set for Player1.
It is easy to deduce that starting from any point, not necessarily inCone has:
Theorem 2.8 If a closed setC⊂Rkis aB-set for Player1,there existsα∈A),such that for everyv∈V
∀t≥1 dC
%z,α(v),v- (t)&
≤Me−t. (5)
The main result is now:
Theorem 2.9 A closed setCis,approachable for Player1inG,if and only if it contains aB-set for Player1 (givenA).
The direct part follows from Blackwell’s proof. The proof of converse implication is as follows: first one defines a mapΨ from strategies of Player 1 inG,to nonanticipative strategies inΓ. In particular givenε >0 and a strategyσε thatε-approachesCinG,, its image isαε=Ψ (σε). The next step is to show that the trajectories in the differential game Γ compatible withαε approach asymptoticallyC up toε. Finally one proves that theω- limit set of any trajectory compatible with someαis a discriminating domain. Explicitly, let D(α)=.
θ≥0cl{x[x0, α(w),w](t);t≥θ, w∈V}, whereclis the closure operator.
Lemma 2.1 D(α)is a nonempty compact discriminating domain for Player1givenf. In particular, approachability and,approachability coincide.
2.6 On Strategies in the Differential and Repeated Games
This part describes explicitly the construction of an approachability strategy in the repeated gameGstarting from a preserving strategy inΓ. The idea of the construction is the follow- ing:
(a) Given a NAα)∈A), construct an approximation in term of range by a NADα∈A. (b) When applied toα)preservingC(hence approachingC), this leads toα∈Aapproach-
ingC.
(c) This NADαgenerates an,approachability strategy in the repeated gameG,. (d) Finally,approachability strategies inG,induce approachability strategies inG.
For step (a), recall the definition of the range associated to a nonanticipative strategy α)∈A):
R(α), t)=cl/
y∈Rk∃v∈V, y=x,
x0, α)(v),v- (t)0
.
The next result is due to Cardaliaguet [10] and is inspired by the “extremal aiming” method of Krasovskii and Subbotin [32]. It is very much in the spirit of proximal normals and approachability.
Proposition 2.1 For anyε >0,T >0and any NAα)∈A),there exists some NADα∈A such that,for allt∈ [0, T]and allv∈V:
dR(α),t)
%x,
x0, α(v),v- (t)&
≤ε.
The efurther result relies explicitly on the specific form (3) of the dynamicsf inΓ and extends the approximation from a compact interval toR+.
Proposition 2.2 Fixz∈Rk.For anyε >0and any NAα)∈A)in the gameΓ,there exists some NADα∈Asuch that,for allt≥0and allv∈V:
dR(α),t)
%zt,α(v),v, z-&
≤ε.
In particular this leads to step (b)
Proposition 2.3 LetCbe aB-set.For anyε >0there is some NADαin the gameΓ and someT such that for anyvinV
dC
%γ,α(v),v- (t)&
≤ε, ∀t≥T .
Step (c) is now to use the delay to define a strategy that depends only on the past moves.
Henceαinduces anε-approachability strategyσ,forCinG,. The last step (d) is
Proposition 2.4 Givenσ,a strategy that,approachCup toε >0in the gameG,,there existsσa strategy that approachCup to2εin the gameG.
and the idea is to use a martingale inequality to compare the trajectory and the trajectory “in law”.
2.7 Remarks
(1) In both cases, the main ideas to represent a RG as a DG is first to take as state variable either the total payoff or the average payoff but in both cases the corresponding dynam- ics is (asymptotically) smooth; the second aspect is to work with expectation so that the trajectory is deterministic.
(2) For recent extension of approachability conditions for games with signals on the out- come, see Perchet [59] or on more general spaces, see Lehrer [39], Perchet and Quin- campoix [60].
3 Recursive Structure of Compact Repeated Games and Shapley Operator
The simplest incarnation of the recursive formula for repeated games is the following equa- tion, due to Shapley [72], involving the discounted value of finite stochastic game
vλ(ω)=valX×Y
1
λg(ω, x, y)+(1−λ)"
ω)
Q(ω, x, y)[ω)]vλ(ω)) 2
, ∀ω∈Ω (6) whereQstands for the transition probability on the state spaceΩ,X=∆(I ) (the set of probabilities on X),Y =∆(J )and forh:I×J →R,h(x, y)=Ex,yh. It expresses the value of the game today as a function of the current payoff and the value from tomorrow on.
Two ingredients are concerned. First, the play of the game: the initial stateωis known by both players and also the new state will be, hence one can perform the analysis for each state separately: the “state” of the stochastic game is the natural “state variable” for the recursive formula. Second, the evaluation measure today and its decomposition between the weight today and the conditional distribution on stages from tomorrow on.
We will describe several extensions: first to general repeated game forms that correspond to point one above, then to general evaluation measures, point two.
3.1 Recursive Formula
We recall briefly that a recursive structure leading to an expression similar to (6) holds in general for two-person zero-sum repeated games described as follows:
M is a parameter space andga function fromI×J×MtoR. For eachm∈Mthis defines a two-person zero-sum game with action spacesIandJ for Player 1 and 2 respec- tively and payoff functiong(m,·). (Again to simplify the presentation we will consider the case where all sets are finite, avoiding in particular measurability issues.)
The initial parameterm1is chosen at random and the players receive some initial infor- mation about it, say a1 (resp.b1) for Player 1 (resp. Player 2). This choice is performed according to some initial probabilityπonM×A×B, whereAandBare the signal sets of both players.
At each stagen, Player 1 (resp. 2) chooses an actionin∈I(resp.jn∈J). This determines a stage payoffgn=g(mn, in, jn), wheremnis the current value of the parameter. Then a new value of the parameter is selected and the players get some information. This is generated by a mapQfromM×I×J to probabilities onM×A×B. Hence at stage na triple (mn+1, an+1, bn+1)is chosen according to the distributionQ(mn, in, jn). The new parameter ismn+1, and the signalan+1(resp.bn+1)is transmitted to Player 1 (resp. Player 2). Note that each signal may reveal some information about the previous choice of actions(in, jn)and both the previous (mn) and the new (mn+1) values of the parameter.
Stochastic gamescorrespond to public signals including the parameter, see Sorin [77], Neyman and Sorin [56], for a general presentation.Incomplete information gamescorre- spond to absorbing transition on the parameter and no further information (after the initial one) on the parameter (that remains fixed), see Aumann and Maschler [2], Sorin [77].
A play of the game induces a sequencem1, a1, b1, i1, j1, m2, a2, b2, i2, j2, . . .while the information of Player 1 before his play at stage n is a 1-private history of the form (a1, i1, a2, i2, . . . , an) and similarly for Player 2. The corresponding sequence of payoffs isg1, g2, . . .. Note that it is not known to the players except if included in the signals.
A strategyσ for Player 1 is a map from 1-private histories to∆(I ), the space of prob- abilities on the setI of actions: it defines the probability distribution of the stage action
as a function of the past known to Player 1; τ is defined similarly for Player 2. Such a couple(σ, τ )induces, together with the components of the game,π andQ, a probability distribution on plays, hence on the sequence of payoffs.
The recursive structure relies on the construction of the universal belief space, Mertens and Zamir [52], that represents the infinite hierarchy of beliefs of the players: Ω = M×Θ1×Θ2, whereΘi, homeomorphic to∆(M×Θ−i), is the type set of Playeri. A consis- tent probabilityρonΩis such that the conditional probability induced byρatθicoincides withθiitself, as elements of∆(M×Θ−i). The set of consistent probabilities isP⊂∆(Ω).
The signaling structure in the game, just before the actions at stagen, describes an infor- mation scheme (basically a probability onM× ˆA× ˆBwhereAˆ is a general signal space to Player 1 and the same for Player 2) that induces a consistent probability Pn∈P, see Mertens, Sorin and Zamir [49], Sections III.1, III.2, IV.3. This is referred to as the “entrance law”. Taking into account the existence of a value for the repeated game, we suppose that the strategies used are announced to the players. The entrance lawPnand the (behavioral) strategies at stagen(sayαnandβn), which are maps from type set to mixed action set, deter- mine the current payoff and the new entrance lawPn+1=H (Pn, αn, βn). This updating rule is the basis of the recursive structure andPis the “state space” for the recursive structure.
The stationary aspect of the repeated game is expressed by the fact thatH does not depend on the stagen.
The (generalized) Shapley operator is defined on the set of real bounded functions onP by:
Ψ(f )(P)=sup
α
infβ
/g(P, α, β)+f%
H (P, α, β)&0
. (7)
It is natural to introduce also the projective version Ψ(f )(P)=sup
α
infβ
/f%
H (P, α, β)&0
(8) that will be crucial in the asymptotic analysis.
Then the usual relations hold, see Mertens, Sorin and Zamir, (1994) Section IV.3, for the finitely repeated game:
(n+1)vn+1(P)=valα×β
/g(P, α, β)+nvn
%H (P, α, β)&0
(9) and the discounted game:
vλ(P)=valα×β
/λg(P, α, β)+(1−λ)vλ
%H (P, α, β)&0
(10) wherevalα×β=supαinfβ=infβsupαis the value operator for the “one stage game atP”.
This representation corresponds to a “deterministic” stochastic game on the state spaceP⊂
∆(Ω). Hence to each compact repeated gameG one can associate an auxiliary gameΓ having the same values that satisfy the recursive equation. However the play, hence the strategies in both games differ.
In the framework of a stochastic game with state space Ω, this representation would correspond to the level of probabilities on the state spaceP⊂∆(Ω). One recovers the initial Shapley formula (6) withPbeing the Dirac mass atω, then(α, β)reduces to(x, y)(i.e. only theωcomponent of(∆(I )×∆(J ))Ωis relevant),H (P, α, β)corresponds toQ(ω, x, y)and finallyvλ(H (P, α, β))=EQ(ω,x,y)vλ(.). The projective version is
Ψ(f )(ω)=valX×Y1"
ω)
Q(ω, x, y)[ω)]f (ω)) 2
. (11)
Let us describe the framework of repeated games with incomplete information (indepen- dent case with perfect monitoring).Mis a product spaceK×L,πis a product probability p⊗q withp∈∆(K),q∈∆(L)and in additiona1=kandb1=8. Given the parameter m=(k, 8), each player knows his own component and holds a prior on the other player’s component. From stage 1 on, the parameter is fixed and the information of the players after stagenisan+1=bn+1= {in, jn}.
The auxiliary stochastic gameΓ)corresponding to the recursive structure can be taken as follows: the “state space”M)is∆(K)×∆(L)and is interpreted as the space of beliefs on the true parameter.X=∆(I )KandY=∆(J )Lare the type-dependent mixed action sets of the players;gis extended onX×Y×M)byg(p, q, x, y)=!
k,8pkq8g(k, 8, xk, y8). Given (p, q, x, y), letx(i)=!
kxikpkbe the probability of actioni andp(i)be the conditional probability onK given the actioni, explicitlypk(i)= px(i)kxik (and similarly fory andq).
Since the actions are announced in the original game, and the strategy are known in the auxiliary game, these posterior probabilities are known by the players and we can work withΩ=M)and take asPthe set∆(M)). Finally the transitionQ(fromM)to∆(M))) is defined by the following procedure:Q(p, q, x, y)(p), q))=!
i,j;(p(i),q(j ))=(p),q))x(i)y(j ).
The resulting form of the Shapley operator is Ψ(f )(p, q)=sup
x∈Xinf
y∈Y
1"
k,8
pkq8g%
k, 8, xk, y8&
+"
i,j
x(i)y(j )f%
p(i), q(j )&2 (12) where with the previous notations
"
i,j
x(i)y(j )f%
p(i), q(j )&
=EQ(x,y,p,q)
,f (p), q))-
=f%
H (p, q, x, y)&
and the projective version writes Ψ(f )(p, q)=sup
x∈Xinf
y∈Y
1"
i,j
x(i)y(j )f%
p(i), q(j )&2
. (13)
The corresponding equations forvnandvλ are due to Aumann and Maschler, see [2], and Mertens and Zamir [50]. Recall that the auxiliary gameΓ)is “equivalent” to the original one in terms of values but uses different strategy spaces. In fact in the original game the strategy of the opponent is unknown, hence the computation of the posterior distribution is not feasible.
3.2 General Partition and Games in Continuous Time
Similarly the recursive structure extends to arbitrarily evaluation of the stream of stage pay- offs. First notice that the generalized Shapley operator can be used to describe any positive combination of weights between the past and the future like
S[c, c)](f )(P)=sup
α
infβ
/cg(P, α, β)+c)f%
H (P, α, β)&0
(14) withc >0,c)>0, since one has
S[c, c)](f )=cΨ (c)
cf )
andΨ =S[1,1]. Note also that the projective operatorΨ=S[0,1]appears as the recession operator associated toΨ. Following this line, one has the classical fixed point formula for the discounted value
vλ=S[λ,1−λ](vλ) (15)
and the recursive formula for thenstage value vn=S
31 n,1−1
n 4
(vn−1) (16)
with obviouslyv0=0.
Consider now an arbitrarily evaluation probability µon N,. We can approximate the corresponding value uniformly by considering measures with finite support. Thenµinduces a finite partitionΠof[0,1]witht0=0, tk=!k
m=1µm, tN =1. Thus the repeated game is naturally represented as a game played between times 0 and 1, where the actions are constant on each subinterval(tk−1, tk): its lengthµkis the weight of stagekin the original game. Let vΠ be its value. The recursive equation is
vΠ=val/
t1g+(1−t1)EvΠt1
0=S[t1,1−t1](vΠt1)
whereΠt1is the normalization on[0,1]of the trace of the partitionΠon the interval[t1,1]. Define nowVΠ(tk)as the value of the game starting at timetk, hence withN−kstages and total weight!N
m=k+1µm. One obtains the alternative recursive formula VΠ(tk)=val/
(tk+1−tk)g+EVΠ(tk+1)0
=S[tk+1−tk,1]%
VΠ(tk+1)&
. (17)
The stationarity property of the game form induces time homogeneity
VΠ(tk)=(1−tk)VΠtk(0) (18) where, as above, Πtk stands for normalization of Π restricted to the interval[tk,1]. By taking the linear extension we define this way for every finite partitionΠ, a functionVΠ(t) on[0,1].
3.3 Further Extensions 3.3.1 Non Expansive Maps
Given a non expansive mapTon a Banach space, one defines inductively Wn=T(Wn−1)=Tn(0)
and forλ∈ ]0,1[:
Wλ=T%
(1−λ)Wλ
&
. Then Wnn andλWλplay the role ofvnandvλ, see (16), (15).
3.3.2 Random Duration(Neyman and Sorin [57])
An uncertain duration processΘ= #(A,B, µ), (sn)n≥0, θ$is a triple whereθ is an integer- valued random variable defined on a probability space (A,B, µ) with finite expectation E(θ ), and each signalsnis a measurable function defined on the probability space(A,B, µ) with finite rangeS. An equivalent representation is through a random tree with finite ex- pected length where the nodes at distancencorrespond to the information sets at stagen.
Givenζn, known to the players, its successor at stagen+1 is chosen at random according to the subtree defined byΘatζn. One can define the random iterateTΘof a non expansive map, Neyman [54]. Then a recursive formula analogous to (17) holds.
3.3.3 Non Stationary Set Up
Note that some of the results above like (17) have a natural extension to a non stationary set up and are very similar to the value of time discretization of a quantitative differential game.
4 Asymptotic Approach
We consider now the asymptotic behavior ofvnasngoes to∞, orvλasλgoes to 0.
4.1 Basic Results
Concerning games with incomplete information on one side the first results proving the existence of limn→∞vn and limλ→0vλ are due to Aumann and Maschler (1966), see [2], including in addition an identification of the limit as Cav∆(K)u. Here u(p)= val∆(I )×∆(J )!
kpkg(k, x, y)is the value of the one shot non revealing game, where the informed player does not use his information and CavC is the concavification operator:
givenφ, a real bounded function defined on a convex setC,CavC(φ)is the smallest func- tion greater thanφand concave, onC.
Extensions of these results to games with lack of information on both sides were achieved by Mertens and Zamir [50]. In addition they identified the limit as the only solution of the system of implicit functional equations with unknownφ:
φ(p, q)=Cavp∈∆(K)min{φ, u}(p, q),
(19) φ(p, q)=Vexq∈∆(L)max{φ, u}(p, q).
Here again u stands for the value of the non revealing game: u(p, q) = valX×Y!
k,8pkq8g(k, 8, x, y)and we will writeMZfor the operator corresponding to (19)
φ=MZ(u). (20)
Mertens and Zamir provided two proofs for this result: the first part is the same and shows by using the recursive structure and constructing sophisticated reply strategies that h=lim infvnsatisfies
h(p, q)≥Cavp∈∆(K)Vexq∈∆(L)max{h, u}(p, q). (21)
Then one proof shows that Player 2 can achieve asymptotically any function satisfying (21).
The second proof constructs inductively dual sequences of functions {cn}with c0≡ −∞
and
cn+1(p, q)=Cavp∈∆(K)Vexq∈∆(L)max{cn, u}(p, q) and similarly{dn}, that converges respectively tocanddsatisfying
c(p, q)=Cavp∈∆(K)Vexq∈∆(L)max{c, u}(p, q),
(22) d=Vexq∈∆(L)Cavp∈∆(K)min{d, u}(p, q).
Now acomparison principleis used to deducec≥d. By contradiction, otherwise consider an extreme point(p0, q0)of the (convex hull of the) set whered−cis maximal. Then one shows that theVexandCavoperators in the above formula (19) at(p0, q0)are trivial which impliesc(p0, q0)≥u(p0, q0)≥d(p0, q0).
Recall that in this framework the uniform value may not exists, see Section6.1.
As for stochastic games, the existence of limλ→0vλin the finite case (Ω, I, J finite) is due to Bewley and Kohlberg [4] using algebraic arguments: the equation (6) can be written as a finite set of polynomial equalities and inequalities involving{xλk, yλk, vλ(k), λ}thus it defines a semi-algebraic set in some Euclidean spaceRN, hence by projectionvλ has an expansion in power series. The existence of limn→∞vnis obtained by comparison, Bewley and Kohlberg [5], see Theorem (4.1).
4.2 Operator Approach
4.2.1 Non-Expansive Monotone Maps
As in Section3.3.1we define similar iterates for an operatorTmappingFto itself, where Fis a subset of the setF0of real bounded functions on some setΩ. Assume:
(1) Fis a convex cone, containing the constants and closed for the uniform norm.
(2) Tis monotonic and translates the constants. (In particularTis non expansive.) Define
Vn=Tn[0], Vλ=T[(1−λ)Vλ] hence by normalizingVn=nvn,vλ=λVλand introducing
Φ(ε, f )=<T 31−ε
ε f 4
one obtains as before
vn=Φ (1
n, vn−1
)
, vλ=Φ(λ, vλ)
and we consider the asymptotic behavior of these families of functions which thus relies on the properties ofΦ(ε,·), asεgoes to 0. Recall that in the case of a Shapley operator one has S[ε,1−ε](f )=Φ(ε, f ). Obviously any accumulation pointwof the familyvnorvλwill satisfy
w=Φ(0, w) (23)
hence is a fixed point of the projective operator (8).
A general result in this framework is due to Neyman [54]:
Theorem 4.1 Ifvλis of bounded variation in the sense that for any sequenceλidecreasing to0
"
i
.vλi+1−vλi.<∞ (24) thenlimn→∞vn=limλ→0vλ.
Let us define sets of functions that will correspond to upper and lower bounds on the families of values following Rosenberg and Sorin [70], Sorin [80].
4.2.2 Uniform Domination
LetL+be the set of functionsf ∈F that satisfy: there exists R0≥0 such thatR≥R0
implies
T(Rf )≤(R+1)f. (25)
L+is defined in a dual way.
Theorem 4.2 Iff∈L+, lim supn→∞vnandlim supλ→0vλare less thanf. Note that the above condition is equivalent to
Φ(ε, f )≤f forε >0 small enough.
Corollary 4.1 In particular if the intersection of the closure ofL+andL−is not empty, then bothlimn→∞vnandlimλ→0vλexist and coincide.
4.2.3 Pointwise Domination
More generally, when the setΩis not finite, one can introduce the larger classS+of func- tions satisfying
θ+(f )(ω)=lim sup
R→∞
/T(Rf )(ω)−(R+1)f (ω)0
≤0, ∀ω∈Ω.
S−is defined similarly withθ−(f )≥0.
Theorem 4.3 AssumeΩcompact.LetS0+(resp.S0−)be the set of continuous functions in S+(resp.S−).Then
f+≥f−
for anyf+∈S0+andf−∈S0−.Hence the intersection of the closures ofS0+andS0−contains at most one point.
Thas the recession property if limε→0Φ(ε, f )(ω)=limε→0εT(fε)(ω)=RT(f )(ω)ex- ists. The next result is due to Vigeral [89].
Theorem 4.4 Assume thatThas the recession property and is convex.Then the family{vn} (resp.{vλ})has at most one accumulation point.
The proof uses the inequalityRT(x+y)≤T(x)+RT(y)and relies on properties of the family of operatorsTmdefined by
Tm(f )= 1
mTm(mf ).
4.2.4 Application to Games
The uniform domination property allows to prove existence and equality of limvnand limvλ in the following classes:
Theorem 4.5(Rosenberg and Sorin [70]) Absorbing games with compact action spaces.
These are stochastic games where the state changes at most once. Notice that the alge- braic approach cannot be used.
Theorem 4.6(Sorin [79], Vigeral [90]) Recursive games with finite state space and compact action spaces.
These are stochastic games where the payoff is zero on non absorbing states, Everett [27].
The pointwise domination property is used to prove existence and equality of limvnand limvλthrough the derived game, see Section4.2.3, in the following cases:
Theorem 4.7 (Rosenberg and Sorin [70]) Games with incomplete information on both sides.
Recall that the first proof is due to Mertens and Zamir [50]. Any accumulation pointw of the familyvλ (resp.vn) asλ→0 belongs to the closure ofS+, hence by symmetry the existence of a limit follows.
Theorem 4.8(Rosenberg [65]) Finite absorbing games with incomplete information on one side.
This is the first proof in this area where both stochastic and information aspects are present.
Theorem 4.9(Vigeral [89]) Existence oflimvn(resp. limvλ)in all games where one player controls the transition and the family{vn}(resp.{vλ})is relatively compact.
This follows from the property for convexTand applies in particular for (finite) dynamic programming, games with incomplete information on one side and mixture of those.
4.2.5 Derived Game(Rosenberg and Sorin [70])
Still dealing with the Shapley operator, one can use the existence of a pointwise limit:
ϕ(f )(ω)= lim
ε→0+
Φ(ε, f )(ω)−Φ(0, f )(ω)
ε .
ϕ(f )(ω)is the value of the “derived game” with payoffg(ω, x, y)−E(ω,x,y)f, played on the product of the subsets of optimal strategies in the gameΦ(0, f ).
In the setup of games with incomplete information on both sides (as well as in absorbing games, following an idea of Kohlberg [30]), any accumulation point of the sequence of values is close to a functionf withϕ(f )≤0. This implies the existence, by Theorem4.3, and the characterization of the asymptotic value as follows:
LetEf be the projection of the extreme points of the epigraph off. Thenv=limvn= limvλis a saddle continuous function satisfying both inequalities:
p∈Ev(·, q) ⇒ v(p, q)≤u(p, q), q∈Ev(p,·) ⇒ v(p, q)≥u(p, q) (26) whereuis the value of the non revealing game.
Then one shows that this recovers the characterization of Mertens and Zamir, (19), see also Laraki [33].
4.2.6 Random Duration Process(Neyman and Sorin [57])
The recursive formula for random duration processes implies thatvΘ=TE(θ )Θ(0) has a limit, asE(θ )goes to∞either in finite stochastic games with signals and absorbing or recursive games with compact action sets.
The uniform domination Theorem 4.2 is true for vΘ. Similarly limE(θ )→∞vΘ exists and equalsMZ(u) for games with lack of information on both sides. Finally, assumeΘ monotonic in the sense that the conditional expected duration decreases with time. Then Theorem4.1still holds.
4.3 Comparison Principle
The operator approach using the generalized Shapley operator allows for an alternative proof of existence and characterization of the asymptotic value limvλfor games with incomplete information on both sides, due to Laraki [33]. The recursive equation is
Φ(λ, vλ)(p, q)=sup
x∈Xinf
y∈Y
1
λg(p, q, x, y)+(1−λ)"
i,j
x(i)y(j )vλ
%p(i), q(j )&2
=vλ(p, q). (27)
Remark that the family of functions{vλ(p, q)}is uniformly Lipschitz, hence relatively com- pact. To prove convergence it is enough to show that there is only one accumulation point.
Note first that any accumulation pointwsatisfies
Φ(0, w)=w (28)
i.e. is a fixed point of the projective operator (13). Assume now thatw1andw2,w1≥w2are two different accumulation points and let(p0, q0)an extreme point of the (convex hull of) the set where the differencew1−w2is maximal. Using (28) and the fact that all functions involved are saddle, this implies that the set X(0, w1)(p0, q0) of profile of mixed actions x∈X optimal inΦ(0, w1)is included in the setNRX(p0)of non revealing actions atp0
(meaning that for any moveihaving positive probabilityp(i)=p0). Consider a familyvλn
converging tow1and letxnbe optimal forΦ(λn, vλn)(p0, q0). Jensen’s inequality leads to vλn(p0, q0)≤λng(p0, q0, xn, j )+(1−λn)vλn(p0, q0), ∀j∈J
thusvλn(p0, q0)≤g(p0, q0, xn, j )hence lettingx¯being an accumulation point of the family {xn}, thus inNRX(p0)(by upper semi continuity) one obtains asλngoes to 0:
w1(p0, q0)≤g(p0, q0,x, j ),¯ ∀j∈J
which impliesw1(p0, q0)≤u(p0, q0). The dual property implies convergence. Moreover one recovers the characterization through the variational inequalities (26) hence one identi- fies the limit asMZ(u).
Consider the framework of splitting games, Sorin [77]. Recall that givenHfrom∆(K)×
∆(L)toRthe corresponding Shapley operator is defined on real functionsf on∆(K)×
∆(L)by
Ψ(f )(p, q)= sup
µ∈MpK
inf
ν∈MLq
#
∆(K)×∆(L)
,H (p, q)+f (p, q)-
µ(dp)ν(dq)
whereMpKstands for the set of probabilities on∆(K)with expectationp(and similarly for MqL). A procedure analogous to the previous one has been developed by Laraki [33,34,36].
It allows, by introducing a family of discounted games and identifying the limit of the val- ues, to extend the range and the properties of theMZoperator, in particular to product of polytopes and Lipschitz functions.
4.4 The Limit Game
In addition to the convergence of the values, one could look for a normal form gameG on [0,1]with strategy setsUandVand valuewsuch that:
(1) the play at timetinGwould be similar to the play at stage[tn]inGn(or at the fraction tof the total weight of the game for general evaluation)
(2) ε-optimal strategies inGwould induce 2ε-optimal strategies inGn, for largen.
Obviously then, limvnexists and isw.
One example was explicitly described (strategies and payoff) for the Big Match with incomplete information on one side in Sorin [74].Vis the set of measurable mapsf from [0,1]to∆(J ). Hence Player 2 playsf (t )at timet and the associated strategy inGn is a piecewise constant approximation.Uis the set of profiles of stopping times{ρk}, k∈K, i.e.
increasing maps from [0,1]to[0,1]andρk(t)is the probability to stop the game before timet if the private information isk. The corresponding strategy of Player 1 inGnhas to satisfy the property that the probability of stopping the game before stagemis ρ(mn). In the initial Big Match of Blackwell and Ferguson [7] (with complete information) one has f (t )≡12andρ(t )=t.
The auxiliary differential games introduced by Vieille [87] to study in weak approacha- bility, Section2.3is also an example of a limit game.
The procedure is very similar to the approximation scheme of Souganidis [83] with F (ρ, f )=val{ρg+Ef} =ρΦ(fρ), however the limit game is not given a priori and the operator is not smooth.
A recent example is in Laraki [37] and deals with absorbing games. Let f (i, j ) be the non absorbing payoff, g(i, j )the absorbing payoff,p(i, j )the probability of non ab- sorption and p,=1−p. For h:I →R defineh onRI by linear interpolationh(ζ )=
!
i∈Iζ (i)h(i). Given a stationary strategyx∈X=∆(I )and a stationary pure strategyj∈
J the payoff in the discounted game isr(λ, x, j )=λf (x, j )+(1−λ)[p(x, j )r(λ, x, j )+
!
ix(i)p,(i, j )g(i, j )]. Define the absorbing part of the payoffa(i, j )=p,(i, j )g(i, j )then r(λ, x, j )=λf (x, j )+(1−λ)a(x, j )
λp(x, j )+p,(x, j ) and
vλ= max
x∈∆(I )min
j∈Jr(λ, x, j ).
Letw=limvλn an accumulation point of the values andxnan optimal stationary strategy of Player 1 inGλn. Thus
vλn≤λnf (xn, j )+(1−λn)a(xn, j )
λnp(xn, j )+p,(xn, j ) , ∀j∈J. (29) Assume thatxnconverges tox.
Ifp,(x, j ) >0, going to the limit in (29) impliesw≤pa(x,j ),(x,j ). Otherwise, lettingαn=xλnn one has
vλn≤f (xn, j )+(1−λn)a(αn, j ) p(xn, j )+p,(αn, j ) hence for everyε, there existsα∈A=(R+)I such that
w≤f (x, j )+a(α, j ) 1+p,(α, j ) +ε.
Thus
w≤ sup
x∈X,α∈A
minj∈J
3a(x, j )
p,(x, j )1p,(x,j )>0+f (x, j )+a(α, j )
1+p,(α, j ) 1p,(x,j )=04
=W. (30) On the other hand let(x, α),ε-optimal in (30). Consider nowx[λ]proportional tox+λα, as a strategy of Player 1 inGλ, forλsmall enough and letj be a best reply, that one can take constant on a subsequence ofλn. Thenvλn≥r(λn, x[λn], j )with
r%
λn, x[λn], j&
=λnf (x[λn], j )+(1−λn)a(x[λn], j ) λnp(x[λn], j )+p,(x[λn], j ) which is
r%
λn, x[λn], j&
=λnf (x, j )+(λn)2f (α, j )+(1−λn)[a(x, j )+λna(α, j )] λnp(x, j )+(λn)2p(α, j )+p,(x, j )+λnp,(α, j ) . Hence ifp,(x, j ) >0, the limit ofr(λn, x[λn], j )is pa(x,j ),(x,j ) and is otherwise f (x,j )1+p,+(α,j )a(α,j ),if p,(x, j )=0.
It follows thatw≥W.
Note that the proof shows that limvλexists and is the value of the auxiliary gameGwith U=X×A,V=Y×Band payoff function
L(x, α, y, β)= a(x, y)
p,(x, y)1p,(x,y)>0+f (x, y)+a(α, y)+a(x, β)
1+p,(α, y)+p,(x, β) 1p,(x,y)=0. Given a strategy(x, α)inG, its image inGλisx+λα(normalized).