FULL LENGTH PAPER
Exponential weight algorithm in continuous time
Sylvain Sorin
Received: 7 May 2005 / Accepted: 25 July 2006
© Springer-Verlag 2007
Abstract The exponential weight algorithm has been introduced in the framework of discrete time on-line problems. Given an observed process{Xm}m=1,2,...the input at stage m+1 is an exponential function of the sum Sm = !m
!=1X!. We define the analog algorithm for a continuous time process Xt and prove similar properties in terms of external or internal consistency. We then deduce results for discrete time from their counterpart in continuous time. Finally we compare this approach to another continuous time approximation of a discrete time exponential algorithm based on the average sumSm/m.
Keywords Exponential weight algorithm·Continuous time·Consistency Mathematics Subject Classification (2000) 68W·91A·90C
1 Presentation
We consider the exponential weight algorithm and its consistency properties in the ba- sic prediction problem for individual sequences. The purpose of the paper is to define a continuous time version of this algorithm and to prove that it satisfies analogous con- sistency properties. In fact this proofs are rather short and one of the main advantages of this approach is to deduce independently discrete time results from their continuous
S. Sorin
Equipe Combinatoire et Optimisation, UFR 929, Université P. et M. Curie-Paris 6, 175 Rue du Chevaleret, 75013 Paris, France
S. Sorin (
B
)Laboratoire d’Econométrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris, France e-mail: [email protected]
time counterpart. This interface between discrete time and continuous time processes has been very effective for a long period, see e.g. [6]. However the approach is quite different from the usual (asymptotic) approximation through differential equations (or inclusions), see e.g. [3,4] for related examples. This is due to the choice of the state variable of the process and is discussed in the last Sect.6.
The model is as follows:{Xn}denotes a sequence of vectors in[0,1]K. At each stagen, a predictor having observed the past realizations X1, . . . ,Xn−1, chooses a componentkninK. The corresponding outcome isxn=Xnkn. Denote the past history (X1,k1, . . . ,Xn−1,kn−1)byhn−1, and the inducedσ-algebra byHn−1. An algorithm or a strategyσin the prediction problem is specified by the choice ofpn(hn−1)∈∆(K) (the simplex ofIRK), which is the law ofkngiven the past historyhn−1. Note that the law ofXnmay also depend onhn−1.
Given a sequence{um}, letu¯n= 1n
!n
m=1umdenote the average of thenfirst terms.
The average external regret evaluation at stagenis the vectorrn = {rnk}k∈K defined by:
rnk = ¯Xnk− ¯xn.
It compares the actual (average) payoff to each payoff corresponding to a constant component choice, see [8,9,11].
Definition A strategyσ satisfies external consistency if, for every process{Xm}: maxk∈Krnk+−→0 a.s., as n−→ ∞.
Remarks (1) In the framework of a repeated finiteI-person game defined by action setsJi,i ∈ I, letM : J ="
i Ji →IRbe the payoff function of player 1, the predictor. Here,K = J1andXnis the vectorM(.,jn−1)∈IRKcorresponding to the payoff induced by the profile of actionsjniby each playeri,i%=1 at stagen.
UsuallyMis known, and jn−1announced to player 1 who then knows the vector Xn. In the current situation one does not assumeMknown by player 1 (not even Ji,i%=1), but only his own action setK, and the fact that all payoffs belong to [0,1]: he is just told the vectorXn.
(2) To obtain robust results no assumption is made on the sequence{Xn}: the pre- dictor is not “Bayesian”.
The content of the paper is the following. Section2recalls the main results concern- ing the discrete time exponential weight algorithm. Section3introduces the continuous counterpart and its properties. Section4deduces, from the continuous time results, the discrete time analogs: this gives an alternative simple proof of properties of the expo- nentional weight algorithm. In Sect.5two extensions are described: restriction on the information and internal consistency. Finally Sect.6 discusses the relation between discrete and continuous time, for algorithms based on the average sumSn/n.
2 The exponential weight algorithm: discrete time
Notation Given a vectorx%=0 inIRK+,![x]denotes its normalization in the simplex
∆(K):
![x]k = xk
!K j=1xj.
Definition The discrete exponential weight(E W)algorithm (see e.g. [1,10,17]) with positive parameter A,E W(A), is defined by pn+1(hn)= pn+1 =![{pkneAXkn}k]or equivalently
pkn+1= ex p(A!n m=1Xkm)
!
jex p(A!n m=1Xmj)
.
Recall that pn+1describes the law of the random choicekn+1.
An alternative definition is E W∗(α),where αis a positive parameter, and pn+1 =
![{(1+α)Skn}k]withSn =!n m=1Xm.
For sake of completeness and to compare with the continuous time argument, we reproduce the basic property, following [1].
Proposition 2.1 σ (n)=E W∗(1/√n)satisfies conditional expected external consis- tency in the following sense:there exists a constant M such that,for any component k and any process{Xm}:
¯ Xkn−1
n
#n m=1
Eσ (n)(xm|Hm−1)≤ M/√
n. (1)
Proof LetWn =!
k(1+α)Skn, hence (recall that 0≤ Xkm ≤1) Wn+1
Wn =#
k
(1+α)Skn(1+α)Xkn+1
Wn
=#
kpkn+1(1+α)Xkn+1
≤#
kpkn+1(1+αXnk+1)
=1+α)pn+1,Xn+1*. It follows that
log
$Wn
W0
%
≤α
#n m=1
)pm,Xm*
and
#n m=1
)pm,Xm* ≥ 1
α(Snklog(1+α)−logK) sinceWn ≥(1+α)Snk, for allkinK.
Thus forαsmall enough one has 1
n
#n m=1
)pm,Xm* ≥ ¯Xnk(1−α/2)−logK/αn.
The choice ofα=1/√nleads to maxk X¯kn−1
n
#n m=1
)pm,Xm* ≤M/√n
for some constantM.
Note that)pm,Xm* = Eσ (n)(xm|Hm−1)so that the above inequality gives the required
result. ,-
Obviously this implies:
Corollary 2.1 σ (n)satisfies expected consistency:
Eσ (n)&
rkn+'
→0 as n→ ∞.
One way to obtain almost sure convergence to 0 is to use the next basic martingale property (see e.g. [13]).
Proposition 2.2 Let Um be a sequence of uniformly bounded (even in L2) random variables on a probability space (Ω,F,P), adapted to a filtration Fm. Assume E(Um|Fm−1)=0, then
1 n
#n m=1
Um →0 as n→ ∞.
Let nowσ be defined as follows: given a sequencenm going to∞, useσ (n1)for K1 ≥ n2/√n1blocks of sizen1(where the entry on blockmfor runningσ (n1)is Xmn1+!, !=0, ...,n1−1) then inductivelyσ (nm)for Km ≥nm+1/√nm blocks of sizenm. Propositions2.1and2.2thus imply the following result:
Theorem 2.1 σsatisfies external consistency.
Remarks (1) The optimal choice of the parameter, Aorα, in Proposition2.1, is a function of the lengthnof the process (see the discussion in Sect.6).
(2) To implement the algorithm the actual past play of the predictor (namely the sequence{km}) is not used. (This property also holds true for other algorithms like Fictitious Play or Smooth Fictitious Play [11]).
(3) Invariance by translation: one can use the vector of total payoffs( Snk)
k or the vector of total regrets ( !n
m=1rmk = Snk −!n m=1xm)
k to define pn, since pn+1 = !*(
pnkeA(Xnk+Yn))
k
+for any Yn ∈ IR. This property is specific to the exponential weight and is not shared by regret procedures based on Blackwell’s approachability [5,14]).
(4) It is enough to satisfy a property like: for alln, there existsσ (n)such that
#n m=1
&
Xmk − )pm,Xm*'
≤o(n)
(uniformly in{Xm}) to obtain Theorem2.1by definingσthrough concatenation as above and then using Proposition2.2.
3 The multiplicative weight algorithm: continuous time
Given a measurable processXt,t ≥0, with values in[0,1]K, letSt =,t
0Xsds =tX¯t. Definition A measurable processpt ∈∆(K)is a continuous time exponential weigth algorithm(CT E W)if it satisfies
pt =!*(
exp&
Stk')
k
+. LetWt =!
kexp&
Stk'
so that ptkWt =exp&
Stk' .
Theorem 3.1 Conditional expected external consistency holds for CTEW in the sense that, for any T >0and any k:
1 T
/T
0
Xksds− /T
0
)ps,Xs*ds
≤ logK T . Proof One has
˙ Wt =#
k
exp2 Stk3
Xkt =#
k
Wt pkt Xkt = )pt,Xt*Wt. Hence
Wt =W0exp
/t
0
)ps,Xs*ds
.
Thus,Wt ≥exp&
Stk'
for everyk, implies:
/t
0
)ps,Xs*ds≥ /t
0
Xskds−logW0.
,-
Remark The interpretation is that the conditional law of the choicekt at timet, given the past, is pt. Letxt = Xktt be the outcome at timet, then the average regret vector at timetis
rtk = ¯Xtk− ¯xt and the expectation ofx¯t is given by
E(x¯t)=E
1 t
/t
0
)ps,Xs*ds
.
The previous proof holds as well while replacing the integralSt of the process by the integral of the (conditional expected) regretRt defined by
Rtk= /t
0
2Xks − )ps,Xs*3 ds.
Note that pt satisfies also
pt =! 45exp2
Rtk36
k
7 . LetVt =!
kexp&
Rkt' . Then:
˙ Vt =#
k
exp2 Rtk3 2
Xkt − )pt,Xt*3
=#
k
Vt ptk 2
Xtk− )pt,Xt*3
=0.
HenceVt is constant andVt ≥expRtk, for everyk, implies /t
0
)ps,Xs*ds≥ /t
0
Xskds−logV0.
The same computation as above extends to the following framework:
Proposition 3.1 Let P be aC1function from IRK to IR with∇P ≥0and%=0, such that xk → +∞, for some component k, implies P(x)→ +∞. If pt satisfies
pkt =
∇kP$5,t 0
2Xsj− )ps,Xs*3 ds6
j
%
!
i∇iP$5,t 0
2Xsj− )ps,Xs*3 ds6
j
%
then conditional expected external consistency holds.
Proof LetRt be defined, as above, by
Rtk= /t
0
2Xks − )ps,Xs*3 ds.
One obtains, withMt =!
i∇iP(Rt) d
dtP(Rt)=#
k
∇kP(Rt)2
Xkt − )pt,Xt*3
=Mt
#
k
pkt 2
Xkt − )pt,Xt*3
=0
Hence P(R(t)) = P(0)so that each Rtk is bounded from above and conditional
expected external consistency follows. ,-
The previous case corresponds to P(x)= !
kexpxk; for similar “potential" ap- proaches see [7,15].
4 Convergence
Given a discrete process{Xm}and a correspondingE W algorithm{pm}the aim is to get a bound on
1 n
#n m=1
2Xmk − )pm,Xm*3
from an evaluation of
1 T
/T
0
2Ysk− )qs,Ys*3 ds
whereYtis a continuous process constructed fromXmandqt is aCT E Walgorithm associated toYt.
Proposition 4.1 Given a discrete time process{Xm} ∈[ 0,1]K,m =1, . . . ,n,and T >0, there exists a measurable continuous time process{Yt} ∈[0,1]K,t ∈ [0,T], such that
1 n
#n m=1
Xkm = 1 T
/T
0
Ytkdt
and 1 n
#n m=1
)pm,Xm*e−T/n≤ 1 T
/T
0
)qt,Yt*dt≤ 1 n
#
m
)pm,Xm*eT/n
where{pm}is an E W(T/n)associated to{Xm}and qtis a CTEW associated to{Yt}. Proof LetT >0. Divide the interval[0,T]intonsubintervals with equal lenghtδ= T/nand define, from the discrete time sequence{Xm},m=1, . . . ,n, the continuous time process{Yt}on[0,T]by the step functionYt =Xmfort ∈ [(m−1)T/n,mT/n).
Obviously
1 n
#m
!=1
X!k= 1 T
mT//n
0
Ytkdt.
Let{ ˆpt}be the continuous time process defined from a discrete one{pm}as above : ˆ
pt = pm fort ∈ [(m−1)T/n,mT/n). Clearly also 1
n
#n m=1
)pm,Xm* = 1 T
/T
0
) ˆpt,Yt*dt.
It remains to handle the difference between{ ˆpt}and{qt}which is aCT E Wassociated to{Yt}. For this choose{pm}as the E W(T/n)associated to {Xm}. Then, for t ∈ [(m−1)T/n,mT/n)one has on the one hand
ˆ
pkt = pmk = Wˆtk Wˆt
withWˆtk =exp*
(T/n)!m−1 u=1 Xuk+
=exp& ,(m−1)T/n 0 Yskds'
andWˆs =!
kWˆsk and on the other
qtk = Wtk Wt withWtk =exp& ,t
0Yskds'
andWt =!
kWtk. Thus, since 0≤Ysk ≤1, one deduces
ˆ
Wsk ≤Wsk ≤ ˆWskeδ hence also
Wˆs ≤Ws ≤ ˆWseδ
so that
ˆ
pske−δ≤qsk ≤ ˆpskeδ and
1 T
/T
0
) ˆps,Ys*ds
e−δ≤ 1 T
/T
0
)ps,Ys*ds≤
1 T
/T
0
) ˆqs,Ys*ds
eδ
as well. ,-
We thus obtain an alternative proof of Proposition2.1that we recall:
Lemma 4.1 There exists a EW algorithm satisfing 1
n
#n m=1
2Xkm− )pm,Xm*3
≤ Mn−1/2.
Proof Givenn, chooseT =√nso that:
- the bound in the continuous version is of the order 1/T =1/√n 1
T /T
0
2Ytk− )qt,Yt*3
dt≤ logK
√n
- and the error term with the discrete approximation of the order ofeδ−1∼δ= T/n =1/√n
1 n
#n m=1
)pm,Xm* ≥ 1 T
/T
0
)qt,Yt*dt
−L/√n
so that the result follows from Theorem3.1and Proposition4.1.
,- Remark The choice ofT =√n amounts to takingE W(1/√n), hence as in Sect.3 the procedure is not uniform.
5 Extensions
The same analysis applies to similar setups. We consider briefly two of them: the case of partial information where only the outcomexnis known and the internal consistency criteria.
5.1 Partial information
Consider the framework of Sect.1but where the vector Xnis not revealed ex-post, only the actual chosen componentxnis announced. The aim is to define an algorithm having similar properties as in Sect.2but depending only on the available information.
We follow [1,2].
In discrete time, define inductively the vectorXˆnby
Xˆnk= 8Xk
pknn ifk=kn 0 otherwise.
Letγ ∈(0,1)be a parameter to be tuned later on and define firstSˆn=!n
m=1γXˆm/K, then
ˆ
pnk+1= exp2 ˆ Snk3
!jexp2 ˆ Snj
3.
Finally the strategy at stagen+1 is
pkn=(1−γ )pˆkn+ γ K. Note that
E2 ˆ
Xnk|X1, . . . ,Xn
3=Xkn
and
xn = )pn,Xˆn* hence it is enough, using Proposition2.2to bound
1 n
#n m=1
2Xˆkm− )pm,Xˆm*3 .
The analysis in continuous time is as follows.
Given{Xs}, let{ ˆXs}and{ps}satisfy /t
0
xsds= /t
0
pkssXˆsksds = /t
0
)ps,Xˆs*ds
pks =(1−γ )pˆks +γ /K
andpˆsbe adapted toXˆs as in the usualCT E W. In particular one obtains /t
0
xsds ≥(1−γ ) /t
0
) ˆps,Xˆs*ds
But, as in Sect.3one has for allk /t
0
) ˆps,Xˆs*ds≥ /t
0
ˆ
Xskds−C
hence finally one deduces:
Proposition 5.1
/t
0
xsds≥(1−γ ) /t
0
ˆ
Xksds−C.
The corresponding discrete inequality is now 1
n
#n m=1
xm ≥ 1 n
9
(1−γ )e−δK/γ
#n m=1
Xˆkm :
−C/T
withδ =T/n. The choice ofT =1/γ =n1/3leads to:
Proposition 5.2 There exists M such that 1
n
#n m=1
2Xˆkm− )pm,Xˆm*3
≤ Mn−1/3.
5.2 Internal consistency
Given a historyhn, the average internal regret evaluation at stagenis defined by the K×K matrixrn, with entries
rnk!= 1 n
#
m,km=k
2X!m−Xkm3
which corresponds to a comparison of the average payoff obtained on the dates where kwas chosen, to the payoff for some other fixed component,!, on these dates.
Definition A strategyσ satisfies internal consistency if, for every process{Xm}: maxk,! rnk!+−→0 a.s., asn−→ ∞.
Using an analog of Proposition2.2it is enough to show, for example, that the quantities
Qn(k, !)=
#n m=1
νmk
2X!m−Xkm3
are of the ordero(n), whereνmk stands for the conditional probability of playingkat stagemgiven the past history.
We first prove a lemma on invariant measures.
Lemma 5.1 Given a matrix A∈IRK×K, letψ (A)∈∆(K)be the unique solution of
ψ (A)k#
!
expA(k, !)=#
!
ψ (A)!expA(!,k).
Thenψis Lipschitz continuous.
Proof Let||.||denote the maximal norm and let||A−B|| =ρ. Then expB(k, !)e−ρ ≤expA(k, !)≤expB(k, !)eρ ∀k, !.
Similarly withm(A)=!
k,!expA(k, !), one hasm(B)e−ρ ≤m(A)≤m(B)eρ for allk, !hence
expB(k, !)
m(B) e−2ρ ≤ expA(k, !)
m(A) ≤ expB(k, !)
m(B) e2ρ ∀k, !.
Sinceψ (A)is the unique invariant measure of the transition matrix with coefficients M(k, !)= expm(A)A(k,!) fork%=!, one deduces, by Theorem 7.2 (Corollary) in [18]
ψ (B)e−4Kρ ≤ψ (A)≤e4Kρψ (B).
Hence forρsmall enough||ψ (A)−ψ (B)|| ≤5Kρ. The constant being independent of A,Bthe result obtains: there existsLwith
||ψ (A)−ψ (B)|| ≤L||A−B||.
,-
The continuous time approach is as follows. GivenXt, let
St(k, !)= /t
0
µks
2Xs!−Xsk3 ds
whereµs =ψ (Ss)is the invariant measure associated to expSs andµks is the condi- tional probability of playingkat times.
Proposition 5.3 Under the above procedure, there exists a constant C such that St(k, !)≤C ∀k, !∈ K, ∀t ≥0.
Proof Define
At =#
k,!
expSt(k, !)
so that
˙ At =#
k,!
expSt(k, !)µkt 2
Xt!−Xtk3
=0
since the coefficient ofXkt is precisely
#
!
expSt(k, !)µkt −#
!
expSt(!,k)µ!t =0
becauseµt =ψ (St). Hence At = A0= K2and eachSt(k, !)is uniformy bounded
from above. ,-
This property corresponds to conditional expected internal consistency.
The discrete procedure E W(A) is defined inductively through νm+1=ψ (AQm).
Proposition 5.4 For4LT =logn, the discrete procedure E W(T/n)satisfies:
Qn/n≤M/logn hence internal consistency follows.
Proof Given a discrete process{Xm},m = 1, . . . ,n,let{Yt}be the associated step process on[0,T], as in Sect.4. Hence one has, inductively
St(k, !)= /t
0
ψk(Ss)
2Ys!−Ysk3 ds.
Define alsoLm =(T/n)Qmwhich corresponds toE W(T/n)for{Xm}, hence induc- tively
Qm(k, !)=Qm−1(k, !)+ψk((T/n)Qm−1)2
X!m−Xkm3 and letLt be the continuous linear interpolation: fort ∈ [mT/n, (m+1)T/n)
Lt(k, !)=(T/n)4
Qm(k, !)+(t−mT/n)ψk((T/n)Qm)2
Xm!+1−Xmk+137 . ThusLt satisfies
Lt(k, !)−LmT/n(k, !)= /t
mT/n
ψk(LmT/n)2
Ys!−Ysk3 ds.
Hence one has
St(k, !)−SmT/n(k, !)−*
Lt(k, !)−LmT/n(k, !)+
= /t
mT/n
4
ψk(Ss)−ψk(LmT/n)
7 2Ys!−Ysk3 ds.
Letρ=4LT/n≥2 max{||ψ (Lt)−ψ (LmT/n)||;mT/n≤t ≤(m+1)T/n}. Then, using Lemma5.1.
||St−Lt|| ≤ /t
0
2L||Ss−Ls||ds+ρT
from which one deduces, by Gronwall’s lemma, that fort ∈ [0,T]:
||St−Lt|| ≤ρTexp(2LT).
Recall thatSt is bounded above byChence
LT(k, !)/T ≤C/T +(4LT/n)exp(2LT).
It then follows, choosing 4LT =lognthat Qn/n≤M/logn.
,-
6 Comments
Let us compare several discrete procedures leading to consistency and their continuous counterparts.
First consider procedures related to exponential evaluation.
The current one(E W)builds on a state parameterzm=Smfor which the updating rule is time independent: ifzm is the state at stagemandξm+1the current observation at stagem+1, the new state iszm+1=h(zm, ξm+1), (for exampleSm+1=Sm+ξm+1).
This applies to a family of procedures, see e.g. [8,9], however the precision of the procedure depends on a parameter that is a function of the lenght of the process: it is not uniform. In our case the optimal value of the parameter to handle then-stage problem is 1/√n. Hence to obtain consistency the parameter has to be time dependent.
Concerning the continuous time embedding, it has to be performed on a compact interval. Note that in fact in our analysis, the approximation is through a sequence of longer and longer intervals, of sizeT =√n, with finer and finer discretization of meshT/n=1/√n.
Another exponential evalution, but where the state variable is the average sumS¯m = Sm/m is used in smooth fictitious play, see [11,12]. Explicitly pm is given by
!*(
exp((1/,)S¯mk))
k
+and the corresponding procedure satisfies (approximate) con- sistency. Note that the udpdating rule requires the knowledge of the current stage;
namely zm+1 = m1+1(mzm +ξm+1) = hm(zm, ξm+1). However this equation is a special case of discrete dynamics of the form
zm+1−zm =am+1F(zn, ξm+1) with!
an = +∞,!
an2 <+∞andF with bounded range. Hence, see [3,4] the asymptotics of this dynamics can be studied using the asymptotics of the continuous time process
˙
z(t)=(∈)G(z(t))
whereGsatisfies E(F(zm, ξm+1)|Hm)=(∈)G(zm). Thus there is no need to adapt the coefficient to the lenght of the process.
The same study through continuous time processes applies to regret dynamics [15,16] based on approachability [5] which satisfy consistency as well, but do depend on the past behavior of the predictor, see [4].
It is interesting to notice that Blackwell’s original procedure, based onL2norm, when applied to the current framework (where one approaches an orthant) satisfies positive homogeneity of degree zero. Hencep(hm)can be defined as a function of the average regretrmor of the total regretmrm.
For extension to potential based algorithms, where the same machinery will work, see [7,15].
Acknowledgments Part of this work was done while visting IMPA (Rio, Brazil) and CMM (Santiago, Chile). The support of these institutions is gratefully acknowledged.
References
1. Auer, P., Cesa-Bianchi, N., Freund, Y., Shapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. Proceedings of the 36th annual IEEE symposium on foundations of computer science, pp. 322–331 (1995)
2. Auer, P., Cesa-Bianchi, N., Freund, Y., Shapire, R.E.: The non-stochastic multiarmed bandit prob- lem. SIAM J. Comput.32, 48–77 (2002)
3. Benaim, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J.
Opt. Control44, 328–348 (2005)
4. Benaim, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. Part II:
applications. Math. Oper. Res.31, 673–695 (2006)
5. Blackwell, D.: An analog of the minmax theorem for vector payoffs. Pac. J. Math.6, 1–8 (1956) 6. Brezis, H., Lions, P.-L.: Produits infinis de résolvantes. Israel J. Math.29, 329–345 (1978)
7. Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Mach.
Learn.51, 239–261 (2003)
8. Foster, D., Vohra, R.: Asymptotic calibration. Biometrika85, 379–390 (1998)
9. Foster, D., Vohra, R.: Regret in the on-line decision problem. Games Econ. Behav.29, 7–35 (1999) 10. Freund, Y., Shapire, R.E.: Adaptive game playing using multiplicative weights. Games Econ. Be-
hav.29, 79–103 (1999)
11. Fudenberg, D., Levine, D.K.: Consistency and cautious fictitious play. J. Econ. Dyn. Control19, 1065–
1089 (1995)
12. Fudenberg, D., Levine, D.K.: Conditional universal consistency. Games Econ. Behav. 29, 104–
130 (1999)
13. Hall, P., Heyde, C.: Martingale limit theory and its applications. Academic, London (1980) 14. Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibria. Economet-
rica68, 1127–1150 (2000)
15. Hart, S., Mas-Colell, A.: A general class of adaptive strategies. J. Econ. Theory98, 26–54 (2001) 16. Hart, S., Mas-Colell, A.: Regret-based continuous time dynamics. Games Econ. Behav.45, 375–
394 (2003)
17. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inform. Comput.108, 212–
261 (1994)
18. Seneta, E.: Non-negative matrices and Markov chains. Springer, Heidelberg (1981)