Exponential weight algorithm in continuous time

(1)

FULL LENGTH PAPER

Exponential weight algorithm in continuous time

Sylvain Sorin

Received: 7 May 2005 / Accepted: 25 July 2006

Abstract The exponential weight algorithm has been introduced in the framework of discrete time on-line problems. Given an observed process{Xm}^m=1,2,...the input at stage m+1 is an exponential function of the sum Sm = !m

!=1X!. We define the analog algorithm for a continuous time process Xt and prove similar properties in terms of external or internal consistency. We then deduce results for discrete time from their counterpart in continuous time. Finally we compare this approach to another continuous time approximation of a discrete time exponential algorithm based on the average sumS_m/m.

Keywords Exponential weight algorithm·Continuous time·Consistency Mathematics Subject Classification (2000) 68W·91A·90C

1 Presentation

We consider the exponential weight algorithm and its consistency properties in the basic prediction problem for individual sequences. The purpose of the paper is to define a continuous time version of this algorithm and to prove that it satisfies analogous consistency properties. In fact this proofs are rather short and one of the main advantages of this approach is to deduce independently discrete time results from their continuous

S. Sorin

Equipe Combinatoire et Optimisation, UFR 929, Université P. et M. Curie-Paris 6, 175 Rue du Chevaleret, 75013 Paris, France

S. Sorin (

B

⁾

Laboratoire d’Econométrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris, France e-mail: [email protected]

(2)

time counterpart. This interface between discrete time and continuous time processes has been very effective for a long period, see e.g. [6]. However the approach is quite different from the usual (asymptotic) approximation through differential equations (or inclusions), see e.g. [3,4] for related examples. This is due to the choice of the state variable of the process and is discussed in the last Sect.6.

The model is as follows:{Xn}denotes a sequence of vectors in[0,1]^K. At each stagen, a predictor having observed the past realizations X₁, . . . ,X_n₋₁, chooses a componentk_ninK. The corresponding outcome isx_n=X_n^kⁿ. Denote the past history (X₁,k₁, . . . ,X_n₋₁,k_n₋₁)byh_n₋₁, and the inducedσ-algebra byHⁿ−1. An algorithm or a strategyσin the prediction problem is specified by the choice ofpn(hn−1)∈∆(K) (the simplex ofIR^K), which is the law ofkngiven the past historyhn−1. Note that the law ofXnmay also depend onhn−1.

Given a sequence{um}, letu¯n= ¹n

!n

m=1umdenote the average of thenfirst terms.

The average external regret evaluation at stagenis the vectorrn = {r_n^k}^k∈K defined by:

r_n^k = ¯X_n^k− ¯xn.

It compares the actual (average) payoff to each payoff corresponding to a constant component choice, see [8,9,11].

Definition A strategyσ satisfies external consistency if, for every process{Xm}: maxk∈Kr_n^k⁺−→0 a.s., as n−→ ∞.

Remarks (1) In the framework of a repeated finiteI-person game defined by action setsJⁱ,i ∈ I, letM : J ="

i Jⁱ →IRbe the payoff function of player 1, the predictor. Here,K = J¹andXnis the vectorM(.,j_n⁻¹)∈IR^Kcorresponding to the payoff induced by the profile of actionsj_nⁱby each playeri,i%=1 at stagen.

UsuallyMis known, and j_n⁻¹announced to player 1 who then knows the vector Xn. In the current situation one does not assumeMknown by player 1 (not even Jⁱ,i%=1), but only his own action setK, and the fact that all payoffs belong to [0,1]: he is just told the vectorXn.

(2) To obtain robust results no assumption is made on the sequence{Xn}: the predictor is not “Bayesian”.

The content of the paper is the following. Section2recalls the main results concerning the discrete time exponential weight algorithm. Section3introduces the continuous counterpart and its properties. Section4deduces, from the continuous time results, the discrete time analogs: this gives an alternative simple proof of properties of the expo- nentional weight algorithm. In Sect.5two extensions are described: restriction on the information and internal consistency. Finally Sect.6 discusses the relation between discrete and continuous time, for algorithms based on the average sumSn/n.

(3)

2 The exponential weight algorithm: discrete time

Notation Given a vectorx%=0 inIR^K₊,![x]denotes its normalization in the simplex

∆(K):

![x]^k = x^k

!K j=1x^j.

Definition The discrete exponential weight(E W)algorithm (see e.g. [1,10,17]) with positive parameter A,E W(A), is defined by p_n₊₁(h_n)= p_n₊₁ =![{p^k_ne^AX^kⁿ}k]or equivalently

p^k_n₊₁= ex p(A!n m=1X^k_m)

!

jex p(A!n m=1X_m^j)

.

Recall that pn+1describes the law of the random choicekn+1.

An alternative definition is E W^∗(α),where αis a positive parameter, and pn+1 =

![{(1+α)^S^kⁿ}k]withS_n =!n m=1X_m.

For sake of completeness and to compare with the continuous time argument, we reproduce the basic property, following [1].

Proposition 2.1 σ (n)=E W^∗(1/√n)satisfies conditional expected external consis- tency in the following sense:there exists a constant M such that,for any component k and any process{X_m}:

¯ X^k_n−1

n

#n m=1

Eσ (n)(xm|H^m−1)≤ M/√

n. (1)

Proof LetWn =!

k(1+α)^S^kⁿ, hence (recall that 0≤ X^k_m ≤1) W_n₊₁

W_n =#

k

(1+α)^S^kⁿ(1+α)^X^kⁿ+1

W_n

=#

kp^k_n₊₁(1+α)^X^kⁿ+1

≤#

kp^k_n₊₁(1+αX_n^k₊₁)

=1+α)pn+1,Xn+1*. It follows that

log

$Wn

W₀

%

≤α

#n m=1

)p_m,X_m*

(4)

and

#n m=1

)pm,Xm* ≥ 1

α(S_n^klog(1+α)−logK) sinceWn ≥(1+α)^Sⁿ^k, for allkinK.

Thus forαsmall enough one has 1

n

#n m=1

)pm,Xm* ≥ ¯X_n^k(1−α/2)−logK/αn.

The choice ofα=1/√nleads to maxk X¯^k_n−1

n

#n m=1

)p_m,X_m* ≤M/√n

for some constantM.

Note that)pm,Xm* = Eσ (n)(xm|Hm−1)so that the above inequality gives the required

result. ,-

Obviously this implies:

Corollary 2.1 σ (n)satisfies expected consistency:

Eσ (n)&

r^k_n⁺'

→0 as n→ ∞.

One way to obtain almost sure convergence to 0 is to use the next basic martingale property (see e.g. [13]).

Proposition 2.2 Let U_m be a sequence of uniformly bounded (even in L²) random variables on a probability space (Ω,F,P), adapted to a filtration F^m. Assume E(U_m|Fm−1)=0, then

1 n

#n m=1

U_m →0 as n→ ∞.

Let nowσ be defined as follows: given a sequencen_m going to∞, useσ (n₁)for K₁ ≥ n₂/√n₁blocks of sizen₁(where the entry on blockmfor runningσ (n₁)is Xmn₁+!, !=0, ...,n₁−1) then inductivelyσ (nm)for Km ≥nm+1/√n_m blocks of sizenm. Propositions2.1and2.2thus imply the following result:

Theorem 2.1 σsatisfies external consistency.

Remarks (1) The optimal choice of the parameter, Aorα, in Proposition2.1, is a function of the lengthnof the process (see the discussion in Sect.6).

(2) To implement the algorithm the actual past play of the predictor (namely the sequence{km}) is not used. (This property also holds true for other algorithms like Fictitious Play or Smooth Fictitious Play [11]).

(5)

(3) Invariance by translation: one can use the vector of total payoffs( S_n^k)

k or the vector of total regrets ( !n

m=1r_m^k = S_n^k −!n m=1x_m)

k to define p_n, since p_n₊₁ = !*(

p_n^ke^A(^Xⁿ^k⁺^Yⁿ⁾)

k

+for any Y_n ∈ IR. This property is specific to the exponential weight and is not shared by regret procedures based on Blackwell’s approachability [5,14]).

(4) It is enough to satisfy a property like: for alln, there existsσ (n)such that

#n m=1

&

X_m^k − )p_m,X_m*'

≤o(n)

(uniformly in{Xm}) to obtain Theorem2.1by definingσthrough concatenation as above and then using Proposition2.2.

3 The multiplicative weight algorithm: continuous time

Given a measurable processX_t,t ≥0, with values in[0,1]^K, letS_t =,t

0X_sds =tX¯_t. Definition A measurable processpt ∈∆(K)is a continuous time exponential weigth algorithm(CT E W)if it satisfies

p_t =!*(

exp&

S_t^k')

k

+. LetW_t =!

kexp&

S_t^k'

so that p_t^kW_t =exp&

S_t^k' .

Theorem 3.1 Conditional expected external consistency holds for CTEW in the sense that, for any T >0and any k:

1 T



 /T

0

X^k_sds− /T

0

)ps,Xs*ds



≤ logK T . Proof One has

˙ W_t =#

k

exp2 S_t^k3

X^k_t =#

k

W_t p^k_t X^k_t = )p_t,X_t*W_t. Hence

W_t =W₀exp



 /t

0

)p_s,X_s*ds



.

Thus,W_t ≥exp&

S_t^k'

for everyk, implies:

/t

0

)ps,Xs*ds≥ /t

0

X_s^kds−logW₀.

,-

(6)

Remark The interpretation is that the conditional law of the choicekt at timet, given the past, is pt. Letxt = X^k_t^t be the outcome at timet, then the average regret vector at timetis

r_t^k = ¯X_t^k− ¯x_t and the expectation ofx¯t is given by

E(x¯t)=E



1 t

/t

0

)ps,Xs*ds



.

The previous proof holds as well while replacing the integralSt of the process by the integral of the (conditional expected) regretRt defined by

R_t^k= /t

0

2X^k_s − )ps,Xs*3 ds.

Note that p_t satisfies also

pt =! 45exp2

R_t^k36

k

7 . LetV_t =!

kexp&

R^k_t' . Then:

˙ Vt =#

k

exp2 R_t^k3 2

X^k_t − )pt,Xt*3

=#

k

Vt p_t^k 2

X_t^k− )pt,Xt*3

=0.

HenceVt is constant andVt ≥expR_t^k, for everyk, implies /t

0

)ps,Xs*ds≥ /t

0

X_s^kds−logV₀.

The same computation as above extends to the following framework:

Proposition 3.1 Let P be aC¹function from IR^K to IR with∇P ≥0and%=0, such that x^k → +∞, for some component k, implies P(x)→ +∞. If pt satisfies

p^k_t =

∇^kP$5,t 0

2X_s^j− )ps,Xs*3 ds6

j

%

!

i∇ⁱP$5,t 0

2X_s^j− )p_s,X_s*3 ds6

j

%

then conditional expected external consistency holds.

(7)

Proof LetRt be defined, as above, by

R_t^k= /t

0

2X^k_s − )ps,Xs*3 ds.

One obtains, withMt =!

i∇ⁱP(Rt) d

dtP(R_t)=#

k

∇^kP(R_t)2

X^k_t − )p_t,X_t*3

=Mt

#

k

p^k_t 2

X^k_t − )pt,Xt*3

=0

Hence P(R(t)) = P(0)so that each R_t^k is bounded from above and conditional

expected external consistency follows. ,-

The previous case corresponds to P(x)= !

kexpx^k; for similar “potential" approaches see [7,15].

4 Convergence

Given a discrete process{Xm}and a correspondingE W algorithm{pm}the aim is to get a bound on

1 n

#n m=1

2X_m^k − )pm,Xm*3

from an evaluation of

1 T

/T

0

2Y_s^k− )q_s,Y_s*3 ds

whereY_tis a continuous process constructed fromX_mandq_t is aCT E Walgorithm associated toY_t.

Proposition 4.1 Given a discrete time process{Xm} ∈[ 0,1]^K,m =1, . . . ,n,and T >0, there exists a measurable continuous time process{Yt} ∈[0,1]^K,t ∈ [0,T], such that

1 n

#n m=1

X^k_m = 1 T

/T

0

Y_t^kdt

(8)

and 1 n

#n m=1

)p_m,X_m*e⁻^T^/n≤ 1 T

/T

0

)q_t,Y_t*dt≤ 1 n

#

m

)p_m,X_m*e^T^/n

where{p_m}is an E W(T/n)associated to{X_m}and q_tis a CTEW associated to{Y_t}. Proof LetT >0. Divide the interval[0,T]intonsubintervals with equal lenghtδ= T/nand define, from the discrete time sequence{Xm},m=1, . . . ,n, the continuous time process{Yt}on[0,T]by the step functionYt =Xmfort ∈ [(m−1)T/n,mT/n).

Obviously

1 n

#m

!=1

X_!^k= 1 T

mT//n

0

Y_t^kdt.

Let{ ˆpt}be the continuous time process defined from a discrete one{pm}as above : ˆ

pt = pm fort ∈ [(m−1)T/n,mT/n). Clearly also 1

n

#n m=1

)pm,Xm* = 1 T

/T

0

) ˆpt,Yt*dt.

It remains to handle the difference between{ ˆp_t}and{q_t}which is aCT E Wassociated to{Y_t}. For this choose{p_m}as the E W(T/n)associated to {X_m}. Then, for t ∈ [(m−1)T/n,mT/n)one has on the one hand

ˆ

p^k_t = p_m^k = Wˆ_t^k Wˆt

withWˆ_t^k =exp*

(T/n)!m−1 u=1 X_u^k+

=exp& ,(m−1)T/n 0 Y_s^kds'

andWˆ_s =!

kWˆ_s^k and on the other

q_t^k = W_t^k W_t withW_t^k =exp& ,t

0Y_s^kds'

andW_t =!

kW_t^k. Thus, since 0≤Y_s^k ≤1, one deduces

ˆ

W_s^k ≤W_s^k ≤ ˆW_s^ke^δ hence also

Wˆs ≤Ws ≤ ˆWse^δ

(9)

so that

ˆ

p_s^ke⁻^δ≤q_s^k ≤ ˆp_s^ke^δ and



1 T

/T

0

) ˆps,Ys*ds



e⁻^δ≤ 1 T

/T

0

)ps,Ys*ds≤



1 T

/T

0

) ˆqs,Ys*ds



e^δ

as well. ,-

We thus obtain an alternative proof of Proposition2.1that we recall:

Lemma 4.1 There exists a EW algorithm satisfing 1

n

#n m=1

2X^k_m− )pm,Xm*3

≤ Mn⁻^1/2.

Proof Givenn, chooseT =√nso that:

- the bound in the continuous version is of the order 1/T =1/√n 1

T /T

0

2Y_t^k− )qt,Yt*3

dt≤ logK

√n

- and the error term with the discrete approximation of the order ofe^δ−1∼δ= T/n =1/√n

1 n

#n m=1

)p_m,X_m* ≥ 1 T



 /T

0

)q_t,Y_t*dt



−L/√n

so that the result follows from Theorem3.1and Proposition4.1.

,- Remark The choice ofT =√n amounts to takingE W(1/√n), hence as in Sect.3 the procedure is not uniform.

5 Extensions

The same analysis applies to similar setups. We consider briefly two of them: the case of partial information where only the outcomexnis known and the internal consistency criteria.

(10)

5.1 Partial information

Consider the framework of Sect.1but where the vector Xnis not revealed ex-post, only the actual chosen componentxnis announced. The aim is to define an algorithm having similar properties as in Sect.2but depending only on the available information.

We follow [1,2].

In discrete time, define inductively the vectorXˆ_nby

Xˆ_n^k= 8_Xk

p^k_nn ifk=k_n 0 otherwise.

Letγ ∈(0,1)be a parameter to be tuned later on and define firstSˆn=!n

m=1γXˆm/K, then

ˆ

p_n^k₊₁= exp2 ˆ S_n^k3

!jexp2 ˆ Sn^j

3.

Finally the strategy at stagen+1 is

p^k_n=(1−γ )pˆ^k_n+ γ K. Note that

E2 ˆ

X_n^k|X₁, . . . ,Xn

3=X^k_n

and

xn = )pn,Xˆn* hence it is enough, using Proposition2.2to bound

1 n

#n m=1

2Xˆ^k_m− )pm,Xˆm*3 .

The analysis in continuous time is as follows.

Given{X_s}, let{ ˆX_s}and{p_s}satisfy /t

0

x_sds= /t

0

p^k_s^sXˆ_s^k^sds = /t

0

)p_s,Xˆ_s*ds

p^k_s =(1−γ )pˆ^k_s +γ /K

(11)

andpˆsbe adapted toXˆs as in the usualCT E W. In particular one obtains /t

0

x_sds ≥(1−γ ) /t

0

) ˆp_s,Xˆ_s*ds

But, as in Sect.3one has for allk /t

0

) ˆps,Xˆs*ds≥ /t

0

ˆ

X_s^kds−C

hence finally one deduces:

Proposition 5.1

/t

0

x_sds≥(1−γ ) /t

0

ˆ

X^k_sds−C.

The corresponding discrete inequality is now 1

n

#n m=1

xm ≥ 1 n

9

(1−γ )e⁻^δK/γ

#n m=1

Xˆ^k_m :

−C/T

withδ =T/n. The choice ofT =1/γ =n^1/3leads to:

Proposition 5.2 There exists M such that 1

n

#n m=1

2Xˆ^k_m− )pm,Xˆm*3

≤ Mn⁻^1/3.

5.2 Internal consistency

Given a historyhn, the average internal regret evaluation at stagenis defined by the K×K matrixrn, with entries

r_n^k!= 1 n

#

m,k_m=k

2X^!_m−X^k_m3

which corresponds to a comparison of the average payoff obtained on the dates where kwas chosen, to the payoff for some other fixed component,!, on these dates.

(12)

Definition A strategyσ satisfies internal consistency if, for every process{Xm}: maxk,! r_n^k!⁺−→0 a.s., asn−→ ∞.

Using an analog of Proposition2.2it is enough to show, for example, that the quantities

Qn(k, !)=

#n m=1

ν_m^k

2X^!_m−X^k_m3

are of the ordero(n), whereν_m^k stands for the conditional probability of playingkat stagemgiven the past history.

We first prove a lemma on invariant measures.

Lemma 5.1 Given a matrix A∈IR^K^×^K, letψ (A)∈∆(K)be the unique solution of

ψ (A)^k#

!

expA(k, !)=#

!

ψ (A)^!expA(!,k).

Thenψis Lipschitz continuous.

Proof Let||.||denote the maximal norm and let||A−B|| =ρ. Then expB(k, !)e⁻^ρ ≤expA(k, !)≤expB(k, !)e^ρ ∀k, !.

Similarly withm(A)=!

k,!expA(k, !), one hasm(B)e⁻^ρ ≤m(A)≤m(B)e^ρ for allk, !hence

expB(k, !)

m(B) e⁻^2ρ ≤ expA(k, !)

m(A) ≤ expB(k, !)

m(B) e^2ρ ∀k, !.

Sinceψ (A)is the unique invariant measure of the transition matrix with coefficients M(k, !)= ^expm(A)^A(k,!) fork%=!, one deduces, by Theorem 7.2 (Corollary) in [18]

ψ (B)e⁻^4Kρ ≤ψ (A)≤e^4Kρψ (B).

Hence forρsmall enough||ψ (A)−ψ (B)|| ≤5Kρ. The constant being independent of A,Bthe result obtains: there existsLwith

||ψ (A)−ψ (B)|| ≤L||A−B||.

,-

(13)

The continuous time approach is as follows. GivenXt, let

St(k, !)= /t

0

µ^k_s

2X_s^!−X_s^k3 ds

whereµs =ψ (Ss)is the invariant measure associated to expSs andµ^k_s is the conditional probability of playingkat times.

Proposition 5.3 Under the above procedure, there exists a constant C such that St(k, !)≤C ∀k, !∈ K, ∀t ≥0.

Proof Define

At =#

k,!

expSt(k, !)

so that

˙ A_t =#

k,!

expS_t(k, !)µ^k_t 2

X_t^!−X_t^k3

=0

since the coefficient ofX^k_t is precisely

#

!

expSt(k, !)µ^k_t −#

!

expSt(!,k)µ^!_t =0

becauseµt =ψ (St). Hence At = A₀= K²and eachSt(k, !)is uniformy bounded

from above. ,-

This property corresponds to conditional expected internal consistency.

The discrete procedure E W(A) is defined inductively through ν_m₊₁=ψ (AQ_m).

Proposition 5.4 For4LT =logn, the discrete procedure E W(T/n)satisfies:

Qn/n≤M/logn hence internal consistency follows.

Proof Given a discrete process{Xm},m = 1, . . . ,n,let{Yt}be the associated step process on[0,T], as in Sect.4. Hence one has, inductively

St(k, !)= /t

0

ψ^k(Ss)

2Y_s^!−Y_s^k3 ds.

(14)

Define alsoLm =(T/n)Qmwhich corresponds toE W(T/n)for{Xm}, hence inductively

Q_m(k, !)=Q_m₋₁(k, !)+ψ^k((T/n)Q_m₋₁)2

X^!_m−X^k_m3 and letLt be the continuous linear interpolation: fort ∈ [mT/n, (m+1)T/n)

L_t(k, !)=(T/n)4

Q_m(k, !)+(t−mT/n)ψ^k((T/n)Q_m)2

X_m^!₊₁−X_m^k₊₁37 . ThusL_t satisfies

L_t(k, !)−L_mT_/n(k, !)= /t

mT/n

ψ^k(L_mT/n)2

Y_s^!−Y_s^k3 ds.

Hence one has

St(k, !)−SmT/n(k, !)−*

Lt(k, !)−LmT/n(k, !)+

= /t

mT/n

4

ψ^k(Ss)−ψ^k(LmT/n)

7 2Y_s^!−Y_s^k3 ds.

Letρ=4LT/n≥2 max{||ψ (L_t)−ψ (L_mT_/n)||;mT/n≤t ≤(m+1)T/n}. Then, using Lemma5.1.

||S_t−L_t|| ≤ /t

0

2L||S_s−L_s||ds+ρT

from which one deduces, by Gronwall’s lemma, that fort ∈ [0,T]:

||St−Lt|| ≤ρTexp(2LT).

Recall thatS_t is bounded above byChence

LT(k, !)/T ≤C/T +(4LT/n)exp(2LT).

It then follows, choosing 4LT =lognthat Q_n/n≤M/logn.

,-

(15)

6 Comments

Let us compare several discrete procedures leading to consistency and their continuous counterparts.

First consider procedures related to exponential evaluation.

The current one(E W)builds on a state parameterzm=Smfor which the updating rule is time independent: ifz_m is the state at stagemandξm+1the current observation at stagem+1, the new state isz_m₊₁=h(z_m, ξ_m₊₁), (for exampleS_m₊₁=S_m+ξ_m₊₁).

This applies to a family of procedures, see e.g. [8,9], however the precision of the procedure depends on a parameter that is a function of the lenght of the process: it is not uniform. In our case the optimal value of the parameter to handle then-stage problem is 1/√n. Hence to obtain consistency the parameter has to be time dependent.

Concerning the continuous time embedding, it has to be performed on a compact interval. Note that in fact in our analysis, the approximation is through a sequence of longer and longer intervals, of sizeT =√n, with finer and finer discretization of meshT/n=1/√n.

Another exponential evalution, but where the state variable is the average sumS¯m = Sm/m is used in smooth fictitious play, see [11,12]. Explicitly pm is given by

!*(

exp((1/,)S¯_m^k))

k

+and the corresponding procedure satisfies (approximate) consistency. Note that the udpdating rule requires the knowledge of the current stage;

namely zm+1 = _m¹₊₁(mzm +ξm+1) = hm(zm, ξm+1). However this equation is a special case of discrete dynamics of the form

z_m₊₁−z_m =a_m₊₁F(z_n, ξ_m₊₁) with!

an = +∞,!

a_n² <+∞andF with bounded range. Hence, see [3,4] the asymptotics of this dynamics can be studied using the asymptotics of the continuous time process

˙

z(t)=(∈)G(z(t))

whereGsatisfies E(F(z_m, ξ_m₊₁)|Hm)=(∈)G(z_m). Thus there is no need to adapt the coefficient to the lenght of the process.

The same study through continuous time processes applies to regret dynamics [15,16] based on approachability [5] which satisfy consistency as well, but do depend on the past behavior of the predictor, see [4].

It is interesting to notice that Blackwell’s original procedure, based onL²norm, when applied to the current framework (where one approaches an orthant) satisfies positive homogeneity of degree zero. Hencep(hm)can be defined as a function of the average regretrmor of the total regretmrm.

For extension to potential based algorithms, where the same machinery will work, see [7,15].

Acknowledgments Part of this work was done while visting IMPA (Rio, Brazil) and CMM (Santiago, Chile). The support of these institutions is gratefully acknowledged.

(16)

References

1. Auer, P., Cesa-Bianchi, N., Freund, Y., Shapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. Proceedings of the 36th annual IEEE symposium on foundations of computer science, pp. 322–331 (1995)

2. Auer, P., Cesa-Bianchi, N., Freund, Y., Shapire, R.E.: The non-stochastic multiarmed bandit problem. SIAM J. Comput.32, 48–77 (2002)

3. Benaim, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J.

Opt. Control44, 328–348 (2005)

4. Benaim, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. Part II:

applications. Math. Oper. Res.31, 673–695 (2006)

5. Blackwell, D.: An analog of the minmax theorem for vector payoffs. Pac. J. Math.6, 1–8 (1956) 6. Brezis, H., Lions, P.-L.: Produits infinis de résolvantes. Israel J. Math.29, 329–345 (1978)

7. Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Mach.

Learn.51, 239–261 (2003)

8. Foster, D., Vohra, R.: Asymptotic calibration. Biometrika85, 379–390 (1998)

9. Foster, D., Vohra, R.: Regret in the on-line decision problem. Games Econ. Behav.29, 7–35 (1999) 10. Freund, Y., Shapire, R.E.: Adaptive game playing using multiplicative weights. Games Econ. Be-

hav.29, 79–103 (1999)

11. Fudenberg, D., Levine, D.K.: Consistency and cautious fictitious play. J. Econ. Dyn. Control19, 1065–

1089 (1995)

12. Fudenberg, D., Levine, D.K.: Conditional universal consistency. Games Econ. Behav. 29, 104–

130 (1999)

13. Hall, P., Heyde, C.: Martingale limit theory and its applications. Academic, London (1980) 14. Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibria. Economet-

rica68, 1127–1150 (2000)

15. Hart, S., Mas-Colell, A.: A general class of adaptive strategies. J. Econ. Theory98, 26–54 (2001) 16. Hart, S., Mas-Colell, A.: Regret-based continuous time dynamics. Games Econ. Behav.45, 375–

394 (2003)

17. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inform. Comput.108, 212–

261 (1994)

18. Seneta, E.: Non-negative matrices and Markov chains. Springer, Heidelberg (1981)