• Aucun résultat trouvé

Exponential weight algorithm in continuous time

N/A
N/A
Protected

Academic year: 2022

Partager "Exponential weight algorithm in continuous time"

Copied!
16
0
0

Texte intégral

(1)

FULL LENGTH PAPER

Exponential weight algorithm in continuous time

Sylvain Sorin

Received: 7 May 2005 / Accepted: 25 July 2006

© Springer-Verlag 2007

Abstract The exponential weight algorithm has been introduced in the framework of discrete time on-line problems. Given an observed process{Xm}m=1,2,...the input at stage m+1 is an exponential function of the sum Sm = !m

!=1X!. We define the analog algorithm for a continuous time process Xt and prove similar properties in terms of external or internal consistency. We then deduce results for discrete time from their counterpart in continuous time. Finally we compare this approach to another continuous time approximation of a discrete time exponential algorithm based on the average sumSm/m.

Keywords Exponential weight algorithm·Continuous time·Consistency Mathematics Subject Classification (2000) 68W·91A·90C

1 Presentation

We consider the exponential weight algorithm and its consistency properties in the ba- sic prediction problem for individual sequences. The purpose of the paper is to define a continuous time version of this algorithm and to prove that it satisfies analogous con- sistency properties. In fact this proofs are rather short and one of the main advantages of this approach is to deduce independently discrete time results from their continuous

S. Sorin

Equipe Combinatoire et Optimisation, UFR 929, Université P. et M. Curie-Paris 6, 175 Rue du Chevaleret, 75013 Paris, France

S. Sorin (

B

)

Laboratoire d’Econométrie, Ecole Polytechnique, 1 rue Descartes, 75005 Paris, France e-mail: [email protected]

(2)

time counterpart. This interface between discrete time and continuous time processes has been very effective for a long period, see e.g. [6]. However the approach is quite different from the usual (asymptotic) approximation through differential equations (or inclusions), see e.g. [3,4] for related examples. This is due to the choice of the state variable of the process and is discussed in the last Sect.6.

The model is as follows:{Xn}denotes a sequence of vectors in[0,1]K. At each stagen, a predictor having observed the past realizations X1, . . . ,Xn1, chooses a componentkninK. The corresponding outcome isxn=Xnkn. Denote the past history (X1,k1, . . . ,Xn1,kn1)byhn1, and the inducedσ-algebra byHn1. An algorithm or a strategyσin the prediction problem is specified by the choice ofpn(hn1)∈∆(K) (the simplex ofIRK), which is the law ofkngiven the past historyhn1. Note that the law ofXnmay also depend onhn1.

Given a sequence{um}, letu¯n= 1n

!n

m=1umdenote the average of thenfirst terms.

The average external regret evaluation at stagenis the vectorrn = {rnk}kK defined by:

rnk = ¯Xnk− ¯xn.

It compares the actual (average) payoff to each payoff corresponding to a constant component choice, see [8,9,11].

Definition A strategyσ satisfies external consistency if, for every process{Xm}: maxkKrnk+−→0 a.s., as n−→ ∞.

Remarks (1) In the framework of a repeated finiteI-person game defined by action setsJi,iI, letM : J ="

i JiIRbe the payoff function of player 1, the predictor. Here,K = J1andXnis the vectorM(.,jn1)∈IRKcorresponding to the payoff induced by the profile of actionsjniby each playeri,i%=1 at stagen.

UsuallyMis known, and jn1announced to player 1 who then knows the vector Xn. In the current situation one does not assumeMknown by player 1 (not even Ji,i%=1), but only his own action setK, and the fact that all payoffs belong to [0,1]: he is just told the vectorXn.

(2) To obtain robust results no assumption is made on the sequence{Xn}: the pre- dictor is not “Bayesian”.

The content of the paper is the following. Section2recalls the main results concern- ing the discrete time exponential weight algorithm. Section3introduces the continuous counterpart and its properties. Section4deduces, from the continuous time results, the discrete time analogs: this gives an alternative simple proof of properties of the expo- nentional weight algorithm. In Sect.5two extensions are described: restriction on the information and internal consistency. Finally Sect.6 discusses the relation between discrete and continuous time, for algorithms based on the average sumSn/n.

(3)

2 The exponential weight algorithm: discrete time

Notation Given a vectorx%=0 inIRK+,![x]denotes its normalization in the simplex

∆(K):

![x]k = xk

!K j=1xj.

Definition The discrete exponential weight(E W)algorithm (see e.g. [1,10,17]) with positive parameter A,E W(A), is defined by pn+1(hn)= pn+1 =![{pkneAXkn}k]or equivalently

pkn+1= ex p(A!n m=1Xkm)

!

jex p(A!n m=1Xmj)

.

Recall that pn+1describes the law of the random choicekn+1.

An alternative definition is E W(α),where αis a positive parameter, and pn+1 =

![{(1+α)Skn}k]withSn =!n m=1Xm.

For sake of completeness and to compare with the continuous time argument, we reproduce the basic property, following [1].

Proposition 2.1 σ (n)=E W(1/√n)satisfies conditional expected external consis- tency in the following sense:there exists a constant M such that,for any component k and any process{Xm}:

¯ Xkn−1

n

#n m=1

Eσ (n)(xm|Hm1)≤ M/

n. (1)

Proof LetWn =!

k(1+α)Skn, hence (recall that 0≤ Xkm ≤1) Wn+1

Wn =#

k

(1+α)Skn(1+α)Xkn+1

Wn

=#

kpkn+1(1+α)Xkn+1

≤#

kpkn+1(1+αXnk+1)

=1+α)pn+1,Xn+1*. It follows that

log

$Wn

W0

%

≤α

#n m=1

)pm,Xm*

(4)

and

#n m=1

)pm,Xm* ≥ 1

α(Snklog(1+α)−logK) sinceWn ≥(1+α)Snk, for allkinK.

Thus forαsmall enough one has 1

n

#n m=1

)pm,Xm* ≥ ¯Xnk(1−α/2)−logK/αn.

The choice ofα=1/√nleads to maxk X¯kn−1

n

#n m=1

)pm,Xm* ≤M/n

for some constantM.

Note that)pm,Xm* = Eσ (n)(xm|Hm1)so that the above inequality gives the required

result. ,-

Obviously this implies:

Corollary 2.1 σ (n)satisfies expected consistency:

Eσ (n)&

rkn+'

→0 as n→ ∞.

One way to obtain almost sure convergence to 0 is to use the next basic martingale property (see e.g. [13]).

Proposition 2.2 Let Um be a sequence of uniformly bounded (even in L2) random variables on a probability space (Ω,F,P), adapted to a filtration Fm. Assume E(Um|Fm1)=0, then

1 n

#n m=1

Um →0 as n→ ∞.

Let nowσ be defined as follows: given a sequencenm going to∞, useσ (n1)for K1n2/√n1blocks of sizen1(where the entry on blockmfor runningσ (n1)is Xmn1+!, !=0, ...,n1−1) then inductivelyσ (nm)for Kmnm+1/√nm blocks of sizenm. Propositions2.1and2.2thus imply the following result:

Theorem 2.1 σsatisfies external consistency.

Remarks (1) The optimal choice of the parameter, Aorα, in Proposition2.1, is a function of the lengthnof the process (see the discussion in Sect.6).

(2) To implement the algorithm the actual past play of the predictor (namely the sequence{km}) is not used. (This property also holds true for other algorithms like Fictitious Play or Smooth Fictitious Play [11]).

(5)

(3) Invariance by translation: one can use the vector of total payoffs( Snk)

k or the vector of total regrets ( !n

m=1rmk = Snk −!n m=1xm)

k to define pn, since pn+1 = !*(

pnkeA(Xnk+Yn))

k

+for any YnIR. This property is specific to the exponential weight and is not shared by regret procedures based on Blackwell’s approachability [5,14]).

(4) It is enough to satisfy a property like: for alln, there existsσ (n)such that

#n m=1

&

Xmk − )pm,Xm*'

o(n)

(uniformly in{Xm}) to obtain Theorem2.1by definingσthrough concatenation as above and then using Proposition2.2.

3 The multiplicative weight algorithm: continuous time

Given a measurable processXt,t ≥0, with values in[0,1]K, letSt =,t

0Xsds =tX¯t. Definition A measurable processpt ∈∆(K)is a continuous time exponential weigth algorithm(CT E W)if it satisfies

pt =!*(

exp&

Stk')

k

+. LetWt =!

kexp&

Stk'

so that ptkWt =exp&

Stk' .

Theorem 3.1 Conditional expected external consistency holds for CTEW in the sense that, for any T >0and any k:

1 T

 /T

0

Xksds− /T

0

)ps,Xs*ds

≤ logK T . Proof One has

˙ Wt =#

k

exp2 Stk3

Xkt =#

k

Wt pkt Xkt = )pt,Xt*Wt. Hence

Wt =W0exp

 /t

0

)ps,Xs*ds

.

Thus,Wt ≥exp&

Stk'

for everyk, implies:

/t

0

)ps,Xs*ds≥ /t

0

Xskds−logW0.

,-

(6)

Remark The interpretation is that the conditional law of the choicekt at timet, given the past, is pt. Letxt = Xktt be the outcome at timet, then the average regret vector at timetis

rtk = ¯Xtk− ¯xt and the expectation ofx¯t is given by

E(x¯t)=E

1 t

/t

0

)ps,Xs*ds

.

The previous proof holds as well while replacing the integralSt of the process by the integral of the (conditional expected) regretRt defined by

Rtk= /t

0

2Xks − )ps,Xs*3 ds.

Note that pt satisfies also

pt =! 45exp2

Rtk36

k

7 . LetVt =!

kexp&

Rkt' . Then:

˙ Vt =#

k

exp2 Rtk3 2

Xkt − )pt,Xt*3

=#

k

Vt ptk 2

Xtk− )pt,Xt*3

=0.

HenceVt is constant andVt ≥expRtk, for everyk, implies /t

0

)ps,Xs*ds≥ /t

0

Xskds−logV0.

The same computation as above extends to the following framework:

Proposition 3.1 Let P be aC1function from IRK to IR withP ≥0and%=0, such that xk → +∞, for some component k, implies P(x)→ +∞. If pt satisfies

pkt =

kP$5,t 0

2Xsj− )ps,Xs*3 ds6

j

%

!

iiP$5,t 0

2Xsj− )ps,Xs*3 ds6

j

%

then conditional expected external consistency holds.

(7)

Proof LetRt be defined, as above, by

Rtk= /t

0

2Xks − )ps,Xs*3 ds.

One obtains, withMt =!

iiP(Rt) d

dtP(Rt)=#

k

kP(Rt)2

Xkt − )pt,Xt*3

=Mt

#

k

pkt 2

Xkt − )pt,Xt*3

=0

Hence P(R(t)) = P(0)so that each Rtk is bounded from above and conditional

expected external consistency follows. ,-

The previous case corresponds to P(x)= !

kexpxk; for similar “potential" ap- proaches see [7,15].

4 Convergence

Given a discrete process{Xm}and a correspondingE W algorithm{pm}the aim is to get a bound on

1 n

#n m=1

2Xmk − )pm,Xm*3

from an evaluation of

1 T

/T

0

2Ysk− )qs,Ys*3 ds

whereYtis a continuous process constructed fromXmandqt is aCT E Walgorithm associated toYt.

Proposition 4.1 Given a discrete time process{Xm} ∈[ 0,1]K,m =1, . . . ,n,and T >0, there exists a measurable continuous time process{Yt} ∈[0,1]K,t ∈ [0,T], such that

1 n

#n m=1

Xkm = 1 T

/T

0

Ytkdt

(8)

and 1 n

#n m=1

)pm,Xm*eT/n≤ 1 T

/T

0

)qt,Yt*dt≤ 1 n

#

m

)pm,Xm*eT/n

where{pm}is an E W(T/n)associated to{Xm}and qtis a CTEW associated to{Yt}. Proof LetT >0. Divide the interval[0,T]intonsubintervals with equal lenghtδ= T/nand define, from the discrete time sequence{Xm},m=1, . . . ,n, the continuous time process{Yt}on[0,T]by the step functionYt =Xmfort ∈ [(m−1)T/n,mT/n).

Obviously

1 n

#m

!=1

X!k= 1 T

mT//n

0

Ytkdt.

Let{ ˆpt}be the continuous time process defined from a discrete one{pm}as above : ˆ

pt = pm fort ∈ [(m−1)T/n,mT/n). Clearly also 1

n

#n m=1

)pm,Xm* = 1 T

/T

0

) ˆpt,Yt*dt.

It remains to handle the difference between{ ˆpt}and{qt}which is aCT E Wassociated to{Yt}. For this choose{pm}as the E W(T/n)associated to {Xm}. Then, for t ∈ [(m−1)T/n,mT/n)one has on the one hand

ˆ

pkt = pmk = Wˆtk Wˆt

withWˆtk =exp*

(T/n)!m1 u=1 Xuk+

=exp& ,(m1)T/n 0 Yskds'

andWˆs =!

kWˆsk and on the other

qtk = Wtk Wt withWtk =exp& ,t

0Yskds'

andWt =!

kWtk. Thus, since 0≤Ysk ≤1, one deduces

ˆ

WskWsk ≤ ˆWskeδ hence also

WˆsWs ≤ ˆWseδ

(9)

so that

ˆ

pskeδqsk ≤ ˆpskeδ and

1 T

/T

0

) ˆps,Ys*ds

eδ≤ 1 T

/T

0

)ps,Ys*ds

1 T

/T

0

) ˆqs,Ys*ds

eδ

as well. ,-

We thus obtain an alternative proof of Proposition2.1that we recall:

Lemma 4.1 There exists a EW algorithm satisfing 1

n

#n m=1

2Xkm− )pm,Xm*3

Mn1/2.

Proof Givenn, chooseT =√nso that:

- the bound in the continuous version is of the order 1/T =1/√n 1

T /T

0

2Ytk− )qt,Yt*3

dt≤ logK

n

- and the error term with the discrete approximation of the order ofeδ−1∼δ= T/n =1/√n

1 n

#n m=1

)pm,Xm* ≥ 1 T

 /T

0

)qt,Yt*dt

−L/√n

so that the result follows from Theorem3.1and Proposition4.1.

,- Remark The choice ofT =√n amounts to takingE W(1/√n), hence as in Sect.3 the procedure is not uniform.

5 Extensions

The same analysis applies to similar setups. We consider briefly two of them: the case of partial information where only the outcomexnis known and the internal consistency criteria.

(10)

5.1 Partial information

Consider the framework of Sect.1but where the vector Xnis not revealed ex-post, only the actual chosen componentxnis announced. The aim is to define an algorithm having similar properties as in Sect.2but depending only on the available information.

We follow [1,2].

In discrete time, define inductively the vectorXˆnby

Xˆnk= 8Xk

pknn ifk=kn 0 otherwise.

Letγ ∈(0,1)be a parameter to be tuned later on and define firstSˆn=!n

m=1γXˆm/K, then

ˆ

pnk+1= exp2 ˆ Snk3

!jexp2 ˆ Snj

3.

Finally the strategy at stagen+1 is

pkn=(1−γ )pˆkn+ γ K. Note that

E2 ˆ

Xnk|X1, . . . ,Xn

3=Xkn

and

xn = )pn,Xˆn* hence it is enough, using Proposition2.2to bound

1 n

#n m=1

2Xˆkm− )pm,Xˆm*3 .

The analysis in continuous time is as follows.

Given{Xs}, let{ ˆXs}and{ps}satisfy /t

0

xsds= /t

0

pkssXˆsksds = /t

0

)ps,Xˆs*ds

pks =(1−γ )pˆks +γ /K

(11)

andpˆsbe adapted toXˆs as in the usualCT E W. In particular one obtains /t

0

xsds ≥(1−γ ) /t

0

) ˆps,Xˆs*ds

But, as in Sect.3one has for allk /t

0

) ˆps,Xˆs*ds≥ /t

0

ˆ

XskdsC

hence finally one deduces:

Proposition 5.1

/t

0

xsds≥(1−γ ) /t

0

ˆ

XksdsC.

The corresponding discrete inequality is now 1

n

#n m=1

xm ≥ 1 n

9

(1−γ )eδK/γ

#n m=1

Xˆkm :

C/T

withδ =T/n. The choice ofT =1/γ =n1/3leads to:

Proposition 5.2 There exists M such that 1

n

#n m=1

2Xˆkm− )pm,Xˆm*3

Mn1/3.

5.2 Internal consistency

Given a historyhn, the average internal regret evaluation at stagenis defined by the K×K matrixrn, with entries

rnk!= 1 n

#

m,km=k

2X!mXkm3

which corresponds to a comparison of the average payoff obtained on the dates where kwas chosen, to the payoff for some other fixed component,!, on these dates.

(12)

Definition A strategyσ satisfies internal consistency if, for every process{Xm}: maxk,! rnk!+−→0 a.s., asn−→ ∞.

Using an analog of Proposition2.2it is enough to show, for example, that the quantities

Qn(k, !)=

#n m=1

νmk

2X!mXkm3

are of the ordero(n), whereνmk stands for the conditional probability of playingkat stagemgiven the past history.

We first prove a lemma on invariant measures.

Lemma 5.1 Given a matrix AIRK×K, letψ (A)∈∆(K)be the unique solution of

ψ (A)k#

!

expA(k, !)=#

!

ψ (A)!expA(!,k).

Thenψis Lipschitz continuous.

Proof Let||.||denote the maximal norm and let||AB|| =ρ. Then expB(k, !)eρ ≤expA(k, !)≤expB(k, !)eρk, !.

Similarly withm(A)=!

k,!expA(k, !), one hasm(B)eρm(A)m(B)eρ for allk, !hence

expB(k, !)

m(B) e ≤ expA(k, !)

m(A) ≤ expB(k, !)

m(B) ek, !.

Sinceψ (A)is the unique invariant measure of the transition matrix with coefficients M(k, !)= expm(A)A(k,!) fork%=!, one deduces, by Theorem 7.2 (Corollary) in [18]

ψ (B)e4Kρ ≤ψ (A)e4Kρψ (B).

Hence forρsmall enough||ψ (A)−ψ (B)|| ≤5Kρ. The constant being independent of A,Bthe result obtains: there existsLwith

||ψ (A)−ψ (B)|| ≤L||AB||.

,-

(13)

The continuous time approach is as follows. GivenXt, let

St(k, !)= /t

0

µks

2Xs!Xsk3 ds

whereµs =ψ (Ss)is the invariant measure associated to expSs andµks is the condi- tional probability of playingkat times.

Proposition 5.3 Under the above procedure, there exists a constant C such that St(k, !)≤Ck, !K,t ≥0.

Proof Define

At =#

k,!

expSt(k, !)

so that

˙ At =#

k,!

expSt(k, !)µkt 2

Xt!Xtk3

=0

since the coefficient ofXkt is precisely

#

!

expSt(k, !)µkt −#

!

expSt(!,k)µ!t =0

becauseµt =ψ (St). Hence At = A0= K2and eachSt(k, !)is uniformy bounded

from above. ,-

This property corresponds to conditional expected internal consistency.

The discrete procedure E W(A) is defined inductively through νm+1=ψ (AQm).

Proposition 5.4 For4LT =logn, the discrete procedure E W(T/n)satisfies:

Qn/n≤M/logn hence internal consistency follows.

Proof Given a discrete process{Xm},m = 1, . . . ,n,let{Yt}be the associated step process on[0,T], as in Sect.4. Hence one has, inductively

St(k, !)= /t

0

ψk(Ss)

2Ys!Ysk3 ds.

(14)

Define alsoLm =(T/n)Qmwhich corresponds toE W(T/n)for{Xm}, hence induc- tively

Qm(k, !)=Qm1(k, !)+ψk((T/n)Qm1)2

X!mXkm3 and letLt be the continuous linear interpolation: fort ∈ [mT/n, (m+1)T/n)

Lt(k, !)=(T/n)4

Qm(k, !)+(t−mT/n)ψk((T/n)Qm)2

Xm!+1Xmk+137 . ThusLt satisfies

Lt(k, !)−LmT/n(k, !)= /t

mT/n

ψk(LmT/n)2

Ys!Ysk3 ds.

Hence one has

St(k, !)−SmT/n(k, !)−*

Lt(k, !)−LmT/n(k, !)+

= /t

mT/n

4

ψk(Ss)−ψk(LmT/n)

7 2Ys!Ysk3 ds.

Letρ=4LT/n≥2 max{||ψ (Lt)−ψ (LmT/n)||;mT/n≤t ≤(m+1)T/n}. Then, using Lemma5.1.

||StLt|| ≤ /t

0

2L||SsLs||dsT

from which one deduces, by Gronwall’s lemma, that fort ∈ [0,T]:

||StLt|| ≤ρTexp(2LT).

Recall thatSt is bounded above byChence

LT(k, !)/T ≤C/T +(4LT/n)exp(2LT).

It then follows, choosing 4LT =lognthat Qn/n≤M/logn.

,-

(15)

6 Comments

Let us compare several discrete procedures leading to consistency and their continuous counterparts.

First consider procedures related to exponential evaluation.

The current one(E W)builds on a state parameterzm=Smfor which the updating rule is time independent: ifzm is the state at stagemandξm+1the current observation at stagem+1, the new state iszm+1=h(zm, ξm+1), (for exampleSm+1=Smm+1).

This applies to a family of procedures, see e.g. [8,9], however the precision of the procedure depends on a parameter that is a function of the lenght of the process: it is not uniform. In our case the optimal value of the parameter to handle then-stage problem is 1/√n. Hence to obtain consistency the parameter has to be time dependent.

Concerning the continuous time embedding, it has to be performed on a compact interval. Note that in fact in our analysis, the approximation is through a sequence of longer and longer intervals, of sizeT =√n, with finer and finer discretization of meshT/n=1/√n.

Another exponential evalution, but where the state variable is the average sumS¯m = Sm/m is used in smooth fictitious play, see [11,12]. Explicitly pm is given by

!*(

exp((1/,)S¯mk))

k

+and the corresponding procedure satisfies (approximate) con- sistency. Note that the udpdating rule requires the knowledge of the current stage;

namely zm+1 = m1+1(mzmm+1) = hm(zm, ξm+1). However this equation is a special case of discrete dynamics of the form

zm+1zm =am+1F(zn, ξm+1) with!

an = +∞,!

an2 <+∞andF with bounded range. Hence, see [3,4] the asymptotics of this dynamics can be studied using the asymptotics of the continuous time process

˙

z(t)=(∈)G(z(t))

whereGsatisfies E(F(zm, ξm+1)|Hm)=(∈)G(zm). Thus there is no need to adapt the coefficient to the lenght of the process.

The same study through continuous time processes applies to regret dynamics [15,16] based on approachability [5] which satisfy consistency as well, but do depend on the past behavior of the predictor, see [4].

It is interesting to notice that Blackwell’s original procedure, based onL2norm, when applied to the current framework (where one approaches an orthant) satisfies positive homogeneity of degree zero. Hencep(hm)can be defined as a function of the average regretrmor of the total regretmrm.

For extension to potential based algorithms, where the same machinery will work, see [7,15].

Acknowledgments Part of this work was done while visting IMPA (Rio, Brazil) and CMM (Santiago, Chile). The support of these institutions is gratefully acknowledged.

(16)

References

1. Auer, P., Cesa-Bianchi, N., Freund, Y., Shapire, R.E.: Gambling in a rigged casino: the adversarial multi-armed bandit problem. Proceedings of the 36th annual IEEE symposium on foundations of computer science, pp. 322–331 (1995)

2. Auer, P., Cesa-Bianchi, N., Freund, Y., Shapire, R.E.: The non-stochastic multiarmed bandit prob- lem. SIAM J. Comput.32, 48–77 (2002)

3. Benaim, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J.

Opt. Control44, 328–348 (2005)

4. Benaim, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. Part II:

applications. Math. Oper. Res.31, 673–695 (2006)

5. Blackwell, D.: An analog of the minmax theorem for vector payoffs. Pac. J. Math.6, 1–8 (1956) 6. Brezis, H., Lions, P.-L.: Produits infinis de résolvantes. Israel J. Math.29, 329–345 (1978)

7. Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Mach.

Learn.51, 239–261 (2003)

8. Foster, D., Vohra, R.: Asymptotic calibration. Biometrika85, 379–390 (1998)

9. Foster, D., Vohra, R.: Regret in the on-line decision problem. Games Econ. Behav.29, 7–35 (1999) 10. Freund, Y., Shapire, R.E.: Adaptive game playing using multiplicative weights. Games Econ. Be-

hav.29, 79–103 (1999)

11. Fudenberg, D., Levine, D.K.: Consistency and cautious fictitious play. J. Econ. Dyn. Control19, 1065–

1089 (1995)

12. Fudenberg, D., Levine, D.K.: Conditional universal consistency. Games Econ. Behav. 29, 104–

130 (1999)

13. Hall, P., Heyde, C.: Martingale limit theory and its applications. Academic, London (1980) 14. Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibria. Economet-

rica68, 1127–1150 (2000)

15. Hart, S., Mas-Colell, A.: A general class of adaptive strategies. J. Econ. Theory98, 26–54 (2001) 16. Hart, S., Mas-Colell, A.: Regret-based continuous time dynamics. Games Econ. Behav.45, 375–

394 (2003)

17. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inform. Comput.108, 212–

261 (1994)

18. Seneta, E.: Non-negative matrices and Markov chains. Springer, Heidelberg (1981)

Références

Documents relatifs

The bottom digram of the bottom right picture shows the reimbursement density associated to the loan density shown in the middle diagram, according to the Repayment Pattern shown in

In the case of continuous times games without dynamic, and in analogy to the repeated games, we proved in [5], that the value function of the game is solution of an optimization

Keywords: optimal control, reinforcement learning, policy search, sensitivity analysis, para- metric optimization, gradient estimate, likelihood ratio method, pathwise derivation..

Our objective consists in designing an exponential continuous-discrete time observer that provides continuous- time estimates of the whole state of system (1) from the

For the synchronous strategy, the transition graph can be visualized whereas for the asynchronous strategy, it is not easy to read or use; BoolNet constructs two

start with the most useful design methods for integrated filters, make a survey of filter elements on silicon,.. consider applications and conclude with

When J is the Shapley operator of a stochastic game Γ, x n defined by (3.2) is the non-normalized value of the following n-stage game: states, actions, payoff and transition are as in

As a matter of fact, until recently only little was known about the rate of L 1 convergence of the discrete exit time of an Euler scheme of a diffusion to- wards the exit time of