• Aucun résultat trouvé

A penalized bandit algorithm

N/A
N/A
Protected

Academic year: 2021

Partager "A penalized bandit algorithm"

Copied!
32
0
0

Texte intégral

(1)

HAL Id: hal-00012187

https://hal.archives-ouvertes.fr/hal-00012187

Submitted on 18 Oct 2005

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A penalized bandit algorithm

Damien Lamberton, Gilles Pagès

To cite this version:

Damien Lamberton, Gilles Pagès. A penalized bandit algorithm. Electronic Journal of Probability,

Institute of Mathematical Statistics (IMS), 2008, 13, 341-373 ; http://dx.doi.org/10.1214/EJP.v13-

489. �10.1214/EJP.v13-489�. �hal-00012187�

(2)

ccsd-00012187, version 1 - 18 Oct 2005

A penalized bandit algorithm

Damien Lamberton

Gilles Pag` es

Abstract

We study a two armed-bandit algorithm with penalty. We show the convergence of the algorithm and establish the rate of convergence. For some choices of the parame- ters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a discontinuous Markov process.

Key words: Two-armed bandit algorithm, penalization, stochastic approximation, conver- gence rate, learning automata, asset allocation.

2001 AMS classification: 62L20, secondary 93C40, 91E40, 68T05, 91B32 .

Introduction

In a recent joint work with P. Tarr`es (see [12]), we studied the convergence of the so-called two armed bandit algorithm. The purpose of the present paper is to investigate a modified version of this algorithm, in which a penalization is introduced. In the terminology of learning theory (see [14, 15]), the algorithm studied in [12] was a Linear Reward-Inaction (LRI) scheme, whereas the one we want to introduce is a Linear Reward-Penalty (LRP ) procedure.

In our previous paper, the algorithm was introduced in a financial context as a pro- cedure for the optimal allocation of a fund between two traders who manage it. Imagine that the owner of a fund can share his wealth between two traders, say A and B , and that, every day, he can evaluate the results of one of the traders and, subsequently, modify the percentage of the fund managed by both traders. Denote by X

n

the percentage managed by trader A at time n (X

n

∈ [0, 1]). We assume that the owner selects the trader to be evaluated at random, in such a way that the probability that A is evaluated at time n is X

n

, in order to select preferably the trader in charge of the greater part of the fund. In the LRI scheme, if the evaluated trader performs well, its share is increased by a fraction

This work has benefitted from the stay of both authors at the Isaac Newton Institute (Cambridge University) on the program Developments in Quantitative Finance.

Laboratoire d’analyse et de math´ematiques appliqu´ees, UMR 8050, Univ. Marne-la-Vall´ee, Cit´e Descartes, 5, Bld Descartes, Champs-sur-Marne, F-77454 Marne-la-Vall´ee Cedex 2, France.

damien.lamberton@univ-mlv.fr

Laboratoire de probabilit´es et mod`eles al´eatoires, UMR 7599, Univ. Paris 6, case 188, 4, pl. Jussieu,

F-75252 Paris Cedex 5. gpa@ccr.jussieu.fr

(3)

γ

n

∈ (0, 1) of the share of the other trader, and nothing happens if the evaluated trader performs badly. Therefore, the dynamics of the sequence (X

n

)

n≥0

can be modelled as follows:

X

n+1

= X

n

+ γ

n+1

1

{Un+1≤Xn}∩An+1

(1 − X

n

) − 1

{Un+1>Xn}∩Bn+1

X

n

, X

0

= x ∈ [0, 1], where (U

n

)

n≥1

is an iid sequence of uniform random variables on the interval [0, 1], A

n

(resp. B

n

) is the event “trader A (resp. trader B) performs well at time n”. We assume P (A

n

) = p

A

, P (B

n

) = p

B

, for n ≥ 1, with p

A

, p

B

∈ (0, 1), and independence between these events and the sequence (U

n

)

n≥1

. The point is that the owner of the fund does not know the parameters p

A

, p

B

.

This recursive learning procedure has been designed in order to assign asymptotically the whole fund to the best trader. This means that, if say p

A

> p

B

, X

n

converges to 1 with probability 1 provided X

0

∈ (0, 1) (if p

A

< p

B

, the limit is 0 with symmetric results).

However this “infallibility” property needs some very stringent assumptions on the reward parameter γ

n

(see [12]). Furthermore, the rate of convergence of the procedure either toward its “target” 1 or its “trap” 0 is not ruled by a CLT with rate √ γ

n

like standard stochastic approximation algorithms (see [10]). It is shown in [11] that this rate is quite non-standard, strongly depends on the (unknown) values p

A

and p

B

and becomes very poor as these probabilities get close to each other.

In order to improve the efficiency of the algorithm, one may imagine to introduce a penalty when an evaluated trader has unsatisfactory performances. More precisely, if the evaluated trader at time n performs badly, its share is decreased by a penalty factor ρ

n

γ

n

. This leads to the following LRP – or “penalized two-armed bandit – procedure

X

n+1

= X

n

+ γ

n+1

1

{Un+1≤Xn}∩An+1

(1 − X

n

) − 1

{Un+1>Xn}∩Bn+1

X

n

− γ

n+1

ρ

n+1

X

n

1

{Un+1≤Xn}∩Ac

n+1

− (1 − X

n

)1

{Un+1>Xn}∩Bc

n+1

, n ∈ N , where the notation A

c

is used for the complement of an event A. The precise assumptions on the reward rate γ

n

and the penalty rate γ

n

ρ

n

will be given in the following sections.

The paper is organized as follows. In Section 1, we discuss the convergence of the sequence (X

n

)

n≥0

. First we show that, if ρ

n

is a positive constant ρ, the sequence converges with probability one to a limit x

ρ

∈ (0, 1) satisfying x

ρ

>

12

if and only if p

A

> p

B

, so that, although the algorithm manages to distinguish which trader is better, it does not assign the whole fund to the best trader. To get rid of this limitation, we consider a sequence (ρ

n

)

n≥1

which goes to zero so that the penalty rate becomes negligible with respect to the reward rate (γ

n

ρ

n

= o(γ

n

)). This framework seems new in the learning theory literature.

Then, we are able to show that the algorithm is infallible i.e., if p

A

> p

B

, then lim

n→∞

X

n

= 1

almost surely, under very light conditions on the reward rate γ

n

(and ρ

n

). From a stochastic

approximation viewpoint, this modification of the original procedure has the same mean

function and time scale (hence the same target and trap, see (5)) but it always keeps the

algorithm away from the trap without adding noise at these equilibria. In fact, it was

necessary not to add noise at these points in order to remain inside the domain [0, 1].

(4)

The other two sections are devoted to the rate of convergence. In Section 2, we show that under some conditions (including lim

n→∞

γ

n

n

= 0) the sequence Y

n

= (1 − X

n

)/ρ

n

converges in probability to (1 − p

A

)/π, where π = p

A

− p

B

> 0. With additional assumptions, we prove that this convergence occurs with probability 1. In Section 3, we show that if the ratio γ

n

n

goes to a positive limit as n goes to infinity, then (Y

n

)

n≥1

converges in a weak sense to a probability distribution ν. This distribution is identified as the unique stationary distribution of a discontinuous Markov process. This result is obtained by using weak functional methods applied to a re-scaling of the algorithm. This approach can be seen as an extension of the SDE method used to prove the CLT in a more standard framework of stochastic approximation (see [10]). Furthermore, we show that ν is absolutely continuous with continuous, possibly non-smooth, piecewise C

density. An interesting consequence of these results for practical applications is that, by choosing ρ

n

and γ

n

proportional to n

−1/2

, one can achieve convergence at the rate 1/ √

n, without any a priori knowledge about the values of p

A

and p

B

. This is in contrast with the case of the LRI procedure, where the rate of convergence depends heavily on these parameters (see [11]) and becomes quite poor when they get close to each other.

Notation. Let (a

n

)

n≥0

and (b

n

)

n≥0

be two sequences of positive real numbers. The symbol a

n

∼ b

n

means a

n

= b

n

+ o(b

n

).

1 Convergence of the LRP algorithm

1.1 Some classical background on stochastic approximation

We will rely on the ODE lemma recalled below for a stochastic procedure (Z

n

) taking its values in a given compact interval I .

Theorem 1 (a) Kushner & Clark’s ODE Lemma (see [9]): Let g : I → R such that Id + g leaves I stable (

1

). Then, consider the recursively defined stochastic approximation procedure defined on I by

Z

n+1

= Z

n

+ γ

n+1

(g(Z

n

) + ∆R

n+1

), n ≥ 0, Z

0

∈ I,

where (γ

n

)

n≥1

is a sequence of [0, 1]-valued real numbers satisfying γ

n

→ 0 and P

n≥1

γ

n

= + ∞ . Set N (t) := min { n : γ

1

+ · · · + γ

n+1

> t } . If, for every T > 0,

N(t)≤n≤N

max

(t+T)

n

X

k=N(t)+1

γ

k

∆R

k

−→ 0 P -a.s. as t + . (1) Let z

be an attracting zero of g in I and G(z

) its attracting interval. Then, on the event

{ Z

n

visits infinitely often a compact subset of G(z

) } Z

n a.s.

−→ z

.

1

then for every γ ∈ [0, 1], Id + γg = γ(Id + g) + (1 − γ)Id still takes values in the convex set I

(5)

(b) The Hoeffding condition (see [1]): If (∆R

n

)

n≥0

is a sequence of L

-bounded martingale increments, if (γ

n

) is nonincreasing and X

n≥1

e

γnϑ

< + ∞ for every ϑ > 0, then Assumption (1) is satisfied.

Remark. The monotonous assumption on the sequence γ can be relaxed into γ

n

→ 0 and sup

n,k≥1

γ

n+k

γ

n

< + ∞

1.2 Basic properties of the LRP algorithm

We first recall the definition of the algorithm. We are interested in the asymptotic behavior of the sequence (X

n

)

n∈N

, where X

0

= x, with x ∈ (0, 1), and

X

n+1

= X

n

+ γ

n+1

1

{Un+1≤Xn}∩An+1

(1 − X

n

) − 1

{Un+1>Xn}∩Bn+1

X

n

− γ

n+1

ρ

n+1

X

n

1

{Un+1≤Xn}∩Ac

n+1

− (1 − X

n

)1

{Un+1>Xn}∩Bc

n+1

, n ∈ N . Throughout the paper, we assume that (γ

n

)

n≥1

is a non-increasing sequence of positive numbers satisfying γ

n

< 1,

X

n=1

γ

n

= + ∞ and

∀ ϑ > 0, X

n

e

γnϑ

< ∞ ,

and that (ρ

n

)

n≥1

is a sequence of positive numbers satisfying γ

n

ρ

n

< 1; (U

n

)

n≥1

is a sequence of independent random variables which are uniformly distributed on the interval [0, 1], the events A

n

, B

n

satisfy

P (A

n

) = p

A

, P (B

n

) = p

B

, n ∈ N ,

where 0 < p

B

≤ p

A

< 1, and the sequences (U

n

)

n≥1

and (1

An

, 1

Bn

)

n≥1

are independent.

The natural filtration of the sequence (U

n

, 1

An

, 1

Bn

)

n≥1

is denoted by ( F

n

)

n≥0

and we set π = p

A

− p

B

.

With this notation, we have, for n ≥ 0,

X

n+1

= X

n

+ γ

n+1

(πh(X

n

) + ρ

n+1

κ(X

n

)) + γ

n+1

∆M

n+1

, (2) where the functions h and κ are defined by

h(x) = x(1 − x), κ(x) = − (1 − p

A

)x

2

+ (1 − p

B

)(1 − x)

2

, 0 ≤ x ≤ 1,

∆M

n+1

= M

n+1

− M

n

, and the sequence (M

n

)

n≥0

is the martingale defined by M

0

= 0 and

∆M

n+1

= 1

{Un+1≤Xn}∩An+1

(1 − X

n

) − 1

{Un+1>Xn}∩Bn+1

X

n

− πh(X

n

)

− ρ

n+1

X

n

1

{Un+1≤Xn}∩Ac

n+1

− (1 − X

n

)1

{Un+1>Xn}∩Bc

n+1

+ κ(X

n

) . (3)

Observe that the increments ∆M

n+1

are bounded.

(6)

1.3 Constant penalty rate In this subsection, we assume

∀ n ≥ 1, ρ

n

= ρ, with 0 < ρ ≤ 1. We then have

X

n+1

= X

n

+ γ

n+1

(h

ρ

(X

n

) + ∆M

n+1

) , where

h

ρ

(x) = πh(x) + ρκ(x), 0 ≤ x ≤ 1.

Note that h

ρ

(0) = ρ(1 − p

B

) > 0 and h

ρ

(1) = − ρ(1 − p

A

) < 0, and that there exists a unique x

ρ

∈ (0, 1) such that h

ρ

(x

ρ

) = 0. By a straightforward computation, we have

x

ρ

= π − 2ρ(1 − p

B

) + p π

2

+ 4ρ

2

(1 − p

B

)(1 − p

A

)

2π(1 − ρ) if π 6 = 0 and ρ 6 = 1

= (1 − p

A

)

(1 − p

A

) + (1 − p

B

) if π = 0 or ρ = 1.

In particular, x

ρ

= 1/2 if π = 0 regardless of the value of ρ. We also have h

ρ

(1/2) = π(1 + ρ)/4 ≥ 0, so that

x

ρ

> 1/2 if π > 0. (4)

Now, let x be a solution of the ODE dx/dt = h

ρ

(x). If x(0) ∈ [0, x

ρ

], x is non-decreasing and lim

t→∞

x(t) = x

ρ

. If x(0) ∈ [x

ρ

, 1], x is non-increasing and lim

t→∞

x(t) = x

ρ

. It follows that the interval [0, 1] is a domain of attraction for x

ρ

. Consequently, using Kushner and Clark’s ODE Lemma (see Theorem 1), one reaches the following conclusion.

Proposition 1 Assume that ρ

n

= ρ ∈ (0, 1], then

X

n

−→

a.s.

x

ρ

as n → ∞ .

The natural interpretation, given the above inequalities on x

ρ

, is that this algorithm never fails in pointing the best trader thanks to Inequality (4), but it never assigns the whole fund to this trader as the original LRI procedure did.

1.4 Convergence when the penalty rate goes to zero Proposition 2 Assume lim

n→∞

ρ

n

= 0. The sequence (X

n

)

n∈N

is almost surely convergent and its limit X

satisfies X

∈ { 0, 1 } with probability 1.

Proof: We first write the algorithm in its canonical form

X

n+1

= X

n

+ γ

n+1

(π h(X

n

) + ∆R

n+1

) with ∆R

n

= ∆M

n

+ ρ

n

κ(X

n−1

). (5)

It is straightforward to check that the ODE x ˙ = h(x) has two equilibrium points, 0 and

1, 1 being attractive with (0, 1] as an attracting interval and 0 is unstable.

(7)

Since the martingale increments ∆M

n

are bounded, it follows from the assumptions on the sequence (γ

n

)

n≥1

and the Hoeffding condition (see Theorem 1(b)) that

N(t)≤n≤N

max

(t+T)

|

n

X

k=N(t)+1

γ

k

∆M

k

|

P

-

a.s.

−→ 0 as t → + ∞

for every T > 0. On the other hand the function κ being bounded on [0, 1] and ρ

n

converging to 0, we have, for every T > 0,

N(t)≤n≤N(t+T

max

)

|

n

X

k=N(t)+1

γ

k

ρ

k

κ(X

k−1

) | ≤ k k k

[0,1]

(T +γ

N(t+T)

) max

k≥N(t)+1

ρ

k

−→ 0 as t → + ∞ . Finally, the sequence (∆R

n

)

n≥1

satisfies Assumption (1). Consequently, either X

n

visits infinitely often an interval [ε, 1] for some ε > 0 and X

n

converges toward 1, or X

n

converges toward 0.

Remark 1 If π = 0, i.e. p

A

= p

B

, the algorithm reduces to

X

n+1

= X

n

+ γ

n+1

ρ

n+1

(1 − p

A

)(1 − 2X

n

) + γ

n+1

∆M

n+1

.

The number 1/2 is the unique equilibrium of the ODE ˙ x = (1 − p

A

)(1 − 2x), and the interval [0, 1] is a domain of attraction. Assuming P

n=1

ρ

n

γ

n

= + ∞ , and that the sequence (γ

n

n

)

n≥1

is non-increasing and satisfies

∀ ϑ > 0,

X

n=1

exp

− ϑ ρ

n

γ

n

< + ∞ ,

it can be proved, using the Kushner-Clark ODE Lemma (Theorem 1), that lim

n→∞

X

n

= 1/2 almost surely. As concerns the asymptotics of the algorithm when π = 0 and γ

n

= g ρ

n

(for which the above condition is not satisfied), we refer to the final remark of the paper.

From now on, we will assume that p

A

> p

B

. The next proposition shows that the penalized algorithm is infallible under very light assumptions on γ

n

and ρ

n

.

Proposition 3 (Infallibility) Assume lim

n→∞

ρ

n

= 0. If the sequence (γ

n

n

)

n≥1

is bounded and P

n

γ

n

ρ

n

= ∞ , and if π > 0, we have lim

n→∞

X

n

= 1 almost surely.

Proof: We have from (2), since h ≥ 0 on the interval [0, 1], X

n

≥ X

0

+

n

X

j=1

γ

j

ρ

j

κ(X

j−1

) +

n

X

j=1

γ

j

∆M

j

, n ≥ 1.

Since the jumps ∆M

j

are bounded, we have

n

X

j=1

γ

j

∆M

j

2

L2

≤ C

n

X

j=1

γ

j2

≤ C sup

j∈N

j

j

)

n

X

j=1

γ

j

ρ

j

,

(8)

for some positive constant C. Therefore, since P

n

γ

n

ρ

n

= ∞ , L

2

- lim

n→∞

P

n

j=1

γ

j

∆M

j

P

n

j=1

γ

j

ρ

j

= 0 so that lim sup

n

P

n

j=1

γ

j

∆M

j

P

n

j=1

γ

j

ρ

j

≥ 0 a.s..

Now, on the set { X

= 0 } , we have

n→∞

lim

n

X

j=1

γ

j

ρ

j

κ(X

j−1

)

n

X

j=1

γ

j

ρ

j

= κ(0) > 0.

Hence, it follows that, still on the set { X

= 0 } , lim sup

n→∞

X

n

n

X

j=1

γ

j

ρ

j

> 0.

Therefore, we must have P (X

= 0) = 0.

The following Proposition will give a control on the conditional variance process of the martingale (M

n

)

n∈N

which will be crucial to elucidate the rate of convergence of the algorithm.

Proposition 4 We have, for n ≥ 0, E ∆M

2

n+1

| F

n

≤ p

A

(1 − X

n

) + ρ

2n+1

(1 − p

B

).

Proof: We have

∆M

n+1

= V

n+1

− E (V

n+1

| F

n

) + W

n+1

E (W

n+1

| F

n

), with

V

n+1

= 1

{Un+1≤Xn}∩An+1

(1 − X

n

) − 1

{Un+1>Xn}∩Bn+1

X

n

and

W

n+1

= − ρ

n+1

X

n

1

{Un+1≤Xn}∩Ac

n+1

− (1 − X

n

)1

{Un+1>Xn}∩Bc

n+1

.

Note that V

n+1

W

n+1

= 0, so that

E ∆M

n+12

| F

n

= E (V

n+12

| F

n

) + E (W

n+12

| F

n

) ( E (V

n+1

+ W

n+1

| F

n

))

2

≤ E (V

n+12

| F

n

) + E (W

n+12

| F

n

).

Now, using p

B

≤ p

A

and X

n

≤ 1, E V

2

n+1

| F

n

= p

A

X

n

(1 − X

n

)

2

+ p

B

(1 − X

n

)X

n2

≤ p

A

(1 − X

n

) and E (W

n+12

| F

n

) = ρ

2n+1

X

n3

(1 p

A

) + (1 − X

n

)

3

(1 − p

B

)

≤ ρ

2n+1

(1 − p

B

).

This proves the Proposition.

(9)

2 The rate of convergence: pointwise convergence

2.1 Convergence in probability Theorem 2 Assume

n→∞

lim ρ

n

= 0, lim

n→∞

γ

n

ρ

n

= 0, X

n

ρ

n

γ

n

= ∞ , ρ

n

− ρ

n−1

= o(ρ

n

γ

n

). (6) Then, the sequence ((1 − X

n

)/ρ

n

)

n≥1

converges to (1 − p

A

)/π in probability.

Note that the assumptions of Theorem 2 are satisfied if γ

n

= C/n

a

and ρ

n

= C

/n

r

, with C, C

> 0, 0 < r < a and a + r < 1. In fact, we will see that for this choice of parameters, convergence holds with probability one (see Theorem 3).

Before proving Theorem 2, we introduce the notation Y

n

= 1 − X

n

ρ

n

. We have, from (2)

1 − X

n+1

= 1 − X

n

− γ

n+1

πX

n

(1 − X

n

) − ρ

n+1

γ

n+1

κ(X

n

) − γ

n+1

∆M

n+1

1 − X

n+1

ρ

n+1

= 1 − X

n

ρ

n+1

− γ

n+1

ρ

n+1

πX

n

(1 − X

n

) − γ

n+1

κ(X

n

) − γ

n+1

ρ

n+1

∆M

n+1

. Hence

Y

n+1

= Y

n

+ (1 − X

n

) 1

ρ

n+1

− 1

ρ

n

− γ

n+1

ρ

n+1

πX

n

− γ

n+1

κ(X

n

) − γ

n+1

ρ

n+1

∆M

n+1

Y

n+1

= Y

n

(1 + γ

n+1

ε

n

− π

n

γ

n+1

X

n

) − γ

n+1

κ(X

n

) − γ

n+1

ρ

n+1

∆M

n+1

, where

ε

n

= ρ

n

γ

n+1

1 ρ

n+1

− 1

ρ

n

and π

n

= ρ

n

ρ

n+1

π.

It follows from the assumption ρ

n

− ρ

n−1

= o(ρ

n

γ

n

) that lim

n→∞

ε

n

= 0 and lim

n→∞

π

n

= π.

Lemma 1 Consider two positive numbers π

and π

+

with 0 < π

< π < π

+

< 1. Given l ∈ N , let

ν

l

= inf { n ≥ l | π

n

X

n

− ε

n

> π

+

or π

n

X

n

− ε

n

< π

} . We have

• lim

l→∞

P

l

= ) = 1,

• for n ≥ l, if θ

+n

= Q

nk=l+1

(1 − π

+

γ

k

) and θ

n

= Q

nk=l+1

(1 − π

γ

k

), Y

n∧νl

θ

n∧ν l

≤ Y

l

n∧νl

X

k=l+1

γ

k

θ

k

κ(X

k−1

) −

n∧νl

X

k=l+1

γ

k

ρ

k

θ

k

∆M

k

(7)

(10)

and

Y

n∧νl

θ

+n∧νl

≥ Y

l

n∧νl

X

k=l+1

γ

k

θ

+k

κ(X

k−1

) −

n∧νl

X

k=l+1

γ

k

ρ

k

θ

+k

∆M

k

. (8) Moreover, with the notation || k ||

= sup

0<x<1

| κ(x) | ,

sup

n≥l

E Y

n

1

l=∞}

≤ E Y

l

+ || k ||

π

.

Remark 2 Note that, as the proof will show, Lemma 1 remains valid if the condition

n→∞

lim γ

n

n

= 0 in (6) is replaced by the boundedness of the sequence (γ

n

n

)

n≥1

. In particular, the last statement, which implies the tightness of the sequence (Y

n

)

n≥1

, will be used in Section 3.

Proof: Since lim

n→∞

n

X

n

− ε

n

) = π a.s., we clearly have lim

l→∞

P

l

= ) = 1.

On the other hand, for l ≤ n < ν

l

, we have

Y

n+1

≤ Y

n

(1 − γ

n+1

π

) − γ

n+1

κ(X

n

) − γ

n+1

ρ

n+1

∆M

n+1

and

Y

n+1

≥ Y

n

(1 − γ

n+1

π

+

) − γ

n+1

κ(X

n

) − γ

n+1

ρ

n+1

∆M

n+1

,

so that, with the notation θ

n+

= Q

nk=l+1

(1 − π

+

γ

k

) and θ

n

= Q

nk=l+1

(1 − π

γ

k

), Y

n+1

θ

n+1

≤ Y

n

θ

n

− γ

n+1

θ

n+1

κ(X

n

) − γ

n+1

ρ

n+1

θ

n+1

∆M

n+1

and Y

n+1

θ

n+1+

≥ Y

n

θ

n+

− γ

n+1

θ

+n+1

κ(X

n

) − γ

n+1

ρ

n+1

θ

n+1+

∆M

n+1

. By summing up these inequalities, we get (7) and (8).

By taking expectations in (7), we get E Y

n∧νl

θ

n∧ν l

≤ E Y

l

+ || k ||

E

n∧νl

X

k=l+1

γ

k

θ

k

= E Y

l

+ || k ||

π

E

n∧νl

X

k=l+1

1 θ

k

− 1

θ

k−1

!

≤ E Y

l

+ || k ||

π

1 θ

n

. We then have

E (Y

n

1

l=∞}

) = θ

n

E Y

n∧νl

θ

n∧νl

1

l=∞}

!

≤ θ

n

E Y

n∧νl

θ

n∧ν l

≤ θ

n

E Y

l

+ || k ||

π

1 θ

n

≤ E Y

l

+ || k ||

π

.

(11)

Lemma 2 Let (θ

n

)

n∈N

be a sequence of positive numbers such that θ

n

= Q

nk=1

(1 − pγ

k

) for some p ∈ (0, 1). The sequence θ

n

P

nk=1 θγk

kρk

∆M

k

n∈N

converges to 0 in probability.

Proof: It suffices to show convergence to 0 in probability for the associated conditional variances T

n

, defined by

T

n

= θ

n2

n

X

k=1

γ

k2

θ

2k

ρ

2k

E ∆M

k2

| F

k−1

. We know from Proposition 4 that

E ∆M

k2

| F

k−1

p

A

(1 − X

k−1

) + ρ

2k

(1 − p

B

)

= p

A

ρ

k−1

Y

k−1

+ ρ

2k

(1 − p

B

).

Therefore, T

n

≤ p

A

T

n(1)

+ (1 − p

B

)T

n(2)

, where T

n(1)

= θ

2n

n

X

k=1

γ

k2

θ

k2

ρ

2k

ρ

k−1

Y

k−1

and

T

n(2)

= θ

2n

n

X

k=1

γ

k2

θ

2k

. We first prove that lim

n→∞

T

n(2)

= 0. Note that, since pγ

k

≤ 1, 1

θ

2k

− 1

θ

2k−1

= 2pγ

k

− p

2

γ

k2

θ

k2

≥ p γ

k

θ

k2

. (9)

Therefore,

T

n(2)

≤ θ

n2

p

n

X

k=1

γ

k

1 θ

k2

− 1

θ

2k−1

! , and lim

n→∞

T

n(2)

= 0 follows from Cesaro’s lemma.

We now deal with T

n(1)

. First note that the assumption ρ

n

− ρ

n−1

= o(ρ

n

γ

n

) implies

n→∞

lim ρ

n

n−1

= 1, so that, the sequence (γ

n

)

n≥1

being non-increasing with limit 0, we only need to prove that lim

n→∞

T ¯

n(1)

= 0 in probability, where T ¯

n(1)

= θ

2n

n

X

k=1

γ

k2

θ

k2

ρ

k

Y

k

.

Now, with the notation of Lemma 1, we have, for n ≥ l > 1 and ε > 0, P T ¯

n(1)

≥ ε ≤ P

l

< ) + P θ

n2

n

X

k=1

γ

k2

θ

k2

ρ

k

Y

k

1

l=∞}

≥ ε

!

≤ P

l

< ) + 1 ε θ

n2

n

X

k=1

γ

k2

θ

2k

ρ

k

E Y

k

1

l=∞}

.

(12)

Using Lemma 1, lim

n→∞

γ

n

n

= 0 and (9), we have

n→∞

lim θ

2n

n

X

k=1

γ

k2

θ

k2

ρ

k

E Y

k

1

l=∞}

= 0.

We also know that lim

l→∞

P

l

< ) = 0. Hence,

n→∞

lim P T ¯

n(1)

≥ ε = 0.

Proof of Theorem 2: First note that if θ

n

= Q

nk=1

(1 − pγ

k

), with 0 < p < 1, we have

n

X

k=1

γ

k

θ

k

κ(X

k−1

) = 1 p

n

X

k=1

1 θ

k

− 1

θ

k−1

κ(X

k−1

).

Hence, using lim

n→∞

X

n

= 1 and κ(1) = − (1 − p

A

),

n→∞

lim θ

n

n

X

k=1

γ

k

θ

k

κ(X

k−1

) = − 1 − p

A

p .

Going back to (7) and (8) and using Lemma 2 with p = π

+

and π

, and the fact that

l→∞

lim

P

l

= ) = 1, we have, for all ε > 0, lim

n→∞

P (Y

n

1 p

A

π

+ ε) = lim

n→∞

P (Y

n

1 − p

A

π

+

− ε) = 0, and since π

+

and π

can be made arbitrarily close to π, the Theorem is

proved.

2.2 Almost sure convergence

Theorem 3 In addition to (6), we assume that for all β ∈ [0, 1],

γ

n

ρ

βn

− γ

n−1

ρ

βn−1

= o(γ

n2

ρ

βn

), (10) and that, for some η > 0, we have

∀ C > 0, X

n

exp − C ρ

1+ηn

γ

n

!

< ∞ . (11)

Then, with probability 1,

n→∞

lim

1 − X

n

ρ

n

= 1 − p

A

π .

Note that the assumptions of Theorem 3 are satisfied if γ

n

= Cn

−a

and ρ

n

= C

n

−r

, with C, C

> 0, 0 < r < a and a + r < 1.

The proof of Theorem 3 is based on the following lemma, which will be proved later.

(13)

Lemma 3 Under the assumptions of Theorem 3, let α ∈ [0, 1] and let (θ

n

)

n∈≥1

be a sequence of positive numbers such that θ

n

= Q

nk=1

(1 − pγ

k

), for some p ∈ (0, 1). On the set { sup

n

αn

Y

n

) < ∞} , we have

n→∞

lim θ

n

ρ

α−η 2 −1 n

n

X

k=1

γ

k

θ

k

∆M

k

= 0 a.s., where η satisfies (11).

Proof of Theorem 3: We start from the following form of (2):

1 − X

n+1

= (1 − X

n

)(1 − γ

n+1

πX

n

) − ρ

n+1

γ

n+1

κ(X

n

) − γ

n+1

∆M

n+1

. We know that lim

n→∞

X

n

= 1 a.s.. Therefore, given π

+

and π

, with 0 < π

< π < π

+

< 1, there exists l ∈ N such that, for n ≥ l,

1 − X

n+1

≤ (1 − X

n

)(1 − γ

n+1

π

) − ρ

n+1

γ

n+1

κ(X

n

) − γ

n+1

∆M

n+1

and

1 − X

n+1

≥ (1 − X

n

)(1 − γ

n+1

π

+

) − ρ

n+1

γ

n+1

κ(X

n

) − γ

n+1

∆M

n+1

, so that, with the notation θ

n+

= Q

nk=l+1

(1 − π

+

γ

k

) and θ

n

= Q

nk=l+1

(1 − π

γ

k

),

1 − X

n+1

θ

n+1

≤ 1 − X

n

θ

n

− ρ

n+1

γ

n+1

θ

n+1

κ(X

n

) − γ

n+1

θ

n+1

∆M

n+1

and 1 − X

n+1

θ

n+1+

≥ 1 − X

n

θ

+n

− ρ

n+1

γ

n+1

θ

+n+1

κ(X

n

) − γ

n+1

θ

n+1+

∆M

n+1

. By summing up these inequalities, we get, for n ≥ l + 1,

1 − X

n

θ

n

≤ (1 − X

l

) −

n

X

k=l+1

ρ

k

γ

k

θ

k

κ(X

k−1

) −

n

X

k=l+1

γ

k

θ

k

∆M

k

and

1 − X

n

θ

+n

≥ (1 − X

l

) −

n

X

k=l+1

ρ

k

γ

k

θ

k+

κ(X

k−1

) −

n

X

k=l+1

γ

k

θ

+k

∆M

k

. Hence

Y

n

≤ θ

n

ρ

n

(1 − X

l

) − θ

n

ρ

n

n

X

k=l+1

ρ

k

γ

k

θ

k

κ(X

k−1

) − θ

n

ρ

n

n

X

k=l+1

γ

k

θ

k

∆M

k

(12) and

Y

n

≥ θ

n+

ρ

n

(1 − X

l

) − θ

+n

ρ

n

n

X

k=l+1

ρ

k

γ

k

θ

k+

κ(X

k−1

) − θ

n+

ρ

n

n

X

k=l+1

γ

k

θ

k+

∆M

k

. (13) We have, with probability 1, lim

n→∞

κ(X

n

) = κ(1) = − (1 − p

A

), and, since P

n=1

ρ

n

γ

n

= + ∞ ,

n

X

k=l+1

ρ

k

γ

k

θ

k

κ(X

k−1

) ∼ − (1 − p

A

)

n

X

k=l+1

ρ

k

γ

k

θ

k

. (14)

(14)

On the other hand,

n

X

k=l+1

ρ

k

γ

k

θ

k

= 1 π

n

X

k=l+1

ρ

k

1 θ

k

− 1

θ

k−1

!

= 1

π

n

X

k=l+1

k−1

− ρ

k

) 1

θ

k−1

+ ρ

n

θ

n

− ρ

l

θ

l

∼ 1 π

ρ

n

θ

n

, (15)

where we have used the condition ρ

k

− ρ

k−1

=o(ρ

k

γ

k

). We deduce from (14) and (15) that

n→∞

lim θ

n

ρ

n

n

X

k=1

ρ

k

γ

k

θ

k

κ(X

k−1

) = − 1 − p

A

π

and, also, that lim

n→∞

θ

n

ρ

n

= 0. By a similar argument, we get lim

n→∞

θ

n+

ρ

n

= 0 and

n→∞

lim θ

n+

ρ

n

n

X

k=1

ρ

k

γ

k

θ

k+

κ(X

k−1

) = − 1 − p

A

π

+

It follows from Lemma 3, that given α ∈ [0, 1], we have, on the set E

α

:= { sup

n

αn

Y

n

) < ∞} ,

n→∞

lim ρ

α−η 2 −1 n

θ

±n

n

X

k=1

γ

k

θ

±k

∆M

k

= 0.

Together with (12) and (13) this implies

• lim

n→∞

Y

n

= (1 − p

A

)/π a.s., if

α−η2

≤ 0,

• lim

n→∞

Y

n

ρ

α−η

n2

= 0 a.s., if

α−η2

> 0.

We obviously have P (E

α

) = 1 for α = 1. We deduce from the previous argument that if P (E

α

) = 1 and

α−η

2

> 0, then P (E

α

) = 1, with α

=

α−η2

− 1. Set α

0

= 1 and α

k+1

=

αk2−η

− 1. If α

0

≤ η, we have lim

n→∞

Y

n

= (1 − p

A

)/π a.s. on E

α0

. If α

0

> η, let j be the largest integer such that α

j

> η (note that j exists because lim

k→∞

α

k

< 0). We have P (E

α

j+1

) = 1, and, on E

αj+1

, lim

n→∞

Y

n

= (1 − p

A

)/π a.s., because

αj2−η

≤ 0.

We now turn to the proof of Lemma 3 which is based on the following classical martin- gale inequality (see [13], remark 1, p.14 for a proof in the case of i.i.d. random variables:

the extension to bounded martingale increments is straightforward).

Lemma 4 (Bernstein’s inequality for bounded martingale increments) Let (Z

i

)

1≤i≤n

be a

finite sequence of square integrable random variables, adapted to the filtration ( F

i

)

1≤i≤n

,

such that

(15)

1. E (Z

i

| F

i−1

) = 0, i = 1, . . . , n, 2. E (Z

2

i

| F

i−1

) ≤ σ

i2

, i = 1, . . . , n, 3. | Z

i

| ≤ ∆

n

, i = 1, . . . , n,

where σ

12

, . . . , σ

n2

, ∆

n

are deterministic positive constants.

Then, the following inequality holds:

∀ λ > 0, P

n

X

i=1

Z

i

≥ λ

!

≤ 2 exp

 − λ

2

2 b

2n

+ λ

3n

 ,

with b

2n

= P

ni=1

σ

i2

.

We will also need the following technical result.

Lemma 5 Let (θ

n

)

n≥1

be a sequence of positive numbers such that θ

n

= Q

nk=1

(1 − pγ

k

), for some p ∈ (0, 1) and let (ξ

n

)

n≥1

be a sequence of non-negative numbers satisfying

γ

n

ξ

n

− γ

n−1

ξ

n−1

= o(γ

n2

ξ

n

).

We have

n

X

k=1

γ

k2

ξ

k

θ

k2

∼ γ

n

ξ

n

2pθ

n2

.

Proof: First observe that the condition γ

n

ξ

n

− γ

n−1

ξ

n−1

= o(γ

n2

ξ

n

) implies γ

n

ξ

n

∼ γ

n−1

ξ

n−1

and that, given ε > 0, we have, for n large enough,

γ

n

ξ

n

− γ

n−1

ξ

n−1

≥ − εγ

n2

ξ

n

≥ − εγ

n−1

γ

n

ξ

n

,

where we have used the fact that the sequence (γ

n

) is non-increasing. Since γ

n

ξ

n

∼ γ

n−1

ξ

n−1

, we have, for n large enough, say n ≥ n

0

,

γ

n

ξ

n

≥ γ

n−1

ξ

n−1

(1 − 2εγ

n−1

).

Therefore, for n > n

0

,

γ

n

ξ

n

≥ γ

n0

ξ

n0

n

Y

k=n0+1

(1 − 2εγ

k−1

).

From this, we easily deduce that lim

n→∞

γ

n

ξ

n

n

= ∞ and that P

n

γ

n2

ξ

n

n2

= ∞ . Now, from

1 θ

2k

− 1

θ

2k−1

= 2γ

k

p − γ

k2

p

2

θ

k2

, we deduce (recall that lim

n→∞

γ

n

= 0) γ

k2

ξ

k

θ

k2

∼ γ

k

ξ

k

2p

1 θ

k2

− 1

θ

k−12

!

,

(16)

and, since P

n

γ

n2

ξ

n

n2

= ∞ ,

n

X

k=1

γ

k2

ξ

k

θ

2k

∼ 1 2p

n

X

k=1

γ

k

ξ

k

1 θ

k2

− 1

θ

k−12

!

= 1

2p

γ

n

ξ

n

θ

2n

+

n

X

k=1

k−1

ξ

k−1

− γ

k

ξ

k

) 1 θ

k−12

!

= γ

n

ξ

n

2pθ

2n

+ o

n

X

k=1

γ

k2

ξ

k

θ

2k

! ,

where, for the first equality, we have assumed ξ

0

= 0, and, for the last one, we have used

again γ

n

ξ

n

− γ

n−1

ξ

n−1

= o(γ

n2

ξ

n

).

Proof of Lemma 3: Given µ > 0, let

ν

µ

= inf { n ≥ 0 | ρ

αn

Y

n

> µ } . Note that { sup

n

ρ

αn

Y

n

< ∞} = S

µ>0

{ ν

µ

= ∞} .

On the set { ν

µ

= ∞} , we have

n

X

k=1

γ

k

θ

k

∆M

k

=

n

X

k=1

γ

k

θ

k

1

{k≤νµ}

∆M

k

. We now apply Lemma 4 with Z

i

=

γθi

i

1

{i≤νµ}

∆M

i

. We have, using Proposition 4, E (Z

i2

| F

i−1

) = γ

i2

θ

i2

1

{i≤νµ}

E (∆M

i2

| F

i−1

)

≤ γ

i2

θ

i2

1

{i≤νµ}

p

A

ρ

i−1

Y

i−1

+ ρ

2i

(1 − p

B

)

≤ γ

i2

θ

i2

p

A

ρ

1−αi−1

µ + ρ

2i

(1 − p

B

) ,

where we have used the fact that, on { i ≤ ν

µ

} , ρ

αi−1

Y

i−1

≤ µ. Since lim

n→∞

ρ

n

= 0 and

n→∞

lim ρ

n

n−1

= 1 (which follows from ρ

n

− ρ

n−1

= o(γ

n

ρ

n

)), we have E (Z

i2

| F

i−1

) σ

i2

,

with σ

i2

= C

µγi2ρθ1−αi2 i

, for some C

µ

> 0, depending only on µ. Using Lemma 5, we have

n

X

i=1

σ

i2

∼ C

µ

γ

n

ρ

1−αn

2pθ

2n

.

On the other hand, we have, because the jumps ∆M

i

are bounded,

| Z

i

| ≤ C γ

i

θ

i

,

(17)

for some C > 0. Note that

γ γkk

k−1k−1

=

γ γk

k−1(1−pγk)

, and, since γ

k

− γ

k−1

= o(γ

k2

) (take β = 0 in (10)), we have, for k large enough, γ

k

− γ

k−1

≥ − pγ

k

γ

k−1

, so that γ

k

k−1

≥ 1 − pγ

k

, and the sequence (γ

n

n

) is non-increasing for n large enough. Therefore, we have

sup

1≤i≤n

| Z

i

| ≤ ∆

n

,

with ∆

n

= Cγ

n

n

for some C > 0. Now, applying Lemma 4 with λ =λ

0

ρ

1−

α−η

n 2

n

, we get P θ

n

n

X

k=1

γ

k

θ

k

1

{k≤νµ}

∆M

k

≥ λ

0

ρ

1−

α−η

n 2

!

≤ 2 exp

 − λ

20

ρ

2−α+ηn

2n

b

2n

+ 2λ

0

θ

n

ρ

1−

α−η

n 2n

3

≤ 2 exp

 − C

1

ρ

2−α+ηn

C

2

γ

n

ρ

1−αn

+ C

3

γ

n

ρ

1−

α−η

n 2

≤ 2 exp − C

4

ρ

1+ηn

γ

n

! ,

where the positive constants C

1

, C

2

, C

3

and C

4

depend on λ

0

and µ, but not on n.

Using (11) and the Borel-Cantelli lemma, we conclude that, on { ν

µ

= ∞} , we have, for n large enough,

θ

n

n

X

k=1

γ

k

θ

k

∆M

k

< λ

0

ρ

1−

α−η

n 2

, a.s.,

and, since λ

0

is arbitrary, this completes the proof of the Lemma.

3 Weak convergence of the normalized algorithm

Throughout this section, we assume (in addition to the initial conditions on the sequence (γ

n

)

n∈N

)

γ

n2

− γ

n−12

= o(γ

n2

) and γ

n

ρ

n

= g + o(γ

n

), (16) where g is a positive constant. Note that a possible choice is γ

n

= ag/ √

n and ρ

n

= a/ √ n, with a > 0.

Under these conditions, we have ρ

n

− ρ

n−1

= o(γ

n2

), and we can write, as in the beginning of Section 2,

Y

n+1

= Y

n

(1 + γ

n+1

ε

n

− π

n

γ

n+1

X

n

) − γ

n+1

κ(X

n

) − γ

n+1

ρ

n+1

∆M

n+1

, (17) where lim

n→∞

ε

n

= 0 and lim

n→∞

π

n

= π. As observed in Remark 2, we know that, under the

assumptions (16), the sequence (Y

n

)

n≥1

is tight. We will prove that it is convergent in

distribution.

(18)

Theorem 4 Under conditions (16), the sequence (Y

n

= (1 − X

n

)/ρ

n

)

n∈N

converges weakly to the unique stationary distribution of the Markov process on [0, + ∞ ) with generator L defined by

Lf (y) = p

B

y f (y + g) − f (y)

g + (1 − p

A

− p

A

y)f

(y), y ≥ 0, (18) for f continuously differentiable and compactly supported in [0, + ∞ ).

The method for proving Theorem 4 is based on the classical functional approach to central limit theorems for stochastic algorithms (see Bouton [2], Kushner [10], Duflo [6]). The long time behavior of the sequence (Y

n

) will be elucidated through the study of a sequence of continuous-time processes Y

(n)

= (Y

t(n)

)

t≥0

, which will be proved to converge weakly to the Markov process with generator L. We will show that ν has a unique stationary distribution, and that this is the weak limit of the sequence (Y

n

)

n∈N

.

The sequence Y

(n)

is defined as follows. Given n ∈ N , and t 0, set

Y

t(n)

= Y

N(n,t)

, (19)

where

N(n, t) = min (

m ≥ n |

m

X

k=n

γ

k+1

> t )

,

so that N (n, 0) = n, for t ∈ [0, γ

n+1

), and, for m ≥ n + 1, N (n, t) = m if and only if P

m

k=n+1

γ

k

≤ t < P

m+1k=n+1

γ

k

.

Theorem 5 Under the assumptions of Theorem 4, the sequence of continuous time pro- cesses (Y

(n)

)

n∈N

converges weakly (in the sense of Skorokhod) to a Markov process with generator L.

The proof of Theorem 5 is done in two steps: in section 3.1, we prove tightness, in section 3.2, we characterize the limit by a martingale problem.

3.1 Tightness

It follows from (17) that the process Y

(n)

admits the following decomposition:

Y

t(n)

= Y

n

+ B

t(n)

+ M

t(n)

, (20) with

B

(n)t

= −

N(n,t)

X

k=n+1

γ

k

[κ(X

k−1

) + (π

k−1

X

k−1

− ε

k−1

)Y

k−1

] and

M

t(n)

= −

N(n,t)

X

k=n+1

γ

k

ρ

k

∆M

k

.

Références

Documents relatifs

The ration of the number of revolution of the driver gear to the number of revolution of the driven gear is called the gear ration.. Le rapport de réduction peut être changé

SQL has the sample and computational complexity of a same order since it performs only one Q-value update per sample, the same argument also applies to the standard Q-learning,

A typical algorithm that was known to perform well in the case of time-cheap cost evaluations was the simulated annealing (SA) as mentioned by Locatelli in [19]: ”The latter

We extend to Lipschitz continuous functionals either of the true paths or of the Euler scheme with decreasing step of a wide class of Brownian ergodic diffusions, the Central

More precisely, it is a real- valued non-Gaussian self-similar stochastic process with stationary increments which belongs to the second order Wiener chaos and has continuous pathsn.

In doing so, we put to the fore inertial proximal algorithms that converge for general monotone inclusions, and which, in the case of convex minimization, give fast convergence rates

We introduce a bimodal model for the case where the histogram of the image has two main modes.. This model involves an energy functional depending on a function and a rie-

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des