HAL Id: hal-00012187
https://hal.archives-ouvertes.fr/hal-00012187
Submitted on 18 Oct 2005
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A penalized bandit algorithm
Damien Lamberton, Gilles Pagès
To cite this version:
Damien Lamberton, Gilles Pagès. A penalized bandit algorithm. Electronic Journal of Probability,
Institute of Mathematical Statistics (IMS), 2008, 13, 341-373 ; http://dx.doi.org/10.1214/EJP.v13-
489. �10.1214/EJP.v13-489�. �hal-00012187�
ccsd-00012187, version 1 - 18 Oct 2005
A penalized bandit algorithm ∗
Damien Lamberton
†Gilles Pag` es
‡Abstract
We study a two armed-bandit algorithm with penalty. We show the convergence of the algorithm and establish the rate of convergence. For some choices of the parame- ters, we obtain a central limit theorem in which the limit distribution is characterized as the unique stationary distribution of a discontinuous Markov process.
Key words: Two-armed bandit algorithm, penalization, stochastic approximation, conver- gence rate, learning automata, asset allocation.
2001 AMS classification: 62L20, secondary 93C40, 91E40, 68T05, 91B32 .
Introduction
In a recent joint work with P. Tarr`es (see [12]), we studied the convergence of the so-called two armed bandit algorithm. The purpose of the present paper is to investigate a modified version of this algorithm, in which a penalization is introduced. In the terminology of learning theory (see [14, 15]), the algorithm studied in [12] was a Linear Reward-Inaction (LRI) scheme, whereas the one we want to introduce is a Linear Reward-Penalty (LRP ) procedure.
In our previous paper, the algorithm was introduced in a financial context as a pro- cedure for the optimal allocation of a fund between two traders who manage it. Imagine that the owner of a fund can share his wealth between two traders, say A and B , and that, every day, he can evaluate the results of one of the traders and, subsequently, modify the percentage of the fund managed by both traders. Denote by X
nthe percentage managed by trader A at time n (X
n∈ [0, 1]). We assume that the owner selects the trader to be evaluated at random, in such a way that the probability that A is evaluated at time n is X
n, in order to select preferably the trader in charge of the greater part of the fund. In the LRI scheme, if the evaluated trader performs well, its share is increased by a fraction
∗
This work has benefitted from the stay of both authors at the Isaac Newton Institute (Cambridge University) on the program Developments in Quantitative Finance.
†
Laboratoire d’analyse et de math´ematiques appliqu´ees, UMR 8050, Univ. Marne-la-Vall´ee, Cit´e Descartes, 5, Bld Descartes, Champs-sur-Marne, F-77454 Marne-la-Vall´ee Cedex 2, France.
damien.lamberton@univ-mlv.fr
‡
Laboratoire de probabilit´es et mod`eles al´eatoires, UMR 7599, Univ. Paris 6, case 188, 4, pl. Jussieu,
F-75252 Paris Cedex 5. gpa@ccr.jussieu.fr
γ
n∈ (0, 1) of the share of the other trader, and nothing happens if the evaluated trader performs badly. Therefore, the dynamics of the sequence (X
n)
n≥0can be modelled as follows:
X
n+1= X
n+ γ
n+11
{Un+1≤Xn}∩An+1(1 − X
n) − 1
{Un+1>Xn}∩Bn+1X
n, X
0= x ∈ [0, 1], where (U
n)
n≥1is an iid sequence of uniform random variables on the interval [0, 1], A
n(resp. B
n) is the event “trader A (resp. trader B) performs well at time n”. We assume P (A
n) = p
A
, P (B
n) = p
B
, for n ≥ 1, with p
A, p
B∈ (0, 1), and independence between these events and the sequence (U
n)
n≥1. The point is that the owner of the fund does not know the parameters p
A, p
B.
This recursive learning procedure has been designed in order to assign asymptotically the whole fund to the best trader. This means that, if say p
A> p
B, X
nconverges to 1 with probability 1 provided X
0∈ (0, 1) (if p
A< p
B, the limit is 0 with symmetric results).
However this “infallibility” property needs some very stringent assumptions on the reward parameter γ
n(see [12]). Furthermore, the rate of convergence of the procedure either toward its “target” 1 or its “trap” 0 is not ruled by a CLT with rate √ γ
nlike standard stochastic approximation algorithms (see [10]). It is shown in [11] that this rate is quite non-standard, strongly depends on the (unknown) values p
Aand p
Band becomes very poor as these probabilities get close to each other.
In order to improve the efficiency of the algorithm, one may imagine to introduce a penalty when an evaluated trader has unsatisfactory performances. More precisely, if the evaluated trader at time n performs badly, its share is decreased by a penalty factor ρ
nγ
n. This leads to the following LRP – or “penalized two-armed bandit – procedure
X
n+1= X
n+ γ
n+11
{Un+1≤Xn}∩An+1(1 − X
n) − 1
{Un+1>Xn}∩Bn+1X
n− γ
n+1ρ
n+1X
n1
{Un+1≤Xn}∩Acn+1
− (1 − X
n)1
{Un+1>Xn}∩Bcn+1
, n ∈ N , where the notation A
cis used for the complement of an event A. The precise assumptions on the reward rate γ
nand the penalty rate γ
nρ
nwill be given in the following sections.
The paper is organized as follows. In Section 1, we discuss the convergence of the sequence (X
n)
n≥0. First we show that, if ρ
nis a positive constant ρ, the sequence converges with probability one to a limit x
∗ρ∈ (0, 1) satisfying x
∗ρ>
12if and only if p
A> p
B, so that, although the algorithm manages to distinguish which trader is better, it does not assign the whole fund to the best trader. To get rid of this limitation, we consider a sequence (ρ
n)
n≥1which goes to zero so that the penalty rate becomes negligible with respect to the reward rate (γ
nρ
n= o(γ
n)). This framework seems new in the learning theory literature.
Then, we are able to show that the algorithm is infallible i.e., if p
A> p
B, then lim
n→∞
X
n= 1
almost surely, under very light conditions on the reward rate γ
n(and ρ
n). From a stochastic
approximation viewpoint, this modification of the original procedure has the same mean
function and time scale (hence the same target and trap, see (5)) but it always keeps the
algorithm away from the trap without adding noise at these equilibria. In fact, it was
necessary not to add noise at these points in order to remain inside the domain [0, 1].
The other two sections are devoted to the rate of convergence. In Section 2, we show that under some conditions (including lim
n→∞
γ
n/ρ
n= 0) the sequence Y
n= (1 − X
n)/ρ
nconverges in probability to (1 − p
A)/π, where π = p
A− p
B> 0. With additional assumptions, we prove that this convergence occurs with probability 1. In Section 3, we show that if the ratio γ
n/ρ
ngoes to a positive limit as n goes to infinity, then (Y
n)
n≥1converges in a weak sense to a probability distribution ν. This distribution is identified as the unique stationary distribution of a discontinuous Markov process. This result is obtained by using weak functional methods applied to a re-scaling of the algorithm. This approach can be seen as an extension of the SDE method used to prove the CLT in a more standard framework of stochastic approximation (see [10]). Furthermore, we show that ν is absolutely continuous with continuous, possibly non-smooth, piecewise C
∞density. An interesting consequence of these results for practical applications is that, by choosing ρ
nand γ
nproportional to n
−1/2, one can achieve convergence at the rate 1/ √
n, without any a priori knowledge about the values of p
Aand p
B. This is in contrast with the case of the LRI procedure, where the rate of convergence depends heavily on these parameters (see [11]) and becomes quite poor when they get close to each other.
Notation. Let (a
n)
n≥0and (b
n)
n≥0be two sequences of positive real numbers. The symbol a
n∼ b
nmeans a
n= b
n+ o(b
n).
1 Convergence of the LRP algorithm
1.1 Some classical background on stochastic approximation
We will rely on the ODE lemma recalled below for a stochastic procedure (Z
n) taking its values in a given compact interval I .
Theorem 1 (a) Kushner & Clark’s ODE Lemma (see [9]): Let g : I → R such that Id + g leaves I stable (
1). Then, consider the recursively defined stochastic approximation procedure defined on I by
Z
n+1= Z
n+ γ
n+1(g(Z
n) + ∆R
n+1), n ≥ 0, Z
0∈ I,
where (γ
n)
n≥1is a sequence of [0, 1]-valued real numbers satisfying γ
n→ 0 and P
n≥1γ
n= + ∞ . Set N (t) := min { n : γ
1+ · · · + γ
n+1> t } . If, for every T > 0,
N(t)≤n≤N
max
(t+T)n
X
k=N(t)+1
γ
k∆R
k−→ 0 P -a.s. as t → + ∞ . (1) Let z
∗be an attracting zero of g in I and G(z
∗) its attracting interval. Then, on the event
{ Z
nvisits infinitely often a compact subset of G(z
∗) } Z
n a.s.−→ z
∗.
1
then for every γ ∈ [0, 1], Id + γg = γ(Id + g) + (1 − γ)Id still takes values in the convex set I
(b) The Hoeffding condition (see [1]): If (∆R
n)
n≥0is a sequence of L
∞-bounded martingale increments, if (γ
n) is nonincreasing and X
n≥1
e
−γnϑ< + ∞ for every ϑ > 0, then Assumption (1) is satisfied.
Remark. The monotonous assumption on the sequence γ can be relaxed into γ
n→ 0 and sup
n,k≥1
γ
n+kγ
n< + ∞
1.2 Basic properties of the LRP algorithm
We first recall the definition of the algorithm. We are interested in the asymptotic behavior of the sequence (X
n)
n∈N, where X
0= x, with x ∈ (0, 1), and
X
n+1= X
n+ γ
n+11
{Un+1≤Xn}∩An+1(1 − X
n) − 1
{Un+1>Xn}∩Bn+1X
n− γ
n+1ρ
n+1X
n1
{Un+1≤Xn}∩Acn+1
− (1 − X
n)1
{Un+1>Xn}∩Bcn+1
, n ∈ N . Throughout the paper, we assume that (γ
n)
n≥1is a non-increasing sequence of positive numbers satisfying γ
n< 1,
∞
X
n=1
γ
n= + ∞ and
∀ ϑ > 0, X
n
e
−γnϑ< ∞ ,
and that (ρ
n)
n≥1is a sequence of positive numbers satisfying γ
nρ
n< 1; (U
n)
n≥1is a sequence of independent random variables which are uniformly distributed on the interval [0, 1], the events A
n, B
nsatisfy
P (A
n) = p
A
, P (B
n) = p
B
, n ∈ N ,
where 0 < p
B≤ p
A< 1, and the sequences (U
n)
n≥1and (1
An, 1
Bn)
n≥1are independent.
The natural filtration of the sequence (U
n, 1
An, 1
Bn)
n≥1is denoted by ( F
n)
n≥0and we set π = p
A− p
B.
With this notation, we have, for n ≥ 0,
X
n+1= X
n+ γ
n+1(πh(X
n) + ρ
n+1κ(X
n)) + γ
n+1∆M
n+1, (2) where the functions h and κ are defined by
h(x) = x(1 − x), κ(x) = − (1 − p
A)x
2+ (1 − p
B)(1 − x)
2, 0 ≤ x ≤ 1,
∆M
n+1= M
n+1− M
n, and the sequence (M
n)
n≥0is the martingale defined by M
0= 0 and
∆M
n+1= 1
{Un+1≤Xn}∩An+1(1 − X
n) − 1
{Un+1>Xn}∩Bn+1X
n− πh(X
n)
− ρ
n+1X
n1
{Un+1≤Xn}∩Acn+1
− (1 − X
n)1
{Un+1>Xn}∩Bcn+1
+ κ(X
n) . (3)
Observe that the increments ∆M
n+1are bounded.
1.3 Constant penalty rate In this subsection, we assume
∀ n ≥ 1, ρ
n= ρ, with 0 < ρ ≤ 1. We then have
X
n+1= X
n+ γ
n+1(h
ρ(X
n) + ∆M
n+1) , where
h
ρ(x) = πh(x) + ρκ(x), 0 ≤ x ≤ 1.
Note that h
ρ(0) = ρ(1 − p
B) > 0 and h
ρ(1) = − ρ(1 − p
A) < 0, and that there exists a unique x
∗ρ∈ (0, 1) such that h
ρ(x
∗ρ) = 0. By a straightforward computation, we have
x
∗ρ= π − 2ρ(1 − p
B) + p π
2+ 4ρ
2(1 − p
B)(1 − p
A)
2π(1 − ρ) if π 6 = 0 and ρ 6 = 1
= (1 − p
A)
(1 − p
A) + (1 − p
B) if π = 0 or ρ = 1.
In particular, x
∗ρ= 1/2 if π = 0 regardless of the value of ρ. We also have h
ρ(1/2) = π(1 + ρ)/4 ≥ 0, so that
x
∗ρ> 1/2 if π > 0. (4)
Now, let x be a solution of the ODE dx/dt = h
ρ(x). If x(0) ∈ [0, x
∗ρ], x is non-decreasing and lim
t→∞
x(t) = x
∗ρ. If x(0) ∈ [x
∗ρ, 1], x is non-increasing and lim
t→∞
x(t) = x
∗ρ. It follows that the interval [0, 1] is a domain of attraction for x
∗ρ. Consequently, using Kushner and Clark’s ODE Lemma (see Theorem 1), one reaches the following conclusion.
Proposition 1 Assume that ρ
n= ρ ∈ (0, 1], then
X
n−→
a.s.x
∗ρas n → ∞ .
The natural interpretation, given the above inequalities on x
∗ρ, is that this algorithm never fails in pointing the best trader thanks to Inequality (4), but it never assigns the whole fund to this trader as the original LRI procedure did.
1.4 Convergence when the penalty rate goes to zero Proposition 2 Assume lim
n→∞
ρ
n= 0. The sequence (X
n)
n∈Nis almost surely convergent and its limit X
∞satisfies X
∞∈ { 0, 1 } with probability 1.
Proof: We first write the algorithm in its canonical form
X
n+1= X
n+ γ
n+1(π h(X
n) + ∆R
n+1) with ∆R
n= ∆M
n+ ρ
nκ(X
n−1). (5)
It is straightforward to check that the ODE x ˙ = h(x) has two equilibrium points, 0 and
1, 1 being attractive with (0, 1] as an attracting interval and 0 is unstable.
Since the martingale increments ∆M
nare bounded, it follows from the assumptions on the sequence (γ
n)
n≥1and the Hoeffding condition (see Theorem 1(b)) that
N(t)≤n≤N
max
(t+T)|
n
X
k=N(t)+1
γ
k∆M
k|
P-
a.s.−→ 0 as t → + ∞
for every T > 0. On the other hand the function κ being bounded on [0, 1] and ρ
nconverging to 0, we have, for every T > 0,
N(t)≤n≤N(t+T
max
)|
n
X
k=N(t)+1
γ
kρ
kκ(X
k−1) | ≤ k k k
[0,1](T +γ
N(t+T)) max
k≥N(t)+1
ρ
k−→ 0 as t → + ∞ . Finally, the sequence (∆R
n)
n≥1satisfies Assumption (1). Consequently, either X
nvisits infinitely often an interval [ε, 1] for some ε > 0 and X
nconverges toward 1, or X
nconverges toward 0.
♦Remark 1 If π = 0, i.e. p
A= p
B, the algorithm reduces to
X
n+1= X
n+ γ
n+1ρ
n+1(1 − p
A)(1 − 2X
n) + γ
n+1∆M
n+1.
The number 1/2 is the unique equilibrium of the ODE ˙ x = (1 − p
A)(1 − 2x), and the interval [0, 1] is a domain of attraction. Assuming P
∞n=1ρ
nγ
n= + ∞ , and that the sequence (γ
n/ρ
n)
n≥1is non-increasing and satisfies
∀ ϑ > 0,
∞
X
n=1
exp
− ϑ ρ
nγ
n< + ∞ ,
it can be proved, using the Kushner-Clark ODE Lemma (Theorem 1), that lim
n→∞
X
n= 1/2 almost surely. As concerns the asymptotics of the algorithm when π = 0 and γ
n= g ρ
n(for which the above condition is not satisfied), we refer to the final remark of the paper.
From now on, we will assume that p
A> p
B. The next proposition shows that the penalized algorithm is infallible under very light assumptions on γ
nand ρ
n.
Proposition 3 (Infallibility) Assume lim
n→∞
ρ
n= 0. If the sequence (γ
n/ρ
n)
n≥1is bounded and P
nγ
nρ
n= ∞ , and if π > 0, we have lim
n→∞
X
n= 1 almost surely.
Proof: We have from (2), since h ≥ 0 on the interval [0, 1], X
n≥ X
0+
n
X
j=1
γ
jρ
jκ(X
j−1) +
n
X
j=1
γ
j∆M
j, n ≥ 1.
Since the jumps ∆M
jare bounded, we have
n
X
j=1
γ
j∆M
j2
L2
≤ C
n
X
j=1
γ
j2≤ C sup
j∈N
(γ
j/ρ
j)
n
X
j=1
γ
jρ
j,
for some positive constant C. Therefore, since P
nγ
nρ
n= ∞ , L
2- lim
n→∞
P
nj=1
γ
j∆M
jP
nj=1
γ
jρ
j= 0 so that lim sup
n
P
nj=1
γ
j∆M
jP
nj=1
γ
jρ
j≥ 0 a.s..
Now, on the set { X
∞= 0 } , we have
n→∞
lim
n
X
j=1
γ
jρ
jκ(X
j−1)
n
X
j=1
γ
jρ
j= κ(0) > 0.
Hence, it follows that, still on the set { X
∞= 0 } , lim sup
n→∞
X
nn
X
j=1
γ
jρ
j> 0.
Therefore, we must have P (X
∞= 0) = 0.
♦The following Proposition will give a control on the conditional variance process of the martingale (M
n)
n∈Nwhich will be crucial to elucidate the rate of convergence of the algorithm.
Proposition 4 We have, for n ≥ 0, E ∆M
2n+1
| F
n≤ p
A(1 − X
n) + ρ
2n+1(1 − p
B).
Proof: We have
∆M
n+1= V
n+1− E (V
n+1| F
n) + W
n+1− E (W
n+1| F
n), with
V
n+1= 1
{Un+1≤Xn}∩An+1(1 − X
n) − 1
{Un+1>Xn}∩Bn+1X
nand
W
n+1= − ρ
n+1X
n1
{Un+1≤Xn}∩Acn+1
− (1 − X
n)1
{Un+1>Xn}∩Bcn+1
.
Note that V
n+1W
n+1= 0, so that
E ∆M
n+12| F
n= E (V
n+12| F
n) + E (W
n+12| F
n) − ( E (V
n+1+ W
n+1| F
n))
2≤ E (V
n+12| F
n) + E (W
n+12| F
n).
Now, using p
B≤ p
Aand X
n≤ 1, E V
2n+1
| F
n= p
AX
n(1 − X
n)
2+ p
B(1 − X
n)X
n2≤ p
A(1 − X
n) and E (W
n+12| F
n) = ρ
2n+1X
n3(1 − p
A
) + (1 − X
n)
3(1 − p
B)
≤ ρ
2n+1(1 − p
B).
This proves the Proposition.
♦2 The rate of convergence: pointwise convergence
2.1 Convergence in probability Theorem 2 Assume
n→∞
lim ρ
n= 0, lim
n→∞
γ
nρ
n= 0, X
n
ρ
nγ
n= ∞ , ρ
n− ρ
n−1= o(ρ
nγ
n). (6) Then, the sequence ((1 − X
n)/ρ
n)
n≥1converges to (1 − p
A)/π in probability.
Note that the assumptions of Theorem 2 are satisfied if γ
n= C/n
aand ρ
n= C
′/n
r, with C, C
′> 0, 0 < r < a and a + r < 1. In fact, we will see that for this choice of parameters, convergence holds with probability one (see Theorem 3).
Before proving Theorem 2, we introduce the notation Y
n= 1 − X
nρ
n. We have, from (2)
1 − X
n+1= 1 − X
n− γ
n+1πX
n(1 − X
n) − ρ
n+1γ
n+1κ(X
n) − γ
n+1∆M
n+11 − X
n+1ρ
n+1= 1 − X
nρ
n+1− γ
n+1ρ
n+1πX
n(1 − X
n) − γ
n+1κ(X
n) − γ
n+1ρ
n+1∆M
n+1. Hence
Y
n+1= Y
n+ (1 − X
n) 1
ρ
n+1− 1
ρ
n− γ
n+1ρ
n+1πX
n− γ
n+1κ(X
n) − γ
n+1ρ
n+1∆M
n+1Y
n+1= Y
n(1 + γ
n+1ε
n− π
nγ
n+1X
n) − γ
n+1κ(X
n) − γ
n+1ρ
n+1∆M
n+1, where
ε
n= ρ
nγ
n+11 ρ
n+1− 1
ρ
nand π
n= ρ
nρ
n+1π.
It follows from the assumption ρ
n− ρ
n−1= o(ρ
nγ
n) that lim
n→∞
ε
n= 0 and lim
n→∞
π
n= π.
Lemma 1 Consider two positive numbers π
−and π
+with 0 < π
−< π < π
+< 1. Given l ∈ N , let
ν
l= inf { n ≥ l | π
nX
n− ε
n> π
+or π
nX
n− ε
n< π
−} . We have
• lim
l→∞
P (ν
l= ∞ ) = 1,
• for n ≥ l, if θ
+n= Q
nk=l+1(1 − π
+γ
k) and θ
−n= Q
nk=l+1(1 − π
−γ
k), Y
n∧νlθ
n∧ν− l≤ Y
l−
n∧νl
X
k=l+1
γ
kθ
k−κ(X
k−1) −
n∧νl
X
k=l+1
γ
kρ
kθ
−k∆M
k(7)
and
Y
n∧νlθ
+n∧νl≥ Y
l−
n∧νl
X
k=l+1
γ
kθ
+kκ(X
k−1) −
n∧νl
X
k=l+1
γ
kρ
kθ
+k∆M
k. (8) Moreover, with the notation || k ||
∞= sup
0<x<1| κ(x) | ,
sup
n≥l
E Y
n1
{νl=∞}
≤ E Y
l+ || k ||
∞π
−.
Remark 2 Note that, as the proof will show, Lemma 1 remains valid if the condition
n→∞
lim γ
n/ρ
n= 0 in (6) is replaced by the boundedness of the sequence (γ
n/ρ
n)
n≥1. In particular, the last statement, which implies the tightness of the sequence (Y
n)
n≥1, will be used in Section 3.
Proof: Since lim
n→∞
(π
nX
n− ε
n) = π a.s., we clearly have lim
l→∞
P (ν
l= ∞ ) = 1.
On the other hand, for l ≤ n < ν
l, we have
Y
n+1≤ Y
n(1 − γ
n+1π
−) − γ
n+1κ(X
n) − γ
n+1ρ
n+1∆M
n+1and
Y
n+1≥ Y
n(1 − γ
n+1π
+) − γ
n+1κ(X
n) − γ
n+1ρ
n+1∆M
n+1,
so that, with the notation θ
n+= Q
nk=l+1(1 − π
+γ
k) and θ
n−= Q
nk=l+1(1 − π
−γ
k), Y
n+1θ
−n+1≤ Y
nθ
n−− γ
n+1θ
n+1−κ(X
n) − γ
n+1ρ
n+1θ
−n+1∆M
n+1and Y
n+1θ
n+1+≥ Y
nθ
n+− γ
n+1θ
+n+1κ(X
n) − γ
n+1ρ
n+1θ
n+1+∆M
n+1. By summing up these inequalities, we get (7) and (8).
By taking expectations in (7), we get E Y
n∧νlθ
n∧ν− l≤ E Y
l+ || k ||
∞E
n∧νl
X
k=l+1
γ
kθ
k−= E Y
l+ || k ||
∞π
−E
n∧νl
X
k=l+1
1 θ
k−− 1
θ
−k−1!
≤ E Y
l+ || k ||
∞π
−1 θ
n−. We then have
E (Y
n1
{νl=∞}) = θ
−nE Y
n∧νlθ
−n∧νl1
{νl=∞}!
≤ θ
−nE Y
n∧νlθ
n∧ν− l≤ θ
−nE Y
l+ || k ||
∞π
−1 θ
n−≤ E Y
l+ || k ||
∞π
−.
♦
Lemma 2 Let (θ
n)
n∈Nbe a sequence of positive numbers such that θ
n= Q
nk=1(1 − pγ
k) for some p ∈ (0, 1). The sequence θ
nP
nk=1 θγkkρk
∆M
kn∈N
converges to 0 in probability.
Proof: It suffices to show convergence to 0 in probability for the associated conditional variances T
n, defined by
T
n= θ
n2n
X
k=1
γ
k2θ
2kρ
2kE ∆M
k2| F
k−1. We know from Proposition 4 that
E ∆M
k2| F
k−1≤ p
A
(1 − X
k−1) + ρ
2k(1 − p
B)
= p
Aρ
k−1Y
k−1+ ρ
2k(1 − p
B).
Therefore, T
n≤ p
AT
n(1)+ (1 − p
B)T
n(2), where T
n(1)= θ
2nn
X
k=1
γ
k2θ
k2ρ
2kρ
k−1Y
k−1and
T
n(2)= θ
2nn
X
k=1
γ
k2θ
2k. We first prove that lim
n→∞
T
n(2)= 0. Note that, since pγ
k≤ 1, 1
θ
2k− 1
θ
2k−1= 2pγ
k− p
2γ
k2θ
k2≥ p γ
kθ
k2. (9)
Therefore,
T
n(2)≤ θ
n2p
n
X
k=1
γ
k1 θ
k2− 1
θ
2k−1! , and lim
n→∞
T
n(2)= 0 follows from Cesaro’s lemma.
We now deal with T
n(1). First note that the assumption ρ
n− ρ
n−1= o(ρ
nγ
n) implies
n→∞
lim ρ
n/ρ
n−1= 1, so that, the sequence (γ
n)
n≥1being non-increasing with limit 0, we only need to prove that lim
n→∞
T ¯
n(1)= 0 in probability, where T ¯
n(1)= θ
2nn
X
k=1
γ
k2θ
k2ρ
kY
k.
Now, with the notation of Lemma 1, we have, for n ≥ l > 1 and ε > 0, P T ¯
n(1)≥ ε ≤ P (ν
l< ∞ ) + P θ
n2n
X
k=1
γ
k2θ
k2ρ
kY
k1
{νl=∞}≥ ε
!
≤ P (ν
l< ∞ ) + 1 ε θ
n2n
X
k=1
γ
k2θ
2kρ
kE Y
k1
{νl=∞}.
Using Lemma 1, lim
n→∞
γ
n/ρ
n= 0 and (9), we have
n→∞
lim θ
2nn
X
k=1
γ
k2θ
k2ρ
kE Y
k1
{νl=∞}
= 0.
We also know that lim
l→∞
P (ν
l< ∞ ) = 0. Hence,
n→∞
lim P T ¯
n(1)≥ ε = 0.
♦Proof of Theorem 2: First note that if θ
n= Q
nk=1(1 − pγ
k), with 0 < p < 1, we have
n
X
k=1
γ
kθ
kκ(X
k−1) = 1 p
n
X
k=1
1 θ
k− 1
θ
k−1κ(X
k−1).
Hence, using lim
n→∞
X
n= 1 and κ(1) = − (1 − p
A),
n→∞
lim θ
nn
X
k=1
γ
kθ
kκ(X
k−1) = − 1 − p
Ap .
Going back to (7) and (8) and using Lemma 2 with p = π
+and π
−, and the fact that
l→∞
lim
P (ν
l= ∞ ) = 1, we have, for all ε > 0, lim
n→∞
P (Y
n≥ 1 − p
Aπ
−+ ε) = lim
n→∞
P (Y
n≤ 1 − p
Aπ
+− ε) = 0, and since π
+and π
−can be made arbitrarily close to π, the Theorem is
proved.
♦2.2 Almost sure convergence
Theorem 3 In addition to (6), we assume that for all β ∈ [0, 1],
γ
nρ
βn− γ
n−1ρ
βn−1= o(γ
n2ρ
βn), (10) and that, for some η > 0, we have
∀ C > 0, X
n
exp − C ρ
1+ηnγ
n!
< ∞ . (11)
Then, with probability 1,
n→∞
lim
1 − X
nρ
n= 1 − p
Aπ .
Note that the assumptions of Theorem 3 are satisfied if γ
n= Cn
−aand ρ
n= C
′n
−r, with C, C
′> 0, 0 < r < a and a + r < 1.
The proof of Theorem 3 is based on the following lemma, which will be proved later.
Lemma 3 Under the assumptions of Theorem 3, let α ∈ [0, 1] and let (θ
n)
n∈≥1be a sequence of positive numbers such that θ
n= Q
nk=1(1 − pγ
k), for some p ∈ (0, 1). On the set { sup
n(ρ
αnY
n) < ∞} , we have
n→∞
lim θ
nρ
α−η 2 −1 n
n
X
k=1
γ
kθ
k∆M
k= 0 a.s., where η satisfies (11).
Proof of Theorem 3: We start from the following form of (2):
1 − X
n+1= (1 − X
n)(1 − γ
n+1πX
n) − ρ
n+1γ
n+1κ(X
n) − γ
n+1∆M
n+1. We know that lim
n→∞
X
n= 1 a.s.. Therefore, given π
+and π
−, with 0 < π
−< π < π
+< 1, there exists l ∈ N such that, for n ≥ l,
1 − X
n+1≤ (1 − X
n)(1 − γ
n+1π
−) − ρ
n+1γ
n+1κ(X
n) − γ
n+1∆M
n+1and
1 − X
n+1≥ (1 − X
n)(1 − γ
n+1π
+) − ρ
n+1γ
n+1κ(X
n) − γ
n+1∆M
n+1, so that, with the notation θ
n+= Q
nk=l+1(1 − π
+γ
k) and θ
n−= Q
nk=l+1(1 − π
−γ
k),
1 − X
n+1θ
−n+1≤ 1 − X
nθ
n−− ρ
n+1γ
n+1θ
n+1−κ(X
n) − γ
n+1θ
−n+1∆M
n+1and 1 − X
n+1θ
n+1+≥ 1 − X
nθ
+n− ρ
n+1γ
n+1θ
+n+1κ(X
n) − γ
n+1θ
n+1+∆M
n+1. By summing up these inequalities, we get, for n ≥ l + 1,
1 − X
nθ
−n≤ (1 − X
l) −
n
X
k=l+1
ρ
kγ
kθ
k−κ(X
k−1) −
n
X
k=l+1
γ
kθ
k−∆M
kand
1 − X
nθ
+n≥ (1 − X
l) −
n
X
k=l+1
ρ
kγ
kθ
k+κ(X
k−1) −
n
X
k=l+1
γ
kθ
+k∆M
k. Hence
Y
n≤ θ
−nρ
n(1 − X
l) − θ
n−ρ
nn
X
k=l+1
ρ
kγ
kθ
−kκ(X
k−1) − θ
−nρ
nn
X
k=l+1
γ
kθ
k−∆M
k(12) and
Y
n≥ θ
n+ρ
n(1 − X
l) − θ
+nρ
nn
X
k=l+1
ρ
kγ
kθ
k+κ(X
k−1) − θ
n+ρ
nn
X
k=l+1
γ
kθ
k+∆M
k. (13) We have, with probability 1, lim
n→∞
κ(X
n) = κ(1) = − (1 − p
A), and, since P
∞n=1ρ
nγ
n= + ∞ ,
n
X
k=l+1
ρ
kγ
kθ
−kκ(X
k−1) ∼ − (1 − p
A)
n
X
k=l+1
ρ
kγ
kθ
−k. (14)
On the other hand,
n
X
k=l+1
ρ
kγ
kθ
k−= 1 π
−n
X
k=l+1
ρ
k1 θ
k−− 1
θ
−k−1!
= 1
π
−
n
X
k=l+1
(ρ
k−1− ρ
k) 1
θ
k−1−+ ρ
nθ
n−− ρ
lθ
−l
∼ 1 π
−ρ
nθ
n−, (15)
where we have used the condition ρ
k− ρ
k−1=o(ρ
kγ
k). We deduce from (14) and (15) that
n→∞
lim θ
n−ρ
nn
X
k=1
ρ
kγ
kθ
k−κ(X
k−1) = − 1 − p
Aπ
−and, also, that lim
n→∞
θ
n−ρ
n= 0. By a similar argument, we get lim
n→∞
θ
n+ρ
n= 0 and
n→∞
lim θ
n+ρ
nn
X
k=1
ρ
kγ
kθ
k+κ(X
k−1) = − 1 − p
Aπ
+It follows from Lemma 3, that given α ∈ [0, 1], we have, on the set E
α:= { sup
n(ρ
αnY
n) < ∞} ,
n→∞
lim ρ
α−η 2 −1 n
θ
±nn
X
k=1
γ
kθ
±k∆M
k= 0.
Together with (12) and (13) this implies
• lim
n→∞
Y
n= (1 − p
A)/π a.s., if
α−η2≤ 0,
• lim
n→∞
Y
nρ
α−η
n2
= 0 a.s., if
α−η2> 0.
We obviously have P (E
α) = 1 for α = 1. We deduce from the previous argument that if P (E
α) = 1 and
α−η2
> 0, then P (E
α′) = 1, with α
′=
α−η2− 1. Set α
0= 1 and α
k+1=
αk2−η− 1. If α
0≤ η, we have lim
n→∞
Y
n= (1 − p
A)/π a.s. on E
α0. If α
0> η, let j be the largest integer such that α
j> η (note that j exists because lim
k→∞
α
k< 0). We have P (E
αj+1
) = 1, and, on E
αj+1, lim
n→∞
Y
n= (1 − p
A)/π a.s., because
αj2−η≤ 0.
♦We now turn to the proof of Lemma 3 which is based on the following classical martin- gale inequality (see [13], remark 1, p.14 for a proof in the case of i.i.d. random variables:
the extension to bounded martingale increments is straightforward).
Lemma 4 (Bernstein’s inequality for bounded martingale increments) Let (Z
i)
1≤i≤nbe a
finite sequence of square integrable random variables, adapted to the filtration ( F
i)
1≤i≤n,
such that
1. E (Z
i| F
i−1) = 0, i = 1, . . . , n, 2. E (Z
2i
| F
i−1) ≤ σ
i2, i = 1, . . . , n, 3. | Z
i| ≤ ∆
n, i = 1, . . . , n,
where σ
12, . . . , σ
n2, ∆
nare deterministic positive constants.
Then, the following inequality holds:
∀ λ > 0, P
n
X
i=1
Z
i≥ λ
!
≤ 2 exp
− λ
22 b
2n+ λ
∆3n
,
with b
2n= P
ni=1σ
i2.
We will also need the following technical result.
Lemma 5 Let (θ
n)
n≥1be a sequence of positive numbers such that θ
n= Q
nk=1(1 − pγ
k), for some p ∈ (0, 1) and let (ξ
n)
n≥1be a sequence of non-negative numbers satisfying
γ
nξ
n− γ
n−1ξ
n−1= o(γ
n2ξ
n).
We have
n
X
k=1
γ
k2ξ
kθ
k2∼ γ
nξ
n2pθ
n2.
Proof: First observe that the condition γ
nξ
n− γ
n−1ξ
n−1= o(γ
n2ξ
n) implies γ
nξ
n∼ γ
n−1ξ
n−1and that, given ε > 0, we have, for n large enough,
γ
nξ
n− γ
n−1ξ
n−1≥ − εγ
n2ξ
n≥ − εγ
n−1γ
nξ
n,
where we have used the fact that the sequence (γ
n) is non-increasing. Since γ
nξ
n∼ γ
n−1ξ
n−1, we have, for n large enough, say n ≥ n
0,
γ
nξ
n≥ γ
n−1ξ
n−1(1 − 2εγ
n−1).
Therefore, for n > n
0,
γ
nξ
n≥ γ
n0ξ
n0n
Y
k=n0+1
(1 − 2εγ
k−1).
From this, we easily deduce that lim
n→∞
γ
nξ
n/θ
n= ∞ and that P
nγ
n2ξ
n/θ
n2= ∞ . Now, from
1 θ
2k− 1
θ
2k−1= 2γ
kp − γ
k2p
2θ
k2, we deduce (recall that lim
n→∞
γ
n= 0) γ
k2ξ
kθ
k2∼ γ
kξ
k2p
1 θ
k2− 1
θ
k−12!
,
and, since P
nγ
n2ξ
n/θ
n2= ∞ ,
n
X
k=1
γ
k2ξ
kθ
2k∼ 1 2p
n
X
k=1
γ
kξ
k1 θ
k2− 1
θ
k−12!
= 1
2p
γ
nξ
nθ
2n+
n
X
k=1
(γ
k−1ξ
k−1− γ
kξ
k) 1 θ
k−12!
= γ
nξ
n2pθ
2n+ o
n
X
k=1
γ
k2ξ
kθ
2k! ,
where, for the first equality, we have assumed ξ
0= 0, and, for the last one, we have used
again γ
nξ
n− γ
n−1ξ
n−1= o(γ
n2ξ
n).
♦Proof of Lemma 3: Given µ > 0, let
ν
µ= inf { n ≥ 0 | ρ
αnY
n> µ } . Note that { sup
nρ
αnY
n< ∞} = S
µ>0{ ν
µ= ∞} .
On the set { ν
µ= ∞} , we have
n
X
k=1
γ
kθ
k∆M
k=
n
X
k=1
γ
kθ
k1
{k≤νµ}∆M
k. We now apply Lemma 4 with Z
i=
γθii
1
{i≤νµ}∆M
i. We have, using Proposition 4, E (Z
i2| F
i−1) = γ
i2
θ
i21
{i≤νµ}E (∆M
i2| F
i−1)
≤ γ
i2θ
i21
{i≤νµ}p
Aρ
i−1Y
i−1+ ρ
2i(1 − p
B)
≤ γ
i2θ
i2p
Aρ
1−αi−1µ + ρ
2i(1 − p
B) ,
where we have used the fact that, on { i ≤ ν
µ} , ρ
αi−1Y
i−1≤ µ. Since lim
n→∞
ρ
n= 0 and
n→∞
lim ρ
n/ρ
n−1= 1 (which follows from ρ
n− ρ
n−1= o(γ
nρ
n)), we have E (Z
i2| F
i−1) ≤ σ
i2,
with σ
i2= C
µγi2ρθ1−αi2 i, for some C
µ> 0, depending only on µ. Using Lemma 5, we have
n
X
i=1
σ
i2∼ C
µγ
nρ
1−αn2pθ
2n.
On the other hand, we have, because the jumps ∆M
iare bounded,
| Z
i| ≤ C γ
iθ
i,
for some C > 0. Note that
γ γk/θkk−1/θk−1
=
γ γkk−1(1−pγk)
, and, since γ
k− γ
k−1= o(γ
k2) (take β = 0 in (10)), we have, for k large enough, γ
k− γ
k−1≥ − pγ
kγ
k−1, so that γ
k/γ
k−1≥ 1 − pγ
k, and the sequence (γ
n/θ
n) is non-increasing for n large enough. Therefore, we have
sup
1≤i≤n
| Z
i| ≤ ∆
n,
with ∆
n= Cγ
n/θ
nfor some C > 0. Now, applying Lemma 4 with λ =λ
0ρ
1−α−η
n 2
/θ
n, we get P θ
nn
X
k=1
γ
kθ
k1
{k≤νµ}∆M
k≥ λ
0ρ
1−α−η
n 2
!
≤ 2 exp
− λ
20ρ
2−α+ηn2θ
2nb
2n+ 2λ
0θ
nρ
1−α−η
n 2 ∆n
3
≤ 2 exp
− C
1ρ
2−α+ηnC
2γ
nρ
1−αn+ C
3γ
nρ
1−α−η
n 2
≤ 2 exp − C
4ρ
1+ηnγ
n! ,
where the positive constants C
1, C
2, C
3and C
4depend on λ
0and µ, but not on n.
Using (11) and the Borel-Cantelli lemma, we conclude that, on { ν
µ= ∞} , we have, for n large enough,
θ
nn
X
k=1
γ
kθ
k∆M
k< λ
0ρ
1−α−η
n 2
, a.s.,
and, since λ
0is arbitrary, this completes the proof of the Lemma.
♦3 Weak convergence of the normalized algorithm
Throughout this section, we assume (in addition to the initial conditions on the sequence (γ
n)
n∈N)
γ
n2− γ
n−12= o(γ
n2) and γ
nρ
n= g + o(γ
n), (16) where g is a positive constant. Note that a possible choice is γ
n= ag/ √
n and ρ
n= a/ √ n, with a > 0.
Under these conditions, we have ρ
n− ρ
n−1= o(γ
n2), and we can write, as in the beginning of Section 2,
Y
n+1= Y
n(1 + γ
n+1ε
n− π
nγ
n+1X
n) − γ
n+1κ(X
n) − γ
n+1ρ
n+1∆M
n+1, (17) where lim
n→∞
ε
n= 0 and lim
n→∞
π
n= π. As observed in Remark 2, we know that, under the
assumptions (16), the sequence (Y
n)
n≥1is tight. We will prove that it is convergent in
distribution.
Theorem 4 Under conditions (16), the sequence (Y
n= (1 − X
n)/ρ
n)
n∈Nconverges weakly to the unique stationary distribution of the Markov process on [0, + ∞ ) with generator L defined by
Lf (y) = p
By f (y + g) − f (y)
g + (1 − p
A− p
Ay)f
′(y), y ≥ 0, (18) for f continuously differentiable and compactly supported in [0, + ∞ ).
The method for proving Theorem 4 is based on the classical functional approach to central limit theorems for stochastic algorithms (see Bouton [2], Kushner [10], Duflo [6]). The long time behavior of the sequence (Y
n) will be elucidated through the study of a sequence of continuous-time processes Y
(n)= (Y
t(n))
t≥0, which will be proved to converge weakly to the Markov process with generator L. We will show that ν has a unique stationary distribution, and that this is the weak limit of the sequence (Y
n)
n∈N.
The sequence Y
(n)is defined as follows. Given n ∈ N , and t ≥ 0, set
Y
t(n)= Y
N(n,t), (19)
where
N(n, t) = min (
m ≥ n |
m
X
k=n
γ
k+1> t )
,
so that N (n, 0) = n, for t ∈ [0, γ
n+1), and, for m ≥ n + 1, N (n, t) = m if and only if P
mk=n+1
γ
k≤ t < P
m+1k=n+1γ
k.
Theorem 5 Under the assumptions of Theorem 4, the sequence of continuous time pro- cesses (Y
(n))
n∈Nconverges weakly (in the sense of Skorokhod) to a Markov process with generator L.
The proof of Theorem 5 is done in two steps: in section 3.1, we prove tightness, in section 3.2, we characterize the limit by a martingale problem.
3.1 Tightness
It follows from (17) that the process Y
(n)admits the following decomposition:
Y
t(n)= Y
n+ B
t(n)+ M
t(n), (20) with
B
(n)t= −
N(n,t)
X
k=n+1
γ
k[κ(X
k−1) + (π
k−1X
k−1− ε
k−1)Y
k−1] and
M
t(n)= −
N(n,t)
X
k=n+1