• Aucun résultat trouvé

On the convergence of moments in the almost sure central limit theorem for stochastic approximation algorithms

N/A
N/A
Protected

Academic year: 2022

Partager "On the convergence of moments in the almost sure central limit theorem for stochastic approximation algorithms"

Copied!
16
0
0

Texte intégral

(1)

DOI:10.1051/ps/2011155 www.esaim-ps.org

ON THE CONVERGENCE OF MOMENTS IN THE ALMOST SURE CENTRAL LIMIT THEOREM FOR STOCHASTIC APPROXIMATION ALGORITHMS

Peggy C´ enac

Abstract. We study the almost sure asymptotic behaviour of stochastic approximation algorithms for the search of zero of a real function. The quadratic strong law of large numbers is extended to the powers greater than one. In other words, the convergence of moments in the almost sure central limit theorem (ASCLT) is established. As a by-product of this convergence, one gets another proof of ASCLT for stochastic approximation algorithms. The convergence result is applied to several examples as estimation of quantiles and recursive estimation of the mean.

Mathematics Subject Classification. 60F05, 62L20, 60G42.

Received October 1st, 2010.

1. Introduction

Let (Xn) be a sequence of independent and identically distributed (i.i.d.) random variables withE[Xn] = 0 andE[Xn2] =σ2. For the empirical measures

Gn def= 1 logn

n k=1

1 Sk

k with Sn def= n k=1

Xk,

the almost sure central limit theorem (ASCLT) states that, with probability one,Gn=⇒GwhereGstands for theN(0, σ2) distribution and =denotes the convergence in distribution. It was simultaneously established by Brosamler [5], Schatte [29] and in the present form by Lacey and Phillip [16]. Moreover, under assumptions on the moments of (Xn), the convergence of moments in the ASCLT has been established recently by Bercu [2], Bercu and Fort [3]. That is to say, the following convergences hold:

n→∞lim 1 n

n k=1

1 k

Sk

√k 2p

= σ2p(2p)!

2pp! a.s., (1.1)

n→∞lim 1 n

n k=1

1 k

Sk

√k 2p−1

= 0 a.s. (1.2)

In fact, this result is a simplified version for i.i.d. random variables. It is extended in a context of martingales.

The case of multidimensional martingales is considered in Bercu et al. [4]. Despite the availability of a wide

Keywords and phrases.Stochastic approximation algorithms, almost sure central limit theorem, martingale transforms, moments.

1 Institut de Math´ematiques de Bourgogne, IMB UMR 5584 CNRS, 9 rue Alain Savary, BP 47870, 21078 Dijon Cedex, France.

[email protected]

Article published by EDP Sciences c EDP Sciences, SMAI 2013

(2)

literature concerning the ASCLT for independent variables, very few references can be found on the ASCLT for martingales apart from the important contribution of Chaˆabane [6,7], Chaˆabane and Maˆaouia [8], Chaˆabane et al. [9] and Lifshits [22,23]. As a by-product of the convergence of moments, Bercu and Fort [3] propose a proof of the ASCLT for unidimensional martingale transforms via an original approach which uses the Carleman moment theorem. In this paper, we prove that stochastic approximation algorithms used for the search of zeros and which are known to satisfy an ASCLT (see Pelletier [27]) also fulfill the convergence of moments in the ASCLT.

More precisely, we consider a stochastic algorithm of the form

Zn+1=Zn+γn[h(Zn) +Rn+1] +σnεn+1, (1.3) where the function h is defined on R and R-valued. The two sequences (Rn) and (εn) are two real distur- bances, defined on a probability space and adapted to a filtrationFdef= (Fn)n≥0. The stepsizes (γn) and (σn) are two deterministic positive sequences going to zero. This model includes the algorithms of Robbins–Monro and Kiefer–Wolfowitz as well as algorithms with Markovian disturbances. Stochastic approximation algorithms such as (1.3) used for the search of zeros ofhhave been widely studied under various assumptions (see Duflo [10] for an overview of these algorithms). In the context of Brownian diffusion process, Lamberton and Pag`es [17,18]

establish the convergence of weighted empirical measures in the form ofGn with a view to getting an approxi- mation of the diffusion’s invariant distribution, even in the case when the diffusion may have several invariant distributions. As a corollary of their main convergence result, they get the almost sure central limit theorem in the context of Brownian diffusion processes.

Letzbe a zero ofh. Many criteria ensure the almost sure convergence of (Zn) toz. In the wide literature concerning this convergence, let us refer to Benvenisteet al.[1], Duflo [10], Dupuis and Kushner [11], Hall and Heyde [13], Kushner and Clark [15], Ljunget al.[24]. Moreover, in the case when (Zn) converges almost surely toz, the weak convergence rate is given by

γn

σn2 (Zn−z) =⇒ N 0, Σ2

, (1.4)

whereΣ2is a positive real number related to the second moment of (εn) and to the differential of the functionh at the pointz(see among many others Benvenisteet al.[1], Duflo [10], Kushner and Clark [15], Ljunget al.[24], Zhu [30]).

Let us definevn def=γnσ−2n . The main goal of this paper is to establish an analog result of (1.1) and (1.2) in the context of stochastic approximation algorithms. More precisely, under appropriate assumptions on moments of the noise (εn), we prove the following result: forp≥1,

n→∞lim n1

k=1γk n k=1

γk[

vk(Zk−z)]p=g(p) a.s., (1.5)

where g(p) is the moment of order p of the Gaussian distribution N(0, Σ2). The ASCLT for stochastic ap- proximation algorithms can be viewed as a corollary of this result due to Carleman’s theorem. By the way, in the scalar case, we get another proof for the ASCLT established by Pelletier [27]. Moreover, in the particular casep= 2, the convergence (1.5) is the quadratic strong law of large numbers established by Le Breton [19], Le Breton and Novikov [20,21], Pelletier [26]. The main contribution of this paper is the extension (1.5) of the strong law to any powerp.

The paper is organized as follows. Section2is devoted to the assumptions and the presentation of the main results. In Section3, we propose some statistical applications to estimation errors for several particular cases of Robbins–Monro procedure. The proofs are established in Section4.

(3)

2. Assumptions and main results

Let us begin with the definition of the class of positive sequences that will be used in our assumptions. This definition is introduced in Mokkadem and Pelletier [25].

Definition 2.1. Let α R and (vn) be a nonrandom positive sequence. Then (vn) is said to be in the setGS(α) if

n→∞lim n

1−vn−1 vn

=α.

For example, typical sequences inGS(α) arenα(logn)β or nα(log logn)β, forα, β∈R. The assumptions we will refer to in the sequel are the following.

(H1) Zn converges almost surely toz.

(H2) The function his defined onR andzis a zero of hsuch that, on a neighborhood ofz, h(z) =H(z−z) +O |z−z|2

, (2.1)

where H <0.

(H3) The noise (εn) is a martingale difference sequence such that

n→∞lim E

ε2n+1|Fn

=σ2 a.s.

(H4) The disturbance (Rn) is split into two terms such that Rn =rn+O |Zn−1−z|2

a.s.,

|rn|=o vn−1/2(logsn)−q

a.s. ∀q≥0, (2.2)

where sndef= n k=1

γk.

(H5) The nonrandom sequences (γn) and (σn) are such that (γn)∈ GS(−α) with α∈

max

1 2,2

a

,1

,n)∈ GS(−β) with β ∈α

2, α

,

n→∞lim n>−−α 2H · Letξdef= limn→∞

n−1

. We can now define the asymptotic varianceΣ2 which appears in (1.4):

Σ2def= −σ2 2H+ξ(2β−α)·

Note that (H5) implies thatξ∈[0,−2H/(2β−α)[ and thus thatΣ2is well defined and strictly positive.

Comments on the assumptions.

The residual term (Rn) enables the study of algorithms with small Markovian disturbances. The way these algorithms can be rewritten as equation (1.3) is detailed in Duflo [10].

(4)

Let us now comment the last assumption (H5) on the gains. For the usual gains γn= γ0

nα and σn= σ0

√nα+β, with γ0>0, σ0>0, and 0< β≤α,

if α ] max{1/2,2/a},1[ or (α = 1 and β < 2Hγ0), assumption (H5) is fulfilled. In the particular case of Robbins–Monro algorithm, that is to say Rn = 0, σn =γn, (H5) contains for example the gainsγn =σn = γ0/nα with γ0 >0 and α∈] max{1/2,2/a},1[ or (α= 1 andγ0 > −1/2H). The Kiefer–Wolfowitz algorithm corresponds to the caseh=V where V is observable only together with a noise. For example classical gains satisfying (H5) could be

γn = γ0

nα, σn =nτγn, α

6 ≤τ < α

2, and γ0>0, σ0>0,

α∈] max{1/2,2/a},1[ or (α= 1 andγ0> 2τ−12H ). In these examples, the optimal rate is obtained forγn=γ0/n but leads to a condition on the initialγ0. To circumvent this condition, Koval and Schwabe [14] introduced the stepsize, for any (p, d)N2,

γn= γ0 logpnd

n , where logpn= log

logp−1n ,

which also fulfills (H5). This stepsize leads to convergence rates very close to the ones obtained withγ0/nbut without any constraints on the initialγ0.

The main result of this paper is the following theorem, which establishes the convergence of moments in the ASCLT for stochastic algorithms.

Theorem 2.2. Set an integer p≥1. Assume that for some realm >2p,(εn)satisfies the moment condition supn≥0E[|εn+1|m|Fn]<∞ a.s. (2.3) Under assumptions (H1) to(H5), one has

n→∞lim 1 sn

n k=1

γk[

vk(Zk−z)]2p = Σ2p(2p)!

2pp! a.s. (2.4)

n→∞lim 1 sn

n k=1

γk[

vk(Zk−z)]2p−1 = 0 a.s. (2.5)

In the particular casep= 1, the convergence (2.4) gives the quadratic strong law established by Pelletier [26].

Theorem2.2 extends this result to all powers = 2. The limiting constants in (2.4) and (2.5) correspond to the moments of the Gaussian distributionN(0, Σ2).

The classical moment problem is related to the question whether or not a given sequence uniquely determines the associated probability distribution. The well-known Carleman theorem (seee.g.Feller [12]) gives a condition on the moments in order to ensure the unicity of the distribution.

Theorem 2.3 (Carleman). A probability distribution is uniquely determined by its moments(mn)if

n=1

m−1/2n2n =∞.

Since the Gaussian limit distribution satisfies Carleman’s moment condition, we deduce the following corollary from Theorem2.2.

(5)

Corollary 2.4 (ASCLT).Assume thatn)is a martingale difference sequence such that, for all integer p≥1, supn≥0E[|εn+1|p|Fn]<∞ a.s.

Under assumptions (H1) to(H5), one has 1 sn

n k=1

γkδvk(Zk−z)=⇒ N 0, Σ2

a.s.

As a straightforward application, the following strong law holds.

Corollary 2.5. Assume thatn)is a martingale difference sequence such that, for all integer p≥1, supn≥0E[|εn+1|p|Fn]<∞ a.s.

Moreover, assume that f stands for any almost everywhere continuous function, with polynomial growth at infinity. Under assumptions(H1)to(H5), one has

n→∞lim 1 sn

n k=1

γkf(

vk(Zk−z)) =

fdGΣ a.s., whereGΣ denotes the N(0, Σ)distribution.

This corollary gives approximations of gaussian integral of almost everywhere continuous function with polyno- mial growth at infinity.

The Proof of Theorem2.2is given in Section4.

Remark 2.6. Assumptions (H1) to (H5) are equivalent to those of Pelletier [27] establishing the ASCLT. Here we assume the conditional moment assumption besides to get the stronger result of convergence of moments in the ASCLT.

3. Examples of applications

Let us give here three examples of estimation for which the Theorem 2.2 apply. These examples are dif- ferent forms of Robbins–Monro procedure, described as follows. Robbins–Monro algorithm (see Robbins and Monro [28]) is used for solving the equation f(x) = α, wheref is an Rd-valued function and α∈ Rd. More precisely, this algorithm is used when f can be rewritten in the formf(x) =E[F(x, X)]. Then the solution of the equationf(x) =αcan be recursively approximated by

Un+1=Un−γn(F(Un, Xn+1)−α),

where γn0 is a deterministic sequence going to zero and (Xn) is a sequence of independent and identically distributed random variables. Obvisouly, as mentioned in the introduction, Robbins–Monro algorithm is a particular case of model (1.3).

3.1. Translation parameter

Let (Yn) be a sample from a distribution F on R, with density f with respect to the Lebesgue measure.

The sequence of random variables (Yn) is not observable, but eachYn is supposed to be centered. We can only observe a translation model (Xn) where Xn = Yn +θ. The real translation parameter θ is unknown. Since E[Y1] = 0, the parameterθ is the mean ofXn,θ=E[Xn].

(6)

Without knowing the function f, we may also assume that f is even, strictly positive and continuously differentiable. Let us define the recursive estimator ofθdefined by

θn+1=θn−γn

11{X

n+1θn}1 2

· (3.1)

This algorithm is a particular case of the model (1.3) withσn=γn,Rn+1= 0 and h(z) =1

2 E

11{Xn+1≤z}|Fn

,

whereFnis theσ-field of events prior tonandFdef= (Fn) is the natural filtration. Indeed, (3.1) can be rewritten as θn+1=θn+γnh θn

+γnεn+1, with εn+1 = E

11{X

n+1θn}|Fn

11{X

n+1θn}. Obviously, assumptions (H4) and (2.3) for all integer p are satisfied, as well as (H3) with σ2= 1/4. It remains to trust (H2). SinceXn+1 is independent ofFn, one has

h(z) = 1

2 P(Xn+1≤z) = 1

2 P(Yn+1≤z−θ).

The functionf is continuous and strictly positive hencehis differentiable andh(z) =−f(z−θ). Moreover θ is a zero ofh. Taylor-Young formula leads to

h(z)−h(θ) =−f(0)(z−θ) +O

(z−θ)2 ,

with f(0)>0. Thus (H2) holds. Consequently, for any integer p, assuming (H1) and choosing the steps (γn) such that (H5) holds, Theorem 2.2 gives the asymptotic results on the estimation errors. The sequence of estimates (θn) satisfies

n→∞lim 1 sn

n k=1

1 γkp−1

θk−θ 2p

= Σ2p(2p)!

2pp! a.s.

n→∞lim 1 sn

n k=1

1 γkp−32

θk−θ 2p−1

= 0 a.s., withΣ2=

4

2f(0)−αξ−1

and the ASCLT for (θn) holds:

1 sn

n k=1

γkδ1γk(θk−θ) =⇒ N 0, Σ2

a.s.

3.2. Recursive estimation of quantiles

More generally, with a procedure of estimation analogous to the preceding one, we can give asymptotic convergence results on the error of quantile’s estimation. We consider a sample (Yn) with a strictly increasing distribution function F. Let us define qthe quantile of orderδ, that is to sayδdef=F(q). Without knowing the functionF, we may also assume that the densityf =F is continuously differentiable.

Let us define the recursive estimator

qn+1=qn−γn

11{Yn+1qn}−δ

. (3.2)

Again, the sequence (qn) is a particular case of (1.3) withσn =γn,Rn+1= 0, h(z) =δ−E

11{Yn+1≤z}|Fn

=δ−F(z),

(7)

andεn+1=E

11{Yn+1qn}|Fn

11{Yn+1qn}. Obviously, assumptions (H4) and (2.3) for all integerpare satisfied, as well as (H3) withσ2=δ(1−δ). Moreover, the functionhclearly satisfies (H2) for the zeroqof h. Conse- quently, for any integerp, assuming (H1) and choosing the steps such that (H5) holds, Theorem2.2leads to

n→∞lim 1 sn

n k=1

1

γkp−1(qk−q)2p = Σ2p(2p)!

2pp! a.s.

n→∞lim 1 sn

n k=1

1

γkp−32 (qk−q)2p−1 = 0 a.s., withΣ2=δ(1−δ)/[2f(q)−αξ]. Moreover we have

1 sn

n k=1

γkδ1

γk(qk−q)=⇒ N 0, Σ2

a.s.

3.3. Recursive estimation of the mean

Let (Yn) be a sequence of independent and identically distributed real random variables with mean μ and varianceσ2. Set an integerp≥1. Assume that for some realm >2p, (Yn) satisfies the moment condition

supn≥0E[|Yn+1|m|Fn]<∞ a.s.

Let us consider the recursive estimator of the meanμdefined by

μn+1=μn+γn(Yn+1−μn).

This model is again a particular case of the model (1.3) with h(z) = μ−z and εn+1 = Yn+1−μ. Assump- tions (H2), (H3), (H4) and (2.3) are satisfied. Consequently, assuming (H1) and choosing the steps such that (H5) holds, the cumulated estimation error satisfies

n→∞lim 1 sn

n k=1

1

γkp−1(μk−μ)2p = Σ2p(2p)!

2pp! a.s.

n→∞lim 1 sn

n k=1

1

γkp−32 (μk−μ)2p−1 = 0 a.s., withΣ2=σ2/(2−αξ), as well as the ASCLT

1 sn

n k=1

γkδ1γk(μk−μ)=⇒ N 0, Σ2

a.s.

4. Proof of Theorem 2.2

Remark 4.1. Assumption (H5) implies thatvn ∈ GS(−α+ 2β), since vn = γnσ−2n , γn ∈ GS(−α) and σn GS(−β). Consequently one has

1−vn−1

vn =−α+ 2β n +o

1 n

=ξγn(−α+ 2β) +on). Finally it comes

vn−1

vn = 1−γnξ(2β−α) +on). (4.1)

This remark is very helpful for the proof of Theorem2.2and (4.1) is widely used.

(8)

First, the convergence is established for even moments, by induction on the power p. The recursive equation (1.3) yields

Zn+1−z=Yn+1+σnεn+1 where Yn+1def=Zn−z+γn[h(Zn) +Rn+1]. (4.2) In view of assumptions (H2) and (H4), the sequence Yn+1satisfies

Yn+1= (Zn−z) [1 +n] +γnrn+1+O γn|Zn−z|2 .

In order to prove Theorem2.2, we shall apply several times the following lemma, established in Mokkadem and Pelletier [25].

Lemma 4.2. Assuming that there exists m > 2 such that (2.3) holds, under assumptions (H1) to (H5), the stepsized stochastic algorithm (Zn)satisfies

|Zn−z|=O

vn−1logsn

a.s. (4.3)

In particular, Lemma4.2consequently leads to

Yn+1= (Zn−z) [1 +n] +o γnv−1/2n (logsn)−q

a.s. (4.4)

for allq≥0.

4.1. Even moments

Elevating (4.2) to the power 2pit comes

(Zn+1−z)2p= 2p

=0

C2p σnεn+1Yn+12p−.

Setting Vn+1def= vn

Zn+1−z

, we get

Vn+12p −Vn2p=vpnYn+12p −Vn2p+γnpε2pn+1+

2p−1

=1

C2p γn/2(

vnYn+1)2p−εn+1. (4.5) From the obvious equality

Vn+12p = n k=1

Vk+12p −Vk2p

+V1p, and from the recursive equation (4.5), one gets for anyn≥1,

Vn+12p =V12p+A(p)n+1+Wn+1(p) +

2p−1

=1

C2p Bn+1(p) (), (4.6)

where

Wn+1(p) def= n k=1

γkpε2pk+1, A(p)n+1def= n k=1

vkpYk+12p −Vk2p

,

Bn+1(p) ()def= n k=1

γk/2(

vkYk+1)2p−εk+1.

(9)

4.1.1. Case p= 1

Clearly equation (4.6) can be rewritten as

A(1)n+1=−Wn+1(1) +Vn+12 −V122Bn+1(1) (1). (4.7) Since the noise (εn) satisfies (H3) and assumption (2.3) for a real numberm >2, let us note that Chow’s lemma (seee.g.Duflo [10], p. 22) yields

n→∞lim 1

snWn+1(1) =σ2 a.s. (4.8)

Let us assume that

1 sn

Vn+12 −V122B(1)n+1(1)

=o(1) +o s−1n A(1)n+1

a.s. (4.9)

We first show how the combination of (4.8) and (4.9) gives the convergence (2.4) forp= 1. From equality (4.4) one has

Yn+12 = (Zn−z)2(1 + 2γnH+on)) +o γn2vn−1(logsn)−2q +o |Zn−znvn−1/2(logsn)−q

= (Zn−z)2(1 + 2γnH+on)) +o vn−1γn(logsn)−q+1/2

,

the second equality resulting from Lemma 4.2. Consequently, applying this equality forq = 1/2 and denoting Un def=

vn(Zn−z), the general term ofA(1)n+1 is of the form Un2

1−vn−1

vn + 2γnH+on)

+on) a.s.

Hence we deduce from (4.1) by applying Kronecker lemma that

n→∞lim 1 sn

n k=1

γkUk2= 1

2H+ξ(2β−α) lim

n→∞

1

snA(1)n+1 a.s. (4.10)

Then the combination of (4.7)–(4.10) leads to (2.4).

It remains to prove (4.9). Obviously sincesn increases to infinity, almost surelyV12p=o(sn). Moreover, from Lemma 4.2, we have a.s.Vn+12 =o(sn). To conclude we have to study the asymptotic behavior of B(1)n+1(1). We easily deduce from Kronecker lemma and from the decomposition (4.4) that we only have to show that

n k=1

√γkUkεk+1 =o A(1)n+1

a.s. and n k=1

γk3/2εk+1=o(sn) a.s.

The standard strong law of large numbers for regressive series (seee.g.Duflo [10], p. 26) yields

n k=1

√γkUkεk+1 =o

n

k=1

γkUk2

+O(1) a.s.

n k=1

γk3/2εk+1 =O

n

k=1

γk3

=o(sn) a.s.

Since from (4.10), one has

o n

k=1

γkvk(Zk−z)2

=o A(1)n+1 a.s.,

(10)

it comes

B(1)n+1(1) =o A(1)n+1

+o(sn) a.s., (4.11)

which concludes the proof of (4.9).

4.1.2. Case p≥2

Now, let p≥2 and assume that the convergence (2.4) holds for any powerq with 1≤q ≤p−1. First, we prove that

n→∞lim 1 sn

n k=1

γkUk2p=−Σ2 2 lim

n→∞

1

snA(p)n+1 a.s. (4.12)

From the decomposition (4.4) together with the result (4.3) of Lemma 4.2, it is easy to see that, for all q

−p+ 1/2

Yn+12p = (Zn−z)2p(1 + 2pHγn+on)) +o γnvn−p(logsn)−q

. In particular forq= 0, one has

vnpYn+12p −Vn2p=vpn(Zn−z)2p

1−vpn−1 vpn

+ 2pHγn+on)

+on) a.s.

Sincevn ∈ GS(2β−α), the sequencevpn belongs toGS

p(2β−α)

and hence 1−vpn−1

vnp =p(2β−α)ξγn+on). Finally, we get

vpnYn+12p −Vn2p=nUn2p[ξ(2β−α) + 2H] (1 +o(1)) +on), and (4.12) is a straightforward application of Kronecker lemma. Thus, we have to show that

n→∞lim 1

snA(p)n+1=−pσ2(2p1)g[2(p1)] a.s. (4.13) The proof of (4.13) is constructed in the following way. Let us note that Lemma4.2implies thatVn+12p =o(sn).

First we establish the convergence ofs−1n Wn+1(p) to zero. Then we show that

∀∈ {1, . . . ,2p1}\{2} Bn+1(p) () =o A(p)n+1

+o(sn) a.s.

n→∞lim B(p)n+1(2) =g[2(p1)]σ2 a.s.

The decomposition (4.6) will lead to convergence (4.13).

Since the noise (εn) satisfies (2.3), by applying Chow’s lemma it comes Wn+1(p) =O

n

k=1

γkp

a.s.

In addition,γnconverges almost surely to zero, henceγnp=o(γn) a.s. Consequently, one gets from Kronecker’s Lemma

Wn+1(p) =o(sn) a.s. (4.14)

(11)

For establishing the asymptotic behavior ofB(p)n+1, let us split it into two terms,Bn+1(p) () =Cn+1(p) () +D(p)n+1() with

Cn+1(p) ()def= n k=1

γk/2(

vkYk+1)2p−ek+1(), D(p)n+1()def=

n k=1

γk/2(

vkYk+1)2p−E

εk+1|Fk

,

anden+1()def=εn+1E

εn+1|Fn

. First, we claim that for any 3≤≤2p1,

D(p)n+1()=o(sn) a.s. (4.15)

One can easily see from the H¨older inequality and from the moment assumption (2.3) that each moment E

εn+1|Fnis almost surely bounded and consequently D(p)n+1()=O

n

k=1

γk/2(

vkYk+1)2p−

a.s.

Ifis even, for allq≥0, (4.4) leads to, γn/2(

vnYn+1)2p− =γn/2

Un(1 +o(1)) +o γn(logsn)−q 2p−

=γn/2

Un2p−(1 +o(1)) +o γnUn2p−−1(logsn)−q

=γn/2Un2p−[1 +o(1)] +o γn/2+1(logsn)−q+p−/2−1/2 . Then the induction assumption together with Kronecker lemma yield

n k=1

γ/2k (

vkYk+1)2p−=o(sn) a.s.

Consequently the convergence (4.15) holds foreven such that 3≤≤2p1. Otherwise in the case whenis odd and3 (sinceDn+1(1) = 0 is a trivial case), applying the Cauchy–Schwarz inequality it comes

D(p)n+1()2=O n

k=1

γk(

vkYk+1)2p−2 n k=1

γk−1(

vkYk+1)2(p−+1)

a.s.

Again the induction assumption and Kronecker’s lemma lead to (4.15). For the last termD(2)n+1(), the conver- gence

n→∞lim 1

snD(p)n+1(2) =g[2(p1)]σ2 (4.16) is a consequence of the induction assumption and Kronecker lemma too.

It remains to study the asymptotic behavior ofCn+1(p) () for any 1≤≤2p1. We claim that Cn+1(p) ()=o(sn) +o A(p)n+1

a.s. (4.17)

(12)

We easily deduce from (4.4) and from Kronecker lemma that we only have to show that n

k=1

γk/2Uk2p−ek+1() =o(sn) +o A(p)n+1 a.s.

n k=1

γk1+/2ek+1() =o(sn) a.s.

According to the standard strong law of large numbers for regressive series one has, for anyδ >0,

n k=1

γk/2Uk2p−ek+1()

2

=o

n

k=1

γkUk2(2p−)

log n k=1

γkUk2(2p−) 1+δ

⎠ a.s.

n k=1

γk1+/2ek+1() =o

n

k=1

γk2+

a.s.

Sinceγn decreases to zero, it comes in particular

n k=1

γk1+/2ek+1()

=o(sn) a.s.

First, assume that 1≤≤p−1. Since n k=1

γkUk2(2p−)=O

1≤k≤nsup Uk2p n k=1

γkUk2(p−)

a.s., the conjonction of Lemma4.2and the induction assumption leads to

n k=1

γk/2Uk2p−ek+1()

=o(sn) a.s.

The particular case=pis also a consequence of both the strong law of large numbers and Kronecker lemma and one has

n k=1

γkpUk2p=o A(p)n+1 a.s.

Consequently Kronecker lemma leads to (4.17).

Now assume thatp < ≤2p1. According to Chow’s lemma, one has

n k=1

γk/2Uk2p−ek+1()

=on()) a.s. with νn()def= n k=1

γk/2Uk2p−2p/.

Applying H¨older inequality with the exponents/pand/(−p), the induction assumption impliesνn() =o(sn), which concludes the proof of (4.17).

Finally the conjonction of the convergences (4.14)–(4.17), leads to

n→∞lim 1 sn

A(p)n+1+Wn+1(p) +

2p−1

=1

C2p B(p)n+1()

= lim

n→∞

1

snA(p)n+1+C2p2 σ2g[2(p1)] a.s.

So, from the decomposition (4.6) we get

n→∞lim 1

snA(p)n+1=−C2p2 σ2g[2(p1)] a.s., which concludes the proof of (2.4).

(13)

4.2. Odd moments

Now let us deal with the odd moments with the proof of (2.5). Elevating equation (4.2) to power 2p1 and proceeding exactly as in the case of even moments, it comes

Vn+12p−1=V12p−1+A(p)n+1+Wn+1(p) +

2p−2

=1

C2p−1 B(p)n+1(), (4.18)

where

Wn+1(p) def= n k=1

γkp−1/2ε2p−1k+1 , A(p)n+1def= n k=1

vkp−1/2Yk+12p−1−Vk2p−1

,

B(p)n+1()def= n k=1

γk/2(

vkYk+1)2p−−1εk+1. 4.2.1. Case p= 1

Forp= 1, the decomposition (4.18) can be rewritten as

A(1)n+1=Vn+1−V1−Wn+1(1) . (4.19) An application of the standard strong law of large numbers for regressive series leads to

Wn+1(1) =o(sn) a.s.

ClearlyV1 =o sn

sincesn increases to infinity and again from Lemma4.2, one hasVn+1 =o(sn) a.s. Hence we deduce from (4.19) thatAn+1=o(sn) a.s.

Moreover, due to the decomposition (4.4) the general term ofAn+1 satisfies for allq≥0,

√vnYn+1−Vn =Un

1 +n vn−1

vn

+o γn(logsn)−q

a.s. (4.20)

Sincevn ∈ GS(2β−α),√

vn∈ GS(β−α/2) and consequently 1

vn−1

vn =ξγn−α/2) +on). In particular forq= 0, the equation (4.20) yields

√vnYn+1−Vn =γnUn[H+ξ(β−α/2)] (1 +o(1)) +on) a.s., which clearly leads to (2.5) from Kronecker lemma, since

n→∞lim 1 sn

n k=1

γkUk = 1

H+ξ(β−α/2) lim

n→∞

1

snA(1)n+1= 0 a.s.

4.2.2. Case p≥2

Now letp≥2 and assume that convergence (2.5) holds for any powerqwith 1≤q≤p−1. The structure of the proof is the same as in the case of even moments. It consists in establishing the convergenceA(p)n+1=o(sn) from the decomposition (4.18).

(14)

First, Chow’s Lemma implies

Wn+1(p) =O n

k=1

γkp−1/2

a.s.

Sincep≥2 andγn goes to zero, one gets

Wn+1(p) =o(sn) a.s. (4.21)

Moreover, clearly we haveV1p=o(sn) a.s. andVn+1p =o(sn) a.s. from Lemma4.2. Let us splitBn+1(p) () into two terms,B(p)n+1() =Cn+1(p) () +D(p)n+1() with

Cn+1(p) ()def= n k=1

γ/2k (

vkYk+1)2p−−1ek+1(),

D(p)n+1()def= n k=1

γ/2k (

vkYk+1)2p−−1E

εk+1|Fk

.

For the first termCn+1(p) , we deduce from (4.4) and from Kronecker lemma that it suffices to show that n

k=1

γk/2Uk2p−−1ek+1() =o(sn) a.s. (4.22) n

k=1

γk1+/2ek+1() =o(sn) a.s.

Once again, one gets from the strong law of large numbers that for allδ >0,

n k=1

γk/2Uk2p−−1ek+1()

2

=o

n

k=1

γkUk2(2p−−1)

log n k=1

γkUk2(2p−−1) 1+δ

⎠ a.s.

n k=1

γk1+/2ek+1() =o

n

k=1

γ2+k

=o(sn) a.s.

First, assume that≥p−1. The convergence (2.4) implies n

k=1

γkUk2(2p−−1)=O(sn) a.s.

Hence, (4.22) is proven sinceγn goes to zero. Now let 1≤ < p−1. Since n

k=1

γkUk2(2p−−1)=O

1≤k≤nsup Uk2(p−−1) n k=1

γkUk2p

a.s.,

the conjonction of Lemma4.2and convergence (2.4) leads to (4.22). Finally, we have established that

Cn+1(p) () =o(sn) a.s. (4.23)

Let us now study the asymptotic behavior ofDn+1() in the following three cases.

Références

Documents relatifs

We show (cf. Theorem 1) that one can define a sta- tionary, bounded martingale difference sequence with non-degenerate lattice distribution which does not satisfy the local

Finally, note that El Machkouri and Voln´y [7] have shown that the rate of convergence in the central limit theorem can be arbitrary slow for stationary sequences of bounded

Abstract: We established the rate of convergence in the central limit theorem for stopped sums of a class of martingale difference sequences.. Sur la vitesse de convergence dans le

In study of the central limit theorem for dependent random variables, the case of martingale difference sequences has played an important role, cf.. Hall and

We established the rate of convergence in the central limit theorem for stopped sums of a class of martingale difference sequences..  2004

In Section 7 we study the asymptotic coverage level of the confidence interval built in Section 5 through two different sets of simulations: we first simulate a

Proposition 5.1 of Section 5 (which is the main ingredient for proving Theorem 2.1), com- bined with a suitable martingale approximation, can also be used to derive upper bounds for

We have analyzed the convergence of stochastic optimization methods for bi- level optimization problems (of the form of Problem (1)) where either the outer objective function or