On the convergence of moments in the almost sure central limit theorem for stochastic approximation algorithms

(1)

DOI:10.1051/ps/2011155 www.esaim-ps.org

ON THE CONVERGENCE OF MOMENTS IN THE ALMOST SURE CENTRAL LIMIT THEOREM FOR STOCHASTIC APPROXIMATION ALGORITHMS

Peggy C´ enac

Abstract. We study the almost sure asymptotic behaviour of stochastic approximation algorithms for the search of zero of a real function. The quadratic strong law of large numbers is extended to the powers greater than one. In other words, the convergence of moments in the almost sure central limit theorem (ASCLT) is established. As a by-product of this convergence, one gets another proof of ASCLT for stochastic approximation algorithms. The convergence result is applied to several examples as estimation of quantiles and recursive estimation of the mean.

Mathematics Subject Classification. 60F05, 62L20, 60G42.

Received October 1st, 2010.

1. Introduction

Let (X_n) be a sequence of independent and identically distributed (i.i.d.) random variables withE[Xn] = 0 andE[X_n²] =σ². For the empirical measures

G_n ^def= 1 logn

n k=1

1 kδSk√

k with S_n ^def= n k=1

X_k,

the almost sure central limit theorem (ASCLT) states that, with probability one,G_n=⇒GwhereGstands for theN(0, σ²) distribution and =⇒denotes the convergence in distribution. It was simultaneously established by Brosamler [5], Schatte [29] and in the present form by Lacey and Phillip [16]. Moreover, under assumptions on the moments of (X_n), the convergence of moments in the ASCLT has been established recently by Bercu [2], Bercu and Fort [3]. That is to say, the following convergences hold:

n→∞lim 1 n

n k=1

1 k

S_k

√k _2p

= σ^2p(2p)!

2^pp! a.s., (1.1)

n→∞lim 1 n

n k=1

1 k

S_k

√k _2p−1

= 0 a.s. (1.2)

In fact, this result is a simpliﬁed version for i.i.d. random variables. It is extended in a context of martingales.

The case of multidimensional martingales is considered in Bercu et al. [4]. Despite the availability of a wide

Keywords and phrases.Stochastic approximation algorithms, almost sure central limit theorem, martingale transforms, moments.

1 Institut de Math´ematiques de Bourgogne, IMB UMR 5584 CNRS, 9 rue Alain Savary, BP 47870, 21078 Dijon Cedex, France.

[email protected]

Article published by EDP Sciences c EDP Sciences, SMAI 2013

(2)

literature concerning the ASCLT for independent variables, very few references can be found on the ASCLT for martingales apart from the important contribution of Chaâbane [6,7], Chaâbane and Maâouia [8], Chaâbane et al. [9] and Lifshits [22,23]. As a by-product of the convergence of moments, Bercu and Fort [3] propose a proof of the ASCLT for unidimensional martingale transforms via an original approach which uses the Carleman moment theorem. In this paper, we prove that stochastic approximation algorithms used for the search of zeros and which are known to satisfy an ASCLT (see Pelletier [27]) also fulfill the convergence of moments in the ASCLT.

More precisely, we consider a stochastic algorithm of the form

Z_n+1=Z_n+γ_n[h(Z_n) +R_n+1] +σ_nε_n+1, (1.3) where the function h is defined on R and R-valued. The two sequences (Rn) and (ε_n) are two real disturbances, defined on a probability space and adapted to a filtrationF^def= (F_n)_n≥0. The stepsizes (γ_n) and (σ_n) are two deterministic positive sequences going to zero. This model includes the algorithms of Robbins–Monro and Kiefer–Wolfowitz as well as algorithms with Markovian disturbances. Stochastic approximation algorithms such as (1.3) used for the search of zeros ofhhave been widely studied under various assumptions (see Duflo [10] for an overview of these algorithms). In the context of Brownian diffusion process, Lamberton and Pagès [17,18]

establish the convergence of weighted empirical measures in the form ofG_n with a view to getting an approximation of the diffusion’s invariant distribution, even in the case when the diffusion may have several invariant distributions. As a corollary of their main convergence result, they get the almost sure central limit theorem in the context of Brownian diffusion processes.

Letz^∗be a zero ofh. Many criteria ensure the almost sure convergence of (Z_n) toz^∗. In the wide literature concerning this convergence, let us refer to Benvenisteet al.[1], Duﬂo [10], Dupuis and Kushner [11], Hall and Heyde [13], Kushner and Clark [15], Ljunget al.[24]. Moreover, in the case when (Z_n) converges almost surely toz^∗, the weak convergence rate is given by

γ_n

σ_n² (Z_n−z^∗) =⇒ N 0, Σ²

, (1.4)

whereΣ²is a positive real number related to the second moment of (ε_n) and to the diﬀerential of the functionh at the pointz^∗(see among many others Benvenisteet al.[1], Duﬂo [10], Kushner and Clark [15], Ljunget al.[24], Zhu [30]).

Let us deﬁnev_n ^def=γ_nσ⁻²_n . The main goal of this paper is to establish an analog result of (1.1) and (1.2) in the context of stochastic approximation algorithms. More precisely, under appropriate assumptions on moments of the noise (ε_n), we prove the following result: forp≥1,

n→∞lim _n1

k=1γ_k n k=1

γ_k[√

v_k(Z_k−z^∗)]^p=g(p) a.s., (1.5)

where g(p) is the moment of order p of the Gaussian distribution N(0, Σ²). The ASCLT for stochastic approximation algorithms can be viewed as a corollary of this result due to Carleman’s theorem. By the way, in the scalar case, we get another proof for the ASCLT established by Pelletier [27]. Moreover, in the particular casep= 2, the convergence (1.5) is the quadratic strong law of large numbers established by Le Breton [19], Le Breton and Novikov [20,21], Pelletier [26]. The main contribution of this paper is the extension (1.5) of the strong law to any powerp.

The paper is organized as follows. Section2is devoted to the assumptions and the presentation of the main results. In Section3, we propose some statistical applications to estimation errors for several particular cases of Robbins–Monro procedure. The proofs are established in Section4.

(3)

2. Assumptions and main results

Let us begin with the deﬁnition of the class of positive sequences that will be used in our assumptions. This deﬁnition is introduced in Mokkadem and Pelletier [25].

Definition 2.1. Let α ∈ R and (v_n) be a nonrandom positive sequence. Then (v_n) is said to be in the setGS(α) if

n→∞lim n

1−v_n−1 v_n

=α.

For example, typical sequences inGS(α) aren^α(logn)^β or n^α(log logn)^β, forα, β∈R. The assumptions we will refer to in the sequel are the following.

(H1) Z_n converges almost surely toz^∗.

(H2) The function his deﬁned onR andz^∗is a zero of hsuch that, on a neighborhood ofz^∗, h(z) =H(z−z^∗) +O |z−z^∗|²

, (2.1)

where H <0.

(H3) The noise (ε_n) is a martingale diﬀerence sequence such that

n→∞lim E

ε²_n+1|Fn

=σ² a.s.

(H4) The disturbance (R_n) is split into two terms such that R_n =r_n+O |Z_n−1−z^∗|²

a.s.,

|r_n|=o v_n^−1/2(logs_n)^−q

a.s. ∀q≥0, (2.2)

where s_n^def= n k=1

γ_k.

(H5) The nonrandom sequences (γ_n) and (σ_n) are such that (γ_n)∈ GS(−α) with α∈

max

1 2,2

a

,1

, (σ_n)∈ GS(−β) with β ∈α

2, α

,

n→∞lim nγ_n>−2β−α 2H · Letξ^def= lim_n→∞

nγ_n₋₁

. We can now deﬁne the asymptotic varianceΣ² which appears in (1.4):

Σ²^def= −σ² 2H+ξ(2β−α)·

Note that (H5) implies thatξ∈[0,−2H/(2β−α)[ and thus thatΣ²is well deﬁned and strictly positive.

Comments on the assumptions.

The residual term (R_n) enables the study of algorithms with small Markovian disturbances. The way these algorithms can be rewritten as equation (1.3) is detailed in Duﬂo [10].

(4)

Let us now comment the last assumption (H5) on the gains. For the usual gains γ_n= γ₀

n^α and σ_n= σ₀

√n^α+β, with γ₀>0, σ₀>0, and 0< β≤α,

if α ∈] max{1/2,2/a},1[ or (α = 1 and β < −2Hγ₀), assumption (H5) is fulﬁlled. In the particular case of Robbins–Monro algorithm, that is to say R_n = 0, σ_n =γ_n, (H5) contains for example the gainsγ_n =σ_n = γ₀/n^α with γ₀ >0 and α∈] max{1/2,2/a},1[ or (α= 1 andγ₀ > −1/2H). The Kiefer–Wolfowitz algorithm corresponds to the caseh=V where V is observable only together with a noise. For example classical gains satisfying (H5) could be

γ_n = γ₀

n^α, σ_n =n^τγ_n, α

6 ≤τ < α

2, and γ₀>0, σ₀>0,

α∈] max{1/2,2/a},1[ or (α= 1 andγ₀> ^2τ−1_2H ). In these examples, the optimal rate is obtained forγ_n=γ₀/n but leads to a condition on the initialγ₀. To circumvent this condition, Koval and Schwabe [14] introduced the stepsize, for any (p, d)∈N²,

γ_n= γ₀ log_pn_d

n , where log_pn= log

log_p−1n ,

which also fulﬁlls (H5). This stepsize leads to convergence rates very close to the ones obtained withγ₀/nbut without any constraints on the initialγ₀.

The main result of this paper is the following theorem, which establishes the convergence of moments in the ASCLT for stochastic algorithms.

Theorem 2.2. Set an integer p≥1. Assume that for some realm >2p,(ε_n)satisﬁes the moment condition supn≥0E[|εn+1|^m|Fn]<∞ a.s. (2.3) Under assumptions (H1) to(H5), one has

n→∞lim 1 s_n

n k=1

γ_k[√

v_k(Z_k−z^∗)]^2p = Σ^2p(2p)!

2^pp! a.s. (2.4)

n→∞lim 1 s_n

n k=1

γ_k[√

v_k(Z_k−z^∗)]^2p−1 = 0 a.s. (2.5)

In the particular casep= 1, the convergence (2.4) gives the quadratic strong law established by Pelletier [26].

Theorem2.2 extends this result to all powers = 2. The limiting constants in (2.4) and (2.5) correspond to the moments of the Gaussian distributionN(0, Σ²).

The classical moment problem is related to the question whether or not a given sequence uniquely determines the associated probability distribution. The well-known Carleman theorem (seee.g.Feller [12]) gives a condition on the moments in order to ensure the unicity of the distribution.

Theorem 2.3 (Carleman). A probability distribution is uniquely determined by its moments(m_n)if ∞

n=1

m^−1/2n_2n =∞.

Since the Gaussian limit distribution satisﬁes Carleman’s moment condition, we deduce the following corollary from Theorem2.2.

(5)

Corollary 2.4 (ASCLT).Assume that(ε_n)is a martingale diﬀerence sequence such that, for all integer p≥1, supn≥0E[|εn+1|^p|Fn]<∞ a.s.

Under assumptions (H1) to(H5), one has 1 s_n

n k=1

γ_kδ^√_v_k_(Z_k_−z∗)=⇒ N 0, Σ²

a.s.

As a straightforward application, the following strong law holds.

Corollary 2.5. Assume that(ε_n)is a martingale diﬀerence sequence such that, for all integer p≥1, supn≥0E[|εn+1|^p|Fn]<∞ a.s.

Moreover, assume that f stands for any almost everywhere continuous function, with polynomial growth at inﬁnity. Under assumptions(H1)to(H5), one has

n→∞lim 1 s_n

n k=1

γ_kf(√

v_k(Z_k−z^∗)) =

fdG_Σ a.s., whereG_Σ denotes the N(0, Σ)distribution.

This corollary gives approximations of gaussian integral of almost everywhere continuous function with polynomial growth at inﬁnity.

The Proof of Theorem2.2is given in Section4.

Remark 2.6. Assumptions (H1) to (H5) are equivalent to those of Pelletier [27] establishing the ASCLT. Here we assume the conditional moment assumption besides to get the stronger result of convergence of moments in the ASCLT.

3. Examples of applications

Let us give here three examples of estimation for which the Theorem 2.2 apply. These examples are dif- ferent forms of Robbins–Monro procedure, described as follows. Robbins–Monro algorithm (see Robbins and Monro [28]) is used for solving the equation f(x) = α, wheref is an R^d-valued function and α∈ R^d. More precisely, this algorithm is used when f can be rewritten in the formf(x) =E[F(x, X)]. Then the solution of the equationf(x) =αcan be recursively approximated by

U_n+1=U_n−γ_n(F(U_n, X_n+1)−α),

where γ_n≥0 is a deterministic sequence going to zero and (X_n) is a sequence of independent and identically distributed random variables. Obvisouly, as mentioned in the introduction, Robbins–Monro algorithm is a particular case of model (1.3).

3.1. Translation parameter

Let (Y_n) be a sample from a distribution F on R, with density f with respect to the Lebesgue measure.

The sequence of random variables (Y_n) is not observable, but eachY_n is supposed to be centered. We can only observe a translation model (X_n) where X_n = Y_n +θ. The real translation parameter θ is unknown. Since E[Y₁] = 0, the parameterθ is the mean ofX_n,θ=E[Xn].

(6)

Without knowing the function f, we may also assume that f is even, strictly positive and continuously differentiable. Let us define the recursive estimator ofθdefined by

θ_n+1=θ_n−γ_n

11_{X

n+1≤θn}−1 2

· (3.1)

This algorithm is a particular case of the model (1.3) withσ_n=γ_n,R_n+1= 0 and h(z) =1

2 −E

11_{X_n+1_≤z}|Fn

,

whereFnis theσ-ﬁeld of events prior tonandF^def= (Fn) is the natural ﬁltration. Indeed, (3.1) can be rewritten as θ_n+1=θ_n+γ_nh θ_n

+γ_nε_n+1, with ε_n+1 = E

11_{X

n+1≤θn}|F_n

−¹1_{X

n+1≤θn}. Obviously, assumptions (H4) and (2.3) for all integer p are satisﬁed, as well as (H3) with σ²= 1/4. It remains to trust (H2). SinceX_n+1 is independent ofFn, one has

h(z) = 1

2 −P(X_n+1≤z) = 1

2 −P(Y_n+1≤z−θ).

The functionf is continuous and strictly positive hencehis diﬀerentiable andh(z) =−f(z−θ). Moreover θ is a zero ofh. Taylor-Young formula leads to

h(z)−h(θ) =−f(0)(z−θ) +O

(z−θ)² ,

with f(0)>0. Thus (H2) holds. Consequently, for any integer p, assuming (H1) and choosing the steps (γ_n) such that (H5) holds, Theorem 2.2 gives the asymptotic results on the estimation errors. The sequence of estimates (θ_n) satisﬁes

n→∞lim 1 s_n

n k=1

1 γ_k^p−1

θ_k−θ _2p

= Σ^2p(2p)!

2^pp! a.s.

n→∞lim 1 s_n

n k=1

1 γ_k^p−³²

θ_k−θ _2p−1

= 0 a.s., withΣ²=

4

2f(0)−αξ₋₁

and the ASCLT for (θ_n) holds:

1 s_n

n k=1

γ_kδ√1γk(θk−θ) =⇒ N 0, Σ²

a.s.

3.2. Recursive estimation of quantiles

More generally, with a procedure of estimation analogous to the preceding one, we can give asymptotic convergence results on the error of quantile’s estimation. We consider a sample (Y_n) with a strictly increasing distribution function F. Let us deﬁne qthe quantile of orderδ, that is to sayδ^def=F(q). Without knowing the functionF, we may also assume that the densityf =F is continuously diﬀerentiable.

Let us deﬁne the recursive estimator

q_n+1=q_n−γ_n

11_{Y_n+1_≤_q_n_}−δ

. (3.2)

Again, the sequence (q_n) is a particular case of (1.3) withσ_n =γ_n,R_n+1= 0, h(z) =δ−E

11_{Y_n+1_≤z}|Fn

=δ−F(z),

(7)

andε_n+1=E

11_{Y_n+1_≤_q_n_}|Fn

−¹1_{Y_n+1_≤_q_n_}. Obviously, assumptions (H4) and (2.3) for all integerpare satisﬁed, as well as (H3) withσ²=δ(1−δ). Moreover, the functionhclearly satisﬁes (H2) for the zeroqof h. Conse- quently, for any integerp, assuming (H1) and choosing the steps such that (H5) holds, Theorem2.2leads to

n→∞lim 1 s_n

n k=1

1

γ_k^p−1(q_k−q)^2p = Σ^2p(2p)!

2^pp! a.s.

n→∞lim 1 s_n

n k=1

1

γ_k^p−³² (q_k−q)^2p−1 = 0 a.s., withΣ²=δ(1−δ)/[2f(q)−αξ]. Moreover we have

1 s_n

n k=1

γ_kδ√1

γk(qk−q)=⇒ N 0, Σ²

a.s.

3.3. Recursive estimation of the mean

Let (Y_n) be a sequence of independent and identically distributed real random variables with mean μ and varianceσ². Set an integerp≥1. Assume that for some realm >2p, (Y_n) satisﬁes the moment condition

supn≥0E[|Yn+1|^m|Fn]<∞ a.s.

Let us consider the recursive estimator of the meanμdeﬁned by

μ_n+1=μ_n+γ_n(Y_n+1−μ_n).

This model is again a particular case of the model (1.3) with h(z) = μ−z and ε_n+1 = Y_n+1−μ. Assump- tions (H2), (H3), (H4) and (2.3) are satisﬁed. Consequently, assuming (H1) and choosing the steps such that (H5) holds, the cumulated estimation error satisﬁes

n→∞lim 1 s_n

n k=1

1

γ_k^p−1(μ_k−μ)^2p = Σ^2p(2p)!

2^pp! a.s.

n→∞lim 1 s_n

n k=1

1

γ_k^p−³² (μ_k−μ)^2p−1 = 0 a.s., withΣ²=σ²/(2−αξ), as well as the ASCLT

1 s_n

n k=1

γ_kδ√1γk(μk−μ)=⇒ N 0, Σ²

a.s.

4. Proof of Theorem 2.2

Remark 4.1. Assumption (H5) implies thatv_n ∈ GS(−α+ 2β), since v_n = γ_nσ⁻²_n , γ_n ∈ GS(−α) and σ_n ∈ GS(−β). Consequently one has

1−v_n−1

v_n =−α+ 2β n +o

1 n

=ξγ_n(−α+ 2β) +o(γ_n). Finally it comes

v_n−1

v_n = 1−γ_nξ(2β−α) +o(γ_n). (4.1)

This remark is very helpful for the proof of Theorem2.2and (4.1) is widely used.

(8)

First, the convergence is established for even moments, by induction on the power p. The recursive equation (1.3) yields

Z_n+1−z^∗=Y_n+1+σ_nε_n+1 where Y_n+1^def=Z_n−z^∗+γ_n[h(Z_n) +R_n+1]. (4.2) In view of assumptions (H2) and (H4), the sequence Y_n+1satisﬁes

Y_n+1= (Z_n−z^∗) [1 +Hγ_n] +γ_nr_n+1+O γ_n|Zn−z^∗|² .

In order to prove Theorem2.2, we shall apply several times the following lemma, established in Mokkadem and Pelletier [25].

Lemma 4.2. Assuming that there exists m > 2 such that (2.3) holds, under assumptions (H1) to (H5), the stepsized stochastic algorithm (Z_n)satisﬁes

|Z_n−z^∗|=O

v_n⁻¹logs_n

a.s. (4.3)

In particular, Lemma4.2consequently leads to

Y_n+1= (Z_n−z^∗) [1 +Hγ_n] +o γ_nv^−1/2_n (logs_n)^−q

a.s. (4.4)

for allq≥0.

4.1. Even moments

Elevating (4.2) to the power 2pit comes

(Z_n+1−z^∗)^2p= 2p

=0

C_2p σ_nε_n+1Y_n+1^2p−.

Setting V_n+1^def=√ v_n

Z_n+1−z^∗

, we get

V_n+1^2p −V_n^2p=v^p_nY_n+1^2p −V_n^2p+γ_n^pε^2p_n+1+

2p−1

=1

C_2p γ_n^/2(√

v_nY_n+1)^2p−ε_n+1. (4.5) From the obvious equality

V_n+1^2p = n k=1

V_k+1^2p −V_k^2p

+V₁^p, and from the recursive equation (4.5), one gets for anyn≥1,

V_n+1^2p =V₁^2p+A^(p)_n+1+W_n+1^(p) +

2p−1

=1

C_2p B_n+1^(p) (), (4.6)

where

W_n+1^(p) ^def= n k=1

γ_k^pε^2p_k+1, A^(p)_n+1^def= n k=1

v_k^pY_k+1^2p −V_k^2p

,

B_n+1^(p) ()^def= n k=1

γ_k^/2(√

v_kY_k+1)^2p−ε_k+1.

(9)

4.1.1. Case p= 1

Clearly equation (4.6) can be rewritten as

A⁽¹⁾_n+1=−W_n+1⁽¹⁾ +V_n+1² −V₁²−2B_n+1⁽¹⁾ (1). (4.7) Since the noise (ε_n) satisﬁes (H3) and assumption (2.3) for a real numberm >2, let us note that Chow’s lemma (seee.g.Duﬂo [10], p. 22) yields

n→∞lim 1

s_nW_n+1⁽¹⁾ =σ² a.s. (4.8)

Let us assume that

1 s_n

V_n+1² −V₁²−2B⁽¹⁾_n+1(1)

=o(1) +o s⁻¹_n A⁽¹⁾_n+1

a.s. (4.9)

We ﬁrst show how the combination of (4.8) and (4.9) gives the convergence (2.4) forp= 1. From equality (4.4) one has

Y_n+1² = (Z_n−z^∗)²(1 + 2γ_nH+o(γ_n)) +o γ_n²v_n⁻¹(logs_n)^−2q +o |Zn−z^∗|γ_nv_n^−1/2(logs_n)^−q

= (Z_n−z^∗)²(1 + 2γ_nH+o(γ_n)) +o v_n⁻¹γ_n(logs_n)^−q+1/2

,

the second equality resulting from Lemma 4.2. Consequently, applying this equality forq = 1/2 and denoting U_n ^def=√

v_n(Z_n−z^∗), the general term ofA⁽¹⁾_n+1 is of the form U_n²

1−v_n−1

v_n + 2γ_nH+o(γ_n)

+o(γ_n) a.s.

Hence we deduce from (4.1) by applying Kronecker lemma that

n→∞lim 1 s_n

n k=1

γ_kU_k²= 1

2H+ξ(2β−α) lim

n→∞

1

s_nA⁽¹⁾_n+1 a.s. (4.10)

Then the combination of (4.7)–(4.10) leads to (2.4).

It remains to prove (4.9). Obviously sinces_n increases to inﬁnity, almost surelyV₁^2p=o(s_n). Moreover, from Lemma 4.2, we have a.s.V_n+1² =o(s_n). To conclude we have to study the asymptotic behavior of B⁽¹⁾_n+1(1). We easily deduce from Kronecker lemma and from the decomposition (4.4) that we only have to show that

n k=1

√γ_kU_kε_k+1 =o A⁽¹⁾_n+1

a.s. and n k=1

γ_k^3/2ε_k+1=o(s_n) a.s.

The standard strong law of large numbers for regressive series (seee.g.Duﬂo [10], p. 26) yields

n k=1

√γ_kU_kε_k+1 =o

_n

k=1

γ_kU_k²

+O(1) a.s.

n k=1

γ_k^3/2ε_k+1 =O

_n

k=1

γ_k³

=o(s_n) a.s.

Since from (4.10), one has

o _n

k=1

γ_kv_k(Z_k−z^∗)²

=o A⁽¹⁾_n+1 a.s.,

(10)

it comes

B⁽¹⁾_n+1(1) =o A⁽¹⁾_n+1

+o(s_n) a.s., (4.11)

which concludes the proof of (4.9).

4.1.2. Case p≥2

Now, let p≥2 and assume that the convergence (2.4) holds for any powerq with 1≤q ≤p−1. First, we prove that

n→∞lim 1 s_n

n k=1

γ_kU_k^2p=−Σ² pσ² lim

n→∞

1

s_nA^(p)_n+1 a.s. (4.12)

From the decomposition (4.4) together with the result (4.3) of Lemma 4.2, it is easy to see that, for all q ≥

−p+ 1/2

Y_n+1^2p = (Z_n−z^∗)^2p(1 + 2pHγ_n+o(γ_n)) +o γ_nv_n^−p(logs_n)^−q

. In particular forq= 0, one has

v_n^pY_n+1^2p −V_n^2p=v^p_n(Z_n−z^∗)^2p

1−v^p_n−1 v^pn

+ 2pHγ_n+o(γ_n)

+o(γ_n) a.s.

Sincev_n ∈ GS(2β−α), the sequencev^p_n belongs toGS

p(2β−α)

and hence 1−v^p_n−1

v_n^p =p(2β−α)ξγ_n+o(γ_n). Finally, we get

v^p_nY_n+1^2p −V_n^2p=pγ_nU_n^2p[ξ(2β−α) + 2H] (1 +o(1)) +o(γ_n), and (4.12) is a straightforward application of Kronecker lemma. Thus, we have to show that

n→∞lim 1

s_nA^(p)_n+1=−pσ²(2p−1)g[2(p−1)] a.s. (4.13) The proof of (4.13) is constructed in the following way. Let us note that Lemma4.2implies thatV_n+1^2p =o(s_n).

First we establish the convergence ofs⁻¹_n W_n+1^(p) to zero. Then we show that

∀∈ {1, . . . ,2p−1}\{2} B_n+1^(p) () =o A^(p)_n+1

+o(s_n) a.s.

n→∞lim B^(p)_n+1(2) =g[2(p−1)]σ² a.s.

The decomposition (4.6) will lead to convergence (4.13).

Since the noise (ε_n) satisﬁes (2.3), by applying Chow’s lemma it comes W_n+1^(p) =O

_n

k=1

γ_k^p

a.s.

In addition,γ_nconverges almost surely to zero, henceγ_n^p=o(γ_n) a.s. Consequently, one gets from Kronecker’s Lemma

W_n+1^(p) =o(s_n) a.s. (4.14)

(11)

For establishing the asymptotic behavior ofB^(p)_n+1, let us split it into two terms,B_n+1^(p) () =C_n+1^(p) () +D^(p)_n+1() with

C_n+1^(p) ()^def= n k=1

γ_k^/2(√

v_kY_k+1)^2p−e_k+1(), D^(p)_n+1()^def=

n k=1

γ_k^/2(√

v_kY_k+1)^2p−E

ε_k+1|Fk

,

ande_n+1()^def=ε_n+1−E

ε_n+1|Fn

. First, we claim that for any 3≤≤2p−1,

D^(p)_n+1()=o(s_n) a.s. (4.15)

One can easily see from the H¨older inequality and from the moment assumption (2.3) that each moment E

ε_n+1|Fnis almost surely bounded and consequently D^(p)_n+1()=O

_n

k=1

γ_k^/2(√

v_kY_k+1)^2p−

a.s.

Ifis even, for allq≥0, (4.4) leads to, γ_n^/2(√

v_nY_n+1)^2p− =γ_n^/2

U_n(1 +o(1)) +o γ_n(logs_n)^−q _2p−

=γ_n^/2

U_n^2p−(1 +o(1)) +o γ_nU_n^2p−−1(logs_n)^−q

=γ_n^/2U_n^2p−[1 +o(1)] +o γ_n^/2+1(logs_n)−q+p−/2−1/2 . Then the induction assumption together with Kronecker lemma yield

n k=1

γ^/2_k (√

v_kY_k+1)^2p−=o(s_n) a.s.

Consequently the convergence (4.15) holds foreven such that 3≤≤2p−1. Otherwise in the case whenis odd and≥3 (sinceDn+1(1) = 0 is a trivial case), applying the Cauchy–Schwarz inequality it comes

D^(p)_n+1()²=O _n

k=1

γ_k(√

v_kY_k+1)^2p−2 n k=1

γ_k⁻¹(√

v_kY_k+1)^2(p−+1)

a.s.

Again the induction assumption and Kronecker’s lemma lead to (4.15). For the last termD⁽²⁾_n+1(), the convergence

n→∞lim 1

s_nD^(p)_n+1(2) =g[2(p−1)]σ² (4.16) is a consequence of the induction assumption and Kronecker lemma too.

It remains to study the asymptotic behavior ofC_n+1^(p) () for any 1≤≤2p−1. We claim that C_n+1^(p) ()=o(s_n) +o A^(p)_n+1

a.s. (4.17)

(12)

We easily deduce from (4.4) and from Kronecker lemma that we only have to show that n

k=1

γ_k^/2U_k^2p−e_k+1() =o(s_n) +o A^(p)_n+1 a.s.

n k=1

γ_k^1+/2e_k+1() =o(s_n) a.s.

According to the standard strong law of large numbers for regressive series one has, for anyδ >0,

n k=1

γ_k^/2U_k^2p−e_k+1()

2

=o

⎛

⎝ⁿ

k=1

γ_kU_k^2(2p−)

log n k=1

γ_kU_k^2(2p−) _1+δ⎞

⎠ a.s.

n k=1

γ_k^1+/2e_k+1() =o

_n

k=1

γ_k²⁺

a.s.

Sinceγ_n decreases to zero, it comes in particular

n k=1

γ_k^1+/2e_k+1()

=o(s_n) a.s.

First, assume that 1≤≤p−1. Since n k=1

γ_kU_k^2(2p−)=O

1≤k≤nsup U_k^2p n k=1

γ_kU_k^2(p−)

a.s., the conjonction of Lemma4.2and the induction assumption leads to

n k=1

γ_k^/2U_k^2p−e_k+1()

=o(s_n) a.s.

The particular case=pis also a consequence of both the strong law of large numbers and Kronecker lemma and one has

n k=1

γ_k^pU_k^2p=o A^(p)_n+1 a.s.

Consequently Kronecker lemma leads to (4.17).

Now assume thatp < ≤2p−1. According to Chow’s lemma, one has

n k=1

γ_k^/2U_k^2p−e_k+1()

=o(ν_n()) a.s. with ν_n()^def= n k=1

γ_k^/2U_k^2p−^2p/.

Applying H¨older inequality with the exponents/pand/(−p), the induction assumption impliesν_n() =o(s_n), which concludes the proof of (4.17).

Finally the conjonction of the convergences (4.14)–(4.17), leads to

n→∞lim 1 s_n

A^(p)_n+1+W_n+1^(p) +

2p−1

=1

C_2p B^(p)_n+1()

= lim

n→∞

1

s_nA^(p)_n+1+C_2p² σ²g[2(p−1)] a.s.

So, from the decomposition (4.6) we get

n→∞lim 1

s_nA^(p)_n+1=−C_2p² σ²g[2(p−1)] a.s., which concludes the proof of (2.4).

(13)

4.2. Odd moments

Now let us deal with the odd moments with the proof of (2.5). Elevating equation (4.2) to power 2p−1 and proceeding exactly as in the case of even moments, it comes

V_n+1^2p−1=V₁^2p−1+A^(p)_n+1+W_n+1^(p) +

2p−2

=1

C_2p−1 B^(p)_n+1(), (4.18)

where

W_n+1^(p) ^def= n k=1

γ_k^p−1/2ε^2p−1_k+1 , A^(p)_n+1^def= n k=1

v_k^p−1/2Y_k+1^2p−1−V_k^2p−1

,

B^(p)_n+1()^def= n k=1

γ_k^/2(√

v_kY_k+1)^2p−−1ε_k+1. 4.2.1. Case p= 1

Forp= 1, the decomposition (4.18) can be rewritten as

A⁽¹⁾_n+1=V_n+1−V₁−W_n+1⁽¹⁾ . (4.19) An application of the standard strong law of large numbers for regressive series leads to

W_n+1⁽¹⁾ =o(s_n) a.s.

ClearlyV₁ =o s_n

sinces_n increases to inﬁnity and again from Lemma4.2, one hasV_n+1 =o(s_n) a.s. Hence we deduce from (4.19) thatA_n+1=o(s_n) a.s.

Moreover, due to the decomposition (4.4) the general term ofA_n+1 satisﬁes for allq≥0,

√v_nY_n+1−V_n =U_n

1 +Hγ_n− v_n−1

v_n

+o γ_n(logs_n)^−q

a.s. (4.20)

Sincev_n ∈ GS(2β−α),√

v_n∈ GS(β−α/2) and consequently 1−

v_n−1

v_n =ξγ_n(β−α/2) +o(γ_n). In particular forq= 0, the equation (4.20) yields

√v_nY_n+1−V_n =γ_nU_n[H+ξ(β−α/2)] (1 +o(1)) +o(γ_n) a.s., which clearly leads to (2.5) from Kronecker lemma, since

n→∞lim 1 s_n

n k=1

γ_kU_k = 1

H+ξ(β−α/2) lim

n→∞

1

s_nA⁽¹⁾_n+1= 0 a.s.

4.2.2. Case p≥2

Now letp≥2 and assume that convergence (2.5) holds for any powerqwith 1≤q≤p−1. The structure of the proof is the same as in the case of even moments. It consists in establishing the convergenceA^(p)_n+1=o(s_n) from the decomposition (4.18).

(14)

First, Chow’s Lemma implies

W_n+1^(p) =O _n

k=1

γ_k^p−1/2

a.s.

Sincep≥2 andγ_n goes to zero, one gets

W_n+1^(p) =o(s_n) a.s. (4.21)

Moreover, clearly we haveV₁^p=o(s_n) a.s. andV_n+1^p =o(s_n) a.s. from Lemma4.2. Let us splitB_n+1^(p) () into two terms,B^(p)_n+1() =C_n+1^(p) () +D^(p)_n+1() with

C_n+1^(p) ()^def= n k=1

γ^/2_k (√

v_kY_k+1)^2p−−1e_k+1(),

D^(p)_n+1()^def= n k=1

γ^/2_k (√

v_kY_k+1)^2p−−1E

ε_k+1|Fk

.

For the ﬁrst termC_n+1^(p) , we deduce from (4.4) and from Kronecker lemma that it suﬃces to show that n

k=1

γ_k^/2U_k^2p−−1e_k+1() =o(s_n) a.s. (4.22) n

k=1

γ_k^1+/2e_k+1() =o(s_n) a.s.

Once again, one gets from the strong law of large numbers that for allδ >0,

n k=1

γ_k^/2U_k^2p−−1e_k+1()

2

=o

⎛

⎝ⁿ

k=1

γ_kU_k^2(2p−−1)

log n k=1

γ_kU_k^2(2p−−1) _1+δ⎞

⎠ a.s.

n k=1

γ_k^1+/2e_k+1() =o

_n

k=1

γ²⁺_k

=o(s_n) a.s.

First, assume that≥p−1. The convergence (2.4) implies n

k=1

γ_kU_k^2(2p−−1)=O(s_n) a.s.

Hence, (4.22) is proven sinceγ_n goes to zero. Now let 1≤ < p−1. Since n

k=1

γ_kU_k^2(2p−−1)=O

1≤k≤nsup U_k^2(p−−1) n k=1

γ_kU_k^2p

a.s.,

the conjonction of Lemma4.2and convergence (2.4) leads to (4.22). Finally, we have established that

C_n+1^(p) () =o(s_n) a.s. (4.23)

Let us now study the asymptotic behavior ofD_n+1() in the following three cases.