DOI:10.1051/ps/2011155 www.esaim-ps.org
ON THE CONVERGENCE OF MOMENTS IN THE ALMOST SURE CENTRAL LIMIT THEOREM FOR STOCHASTIC APPROXIMATION ALGORITHMS
Peggy C´ enac
Abstract. We study the almost sure asymptotic behaviour of stochastic approximation algorithms for the search of zero of a real function. The quadratic strong law of large numbers is extended to the powers greater than one. In other words, the convergence of moments in the almost sure central limit theorem (ASCLT) is established. As a by-product of this convergence, one gets another proof of ASCLT for stochastic approximation algorithms. The convergence result is applied to several examples as estimation of quantiles and recursive estimation of the mean.
Mathematics Subject Classification. 60F05, 62L20, 60G42.
Received October 1st, 2010.
1. Introduction
Let (Xn) be a sequence of independent and identically distributed (i.i.d.) random variables withE[Xn] = 0 andE[Xn2] =σ2. For the empirical measures
Gn def= 1 logn
n k=1
1 kδSk√
k with Sn def= n k=1
Xk,
the almost sure central limit theorem (ASCLT) states that, with probability one,Gn=⇒GwhereGstands for theN(0, σ2) distribution and =⇒denotes the convergence in distribution. It was simultaneously established by Brosamler [5], Schatte [29] and in the present form by Lacey and Phillip [16]. Moreover, under assumptions on the moments of (Xn), the convergence of moments in the ASCLT has been established recently by Bercu [2], Bercu and Fort [3]. That is to say, the following convergences hold:
n→∞lim 1 n
n k=1
1 k
Sk
√k 2p
= σ2p(2p)!
2pp! a.s., (1.1)
n→∞lim 1 n
n k=1
1 k
Sk
√k 2p−1
= 0 a.s. (1.2)
In fact, this result is a simplified version for i.i.d. random variables. It is extended in a context of martingales.
The case of multidimensional martingales is considered in Bercu et al. [4]. Despite the availability of a wide
Keywords and phrases.Stochastic approximation algorithms, almost sure central limit theorem, martingale transforms, moments.
1 Institut de Math´ematiques de Bourgogne, IMB UMR 5584 CNRS, 9 rue Alain Savary, BP 47870, 21078 Dijon Cedex, France.
Article published by EDP Sciences c EDP Sciences, SMAI 2013
literature concerning the ASCLT for independent variables, very few references can be found on the ASCLT for martingales apart from the important contribution of Chaˆabane [6,7], Chaˆabane and Maˆaouia [8], Chaˆabane et al. [9] and Lifshits [22,23]. As a by-product of the convergence of moments, Bercu and Fort [3] propose a proof of the ASCLT for unidimensional martingale transforms via an original approach which uses the Carleman moment theorem. In this paper, we prove that stochastic approximation algorithms used for the search of zeros and which are known to satisfy an ASCLT (see Pelletier [27]) also fulfill the convergence of moments in the ASCLT.
More precisely, we consider a stochastic algorithm of the form
Zn+1=Zn+γn[h(Zn) +Rn+1] +σnεn+1, (1.3) where the function h is defined on R and R-valued. The two sequences (Rn) and (εn) are two real distur- bances, defined on a probability space and adapted to a filtrationFdef= (Fn)n≥0. The stepsizes (γn) and (σn) are two deterministic positive sequences going to zero. This model includes the algorithms of Robbins–Monro and Kiefer–Wolfowitz as well as algorithms with Markovian disturbances. Stochastic approximation algorithms such as (1.3) used for the search of zeros ofhhave been widely studied under various assumptions (see Duflo [10] for an overview of these algorithms). In the context of Brownian diffusion process, Lamberton and Pag`es [17,18]
establish the convergence of weighted empirical measures in the form ofGn with a view to getting an approxi- mation of the diffusion’s invariant distribution, even in the case when the diffusion may have several invariant distributions. As a corollary of their main convergence result, they get the almost sure central limit theorem in the context of Brownian diffusion processes.
Letz∗be a zero ofh. Many criteria ensure the almost sure convergence of (Zn) toz∗. In the wide literature concerning this convergence, let us refer to Benvenisteet al.[1], Duflo [10], Dupuis and Kushner [11], Hall and Heyde [13], Kushner and Clark [15], Ljunget al.[24]. Moreover, in the case when (Zn) converges almost surely toz∗, the weak convergence rate is given by
γn
σn2 (Zn−z∗) =⇒ N 0, Σ2
, (1.4)
whereΣ2is a positive real number related to the second moment of (εn) and to the differential of the functionh at the pointz∗(see among many others Benvenisteet al.[1], Duflo [10], Kushner and Clark [15], Ljunget al.[24], Zhu [30]).
Let us definevn def=γnσ−2n . The main goal of this paper is to establish an analog result of (1.1) and (1.2) in the context of stochastic approximation algorithms. More precisely, under appropriate assumptions on moments of the noise (εn), we prove the following result: forp≥1,
n→∞lim n1
k=1γk n k=1
γk[√
vk(Zk−z∗)]p=g(p) a.s., (1.5)
where g(p) is the moment of order p of the Gaussian distribution N(0, Σ2). The ASCLT for stochastic ap- proximation algorithms can be viewed as a corollary of this result due to Carleman’s theorem. By the way, in the scalar case, we get another proof for the ASCLT established by Pelletier [27]. Moreover, in the particular casep= 2, the convergence (1.5) is the quadratic strong law of large numbers established by Le Breton [19], Le Breton and Novikov [20,21], Pelletier [26]. The main contribution of this paper is the extension (1.5) of the strong law to any powerp.
The paper is organized as follows. Section2is devoted to the assumptions and the presentation of the main results. In Section3, we propose some statistical applications to estimation errors for several particular cases of Robbins–Monro procedure. The proofs are established in Section4.
2. Assumptions and main results
Let us begin with the definition of the class of positive sequences that will be used in our assumptions. This definition is introduced in Mokkadem and Pelletier [25].
Definition 2.1. Let α ∈ R and (vn) be a nonrandom positive sequence. Then (vn) is said to be in the setGS(α) if
n→∞lim n
1−vn−1 vn
=α.
For example, typical sequences inGS(α) arenα(logn)β or nα(log logn)β, forα, β∈R. The assumptions we will refer to in the sequel are the following.
(H1) Zn converges almost surely toz∗.
(H2) The function his defined onR andz∗is a zero of hsuch that, on a neighborhood ofz∗, h(z) =H(z−z∗) +O |z−z∗|2
, (2.1)
where H <0.
(H3) The noise (εn) is a martingale difference sequence such that
n→∞lim E
ε2n+1|Fn
=σ2 a.s.
(H4) The disturbance (Rn) is split into two terms such that Rn =rn+O |Zn−1−z∗|2
a.s.,
|rn|=o vn−1/2(logsn)−q
a.s. ∀q≥0, (2.2)
where sndef= n k=1
γk.
(H5) The nonrandom sequences (γn) and (σn) are such that (γn)∈ GS(−α) with α∈
max
1 2,2
a
,1
, (σn)∈ GS(−β) with β ∈α
2, α
,
n→∞lim nγn>−2β−α 2H · Letξdef= limn→∞
nγn−1
. We can now define the asymptotic varianceΣ2 which appears in (1.4):
Σ2def= −σ2 2H+ξ(2β−α)·
Note that (H5) implies thatξ∈[0,−2H/(2β−α)[ and thus thatΣ2is well defined and strictly positive.
Comments on the assumptions.
The residual term (Rn) enables the study of algorithms with small Markovian disturbances. The way these algorithms can be rewritten as equation (1.3) is detailed in Duflo [10].
Let us now comment the last assumption (H5) on the gains. For the usual gains γn= γ0
nα and σn= σ0
√nα+β, with γ0>0, σ0>0, and 0< β≤α,
if α ∈] max{1/2,2/a},1[ or (α = 1 and β < −2Hγ0), assumption (H5) is fulfilled. In the particular case of Robbins–Monro algorithm, that is to say Rn = 0, σn =γn, (H5) contains for example the gainsγn =σn = γ0/nα with γ0 >0 and α∈] max{1/2,2/a},1[ or (α= 1 andγ0 > −1/2H). The Kiefer–Wolfowitz algorithm corresponds to the caseh=V where V is observable only together with a noise. For example classical gains satisfying (H5) could be
γn = γ0
nα, σn =nτγn, α
6 ≤τ < α
2, and γ0>0, σ0>0,
α∈] max{1/2,2/a},1[ or (α= 1 andγ0> 2τ−12H ). In these examples, the optimal rate is obtained forγn=γ0/n but leads to a condition on the initialγ0. To circumvent this condition, Koval and Schwabe [14] introduced the stepsize, for any (p, d)∈N2,
γn= γ0 logpnd
n , where logpn= log
logp−1n ,
which also fulfills (H5). This stepsize leads to convergence rates very close to the ones obtained withγ0/nbut without any constraints on the initialγ0.
The main result of this paper is the following theorem, which establishes the convergence of moments in the ASCLT for stochastic algorithms.
Theorem 2.2. Set an integer p≥1. Assume that for some realm >2p,(εn)satisfies the moment condition supn≥0E[|εn+1|m|Fn]<∞ a.s. (2.3) Under assumptions (H1) to(H5), one has
n→∞lim 1 sn
n k=1
γk[√
vk(Zk−z∗)]2p = Σ2p(2p)!
2pp! a.s. (2.4)
n→∞lim 1 sn
n k=1
γk[√
vk(Zk−z∗)]2p−1 = 0 a.s. (2.5)
In the particular casep= 1, the convergence (2.4) gives the quadratic strong law established by Pelletier [26].
Theorem2.2 extends this result to all powers = 2. The limiting constants in (2.4) and (2.5) correspond to the moments of the Gaussian distributionN(0, Σ2).
The classical moment problem is related to the question whether or not a given sequence uniquely determines the associated probability distribution. The well-known Carleman theorem (seee.g.Feller [12]) gives a condition on the moments in order to ensure the unicity of the distribution.
Theorem 2.3 (Carleman). A probability distribution is uniquely determined by its moments(mn)if ∞
n=1
m−1/2n2n =∞.
Since the Gaussian limit distribution satisfies Carleman’s moment condition, we deduce the following corollary from Theorem2.2.
Corollary 2.4 (ASCLT).Assume that(εn)is a martingale difference sequence such that, for all integer p≥1, supn≥0E[|εn+1|p|Fn]<∞ a.s.
Under assumptions (H1) to(H5), one has 1 sn
n k=1
γkδ√vk(Zk−z∗)=⇒ N 0, Σ2
a.s.
As a straightforward application, the following strong law holds.
Corollary 2.5. Assume that(εn)is a martingale difference sequence such that, for all integer p≥1, supn≥0E[|εn+1|p|Fn]<∞ a.s.
Moreover, assume that f stands for any almost everywhere continuous function, with polynomial growth at infinity. Under assumptions(H1)to(H5), one has
n→∞lim 1 sn
n k=1
γkf(√
vk(Zk−z∗)) =
fdGΣ a.s., whereGΣ denotes the N(0, Σ)distribution.
This corollary gives approximations of gaussian integral of almost everywhere continuous function with polyno- mial growth at infinity.
The Proof of Theorem2.2is given in Section4.
Remark 2.6. Assumptions (H1) to (H5) are equivalent to those of Pelletier [27] establishing the ASCLT. Here we assume the conditional moment assumption besides to get the stronger result of convergence of moments in the ASCLT.
3. Examples of applications
Let us give here three examples of estimation for which the Theorem 2.2 apply. These examples are dif- ferent forms of Robbins–Monro procedure, described as follows. Robbins–Monro algorithm (see Robbins and Monro [28]) is used for solving the equation f(x) = α, wheref is an Rd-valued function and α∈ Rd. More precisely, this algorithm is used when f can be rewritten in the formf(x) =E[F(x, X)]. Then the solution of the equationf(x) =αcan be recursively approximated by
Un+1=Un−γn(F(Un, Xn+1)−α),
where γn≥0 is a deterministic sequence going to zero and (Xn) is a sequence of independent and identically distributed random variables. Obvisouly, as mentioned in the introduction, Robbins–Monro algorithm is a particular case of model (1.3).
3.1. Translation parameter
Let (Yn) be a sample from a distribution F on R, with density f with respect to the Lebesgue measure.
The sequence of random variables (Yn) is not observable, but eachYn is supposed to be centered. We can only observe a translation model (Xn) where Xn = Yn +θ. The real translation parameter θ is unknown. Since E[Y1] = 0, the parameterθ is the mean ofXn,θ=E[Xn].
Without knowing the function f, we may also assume that f is even, strictly positive and continuously differentiable. Let us define the recursive estimator ofθdefined by
θn+1=θn−γn
11{X
n+1≤θn}−1 2
· (3.1)
This algorithm is a particular case of the model (1.3) withσn=γn,Rn+1= 0 and h(z) =1
2 −E
11{Xn+1≤z}|Fn
,
whereFnis theσ-field of events prior tonandFdef= (Fn) is the natural filtration. Indeed, (3.1) can be rewritten as θn+1=θn+γnh θn
+γnεn+1, with εn+1 = E
11{X
n+1≤θn}|Fn
−11{X
n+1≤θn}. Obviously, assumptions (H4) and (2.3) for all integer p are satisfied, as well as (H3) with σ2= 1/4. It remains to trust (H2). SinceXn+1 is independent ofFn, one has
h(z) = 1
2 −P(Xn+1≤z) = 1
2 −P(Yn+1≤z−θ).
The functionf is continuous and strictly positive hencehis differentiable andh(z) =−f(z−θ). Moreover θ is a zero ofh. Taylor-Young formula leads to
h(z)−h(θ) =−f(0)(z−θ) +O
(z−θ)2 ,
with f(0)>0. Thus (H2) holds. Consequently, for any integer p, assuming (H1) and choosing the steps (γn) such that (H5) holds, Theorem 2.2 gives the asymptotic results on the estimation errors. The sequence of estimates (θn) satisfies
n→∞lim 1 sn
n k=1
1 γkp−1
θk−θ 2p
= Σ2p(2p)!
2pp! a.s.
n→∞lim 1 sn
n k=1
1 γkp−32
θk−θ 2p−1
= 0 a.s., withΣ2=
4
2f(0)−αξ−1
and the ASCLT for (θn) holds:
1 sn
n k=1
γkδ√1γk(θk−θ) =⇒ N 0, Σ2
a.s.
3.2. Recursive estimation of quantiles
More generally, with a procedure of estimation analogous to the preceding one, we can give asymptotic convergence results on the error of quantile’s estimation. We consider a sample (Yn) with a strictly increasing distribution function F. Let us define qthe quantile of orderδ, that is to sayδdef=F(q). Without knowing the functionF, we may also assume that the densityf =F is continuously differentiable.
Let us define the recursive estimator
qn+1=qn−γn
11{Yn+1≤qn}−δ
. (3.2)
Again, the sequence (qn) is a particular case of (1.3) withσn =γn,Rn+1= 0, h(z) =δ−E
11{Yn+1≤z}|Fn
=δ−F(z),
andεn+1=E
11{Yn+1≤qn}|Fn
−11{Yn+1≤qn}. Obviously, assumptions (H4) and (2.3) for all integerpare satisfied, as well as (H3) withσ2=δ(1−δ). Moreover, the functionhclearly satisfies (H2) for the zeroqof h. Conse- quently, for any integerp, assuming (H1) and choosing the steps such that (H5) holds, Theorem2.2leads to
n→∞lim 1 sn
n k=1
1
γkp−1(qk−q)2p = Σ2p(2p)!
2pp! a.s.
n→∞lim 1 sn
n k=1
1
γkp−32 (qk−q)2p−1 = 0 a.s., withΣ2=δ(1−δ)/[2f(q)−αξ]. Moreover we have
1 sn
n k=1
γkδ√1
γk(qk−q)=⇒ N 0, Σ2
a.s.
3.3. Recursive estimation of the mean
Let (Yn) be a sequence of independent and identically distributed real random variables with mean μ and varianceσ2. Set an integerp≥1. Assume that for some realm >2p, (Yn) satisfies the moment condition
supn≥0E[|Yn+1|m|Fn]<∞ a.s.
Let us consider the recursive estimator of the meanμdefined by
μn+1=μn+γn(Yn+1−μn).
This model is again a particular case of the model (1.3) with h(z) = μ−z and εn+1 = Yn+1−μ. Assump- tions (H2), (H3), (H4) and (2.3) are satisfied. Consequently, assuming (H1) and choosing the steps such that (H5) holds, the cumulated estimation error satisfies
n→∞lim 1 sn
n k=1
1
γkp−1(μk−μ)2p = Σ2p(2p)!
2pp! a.s.
n→∞lim 1 sn
n k=1
1
γkp−32 (μk−μ)2p−1 = 0 a.s., withΣ2=σ2/(2−αξ), as well as the ASCLT
1 sn
n k=1
γkδ√1γk(μk−μ)=⇒ N 0, Σ2
a.s.
4. Proof of Theorem 2.2
Remark 4.1. Assumption (H5) implies thatvn ∈ GS(−α+ 2β), since vn = γnσ−2n , γn ∈ GS(−α) and σn ∈ GS(−β). Consequently one has
1−vn−1
vn =−α+ 2β n +o
1 n
=ξγn(−α+ 2β) +o(γn). Finally it comes
vn−1
vn = 1−γnξ(2β−α) +o(γn). (4.1)
This remark is very helpful for the proof of Theorem2.2and (4.1) is widely used.
First, the convergence is established for even moments, by induction on the power p. The recursive equation (1.3) yields
Zn+1−z∗=Yn+1+σnεn+1 where Yn+1def=Zn−z∗+γn[h(Zn) +Rn+1]. (4.2) In view of assumptions (H2) and (H4), the sequence Yn+1satisfies
Yn+1= (Zn−z∗) [1 +Hγn] +γnrn+1+O γn|Zn−z∗|2 .
In order to prove Theorem2.2, we shall apply several times the following lemma, established in Mokkadem and Pelletier [25].
Lemma 4.2. Assuming that there exists m > 2 such that (2.3) holds, under assumptions (H1) to (H5), the stepsized stochastic algorithm (Zn)satisfies
|Zn−z∗|=O
vn−1logsn
a.s. (4.3)
In particular, Lemma4.2consequently leads to
Yn+1= (Zn−z∗) [1 +Hγn] +o γnv−1/2n (logsn)−q
a.s. (4.4)
for allq≥0.
4.1. Even moments
Elevating (4.2) to the power 2pit comes
(Zn+1−z∗)2p= 2p
=0
C2p σnεn+1Yn+12p−.
Setting Vn+1def=√ vn
Zn+1−z∗
, we get
Vn+12p −Vn2p=vpnYn+12p −Vn2p+γnpε2pn+1+
2p−1
=1
C2p γn/2(√
vnYn+1)2p−εn+1. (4.5) From the obvious equality
Vn+12p = n k=1
Vk+12p −Vk2p
+V1p, and from the recursive equation (4.5), one gets for anyn≥1,
Vn+12p =V12p+A(p)n+1+Wn+1(p) +
2p−1
=1
C2p Bn+1(p) (), (4.6)
where
Wn+1(p) def= n k=1
γkpε2pk+1, A(p)n+1def= n k=1
vkpYk+12p −Vk2p
,
Bn+1(p) ()def= n k=1
γk/2(√
vkYk+1)2p−εk+1.
4.1.1. Case p= 1
Clearly equation (4.6) can be rewritten as
A(1)n+1=−Wn+1(1) +Vn+12 −V12−2Bn+1(1) (1). (4.7) Since the noise (εn) satisfies (H3) and assumption (2.3) for a real numberm >2, let us note that Chow’s lemma (seee.g.Duflo [10], p. 22) yields
n→∞lim 1
snWn+1(1) =σ2 a.s. (4.8)
Let us assume that
1 sn
Vn+12 −V12−2B(1)n+1(1)
=o(1) +o s−1n A(1)n+1
a.s. (4.9)
We first show how the combination of (4.8) and (4.9) gives the convergence (2.4) forp= 1. From equality (4.4) one has
Yn+12 = (Zn−z∗)2(1 + 2γnH+o(γn)) +o γn2vn−1(logsn)−2q +o |Zn−z∗|γnvn−1/2(logsn)−q
= (Zn−z∗)2(1 + 2γnH+o(γn)) +o vn−1γn(logsn)−q+1/2
,
the second equality resulting from Lemma 4.2. Consequently, applying this equality forq = 1/2 and denoting Un def=√
vn(Zn−z∗), the general term ofA(1)n+1 is of the form Un2
1−vn−1
vn + 2γnH+o(γn)
+o(γn) a.s.
Hence we deduce from (4.1) by applying Kronecker lemma that
n→∞lim 1 sn
n k=1
γkUk2= 1
2H+ξ(2β−α) lim
n→∞
1
snA(1)n+1 a.s. (4.10)
Then the combination of (4.7)–(4.10) leads to (2.4).
It remains to prove (4.9). Obviously sincesn increases to infinity, almost surelyV12p=o(sn). Moreover, from Lemma 4.2, we have a.s.Vn+12 =o(sn). To conclude we have to study the asymptotic behavior of B(1)n+1(1). We easily deduce from Kronecker lemma and from the decomposition (4.4) that we only have to show that
n k=1
√γkUkεk+1 =o A(1)n+1
a.s. and n k=1
γk3/2εk+1=o(sn) a.s.
The standard strong law of large numbers for regressive series (seee.g.Duflo [10], p. 26) yields
n k=1
√γkUkεk+1 =o
n
k=1
γkUk2
+O(1) a.s.
n k=1
γk3/2εk+1 =O
n
k=1
γk3
=o(sn) a.s.
Since from (4.10), one has
o n
k=1
γkvk(Zk−z∗)2
=o A(1)n+1 a.s.,
it comes
B(1)n+1(1) =o A(1)n+1
+o(sn) a.s., (4.11)
which concludes the proof of (4.9).
4.1.2. Case p≥2
Now, let p≥2 and assume that the convergence (2.4) holds for any powerq with 1≤q ≤p−1. First, we prove that
n→∞lim 1 sn
n k=1
γkUk2p=−Σ2 pσ2 lim
n→∞
1
snA(p)n+1 a.s. (4.12)
From the decomposition (4.4) together with the result (4.3) of Lemma 4.2, it is easy to see that, for all q ≥
−p+ 1/2
Yn+12p = (Zn−z∗)2p(1 + 2pHγn+o(γn)) +o γnvn−p(logsn)−q
. In particular forq= 0, one has
vnpYn+12p −Vn2p=vpn(Zn−z∗)2p
1−vpn−1 vpn
+ 2pHγn+o(γn)
+o(γn) a.s.
Sincevn ∈ GS(2β−α), the sequencevpn belongs toGS
p(2β−α)
and hence 1−vpn−1
vnp =p(2β−α)ξγn+o(γn). Finally, we get
vpnYn+12p −Vn2p=pγnUn2p[ξ(2β−α) + 2H] (1 +o(1)) +o(γn), and (4.12) is a straightforward application of Kronecker lemma. Thus, we have to show that
n→∞lim 1
snA(p)n+1=−pσ2(2p−1)g[2(p−1)] a.s. (4.13) The proof of (4.13) is constructed in the following way. Let us note that Lemma4.2implies thatVn+12p =o(sn).
First we establish the convergence ofs−1n Wn+1(p) to zero. Then we show that
∀∈ {1, . . . ,2p−1}\{2} Bn+1(p) () =o A(p)n+1
+o(sn) a.s.
n→∞lim B(p)n+1(2) =g[2(p−1)]σ2 a.s.
The decomposition (4.6) will lead to convergence (4.13).
Since the noise (εn) satisfies (2.3), by applying Chow’s lemma it comes Wn+1(p) =O
n
k=1
γkp
a.s.
In addition,γnconverges almost surely to zero, henceγnp=o(γn) a.s. Consequently, one gets from Kronecker’s Lemma
Wn+1(p) =o(sn) a.s. (4.14)
For establishing the asymptotic behavior ofB(p)n+1, let us split it into two terms,Bn+1(p) () =Cn+1(p) () +D(p)n+1() with
Cn+1(p) ()def= n k=1
γk/2(√
vkYk+1)2p−ek+1(), D(p)n+1()def=
n k=1
γk/2(√
vkYk+1)2p−E
εk+1|Fk
,
anden+1()def=εn+1−E
εn+1|Fn
. First, we claim that for any 3≤≤2p−1,
D(p)n+1()=o(sn) a.s. (4.15)
One can easily see from the H¨older inequality and from the moment assumption (2.3) that each moment E
εn+1|Fnis almost surely bounded and consequently D(p)n+1()=O
n
k=1
γk/2(√
vkYk+1)2p−
a.s.
Ifis even, for allq≥0, (4.4) leads to, γn/2(√
vnYn+1)2p− =γn/2
Un(1 +o(1)) +o γn(logsn)−q 2p−
=γn/2
Un2p−(1 +o(1)) +o γnUn2p−−1(logsn)−q
=γn/2Un2p−[1 +o(1)] +o γn/2+1(logsn)−q+p−/2−1/2 . Then the induction assumption together with Kronecker lemma yield
n k=1
γ/2k (√
vkYk+1)2p−=o(sn) a.s.
Consequently the convergence (4.15) holds foreven such that 3≤≤2p−1. Otherwise in the case whenis odd and≥3 (sinceDn+1(1) = 0 is a trivial case), applying the Cauchy–Schwarz inequality it comes
D(p)n+1()2=O n
k=1
γk(√
vkYk+1)2p−2 n k=1
γk−1(√
vkYk+1)2(p−+1)
a.s.
Again the induction assumption and Kronecker’s lemma lead to (4.15). For the last termD(2)n+1(), the conver- gence
n→∞lim 1
snD(p)n+1(2) =g[2(p−1)]σ2 (4.16) is a consequence of the induction assumption and Kronecker lemma too.
It remains to study the asymptotic behavior ofCn+1(p) () for any 1≤≤2p−1. We claim that Cn+1(p) ()=o(sn) +o A(p)n+1
a.s. (4.17)
We easily deduce from (4.4) and from Kronecker lemma that we only have to show that n
k=1
γk/2Uk2p−ek+1() =o(sn) +o A(p)n+1 a.s.
n k=1
γk1+/2ek+1() =o(sn) a.s.
According to the standard strong law of large numbers for regressive series one has, for anyδ >0,
n k=1
γk/2Uk2p−ek+1()
2
=o
⎛
⎝n
k=1
γkUk2(2p−)
log n k=1
γkUk2(2p−) 1+δ⎞
⎠ a.s.
n k=1
γk1+/2ek+1() =o
n
k=1
γk2+
a.s.
Sinceγn decreases to zero, it comes in particular
n k=1
γk1+/2ek+1()
=o(sn) a.s.
First, assume that 1≤≤p−1. Since n k=1
γkUk2(2p−)=O
1≤k≤nsup Uk2p n k=1
γkUk2(p−)
a.s., the conjonction of Lemma4.2and the induction assumption leads to
n k=1
γk/2Uk2p−ek+1()
=o(sn) a.s.
The particular case=pis also a consequence of both the strong law of large numbers and Kronecker lemma and one has
n k=1
γkpUk2p=o A(p)n+1 a.s.
Consequently Kronecker lemma leads to (4.17).
Now assume thatp < ≤2p−1. According to Chow’s lemma, one has
n k=1
γk/2Uk2p−ek+1()
=o(νn()) a.s. with νn()def= n k=1
γk/2Uk2p−2p/.
Applying H¨older inequality with the exponents/pand/(−p), the induction assumption impliesνn() =o(sn), which concludes the proof of (4.17).
Finally the conjonction of the convergences (4.14)–(4.17), leads to
n→∞lim 1 sn
A(p)n+1+Wn+1(p) +
2p−1
=1
C2p B(p)n+1()
= lim
n→∞
1
snA(p)n+1+C2p2 σ2g[2(p−1)] a.s.
So, from the decomposition (4.6) we get
n→∞lim 1
snA(p)n+1=−C2p2 σ2g[2(p−1)] a.s., which concludes the proof of (2.4).
4.2. Odd moments
Now let us deal with the odd moments with the proof of (2.5). Elevating equation (4.2) to power 2p−1 and proceeding exactly as in the case of even moments, it comes
Vn+12p−1=V12p−1+A(p)n+1+Wn+1(p) +
2p−2
=1
C2p−1 B(p)n+1(), (4.18)
where
Wn+1(p) def= n k=1
γkp−1/2ε2p−1k+1 , A(p)n+1def= n k=1
vkp−1/2Yk+12p−1−Vk2p−1
,
B(p)n+1()def= n k=1
γk/2(√
vkYk+1)2p−−1εk+1. 4.2.1. Case p= 1
Forp= 1, the decomposition (4.18) can be rewritten as
A(1)n+1=Vn+1−V1−Wn+1(1) . (4.19) An application of the standard strong law of large numbers for regressive series leads to
Wn+1(1) =o(sn) a.s.
ClearlyV1 =o sn
sincesn increases to infinity and again from Lemma4.2, one hasVn+1 =o(sn) a.s. Hence we deduce from (4.19) thatAn+1=o(sn) a.s.
Moreover, due to the decomposition (4.4) the general term ofAn+1 satisfies for allq≥0,
√vnYn+1−Vn =Un
1 +Hγn− vn−1
vn
+o γn(logsn)−q
a.s. (4.20)
Sincevn ∈ GS(2β−α),√
vn∈ GS(β−α/2) and consequently 1−
vn−1
vn =ξγn(β−α/2) +o(γn). In particular forq= 0, the equation (4.20) yields
√vnYn+1−Vn =γnUn[H+ξ(β−α/2)] (1 +o(1)) +o(γn) a.s., which clearly leads to (2.5) from Kronecker lemma, since
n→∞lim 1 sn
n k=1
γkUk = 1
H+ξ(β−α/2) lim
n→∞
1
snA(1)n+1= 0 a.s.
4.2.2. Case p≥2
Now letp≥2 and assume that convergence (2.5) holds for any powerqwith 1≤q≤p−1. The structure of the proof is the same as in the case of even moments. It consists in establishing the convergenceA(p)n+1=o(sn) from the decomposition (4.18).
First, Chow’s Lemma implies
Wn+1(p) =O n
k=1
γkp−1/2
a.s.
Sincep≥2 andγn goes to zero, one gets
Wn+1(p) =o(sn) a.s. (4.21)
Moreover, clearly we haveV1p=o(sn) a.s. andVn+1p =o(sn) a.s. from Lemma4.2. Let us splitBn+1(p) () into two terms,B(p)n+1() =Cn+1(p) () +D(p)n+1() with
Cn+1(p) ()def= n k=1
γ/2k (√
vkYk+1)2p−−1ek+1(),
D(p)n+1()def= n k=1
γ/2k (√
vkYk+1)2p−−1E
εk+1|Fk
.
For the first termCn+1(p) , we deduce from (4.4) and from Kronecker lemma that it suffices to show that n
k=1
γk/2Uk2p−−1ek+1() =o(sn) a.s. (4.22) n
k=1
γk1+/2ek+1() =o(sn) a.s.
Once again, one gets from the strong law of large numbers that for allδ >0,
n k=1
γk/2Uk2p−−1ek+1()
2
=o
⎛
⎝n
k=1
γkUk2(2p−−1)
log n k=1
γkUk2(2p−−1) 1+δ⎞
⎠ a.s.
n k=1
γk1+/2ek+1() =o
n
k=1
γ2+k
=o(sn) a.s.
First, assume that≥p−1. The convergence (2.4) implies n
k=1
γkUk2(2p−−1)=O(sn) a.s.
Hence, (4.22) is proven sinceγn goes to zero. Now let 1≤ < p−1. Since n
k=1
γkUk2(2p−−1)=O
1≤k≤nsup Uk2(p−−1) n k=1
γkUk2p
a.s.,
the conjonction of Lemma4.2and convergence (2.4) leads to (4.22). Finally, we have established that
Cn+1(p) () =o(sn) a.s. (4.23)
Let us now study the asymptotic behavior ofDn+1() in the following three cases.