Invariance principles for adaptive self-normalized partial sums processes∗

(1)

Invariance principles for adaptive self-normalized partial sums processes ^∗

Alfredas Ra£kauskas

Vilnius university and Institute of Mathematics and Informatics

Department of Mathematics, Vilnius University Naugarduko 24, Lt-2006 Vilnius

Lithuania

Charles Suquet

Université des Sciences et Technologies de Lille

Statistique et Probabilités F.R.E. CNRS 2222 Bât. M2, U.F.R. de Mathématiques F-59655 Villeneuve d'Ascq Cedex France

Revised version of

Invariance principle for self-normalized sums of symmetric random variables

Abstract

Let ζn^se be the adaptive polygonal process of self-normalized partial sumsSk=P

1≤i≤kXiof i.i.d. random variables dened by linear interpolation between the points(V_k²/Vn², Sk/Vn),k≤n, whereV_k²=P

i≤kX_i². We investigate the weak Hölder convergence ofζ_n^se to the Brownian mo- tionW. We prove particularly that whenX1 is symmetric,ζn^se converges toW in each Hölder space supportingW if and only if X1belongs to the domain of attraction of the normal distribution. This contrasts strongly with Lamperti's FCLT where a moment ofX1of orderp >2is requested for some Hölder weak convergence of the classical partial sums process.

We also present some partial extension to the non symmetric case.

Keywords: functional central limit theorem, domain of attraction, Hölder space, randomization.

AMS classications: 60F05, 60B05, 60G17, 60E10.

∗Preprint. Research supported by a cooperation agreement CNRS/LITHUANIA (4714).

(2)

1 Introduction and results

Various partial sums processes can be built from the sumsSn =X1+· · ·+Xnof independent identically distributed mean zero random variables. In this paper we focus attention on what we call the adaptive self-normalized partial sums process, denoted ζ_n^se. We investigate its weak convergence to the Brownian motion, trying to obtain it under the mildest integrability assumptions onX₁ and in the strongest topological framework. We basically show that in both respects, ζ_n^se behaves better than the classical Donsker-Prohorov partial sum processesξ_n^sr. Self-normalized means here that the classical normalization by

√nis replaced by

Vn= (X₁²+· · ·+X_n²)^1/2.

Adaptive means that the vertices of the corresponding random polygonal line have their abscissas at the random points V_k²/V_n² (0 ≤k ≤ n) instead of the deterministic equispaced pointsk/n. By this construction the slope of each line adapts itself to the value of the corresponding random variable.

As a lot of dierent partial sums processes will appear throughout the paper, we need to explain our typographical conventions and x notations.

Byζn(respectivelyξn) we denote the random polygonal partial sums process dened on[0,1]by linear interpolation between the vertices (V_k²/V_n², Sk), k = 0,1, . . . , n(respectively (k/n, Sk),k= 0,1, . . . , n), where

Sk =X1+· · ·+Xk, V_k²=X₁²+· · ·+X_k². For the special casek= 0, we putS0= 0,V0= 0.

The upper scripts^sr or^se mean respectively normalization by square root of nor self-normalization. Hence,

ξ_n^sr= ξn

√n, ξ_n^se= ξn

Vn

, ζ_n^sr = ζn

√n, ζ_n^se= ζn

Vn

.

By convention the random functions ξ^se_n and ζ_n^se are dened to be the null function on the event{Vn= 0}. Finally, the step partial sums processesΞn,Zn, Ξ^se_n etc. are the piecewise constant random càdlàg functions whose jump points are vertices for the polygonal process denoted by the corresponding lowercase Greek letter.

Classical Donsker-Prohorov invariance principle states, that if EX₁² = 1, then

ξ^sr_n −→^D W, (1)

inC[0,1], where(W(t), t∈[0,1])is a standard Wiener process and−→^D denotes convergence in distribution. Since (1) yields the central limit theorem, the niteness of the second moment ofX1therefore is necessary.

Lamperti [14] considered the convergence (1) with respect to a stronger topology. He proved that ifE|X1|^p <∞, wherep >2, then (1) takes place in the Hölder spaceHα[0,1], where 0 < α < 1/2−1/p. This result was derived again by Kerkyacharian and Roynette [13] by another method using Ciesielski

(3)

[6] analysis of Hölder spaces by triangular functions. Further generalizations were given by Erickson [7], Hamadouche [12], Ra£kauskas and Suquet [19].

Considering a symmetric random variable X1 such that P(X1 ≥ u) = 1/(2u^p), u ≥ 1, Lamperti [14] noticed that the corresponding sequence (ξ^sr_n) is not tight in Hα[0,1] for α = 1/2−1/p. It is then hopeless in general to look for an invariance principle in H_α[0,1] without some moment assumption beyond the square integrability of X₁. Recently, Ra£kauskas and Suquet [19]

proved more precisely that if(ξ^sr_n) satises the invariance principle inH_α[0,1]

for some0< α <1/2, then necessarily sup

t>0

t^pP(|X₁|> t)<∞, (2) for anyp <1/(1/2−α).

Let us see now, how self-normalization and adaptiveness help to improve this situation. Recall that "X₁ belongs to the domain of attraction of the normal distribution" ( denoted by X₁ ∈ DAN) means that there exists a sequence b_n↑ ∞such that

b⁻¹_n Sn −→D N(0,1). (3)

According to O'Brien's [16] result: X₁∈DAN if and only if V_n⁻¹ max

1≤k≤n|Xk|−→^P 0, (4)

where −→^P denotes convergence in probability. In the classical framework of C[0,1], we obtain the following improvements of the Donsker Prohorov theorem.

Theorem 1. The convergence

ξ^se_n −→^D W (5)

holds in the spaceC[0,1]if and only ifX1∈DAN. Theorem 2. The convergence

ζ_n^se−→^D W (6)

holds in the spaceC[0,1]if and only ifX₁∈DAN.

Let us remark that the necessity ofX1 ∈DAN in both Theorems 1 and 2 follows from Giné, Götze and Mason [9]. Let us notice also that (5) or (6) both exclude the degenerated caseP(X1= 0) = 1, so that almost surelyVn>0for large enoughn. We have similar results [20] for the step processesΞ^se_n andZ^se_n within the Skorohod spaceD(0,1).

For a modulus of continuity ρ : [0,1] → R, denote by Hρ[0,1] the set of continuous functionsx: [0,1]→Rsuch thatωρ(x,1)<∞,where

ωρ(x, δ) := sup

t,s∈[0,1], 0<|t−s|<δ

|x(t)−x(s)|

ρ(|s−t|) .

(4)

The setHρ[0,1]is a Banach space when endowed with the norm kxk_ρ:=|x(0)|+ωρ(x,1).

Dene

H_ρ^o[0,1] ={x∈Hρ[0,1] : lim

δ→0ωρ(x, δ) = 0}.

Then H_ρ^o[0,1] is a closed separable subspace of H_ρ[0,1]. In what follows we assume that the functionρ satises technical conditions (12)to (16)(see Sec- tion 2). These assumptions are fullled particularly whenρ=ρ_α,β,0< α <1, β∈R, dened by:

ρ_α,β(h) :=h^αln^β(c/h), 0< h≤1,

for a suitable constant c. We write H_α,β and H_α,β^o for H_ρ[0,1] and H_ρ^o[0,1], respectively, whenρ=ρ_α,β and we abbreviateH_α,0 inH_α.

With respect to this Hölder scale H_α,β, we obtain an optimal result when X1 is symmetric.

Theorem 3. Assume thatρsatises conditions(12)to(16)and

j→∞lim

2^jρ²(2^−j)

j =∞. (7)

IfX₁ is symmetric and X₁∈DAN then

ζ_n^se−→^D W, (8)

inH_ρ^o[0,1].

Corollary 4. IfX1 is symmetric and X1 ∈DAN then (8) holds in the space H_1/2,β^o , for anyβ >1/2.

It is well known that the Wiener process has a version in the spaceH_1/2,1/2 but none in H_1/2,1/2^o . Hence Corollary 4 gives the best result possible in the scale of the separable Hölder spacesHα,β. In [19] it is proved that if the classical partial sums processξ^sr_n converges inH_1/2,β^o for someβ >1/2, then kX1k_ψ

γ <

∞, where kX1k_ψ

γ is the Orlicz norm related to the Young function ψγ(r) = exp(r^γ)−1withγ= 1/β. This shows the striking improvement of weak Hölder convergence due to self-normalization and adaptation.

It seems worth noticing here, that without adaptive construction of the polygonal process, the existence of moments of order bigger than 2 is necessary for Hölder weak convergence. Indeed, ifξ^se_n −→^D W in Hα, then one can prove thatEX₁²<∞. Thereforeξ_n^sr −→^D W in Hα and the moment restriction (2) is necessary.

Naturally it is very desirable to remove the symmetry assumption in Corol- lary 4. Although the problem remains open, we can propose the following partial results in this direction.

(5)

Theorem 5. Letβ >1/2 and suppose that we have P°

max

1≤k≤n

X_k² V_n² ≥δ_n±

−−−−→

n→∞ 0 (9)

and

P°

1≤k≤nmax

¬

¬ V_k² V_n² −k

n

¬

¬≥δn

±−−−−→

n→∞ 0, (10)

with

δn=c2^−(logⁿ⁾^γ

logn , for some 1

2β < γ <1 and somec >0. (11) Then

ζ_n^se−→^D W, in H_1/2,β^o .

Observe thatn^−¯=o(δn)for any¯ >0. This mild convergence rateδn may be obtained as soon asE|X1|^2+εis nite.

Corollary 6. If for some ε >0, E|X₁|^2+ε <∞, then for any β > 1/2, ζ_n^se converges weakly toW in the spaceH_1/2,β^o .

This result contrasts strongly with the extension of Lamperti's invariance principle in the same functional framework [19].

The present contribution is a new illustration of the now well established fact, that in general, self-normalization improves the asymptotic properties of sums of independent random variables.

A rich litterature is devoted to limit theorems for self-normalized sums. Lo- gan, Mallows, Rice and Shepp [15] investigate the various possible limit distributions of self-normalized sums. Giné, Götze and Mason [9] prove thatS_n/V_n converges to the Gaussian standard distribution if and only if X₁ is in the domain of attraction of the normal distribution (the symmetric case was pre- viously treated in Grin and Mason [11]). Egorov [8] investigates the non identically distributed case. Bentkus and Götze [2] obtain the rate of convergence of Sn/Vn when X1 ∈ DAN. Grin and Kuelbs [10] prove the LIL for self-normalized sums when X1 ∈ DAN. Moderate deviations (of Linnik's type) are studied in Shao [22] and Christiakov and Götze [4]. Large deviations (of Cramér-Cherno type) are investigated in Shao [21] without moment conditions. Chuprunov [5] gives invariance principles for various partial sums processes under self-normalization inC[0,1]orD[0,1]. Our Theorems 1 and 2 improve on Chuprunov's results in the i.i.d. case.

2 Preliminaries

2.1 Analytical background

In this section we collect some facts about the Hölder spacesHρ[0,1]including the tightness criterion for distributions in these spaces. All these facts may be found e.g. in Ra£kauskas and Suquet [18].

(6)

In what follows, we assume that the modulus of smoothness ρsatises the following technical conditions wherec1, c2 andc3 are positive constants:

ρ(0) = 0, ρ(δ)>0, 0< δ≤1; (12) ρis non decreasing on[0,1]; (13)

ρ(2δ)≤c₁ρ(δ), 0≤δ≤1/2; (14)

Z δ

0

ρ(u)

u du≤c2ρ(δ), 0< δ≤1; (15) δ

Z 1

δ

ρ(u)

u² du≤c3ρ(δ), 0< δ≤1. (16) For instance, elementary computations show that the functions

ρ(δ) :=δ^αln^β c δ

¡, 0< α <1, β∈R,

satisfy conditions (12) to (16), for a suitable choice of the constant c, namely c≥exp(β/α)ifβ >0andc >exp(−β/(1−α))ifβ <0.

WriteDj for the set of dyadic numbers of level j in [0,1], i.e. D0={0,1}

and forj≥1,

Dj={(2k+ 1)2^−j; 0≤k <2^j−1}.

For any continuous functionx: [0,1]→R, dene λ0,t(x) :=x(t), t∈D0

and forj≥1,

λj,t(x) :=x(t)−1 2

x(t+ 2^−j) +x(t−2^−j)¡

, t∈Dj.

The λj,t(x) are the coecients of the expansion of x in a series of triangular functions. Thej-th partial sum E_jxof this series is exactly the polygonal line interpolating x between the dyadic points k2^−j(0 ≤ k ≤ 2^j). Under (12) to (16), the normkxk_ρ is equivalent to the sequence norm

kxk^seq_ρ := sup

j≥0

1 ρ(2^−j)max

t∈Dj

|λj,t(x)|.

In particular both norms are nite if and only ifxbelongs toHρ. It is easy to check that

kx−Ejxk^seq_ρ = sup

i>j

1 ρ(2⁻ⁱ)max

t∈Di

|λi,t(x)|.

Proposition 7. The sequence (Yn) of random elements in H_ρ^o is tight if and only if the following two conditions are satised:

i) For each t∈[0,1], the sequence(Y_n(t))_n≥1 is tight on R.

(7)

ii) For each ε >0,

j→∞lim sup

n≥1

P(kYn−EjYnk^seq_ρ > ε) = 0.

Remark 8. Condition ii) in Proposition 7 may be replaced by

j→∞lim lim sup

n→∞

P(kY_n−E_jY_nk^seq_ρ > ε) = 0. (17)

2.2 Adaptive time and DAN

We establish here the technical results on the adaptive time whenX1 ∈DAN which will be used throughout the paper. These results rely on the common assumption that X1 is in the domain of normal attraction. This provides the following properties on the distribution ofX1. SinceX1∈DAN, there exists a sequencebn ↑ ∞ such thatb⁻¹_n Sn converges weakly to N(0,1). Then Raikov's theorem yields

b⁻²_n V_n²−→^P 1. (18)

We have moreover for eachτ >0, puting bn=n^−1/2`n, nP(|X1|> τ `n

√n) → 0, (19)

`⁻²_n E(X₁²;|X₁| ≤τ `_n√

n) → 1, (20)

nE(X1;|X1| ≤τ `n

√n) → 0, (21)

see for instance Araujo and Giné [1], Chap. 2, Cor. 4.8 (a), Th. 6.17 (i) and Cor. 6.18 (b). Here and in all the paper (X;E) means the product of the random variableX by the indicator function of the eventE.

Lemma 9. IfX₁∈DAN, then sup

0≤t≤1

¬

¬ V_[nt]²

V_n² −t¬

¬

−→P 0. (22)

Proof. Consider the truncated random variables

Xn,i:=b⁻¹_n (Xi;X_i²≤b²_n), i= 1, . . . , n.

DeneVn,0:= 0 andV_n,k² =X_n,1² +· · ·+X_n,k² fork= 1, . . . , n. Set

νn= sup

0≤t≤1

¬

¬ V_[nt]²

V_n² −t

¬

¬ and eνn = sup

0≤t≤1

¬

¬ V_n,[nt]²

V_n,n² −t

¬

¬. Then we have forλ >0,

P(ν_n> λ)≤P(νe_n> λ) +nP(X₁²> b²_n).

(8)

Due to (19) the proof of (22) reduces to the proof of eνn

−→P 0. (23)

SinceV_n,k² ≤V_n,n² fork= 0, . . . , n, the elementary estimate

¬

¬ V_n,k² V_n,n² − k

n

¬

¬≤ V_n,k²

V_n,n² |1−V_n,n² |+

¬

¬V_n,k² −k n

¬

leads to

eνn ≤ max

0≤k≤n

¬

¬V_n,k² −k n

¬

¬+|1−V_n,n² |+ 1

n. (24)

Noting thatV_n,n² =b⁻²_n V_n²R_n with Rn:= 1 V_n²

n

X

i=1

(X_i²;X_i²≤b²_n), we clearly haveRn≤1 a.s. and

P(R_n<1) =P( max

1≤i≤n|Xi|> b_n)≤nP(|X1|> b_n), which goes to zero by (19). This together with (18) gives

V_n,n² −→^P 1. (25)

Hence the proof of (23) reduces to

0≤k≤nmax

¬

¬V_n,k² −k n

¬

−→P 0. (26)

For this convergence we have

0≤k≤nmax |V_n,k² −k/n| ≤ max

0≤k≤n|V_n,k² −EV_n,k² |+ max

0≤k≤n|EV_n,k² −k/n|.

Noting that

EV_n,k² − k n = k

n

°nb⁻²_n E(X₁²;X₁²≤b²_n)−1± gives

0≤k≤nmax

¬

¬EV_n,k² −k n

¬

¬≤¬

¬nb⁻²_n E(X₁²;X₁²≤b²_n)−1¬

¬,

which goes to zero by (20). Hence it remains to prove

0≤k≤nmax |V_n,k² −EV_n,k² |−→^P 0. (27) PuttingTn,k :=V_n,k² −EV_n,k² , we have by Ottaviani inequality

P

1≤k≤nmax |Tn,k|>2λ¡

≤ P(|Tn,n|> λ)

1−max_1≤k≤nP(|Tn,n−Tn,k|> λ). (28)

(9)

Due to (25), we are left with the control ofI := max_1≤k≤nP(|Tn,k| > λ). By Chebyshev's inequality

I≤λ⁻² max

1≤k≤nET_n,k² ≤λ⁻²nEX_n,1⁴

and we have to consider I1 = nEX_n,1⁴ = nb⁻⁴_n E(X₁⁴;|X1| ≤ bn). For any 0< τ <1,

E(X₁⁴;|X1| ≤bn) ≤ E(X₁⁴;|X1| ≤τ bn) +E(X₁⁴;τ bn ≤ |X1| ≤bn)

≤ τ²b²_nE(X₁²;|X1| ≤τ bn) +b⁴_nP(|X1| ≥τ bn).

So

I1≤τ²nb⁻²_n E(X₁²;|X1| ≤τ bn) +nP(|X1| ≥τ bn).

Choosingτ =λ/2 in (19) and (20), we can achieveI≤1/2 fornlarge enough and the proof is complete.

Remark 10. IfX1∈DAN, we also have sup

0≤t≤1

¬

¬ V_[nt]+1²

V_n² −t¬

¬

−→P 0. (29)

Indeed, recalling (4), it suces to write V_[nt]+1² −V_[nt]²

V_n² =X_[nt]+1² V_n² ≤

² 1

V_n+1² max

1≤k≤n+1X_k²

³V_n+1² V_n² ,

and observe thatV_n+1² /V_n²converges to 1in probability since by Lemma 9,

¬

¬ V_n²

V_n+1² − n n+ 1

¬

¬≤ sup

0≤t≤1

¬

V_[(n+1)t]² V_n+1² −t

¬

−→P 0.

Remark 11. For eacht∈[0,1], b²_[nt]

b²_n →t. (30)

This is a simple by-product of Lemma 9, writing b²_[nt]

b²_n = V_n² b²_n ×b²_[nt]

V_[nt]² ×V_[nt]² V_n²

and noting that for xedt >0andn≥n0 large enough[nt]<[(n+ 1)t]so the sequence(b²_[nt]/V_[nt]² )_n≥n₀ is a subsequence of(b²_n/V_n²)_n≥n₀ which converges in probability to1 by (18).

Dene the random variables

τn(t) = max{k= 0, . . . , n; V_k²≤tV_n²}, t∈[0,1], (31)

(10)

so that we haveτn(1) =nand for0≤t <1, V_τ²

n(t)

V_n² ≤t < V_τ²

n(t)+1

V_n² . (32)

Lemma 12. IfX1∈DAN then sup

t∈[0,1]

|n⁻¹τn(t)−t|−→^P 0. (33) Proof. The result will follow from Remark 10, if we check the inclusion of events

n sup

t∈[0,1]

|n⁻¹τ_n(t)−t|> εo

⊂n sup

u∈[0,1]

¬

¬ V_[nu]+1²

V_n² −u¬

¬

¬≥εo

. (34)

The occurrence of the left hand side in (34) is equivalent to the existence of one s∈[0,1]such that|n⁻¹τ_n(s)−s|> ε, i.e. such that

τ_n(s)> n(s+ε) (35)

or

τn(s)< n(s−ε). (36)

Observe that under (35), s+ε < 1, while under (36), s−ε > 0. From the denition ofτn, (35) gives an integerk > n(s+ε)such thatV_k²/V_n²≤s, whence

V_[n(s+ε)]+1²

V_n² ≤s. (37)

On the other hand, under (36), we haveV_k²/V_n²> sfor everyk≥n(s−ε)and in particular

V_{[n(s−ε)]+1}²

V_n² > s. (38)

Recasting (37) and (38) under the form V_[n(s+ε)]+1²

V_n² −(s+ε), ≤ −ε V_{[n(s−ε)]+1}²

V_n² −(s−ε) > ε,

shows that both (35) and (36) imply the occurrence of the event in the right hand side of (34).

(11)

3 Proofs

Proof of Theorem 1. First we prove the convergence of nite dimensional distributions (f.d.d.) of the processξ_n^se to the corresponding f.d.d. of the Wiener processW.

To this aim, consider the process Ξ_n = (S_[nt], t∈[0,1]). By (4) applied to the obvious bound

sup

0≤t≤1

V_n⁻¹|ξ_n(t)−Ξ_n(t)| ≤V_n⁻¹ max

1≤k≤n|X_k|, the convergence of f.d.d. ofξ_n^sefollows from those of the process Ξ^se_n.

Let 0 ≤ t1 < t2 < · · · < td ≤1. From (3), independence of the Xi's and Remark 11, we get

b⁻¹_n

S[nt₁], S[nt₂]−S[nt₁], . . . , S[nt_d]−S[nt_d−1]

¡ _D

−→

W(t1), W(t2)−W(t1), . . . , W(td)−W(t_d−1)¡ . Now (18) and the continuity of the map

(x1, x2, . . . , xd)7→(x1, x2+x1, . . . , xd+· · ·+x1)

yields the convergence of f.d.d. of Ξ^se_n. The convergence of nite dimensional distributions of the processξ_n^se is thus established.

To prove the tightness we shall use Theorem 8.3 from Billingsley [3]. Since ξ_n^se(0) = 0, the proof reduces in showing that for allε,η >0there existn₀≥1 andδ,0< δ <1, such that

1 δPn

sup

1≤i≤nδ

V_n⁻¹|Sk+i−Sk| ≥εo

≤η, n≥n0, (39) for all1≤k≤n.

Let us introduce the truncated variables

Y_i:=`⁻¹_n (X_i; X_i²≤τ²b²_n), i= 1, . . . , n,

with`n=n^−1/2bn as above andτ to be chosen later. Denote bySek andVek the corresponding partial sums with their self-normalizing random variables:

Sek =Y1+· · ·+Yk, Vek = (Y₁²+· · ·Y_n²)^1/2, k= 1, . . . , n.

Then we have Pn

sup

1≤i≤nδ

V_n⁻¹|Sk+i−Sk| ≥εo

≤A+B+C, (40) where

A := Pn sup

1≤i≤nδ

|Sek+i−Sek| ≥ε√ n/2o

, B := P{Ve_n<√

n/2}, C := nP{|X1| ≥τ `n

√n}.

(12)

Due to (21) we can choosen1such that√

n|EY1| ≤1/4forn≥n1. Then with n≥n1andδ≤εwe have

A ≤ Pn

1≤i≤nδmax

¬

k+i

X

j=k+1

(Yj−EYj)

¬

¬+nδ|EY1| ≥√ nε/2o

≤ Pn

1≤i≤nδmax

¬

k+i

X

j=k+1

(Yj−EYj)

¬

¬≥√ nε/4o

.

By Chebyshev's inequality and Rosenthal inequality with p > 2, we have for each1≤k≤n

Pn n^−1/2

¬

k+nδ

X

j=k+1

(Yj−EYj)

¬

¬≥ ε 8

o ≤ 8^p ε^pn^p/2E

¬

k+nδ

X

j=k+1

(Yj−EYj)

¬

p

≤ 8^p ε^pn^p/2

¢(nδ)^p/2(EY₁²)^p/2+nδE|Y1|^p£ . By (20) we can choosen2 such that

3/4≤EY₁²≤3/2 for n≥n₂. (41) Then we have E|Y₁|^p ≤ 2n^(p−2)/2τ^p−2 and then assuming that τ ≤ δ^1/2 we obtain

Pn n^−1/2¬

¬

k+nδ

X

j=k+1

(Yj−EYj)¬

¬

¬≥ ε 8

o ≤ 8^p ε^pn^p/2

¢2^p/2(nδ)^p/2+δn^p/2τ^p−2£

≤ 2·16^pδ^p/2 ε^p . Now by Ottaviani inequality we nd

A≤δη

3 , (42)

providedδ^p/2≤ε^p/(4·16^p)andδ^(p−2)/2≤ηε^p/(6·16^p).

Next we considerB. Sincen⁻¹EVe_n²=EY₁²we have by (41)n⁻¹EVe_n²≥3/4, forn≥n₂. Furthermore

B ≤P{n⁻¹|Ve_n²−EVe_n²| ≥1/2} ≤4n⁻¹EY₁⁴≤4τ²EY₁²≤δη/3, (43) providedn≥n₂ andτ²≤δη/18.

Finally choose n3 such that C ≤ ηδ/3 when n ≥ n3 and join to that the estimates (42, 43) to conclude (39). The proof is completed.

(13)

Proof of Theorem 2. Due to Theorem 1, it suces to check that

V_n⁻¹(ξn−ζn)

_∞ goes to zero in probability, where kfk_∞ := sup_0≤t≤1|f(t)|. To this end let us introduce the random change of timeθn dened as follows. When Vn >0, θn

is the map from[0,1]onto [0,1]which interpolates linearly between the points (k/n, V_k²/V_n²),k= 0,1, . . . , n. When Vn = 0, we simply takeθn =I, the iden- tity on [0,1]. With the usual convention S_k/V_n := 0 for V_n = 0, we always have

ζ_n^se θ_n(t)¡

=ξ^se_n(t), 0≤t≤1. (44) Clearly for eacht∈[0,1],

¬

¬ V_[nt]²

V_n² −θn(t)

¬

¬≤ max

1≤k≤n

X_k² V_n². It follows by (4) that

sup

0≤t≤1

¬

¬ V_[nt]²

V_n² −θn(t)¬

¬

−→P 0

and this together with Lemma 9 gives

kθ_n−Ik_∞−→^P 0. (45) Letω(f;δ) := sup{|f(t)−f(s)|; |t−s| ≤δ}denote the modulus of continuity off ∈C[0,1]. Then recalling (44) we have

kξ_n^se−ζ_n^sek_∞= sup

0≤t≤1

¬¬ξ_n^se θn(t)¡

−ζ_n^se θn(t)¡¬

¬≤ω

ξ_n^se;kθn−Ik_∞¡ . It follows that for anyλ >0and0< δ≤1,

P

kξ_n^se−ζ_n^sek_∞≥λ¡

≤P

kθn−Ik_∞> δ¡ +P

ω(ξ_n^se;δ)≥λ¡

. (46) Now since the Brownian motion has a version inC[0,1], we can nd for each positiveε, some δ∈(0,1]such that P

ω(W;δ)≥λ¡

< ε. As the functionalω is continuous onC[0,1], it follows from Theorem 1 that

lim sup

n→∞

P

ω(ξ^se_n;δ)≥λ¡

≤P

ω(W;δ)≥λ¡ . Hence forn≥n1we haveP

ω(ξ^se_n;δ)≥λ¡

<2ε. Having in mind (45) and (46) we see that the proof is complete.

Proof of Theorem 3. The convergence of nite dimensional distributions is already established in the proof of Theorem 2

It remains to prove tightness of ζ_n^se in the space H_ρ[0,1]. To this aim, we have to check the second condition of Proposition 7 only.

Let ε₁, . . . , ε_n, . . . be an independent Rademacher sequence which is independent on (X_i). By symmetry of X₁, both sequences (X_i) and (ε_iX_i) have

(14)

the same distribution. Noting also thatε²_i = 1 a.s., we have that ζ_n^se has the same distribution as the random processζe_n^se which is dened linearly between the points

°V_k² V_n², U_k

Vn

± , whereU0= 0andUk =Pk

i=1εiXi,fork≥1.Hence it suces to prove that

J→∞lim sup

n

X

j>J

2^j max

0≤k<2^jP ¬

¬ eζ_n^se

(k+ 1)2^−j)−eζ_n^se(k2^−j)¬

¬> ερ(2^−j)¡

= 0. (47) To this aim we shall estimate

δ(t, h, r) :=P

|eζ_n^se(t+h)−ζe_n^se(t)|> r¡ , uniformly inn. First consider the case, where

0≤V_k−1²

V_n² ≤t < t+h≤ V_k² V_n², so

0≤h≤V_k²

V_n² −V_k−1² V_n² = X_k²

V_n². We have then by linear interpolation

|eζ_n^se(t+h)−ζe_n^se(t)| = |εkXk| V_n

V_n² X_k²h

= ° Vn

|Xk|

√ h±√

h

≤ √

h. (48)

Next consider the following conguration:

0≤V_k−1²

V_n² ≤t < V_k² V_n² ≤ V_l²

V_n² ≤t+h < V_l+1² V_n² . Then we have

|eζ_n^se(t+h)−ζe_n^se(t)| ≤δ1+δ2+δ3, where

δ1 := |eζ_n^se(t+h)−ζe_n^se(V_l²/V_n²)| ≤ q

pV_l²−V_k²

√ h, δ₃ := |eζ_n^se(V_k²/V_n²)−ζ_n^se(t)| ≤q

V_k²/V_n²−t≤√ h.

(15)

Hence, for any conguration we obtain

|eζ_n^se(t+h)−ζe_n^se(t)| ≤ |U_l−U_k| pV_l²−V_k²

√ h+ 2√

h, (49)

if we agree that|Ul−Uk|(V_l²−V_k²)^−1/2:= 0 whenk=l. Therefore δ(t, h, r)≤P

|Ul−U_k|/

q

V_l²−V_k²> r/(2√ h)¡

, (50)

providedr >4√

h. Observe that in this formula the indexeslandkare random variables depending on t, h and the sequence (Xi), but independent of the sequence(εi). Thus conditioning onX1, . . . , Xn and applying the well known Hoeding's inequality we obtain

δ(t, h, r)≤cexp{−r²/(8h)}. (51) Now (47) clearly follows if for every¯ >0,

∞

X

j=1

2^jexp{−¯2^jρ²(2^−j)}<∞, (52) which is easily seen to be equivalent to our hypothesis (7). The proof is completed.

Proof of Theorem 5. From (9) and the characterization (4) of DAN, X₁ is clearly in the domain of normal attraction. So the convergence of nite dimensional distributions is already given by Theorem 2.

To establish the tightness we have to prove that

J→∞lim lim sup

n→∞

P

kζ_n^se−EJζ_n^sek^seq_ρ >4ε¡

= 0. (53)

To this end, it suces to prove that with some sequenceJn↑ ∞to be precised later,

lim sup

n→∞

P° sup

j>Jn

max

0≤k<2^j

1 ρ(2^−j)

¬¬λ⁰_j,k(ζ_n^se)¬

¬> ε±

= 0 (54)

and

J→∞lim lim sup

n→∞

P° sup

J≤j≤Jn

max

0≤k<2^j

1 ρ(2^−j)

¬¬λ⁰_j,k(ζ_n^se)¬

¬>3ε±

= 0, (55) where

λ⁰_j,k(ζ_n^se) :=ζ_n^se

(k+ 1)2^−j¡

−ζ_n^se(k2^−j), 0≤k <2^j.

To start with (54), following the same steps which led to (49) we obtain with k,l such that

V_k−1²

V_n² < t≤ V_k²

V_n², V_l−1²

V_n² < t+h≤ V_l² V_n²,

(16)

the upper bound

|ζ_n^se(t+h)−ζ_n^se(t)| ≤

²

2 + |S(l,k]| V_(l,k]

³√ h, where we use the notations

S_(i,j]:= X

i<k≤j

X_k, V_(i,j] :=

² X

i<k≤j

X_k²

³^1/2 ,

with the usual convention of null value for a sum indexed by the empty set.

WritingT_k,l:= 2 +|S(l,k]|/V(l,k], this gives

|ζ_n^se(t+h)−ζ_n^se(t)| ≤√

h max

1≤k≤l≤nTk,l. (56)

By [9, Th. 2.5], the Tk,l are uniformly subgaussian. It is worth recalling here and for further use, that if the random variablesYi(1≤i≤N) are subgaussian, then so ismax1≤i≤N|Yi|, which more precisely satises

1≤i≤Nmax |Yi|

_φ

2

≤a(logN)^1/2 max

1≤i≤NkYik_φ

2, (57)

where ais an absolute constant and k k_φ

2 denotes the Orlicz norm associated to the Young functionφ2(t) := exp(t²)−1. Applying (57) to the n² random variables Tk,l, we obtain (with constants c, C whose value may vary at each occurence)

P° sup

j>J_n

max

0≤k<2^j

1

ρ(2^−j)|λ⁰_j,k(ζ_n^se)|> ε±

≤ X

j>Jn

P°

1≤k≤l≤nmax Tk,l> cεj^β±

≤ X

j>Jn

Cexp°−cj^2β logn

±

. (58)

Now choose Jn = (logn)^γ with 1 > γ > (2β)⁻¹. Then 2β−1/γ is strictly positive and using

j^2β=j^1/γj^2β−1/γ > J_n^1/γj^2β−1/γ =j^2β−1/γlogn, we see that the right hand side in (58) is bounded byP

j>J_nCexp

−cj^2β−1/γ¡ , whence (54) follows.

To prove (55), we start with P°

J≤j≤Jmax_n max

0≤k<2^j

1

ρ(2^−j)|λ⁰_j,k(ζ_n^se)|>3ε±

≤P₁+P₂+P₃, (59) withP1,P2andP3 dened below. First introduce the event

An=n sup

t∈[0,1]

¬

¬ V_τ²

n(t)

V_n² −V_[nt]² V_n²

¬

¬≤δn

o∩n sup

t∈[0,1]

¬

¬ V_[nt]²

V_n² −t

¬

¬≤δn

o .

(17)

whereδn is choosen as in (11), keeping the freedom of choice of the constant c. Now we dene

P1 := P(A^c_n);

P2 := P° An∩n

J≤j≤Jmaxn

max

0≤k<2^j

1 ρ(2^−j)

|S[(k+1)2^−jn]−S[k2^−jn]| Vn

> εo±

; P₃ := P°

A_n∩n

J≤j≤Jmax_n max

0≤k<2^j

1

ρ(2^−j) max

|l−[k2^−jn]|≤nδn

h|Sl−S_[k2−jn]|

V_n + 2

2^j/2 i

>2εo±

. The following easy estimates

sup

t∈[0,1]

¬

¬ V_[nt]²

V_n² −t¬

¬

¬≤ max

1≤k≤n

¬

¬ V_k² V_n²−k

n

¬

¬+ 1 n,

sup

t∈[0,1]

¬

¬ V_τ²

n(t)

V_n² −V_[nt]² V_n²

¬

¬≤ max

1≤k≤n

X_k²

V_n² + max

1≤k≤n

¬

¬ V_k² V_n² −k

n

¬

¬+1 n, lead by (9) and (10) to

P(A^c_n)→0. (60)

SoP₁ will be killed by taking thelim supinn.

To controlP₂, rst write with self-explanatory notations

|S_[(k+1)2−jn]−S_[k2−jn]| Vn

= |S_[(k+1)2−jn]−S_[k2−jn]| V [k2^−jn],[(k+1)2^−jn]£

×

V [k2^−jn],[(k+1)2^−jn]£ Vn

.

Observing that on the eventA_n, we have V [k2^−jn],[(k+1)2^−jn]£

Vn

≤p

2^−j+δ_n and assuming that

δn≤2^−Jⁿ, (61)

we get

P₂≤ X

J≤j≤Jn

P° max

0≤k<2^j

1 ρ(2^−j)

|S_[(k+1)2−jn]−S_[k2−jn]| V [k2^−jn],[(k+1)2^−jn]£

>√ 2ε2^j/2±

.

Since we are dealing now with the maximum of2^juniformly subgaussian random variables (theirϕ2norms are bounded by a constant which depends only on the distribution ofX1), this leads to

P2≤ X

J≤j≤Jn

Cexp

−cj^2β−1¡

≤

∞

X

j=J

Cexp

−cj^2β−1¡

. (62)

(18)

To controlP3, we rst get rid of the residual term by noting that 2

ρ(2^−j)2^j/2 = c

j^β < ε for j ≥J ≥J(ε), uniformly inn. So for J≥J(ε),

P3≤P° An∩n

J≤j≤Jmaxn

0≤k<2max^j 1

ρ(2^−j) max

|l−[k2^−jn]|≤nδ_n

|S_l−S_[k2−jn]| Vn

> εo±

. On the eventA_n we have for anylsuch that|l−[k2^−jn]| ≤nδ_n,

|V_[k2² −jn]−V_l²| V_n² ≤2δ_n. It follows that

P3≤P°

J≤j≤Jmaxn

max

0≤k<2^j max

|l−[k2^−jn]|≤nδn

|Sl−S_[k2−jn]|

|V_[k2² −jn]−V_l²|^1/2 >ερ(2^−j)

√2δn

± . Using the invariance of distributions under translations onk, we get

P₃ ≤ X

J≤j≤Jn

2^jP° max

0<l≤[2nδn]

|S_l| Vl

>ερ(2^−j)

√2δn

±

≤ X

J≤j≤Jn

2^jCexp°

−c2^−jj^2β δnlogn

±

≤ C X

J≤j≤Jn

2^jexp°

− c2^−Jⁿ δnlognj^2β±

.

Now we see that the following convergence rate (stronger than (61)) δn= 1

2^Jⁿlogn = 2^−(logⁿ⁾^γ

logn , with 1

2β < γ <1, is sucient to obtain (55). The proof is complete.

Proof of Corollary 6. As isX1is square integrable,X1is inDAN. The convergence rates (9) and (10) required by Theorem 5 are provided by the two following lemmas, recalling that with our choice (11) ofδ_n, we haven^−ε=o(δ_n)for any ε >0.

Lemma 13. IfE|X1|^2+δ <∞ for someδ >0, then almost surely n^−c max

1≤k≤n

¬

¬ V_k² V_n² −k

n

¬

¬→0, (63)

wherec=δ/(2 + 2δ).

(19)

Proof. By Marcinkiewicz SLLN, if the i.i.d. sequence(Yk)satisesE|Y1|^p <∞ for some 1 ≤ p < 2, then n^−1/p P

k≤nYk−nEY1

¡ goes to 0 almost surely.

Applying this toY1=X₁²andp= 1 +δ/2 gives V_n²

n = 1 +n^1/p−1ε_n, n≥1,

where the random sequence (εn) goes to zero almost surely. Since we assume P(X1= 0)<1, we haveP(∀n≥1, Vn = 0) = 0. On each event{V_n²>0}, we may write witha= 1−1/p,

V_k² V_n² −k

n =k n

°V_k² k

n V_n² −1±

= k

n×k^−aε_k−n^−aε_n 1 +n^−aεn

.

For eachn ≥n0 =n0(ω)large enough, n^−aεn >−1/2. Now for an exponent 0< b <1 to be precised later, we have

¬

¬ V_k² V_n² −k

n

¬

¬≤4n^b−1sup

i≥1

|εi|, for n≥n0, 1≤k≤n^b and

¬

¬ V_k² V_n² −k

n

¬

¬≤4n^−absup

i≥n^b

|εi|, for n≥n0, n^b< k≤n.

The optimal choice ofb given by1−b=ableads to the announced conclusion withc=a/(a+ 1) =δ/(2 + 2δ).

Lemma 14. IfE|X1|^2+δ <∞ for someδ >0, then almost surely n^d max

1≤k≤n

X_k²

V_n² →0, (64)

for anyd < δ/(2 +δ).

Proof. We use the same trick as in O'Brien [16] p. 542. For any positiveεwe have (noting the key role of i.o. in the following inequalities)

P°

1≤k≤nmax X_k²

V_n² > εn^−d, i.o.±

≤ P° V_n²< n

2, i.o.± +P°

1≤k≤nmax X_k²> n

2εn^−d, i.o.±

= 0 +P°

X_n² >n

2εn^−d, i.o.± . Now observe that

∞

X

n=1

P°

X_n²> n 2εn^−d±

≤°2 ε

±1+δ/2

E|X1|^2+δ

∞

X

n=1

1 n(1−d)(1+δ/2). For anydsuch that (1−d)(1 +δ/2)>1, Borel-Cantelli's Lemma leads to

P°

1≤k≤nmax X_k²

V_n² > εn^−d, i.o±

= 0.

Asεis arbitrary, the result is proved.

(20)

Acknowledgement: The rst author would like to thank Vidmantas Bentkus for a number of stimulating discussions on the invariance principle for self- normalized sums.

References

[1] Araujo, A. and Giné, E. (1980) The Central Limit Theorem for Real and Banach Valued Random Variables. Wiley, New York.

[2] Bentkus, V. and Götze, F. (1996) The Berry-Esseen bound for student's statistic Annal. Probab. 24, 491503

[3] Billingsley, P. (1968) Convergence of probability measures. Wiley, New York.

[4] Christiakov, G. P. and Götze, F. (1999) Moderate deviations for self- normalized sums. Preprint 99-048, SFB 343, Univ. Bielefeld.

[5] Chuprunov, A. N. (1997) On convergence of random polygonal lines under Student-type normalizations. Theory of Probab. and Appl. 41, 756761.

[6] Ciesielski, Z. (1960). On the isomorphisms of the spacesHα andm. Bull.

Acad. Pol. Sci. Ser. Sci. Math. Phys. 8, 217222.

[7] Erickson, R. V. (1981). Lipschitz smoothness and convergence with appli- cations to the central limit theorem for summation processes. Ann. Probab.

9, 831851.

[8] Egorov, V. A. (1997) On the asymptotic behavior of self-normalized sums of random variables. Theory of Probab. and Appl. 41, 542548.

[9] Giné, E., Götze, F. and Mason, D. M. (1997) When is the Student t-statistic asymptotically standard normal? Ann.Probab. 25, 15141531.

[10] Grin, P. S. and Kuelbs, J. (1989) Self-normalized laws of the iterated logarithm. Ann. Probab. 17, 15711601.

[11] Grin, P. S. and Mason, D. M. (1991) On the asymptotic normality of self-normalized sums. Proc. Cambridge Phil. Soc. 109, 597610.

[12] Hamadouche, D. (1998). Invariance principles in Hölder spaces. Portugaliae Mathematica, 57 (2000), 127151.

[13] Kerkyacharian, G. and Roynette, B. (1991). Une démonstration simple des théorèmes de Kolmogorov, Donsker et Ito-Nisio. C. R. Acad. Sci. Paris Série I 312, 877882.

[14] Lamperti, J. (1962). On convergence of stochastic processes. Trans. Amer.

Math. Soc. 104, 430435.

[15] Logan, B. F., Mallows, C. L., Rice, S. O. and Shepp, L. A. (1973) Limit distributions of self-normalized sums. Ann. Probab. 1, 788809.

[16] O'Brien, G. L. (1980) A limit theorem for sample maxima and heavy branches in Galton-Watson trees. J. Appl. Probab. 17, 539545.

(21)

[17] Ra£kauskas, A. and Suquet, Ch. (1999). Central limit theorem in Hölder spaces. Probab. Math. Stat. 19, 155174.

[18] Ra£kauskas, A. and Suquet, Ch., Random elds and central limit theorem in some generalized Hölder spaces, Prob. Theory and Math. Statist., Pro- ceedings of the 7th Vilnius Conference (1998), B. Grigelionis et al. (Eds), 599616, TEV Vilnius and VSP Utrecht (1999).

[19] Ra£kauskas, A. and Suquet, Ch., On the Hölderian functional central limit theorem for i.i.d. random elements in Banach space, Pub. IRMA Lille 50-III, to appear in Proceedings of the Fourth Hungarian Colloquium on Limit Theorems of Probability and Statistics (1999).

[20] Ra£kauskas, A. and Suquet, Ch., Convergence of self-normalized partial sums processes in C[0,1] and D[0,1], Pub. IRMA Lille (2000) 53-VI, preprint.

[21] Shao, Q.-M. (1997) Self-normalized large deviations, Ann. Probab. 25, 285 328.

[22] Shao, Q.-M. (1999) A Cramér type large deviation result for Student's t-statistic. J. Theoret. Probab. 12, 387-398.

Invariance principles for adaptive self-normalized partial sums processes∗