On the Almost Sure Central Limit Theorem for Vector Martingales: Convergence of Moments and Statistical Applications

(1)

HAL Id: inria-00348138

https://hal.inria.fr/inria-00348138

Submitted on 17 Dec 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Martingales: Convergence of Moments and Statistical Applications

Bernard Bercu, Peggy Cénac, Guy Fayolle

To cite this version:

Bernard Bercu, Peggy Cénac, Guy Fayolle. On the Almost Sure Central Limit Theorem for Vector Martingales: Convergence of Moments and Statistical Applications. [Research Report] RR-6780, INRIA. 2008, pp.26. �inria-00348138�

(2)

a p p o r t

d e r e c h e r c h e

SN0249-6399ISRNINRIA/RR--6780--FR+ENG

Thème NUM

On the Almost Sure Central Limit Theorem for Vector Martingales : Convergence of Moments and

Statistical Applications

Bernard Bercu — Peggy Cénac — Guy Fayolle

N° 6780

December 2008

(3)

(4)

Centre de recherche INRIA Paris – Rocquencourt

Domaine de Voluceau, Rocquencourt, BP 105, 78153 Le Chesnay Cedex

Statistical Applications

Bernard Bercu^∗ , Peggy Cénac ^† , Guy Fayolle ^‡ Thème NUM — Systèmes numériques

Equipe-Projet Imara´

Rapport de recherche n^°6780 — December 2008 — 23 pages

Abstract: We investigate the almost sure asymptotic properties of vector martingale transforms. Assuming some appropriate regularity conditions both on the increasing process and on the moments of the martingale, we prove that normalized moments of any even order converge in the almost sure cental limit theorem for martingales. A conjecture about almost sure upper bounds under wider hypotheses is formulated. The theoretical results are supported by examples borrowed from statistical applications, including linear autoregressive models and branching processes with immigration, for which new asymptotic properties are established on estimation and prediction errors.

Key-words: Almost sure central limit theorem, vector martingale, moment, stochastic regression.

∗ Université Bordeaux 1, Institut de Mathématiques de Bordeaux UMR 5251, and INRIA Bordeaux Sud-Ouest, 351 cours de la Libération, 33405 Talence Cedex, France.

Bernard.Bercu@math.u-bordeaux1.fr

† Universit´e de Bourgogne, Institut de Math´ematiques de Bourgogne, UMR 5584, 9 rue Alain Savary, BP 47870, 21078 Dijon Cedex, France. Peggy.Cenac@u-bourgogne.fr

‡ INRIA CR Paris-Rocquencourt, Domaine de Voluceau, BP 105, 78153 Le Chesnay Cedex, France. Guy.Fayolle@inria.fr

(5)

moments et applications statistiques

Résumé : On étudie dans ce rapport des propriétés de convergence presque sûre de transformées de martingales vectorielles. Sous certaines conditions d’existence de moments et de régularité du processus croissant, on montre en particulier la convergence des moments normalisés de tout ordre pair dans le théorème central limite presque sûr pour les martingales vectorielles. On formule une conjecture de borne presque sûre, sous des hypothèses moins res- trictives et couvrant des familles plus vastes de processus. Enfin, on applique ces résultats des exemples issus d’applications statistiques, notamment les modèles autorégressifs linéaires et certains processus de branchement avec immigration, ce qui permet d’établir de nouvelles propriétés asymptotiques sur les erreurs d’estimation et de prédiction.

Mots-clés : Théorème central limite presque sûr, martingale vectorielle, moment, regression stochastique.

(6)

1 Introduction

Let (X_n) be a sequence of real independent identically distributed random variables withE[X_n] = 0 andE[X_n²] =σ² and denote

S_n= Xn k=1

X_k.

It follows from the ordinary central limit theorem (CLT) that S_n

√n

−→ NL (0, σ²),

which implies, for any bounded continuous real functionh

nlim→∞Eh hS_n

√n i=

Z

Rh(x)dG(x)

whereG stands for the Gaussian measureN(0, σ²). Moreover, by the cele- brated almost sure central limit theorem (ASCLT), the empirical measure

G_n= 1 logn

Xn k=1

1 kδSk√

k

satisfies

G_n=⇒G a.s.

In other words, for any bounded continuous real functionh

nlim→∞

1 logn

Xn k=1

1 khS_k

√k =

Z

Rh(x)dG(x) a.s.

The ASCLT was simultaneously proved by Brosamler [3] and Schatte [15]

and, in its present form, by Lacey and Phillip [10]. In contrast with the wide literature on the ASCLT for independent random variables, very few references are available on the ASCLT for martingales apart the recent work of Bercu and Fort [1, 2] and the important contribution of Chaabane and Maaouia [5,4] and Lifshits [13,14]. The ASCLT for martingales is as follows.

Let (ε_n) be a martingale difference sequence adapted to a filtrationF= (Fn) with E[ε²_n+1|Fn] = σ² a.s. Let (Φ_n) be a sequence of random variables adapted to Fand denote by (Mn) the real martingale transform

M_n= Xn k=1

Φ_k₋₁ε_k.

(7)

We also need to introduce the explosion coefficientf_nassociated with (Φ_n) f_n= Φ²_n

s_n where s_n=

Xn k=0

Φ²_k.

As soon as (fn) goes to zero a.s. and under reasonable assumption on the conditional momments of (ε_n), the ASCLT for martingales asserts that the empirical measure

G_n= 1 logs_n

Xn k=1

f_kδ√Mk_sk

−1

=⇒G a.s. (1.1)

In other words, for any bounded continuous real functionh

nlim→∞

1 logs_n

Xn k=1

f_kh M_k

√s_k₋₁ =

Z

Rh(x)dG(x) a.s. (1.2) It is quite natural to overcome the case of unbounded functions h. To be more precisely, one might wonder if convergence (1.2) remains true for unbounded functionsh. It has been recently shown in [1, 2] that, whenever (ε_n) has a finite conditional moment of order >2p , then the convergence (1.2) still holds for any continuous real functionh such that|h(x)| ≤x^2p. Theorem 1.1(Convergence of moments in the scalar case). Assume that (εn) is a martingale difference sequence such that E[ε²_n+1|Fn] = σ² a.s. and satisfying for some integer p≥1 and for some real number a >2p,

sup

n≥0E[|εn+1|^a|Fn]<∞ a.s.

If the explosion coefficient (f_n) tends to zero a.s., then

nlim→∞

1 logs_n

Xn k=1

f_kM_k² s_k₋₁

p

= σ^2p(2p)!

2^pp! a.s. (1.3)

The limit (1.3) is exactly the moment of order 2pof the Gaussian distribution N(0, σ²). The purpose of the present paper is to extend the results of [1] to vector martingale transforms, which is strongly needed in various applications arising in statistics and signal processing.

Let (M_n) be a square integrable vector martingale with values inR^d, adapted to a filtration F. Its increasing process consists of the sequence (hMin) of symmetric, positive semi-definite square matrices of orderdgiven by

hMin= Xn k=1

E[(M_k−M_k₋₁)(M_k−M_k₋₁)^t|Fk−1].

(8)

A first version of the ASCLT for discrete vector martingales was proposed in [5, 6], under fairly restrictive assumptions on the increasing process (hMin).

Hereafter, our goal is to establish the convergence of moments of even order in the ASCLT for (M_n) under suitable assumptions on the behaviour of (hMin). We shall work in the general setting of vector martingales transforms (M_n), which can be written as

M_n=M₀+ Xn k=1

Φ_k₋₁ε_k

where M₀ can be taken arbitrary and (Φ_n) denotes a sequence of random vectors of dimensiond, adapted to F. We also introduce

S_n= Xn k=0

Φ_kΦ^t_k+S (1.4)

where S is a fixed deterministic matrix, symmetric and positive definite.

One can obviously see that if E[ε²_n+1|Fn] = σ² a.s., then the increasing process of (M_n) takes the form hMin = σ²S_n₋₁. The explosion coefficient associated with (Φ_n) is now given by

f_n= Φ^t_nS_n⁻¹Φ_n= d_n−d_n₋₁ dn

(1.5) whered_n= det(S_n).

After this short survey, the paper will be organized as follows. The main theoretical result for vector martingale transforms is given in Section 2, at the end of which a quite plausible interesting conjecture is formulated, involving minimal assumptions. The final Section 3 proposes some statistical applications to estimation and prediction errors in linear autoregressive models and in branching processes with immigration.

2 On the convergence of moments

As mentioned above, our main result is given in theorem 2.1 and extends theorem 1.1 to the vector case. By the way, in mathematics, the difficulty of the problem is almost always a strictly increasing function of the dimension of some underlying space: it is also the case here !

Theorem 2.1. Let (ε_n) be a martingale difference sequence satisfying the homogeneity condition E[ε²_n+1|Fn] =σ² a.s. and such that, for some integer p≥1 and some real number a >2p,

(H_p) sup

n≥0E

|ε_n+1|^a Fn

<∞ a.s.

(9)

In addition, assume that the explosion coefficient f_n tends to zero a.s. and that there exists a positive random sequence (α_n) increasing to infinity and an invertible symmetric matrix L, such that

nlim→∞

1

α_nS_n=L a.s. (2.1)

Then, the following limits hold almost surely

nlim→∞

1 logd_n

Xn k=1

f_k M_k^tS_k⁻₋¹₁M_kp

=ℓ(p) =dσ^2p

p−1

Y

j=1

d+ 2j

, (2.2)

nlim→∞

1 logd_n

Xn k=1

M_k^tS_k⁻₋¹₁M_kp

− M_k^tS_k⁻¹M_kp

=λ(p) = p

dℓ(p). (2.3) Remark 1. The limitℓ(p) corresponds exactly to the moment of order 2pof the norm of a gaussian vectorN(0, σ²I_d), so that theorem 2.1 shows indeed the convergence of moments of order 2pin the ASCLT for vector martingales.

Here the deterministic normalization given in [5] has been replaced by the natural random normalization given by the increasing process.

Remark 2.The convergence hypothesis (2.1) clearly implies f_n→ 0 a.s., if and only ifα_n∼α_n₋₁ a.s. As a matter of fact, we deduce from (2.1) that

nlim→∞

d_n

α^d_n = detL >0 a.s.

Proof. For the sake of shortness, we shall define the following variables.

V_n = M_n^tS_n⁻₋¹₁M_n, (2.4) ϕ_n = α⁻_n¹Φ^t_nL⁻¹Φ_n,

v_n = α⁻_n₋¹₁M_n^tL⁻¹M_n.

First of all, by using the symmetry ofL, the convergence (2.1) ensures that f_n = ϕ_n+o ϕ_n

a.s. (2.5)

V_n = v_n+o v_n

a.s. (2.6)

For we have

f_n=ϕ_n+α⁻_n¹Φ^t_nL⁻^1/2 α_nL^1/2S_n⁻¹L^1/2−I

L⁻^1/2Φ_n.

The matrix R_n = α_nL^1/2S_n⁻¹L^1/2 −I is symmetricand denoting by ρ_n its spectral radius, we can write

α⁻_n¹Φ^t_nL⁻^1/2R_nL⁻^1/2Φ_n

≤ρ_n ϕ_n,

(10)

and ρ_n converges to 0 almost surely, which leads to (2.5). The equation (2.6) is proved in the same way from the decomposition

Vn=vn+M_n^t S_n⁻₋¹₁−α⁻_n₋¹₁L⁻¹ Mn. Hence, by (2.6), Vn^p = vn^p +o vn^p

a.s. In order to find the limit (2.2), il suffices, by Toeplitz lemma, to study the convergence

nlim→∞

1 logd_n

Xn k=1

f_kV_k^p= lim

n→∞

1 logd_n

Xn k=1

ϕ_kv_k^p. (2.7) Theorem 2.1 will be proved by induction with respect to p ≥ 1, as in [1]

in the scalar case. The first step consists in writing a recursive relation for M_n^tL⁻¹M_n.

Let

β_n= tr L⁻^1/2S_nL⁻^1/2

, γ_n= β_n−β_n₋₁ β_n , δ_n= M_n^tL⁻¹Φ_n

β_n , m_n=β_n⁻₋¹₁M_n^tL⁻¹M_n.

(2.8)

According to the definition of (M_n), the following decomposition holds M_n+1^t L⁻¹M_n+1=M_n^tL⁻¹M_n+ 2ε_n+1Φ^t_nL⁻¹M_n+ε²_n+1Φ^t_nL⁻¹Φ_n, so that

m_n+1 = (1−γ_n)m_n+ 2δ_nε_n+1+γ_nε²_n+1. (2.9) The theorem relies essentially on the following lemma.

Lemma 2.2. Under the assumptions of theorem 2.1, we have

nlim→∞

1 logd_n

Xn k=1

γ_km^p_k = ℓ(p)

d^p+1 a.s. (2.10)

In addition, if g_n=M_n^tS_n⁻₋¹₁Φ_n, we also have

nlim→∞

1 logd_n

Xn k=1

(1−f_k)g²_km^p_k⁻¹= λ(p)

pd^p⁻¹ a.s. (2.11) Proof. Raising equality (2.9) to the powerp implies

m^p_n+1 = Xp k=0

Xk ℓ=0

2^k⁻^ℓC_p^kC_k^ℓ γ_n^ℓδ_n^k⁻^ℓ (1−γ_n)m_np−k

ε^k+ℓ_n+1. (2.12)

After some straightforward simplifications, we obtain the relation

m^p_n+1+An(p) =m^p₁+Bn+1(p) +Wn+1(p), (2.13)

(11)

where

An(p) = Xn k=1

β_k⁻^p β_k^p−β^p_k₋₁

m^p_k, Bn+1(p) =

2p−1

X

ℓ=1

Xn k=1

b_k(ℓ)ε^ℓ_k+1,

Wn+1(p) = Xn k=1

γ_k^pε^2p_k+1. For 1≤ℓ≤p−1, we have

b_k(ℓ) =

⌊Xℓ/2⌋ j=0

2^ℓ⁻^2jC_p^ℓ⁻^jC_ℓ^j₋_jγ_k^jδ^ℓ_k⁻^2j (1−γ_k)m_kp−ℓ+j

, while, forp≤ℓ≤2p−1,

b_k(ℓ) =

⌊Xℓ/2⌋ j=ℓ−(p−1)

2^ℓ⁻^2jC_p^ℓ⁻^jC_ℓ^j₋_jγ_k^jδ^ℓ_k⁻^2j (1−γ_k)m_kp−ℓ+j

+C_p^ℓ⁻^p2^2p⁻^ℓ−δ^2p_k ⁻^ℓγ_k^ℓ⁻^p.

In order to take out useful information aboutAn(p), it is necessary to study the asymptotic behaviourWn+1(p),Bn+1(p) and m^pn.

The casep= 1. Remarking that logβ_n∼Pn

k=1γ_k, Chow’s lemma [7, page 22] implies

nlim→∞

1

logβ_nWn+1(1) =σ² a.s.

Applying the strong law of large numbers for martingales and the inequality δ²_n≤γ_nm_n, we getBn+1(1) =o An(1)

a.s. In addition, from relation (2.30) in [17], it follows thatm_n+1=o(logβ_n) a.s. Consequently, by (2.13),

nlim→∞

1

logβ_nAn(1) =σ² a.s.

But the basic convergence assumption (2.1) implies immediately

nlim→∞

β_n

γ_n =d a.s., so that logd_n ∼dlogβ_n a.s. Hence

nlim→∞

1

logd_nAn(1) = σ²

d a.s.,

which establishes (2.10).

(12)

As for the proof of (2.11), one can proceed in the same way, starting from the decomposition

V_n+1 = M_n^tS_n⁻¹M_n+ 2ε_n+1Φ^t_nS_n⁻¹M_n+ε²_n+1f_n

= h_n+ 2ε_n+1g_n+ε²_n+1f_n. which, for all n≥1, leads to

V_n+1+A_n=V₁+B_n+1+W_n+1, (2.14) where

A_n^def= Xn k=1

a_k(1), B_n+1^def= 2 Xn k=1

ε_k+1g_k, W_n+1 ^def= Xn k=1

ε²_k+1f_k.

By Riccati’s formula,

g²_n= (1−f_n)a_n(1), (2.15) and, hence, coupling (2.15) with the strong law of large numbers for martingales,

B_n+1=o A_n a.s.

On the other hand,m_n=o logβ_n

, which, with (2.6), gives the almost sure estimateV_n=o logd_n

. Now, since Pn

k=1f_k∼logdn, Chow’s lemma implies

nlim→∞ logd_n₋1

W_n+1 =σ² a.s.,

and to conclude the proof of Lemma 2.2 forp= 1, it suffices to divide (2.14) by logdn, letting n→ ∞.

The casep≥2. First, using again Chow’s lemma, we can write Wn+1(p) =o logdn

a.s.

Also, we shall show that Bn+1(p) = p

d^p+1ℓ(p) logd_n+o logd_n

+o An(p)

a.s. (2.16) Setting, for 1≤ℓ≤2p−1,

ε^ℓ_n+1 ^def= en+1(ℓ) +E[ε^ℓ_n+1|Fn] =en+1(ℓ) +σn(ℓ), (2.17) we write Pn

k=1b_k(ℓ)ε^ℓ_k+1=Cn+1(ℓ) +Dn(ℓ), with Cn+1(ℓ)^def=

Xn k=1

b_k(ℓ)e_k+1(ℓ), and Dn(ℓ)^def= Xn k=1

b_k(ℓ)σ_k(ℓ).

(13)

First, for anyℓsuch that 1≤ℓ≤p−1, using again the strong law of large numbers and equation (2.30) of [17], we have

Cn+1(ℓ) =o logd_n a.s.

Suppose 3≤ℓ≤p−1. From Hlder’s inequality and the assumptions on the moments of ε_n, it follows that, for all 1≤ j ≤ 2p−1, |σ_n(j)| is bounded, which implies

Dn(ℓ)

=OXⁿ

k=1

γ_k^ℓ/2m^p_k⁻^ℓ/2 .

For even ℓ, the induction assumption leads toDn(ℓ) =o(logd_n) a.s. When ℓis odd, Cauchy Schwarz inequality and the induction assumption yield

|Dn(ℓ)|=O Xⁿ

k=1

γ_km^p_k⁻¹1/2 Xn k=1

γ_k^ℓm^p_k⁻^ℓ1/2

=o logd_n a.s.

Suppose nowp≤ℓ≤2p−1. It is easy to obtain

|Dn(ℓ)|=OXⁿ

k=1

γ_k^ℓ/2m^p_k⁻^ℓ/2 a.s.

Now, from the induction assumption, we get, for any integerℓ6= 2, Dn(ℓ) =o logd_n

a.s.

It remains to study Cn+1(ℓ). By Chow’s lemma, we have the almost sure equality

Cn+1(ℓ) =o ν_n(ℓ)

, with ν_n(ℓ) = Xn k=1

|b_k(ℓ)|^2p/ℓ =OXⁿ

k=1

γ_k^pm^(2p_k ⁻^ℓ)p/ℓ . For ℓ > p, we apply Hlder’s inequality with exponents ℓ/p and ℓ/(ℓ−p).

Then, ν_n(ℓ) = o(logd_n) a.s. In the particular case p = ℓ, we get by the strong law of large numbers

|Cn+1(ℓ)|²=O τ_n(p) logτ_n(p)

with τ_n(p) = Xn k=1

b_k(p)²=OXⁿ

k=1

γ_k^pm^p_k , which leads to|Cn+1(ℓ)|=o(logd_n) a.s., since, from equation (2.30) of [17], m^p_k =o (logd_n)^δ

, for 0< δ <1. As for the last term Dn(2), one needs to study, forp≥3, the quantity

Dn(2)

logd_n = σ² logd_n

Xn k=1

X1 j=0

2²⁻^2jC_p²⁻^jC₂^j₋_jγ_k^jδ²_k⁻^2jm^p_k⁻^2+j,

= 2p(p−1)σ² logd_n

Xn k=1

δ_k²m^p_k⁻²+ pσ² logd_n

Xn k=1

γ_km^p_k⁻¹.

(14)

It is easy to establish

g²_n= Φ^t_nS_n⁻₋¹₁M_n2

=d² δ²_n+o γ_nm_n

. (2.18)

Then, the induction assumption and Toeplitz lemma imply

nlim→∞

1 logdn

Xn k=1

δ_k²m^p_k⁻²= lim

n→∞

1 d²logdn

Xn k=1

a_k(1)m^p_k⁻² = λ(p−1) (p−1)d^p a.s.

Thus, we obtain

nlim→∞

Dn(2) logdn

=ℓ(p−1)2p(p−1)σ² d^p+1 +pσ²

d^p

= p

d^p+1ℓ(p) a.s., [this formula being also valid forp= 2] which proves (2.16).

On the other hand, still applying equation (2.30) of [17], we derive the estimatem^pn=o(logdn). Thus,

nlim→∞

1

logdnAn(p) = p

d^p+1ℓ(p) a.s.

Since

βn^p−β_n^p₋₁ β_n^p =γ_n

p−1

X

q=0

β_n₋₁ β_n

p−1−q

∼pγ_n a.s., (2.19) we get finally

nlim→∞

1 logd_n

Xn k=1

γ_km^p_k= lim

n→∞

1

plogd_nAn(p) = ℓ(p) d^p+1, and the proof of (2.10) is terminated.

As for the second part of Lemma 2.2, i.e. equation (2.11), we could proceed along similar lines, via the equality

V_n+1^p = Xp k=0

Xk ℓ=0

2^k⁻^ℓC_p^kC_k^ℓ f_n^ℓg_n^k⁻^ℓh^p_n⁻^kε^k+ℓ_n+1, withg_n= Φ^t_nS⁻_n₋¹₁M_n and h_n=M_n^tS_n⁻¹M_n.

The proof of theorem 2.1 is completed as relations (2.10) and (2.11) are clearly direct consequences of (2.2) and (2.3), respectively. Indeed, since β_n∼dα_n, we have almost surely

V_n ∼dm_n, and f_n∼dγ_n,

(15)

hence convergence (2.10) immediately yields (2.2). Moreover in the particular case p = 1, the second convergence (2.11) is exactly (2.3). Now, for p≥2, the elementary expansion

x^p−y^p = (x−y)x^p⁻¹

p−1

X

q=0

y x

p−1−q

(2.20) leads to

a_n(p) = M_n^tS_n⁻₋¹₁M_np

− M_n^tS_n⁻¹M_np

=an(1)V_n^p⁻¹

p−1

X

q=0

Vn−an(1) V_n

p−1−q

.

(2.21)

Riccati’s formula yields

a_n(1) = (1−f_n)M_n^tS_n⁻₋¹₁Φ_nΦ^t_nS_n⁻₋¹₁M_n

≤ (1−fn) tr(S_n⁻₋^1/2₁ ΦnΦ^t_nS_n⁻₋^1/2₁ )Vn

≤ f_nV_n, hencea_n(1) =o(V_n) and, by (2.21),

a_n(p)∼pa_n(1)V_n^p⁻¹ a.s., so that finally

nlim→∞

1 logdn

Xn k=1

a_k(p) = lim

n→∞

pd^p⁻¹ logdn

Xn k=1

a_k(1)m^p_k⁻¹=λ(p) a.s. (2.22)

In most of statistical applications encountered so far, the convergence assumption (2.1) is satisfied. However, this technical hypothesis somehow cir- cumvents the vector problem, which in its full generality is not yet solved.

Indeed, (2.1) entails that all eigenvalues of the matrixS_ngrow to infinity at the same speedα_n. Thus, our method of proof as some features in common with the scalar case. Hopefully, we should be able to establish the following result, stated for the moment as a conjecture, without assuming (2.1).

Conjecture 2.3.Let (ε_n) be a martingale difference sequence satisfying the homogeneity condition E[ε²_n+1|Fn] = σ² a.s. and assumption (H_p) intro- duced in theorem 2.1, for some integer p≥1. Then, we have

Xn k=1

f_k M_k^tS_k⁻₋¹₁M_kp

=O logd_n

a.s.

Xn k=1

M_k^tS_k⁻₋¹₁M_kp

− M_k^tS_k⁻¹M_kp

=O logd_n

a.s.