Comparison between two types of large sample covariance matrices

(1)

www.imstat.org/aihp 2014, Vol. 50, No. 2, 655–677

DOI:10.1214/12-AIHP506

Comparison between two types of large sample covariance matrices

Guangming Pan

Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371.

E-mail:gmpan@ntu.edu.sg

Received 16 March 2011; revised 18 June 2012; accepted 21 June 2012

Abstract. Let {X_ij},i, j = · · ·,be a double array of independent and identically distributed (i.i.d.) real random variables with EX₁₁=μ, E|X11−μ|²=1 andE|X11|⁴<∞. Consider sample covariance matrices (with/without empirical centering)S=

1 n

_n

j=1(s_j− ¯s)(s_j− ¯s)^T andS= _n¹_n

j=1s_js^T_j, where¯s= ¹_n_n

j=1s_j ands_j=T^1/2_n (X_1j, . . . , X_pj)^T with(T^1/2_n )²=T_n, non-random symmetric non-negative definite matrix. It is proved that central limit theorems of eigenvalue statistics ofSandS are different asn→ ∞withp/napproaching a positive constant. Moreover, it is also proved that such a different behavior is not observed in the average behavior of eigenvectors.

Résumé. Soit{X_ij},i, j=1,2, . . . ,un tableau à double entrées, lesX_ij étant des variables aléatoires réelles indépendantes et identiquement distribuées (i.i.d.) et oùEX₁₁=μ,E|X₁₁−μ|²=1 etE|X₁₁|⁴<∞. Considérons les matrices de covariances empiriques suivantes (avec/sans centrage empirique):S=_n¹_n

j=1(s_j− ¯s)(s_j− ¯s)^T etS=¹_n_n

j=1s_js^T_j, avecs¯= ¹_n_n j=1s_jet s_j=T^1/2_n (X_1j, . . . , X_pj)^T, où(T^1/2_n )²=Tnest une matrice déterministe définie positive. Nous démontrons que, sous le régime asymptotiquen→ ∞etp/nconverge vers une constante positive, le théorème central limite pour la statistiqueSest différent de celui concernant la statistiqueS. En outre, nous montrons que cette différence de comportement n’est pas observée pour le comportement moyen des vecteurs propres.

MSC:15A52; 60F15; 62E20; 60F17

Keywords:Central limit theorems; Eigenvectors and eigenvalues; Sample covariance matrix; Stieltjes transform; Strong convergence

1. Introduction

Let {Xij},i, j = · · ·,be a double array of independent and identically distributed (i.i.d.) real random variables with EX₁₁=μ, E|X₁₁−μ|²=1 and E|X₁₁|⁴<∞. Write xj =(X_1j, . . . , X_pj)^T,sj=T^1/2n xj andXn=(s1, . . . ,sn) where(T^1/2n )²=Tn, non-random symmetric non-negative definite matrix. It is well known that the sample covariance matrix is defined by

S=1 n

n j=1

(sj− ¯s)(sj− ¯s)^T,

(2)

wheres¯= ¹_nn

j=1sj =T^1/2n x¯ with x¯= ¹_nn

j=1xj. It lies at the roots of many methods in multivariate analysis (see [1]). However, the commonly used sample covariance matrix in random matrix theory is

S=1 n

n j=1

sjs^T_j.

Since the matrixShas been well studied in the literature (see [5]), a natural question is that whether the asymptotic results for eigenvalues and (or) eigenvectors ofSapply for the matrixS as well. This paper attempts to address it.

First, observe that

S=S− ¯s¯s^T (1.1)

and thus by the rank inequality (see Theorem A.44 in [5]) there is no difference when the limiting empirical spectral distribution (ESD) of eigenvalues is our concern only. Here the ESD of a matrixBis defined by

F^B(x)= 1 p

p i=1

I (μ_i≤x), (1.2)

whereμ₁, . . . , μ_pare the eigenvalues ofB. WhenF^Tⁿconverges weakly to a non-random distributionHandp/n→ c >0, Marcenko and Pastur [14], Yin [23] and Silverstein [19] proved that, with probability one, F^(1/n)X^Tⁿ^Xⁿ(x) converges in distribution to a nonrandom distribution functionF_c,H(x)whose Stieltjes transformm(z)=mF_c,H(z) is, for eachz∈C⁺= {z∈C: z >0}, the unique solution to the equation

m= −

z−c

tdH (t ) 1+t m

₋1

. (1.3)

Here the Stieltjes transformm_F(z)for any probability distribution functionF (x)is defined by m_F(z)=

1

x−zdF (x), z∈C⁺. (1.4)

Noting the relationship between the spectra ofSandn⁻¹X^T_nXn, we have m(z)= −1−c

z +cm(z), (1.5)

wherem(z)is the Stieltjes transform ofF_c,H(x), the limit ofF^S.

Particularly, when the population matrixTnis the identity matrix the limiting ESD ofS is Marcenko and Pastur’s lawF_c(x)(see [14] and [12]), which has a density function

p_c(x)=

(2πcx)⁻¹√

(b−x)(x−a), a≤x≤b,

0, otherwise

and has point mass 1−c⁻¹at the origin ifc >1, wherea=(1−√

c)²andb=(1+√ c)².

Secondly, whenTn=I, it was reported in [11] that the maximum eigenvalues ofS andS have the same limit, while, [22] announced that the minimum eigenvalues ofSandSalso share the identical limit. That is,

λmax(S)−→^a.s. (1+√

c)² (1.6)

and whenc <1

λmin(S)−→^a.s. (1−√

c)², (1.7)

(3)

whereλ_max(S)andλ_min(S)denote, respectively, the maximum and minimum eigenvalues ofS. Moreover, it was proved in [10] that asymptotic distribution of the maximum eigenvalues ofSafter centering and scaling is the Tracy–

Widom law of order 1 whenShas a Wishart distributionW_p(n,I). In this case, note thatnShas the same distribution as_n₋₁

j=1xjx^T_j and hence it is not difficult to prove that the maximum eigenvalues ofSandSshare the same central limit theorem.

Thirdly, remarkable central limit theorems for the eigevalues ofSwere established in [3]. Moreover, the constraint that EX₁₁⁴ =3, imposed in [3], was removed in [16]. Since the ESDs ofSandS have the same limit it seems that they might have the same central limit theorem for eigenvalues. Surprisingly, a detailed investigation shows they have different central limit theorems. The main reason is thats¯involved inS contributes to the central limit theorem as well. Formally, let

G_n(x)=p

F^S(x)−F_c_n_,H_n(x) ,

wherec_n=^p_n andF_c_n_,H_n(x)denotes the function by substitutingc_nforcandH_nforHinF_c,H(x).

Theorem 1. Suppose that

(1) For each n X_ij =Xⁿ_ij, i, j=1,2, . . . ,are i.i.d.real random variables withEX₁₁=μ, E|X₁₁−μ|²=1 and E|X₁₁|⁴<∞.

(2) limn→∞p

n=c∈(0,∞).

(3) g₁, . . . , g_kare functions onRanalytic on an open regionDof the complex plane,which contains the real interval

lim inf

n λmin(Tn)I_(0,1)(c)(1−√

c)²,lim sup

n

λmax(Tn)(1+√ c)²

.

(4) LetTnbe ap×pnon-random symmetric non-negative definite matrix with spectral norm bounded above by a positive constant such thatHn=F^Tⁿconverges weakly to a non-random distributionH.

(5) Letei be the vector of sizep×1with theith element1and others0.The matrixTnalso satisfies 1

n n i=1

e^T_i T^1/2n

m(z₁)Tn+I ⁻¹T^1/2n eie^T_i T^1/2n

m(z₂)Tn+I ⁻¹T^1/2n ei→h₁(z₁, z₂)

and 1 n

n i=1

e^T_i T^1/2n

m(z)Tn+I ⁻¹T^1/2n eie^T_i T^1/2n

m(z)Tn+I⁻²T^1/2n ei→h2(z).

Then(

g1(x)dGn(x), . . . ,

gk(x)dGn(x))converges weakly to a Gaussian vector(Xg₁, . . . , Xg_k),with mean

EX_g= − c 2πi

g(z)

m³(z)t²dH (t) (1+t m(z))³

(1−c _m²_(z)t²dH (t) (1+t m(z))² )²

dz−c(E(X11−μ)⁴−3) 2πi

g(z) m³(z)h2(z) 1−c _m²_(z)t²dH (t)

(1+t m(z))²

dz

+ c 2πi

g(z)

m(z) _t_{dH (t)}

(1+t m(z))²

z(1−c _(m(z))²_t²_{dH (t)}

(1+t m(z))² )

dz (1.8)

and covariance function Cov(Xg₁, Xg₂)= − 1

2π²

g₁(z₁)g₂(z₂) (m(z₁)−m(z₂))²

d dz1

m(z1) d dz2

m(z2)dz1dz2

−c(E(X11−μ)⁴−3)

4π² g₁(z₁)g₂(z₂) d² dz1dz2

m(z₁)m(z₂)h₁(z₁, z₂)

dz1dz2. (1.9)

(4)

The contours in(1.8)and(1.9)are both contained in the analytic region for the functionsg₁, . . . , g_kand both enclose the support ofF_c_n_,H_n(x)for largen.Moreover,the contours in(1.9)are disjoint.

Remark 1. From Theorem1.1of[3],Theorem1.4of[16]and Theorem1we see that functionals of eigenvalues ofS andS have the same asymptotic variances but they have different asymptotic means.Indeed,the additional term in the asymptotic mean is the last term of(1.8),which can be further written as

c 2πi

g(z)

m(z) _t_{dH (t)}

(1+t m(z))²

z(1−c _(m(z))²_t²_{dH (t)}

(1+t m(z))² )

dz= −1 π

_b

a

g(x)arg xm(x)

dx, (1.10)

wherem(x)=limz→xm(z)for0 =x∈R.Here one should note that that Theorem1.1of[3]did not handle the case when the common mean of the underlying random variables is not equal to zero.

Remark 2. When developing central limit theorems,it is necessary to study the limits of the products of(E(X11− μ)⁴−3)and the diagonal elements of (S−zI)⁻¹.For general Tn,the limits of such diagonal elements are not necessary the same.That is why Assumption5has to be imposed.Roughly speaking,Assumption5is some kind of rotation invariance.For example, it is satisfied ifTn is the inverse of another sample covariance matrix with the population matrix being the identity.However,whenE(X11−μ)⁴=3,Assumption5can be removed as well because such diagonal elements disappear in this case thanks to the special fourth moment and one may see[3].Also,when Tnis a diagonal matrix Assumption5in Theorem1is automatically satisfied,as pointed out in[16],and in this case

h1(z1, z2)=

t²dH (t )

((m(z₁)t+1))((m(z2)t+1)), h2(z)=

t²dH (t ) (m(z)t+1)³. Particulary,whenTn=Iwe have

EX_g= 1 2π

_b

a

g(x)arg

1− cm²(x) (1+m(x))²

dx− 1

π _b

a

g(x)arg xm(x)

dx

−c(E(X₁₁−μ)⁴−3) π

b a

g(x)

m³(x)

(m(x)+1)[(m(x)+1)²−cm²(x)]

dx, (1.11)

where

m(x)=−(x+1−c)+√

(x−a)(b−x)i

2x ,

(see[3]).For some simplified formulas for covariance function,refer to(1.24)in[3]and(1.24)in[16].

Finally, let us take a look at the eigenvectors ofSandS. The matrix of orthonormal eigenvectors ofShas a Haar measure on the orthogonal matrices whenX₁₁ is normally distributed and Tn=I. It is conjectured that for large n and for generalX₁₁ the matrix of orthonormal eigenvectors of S is close to being Haar distributed. Silverstein (1981) created an approach to address it. Further work along this direction can be found in Silverstein [18], [20], Bai, Miao and Pan [4] and Ledoit and Peche [13]. Here we would also like to point out that there are similar interests in eigenvectors of large Wigner matrix and one may see [6,7,9] and [21]. To consider the matrixS, let U_nΛ_nU_n^T denote the spectral decomposition ofS, whereΛ_n=diag(λ1, λ₂, . . . , λ_p), U_n=(u_ij)is an orthogonal matrix con- sisting of the orthonormal eigenvectors ofS. Assume thatxn∈R^p,xn =1,is an arbitrary non-random unit vector andy=(y1, y2, . . . , yp)^T =U_n^Txn.Following their approach, we define an empirical distribution function based on eigenvectors and eigenvalues as

F₁^S(x)= p i=1

|yi|²I (λi≤x), (1.12)

(5)

whereλ_isare eigenvalues ofS. Based on the above empirical spectral distribution function, we may further investigate central limit theorems of functions of eigenvalues and eigenvectors.

As will be seen below, the asymptotic properties of the eigenvectors matrix ofSare the same as those ofSin terms of this property (note: EX₁₁=0 required forS). However, the advantage ofS overSis that the common mean of underlying random variables is not necessarily zero to keep these types of properties. Unfortunately, this is not the case forS. For example, considere^T₁Se1whenX_ij are i.i.d. withEX11=μand Var(X11)=1, andTn=I. Then it is a simple matter to show that

e^T₁Se1

−→a.s. 1+μ²

which is dependent onμ, different from Corollary 1 in Bai, Miao and Pan [4] whenμ =0. Indeed, when dealing with central limit theorems of functions of eigenvalues and eigenvectors, [18] also proved thatEX₁₁=0 is a necessary condition to keep this type of property for the matrixSwithTn=I.

Formally, let G_n1(x)=√

n

F₁^S(x)−F_c_n_,H_n(x) .

Theorem 2. In addition to Conditions(1), (2)and(4)in Theorem1suppose thatxn∈R^p₁= {x∈R^p,x =1}and x^T_n(Tn−zI)⁻¹xn→m_H(z)wherem_H(z)denotes the Stieltjes transform ofH (x).Then,it holds that

F₁^S(x)→F_c,H(x) a.s.

By Theorem2, we obtain

Corollary 1. Let(S^m)_ii, m=1,2, . . . ,denote theith diagonal elements of matrices S^m.Under the conditions of Theorem2forxn=ei,then for any fixedm,

nlim→∞

S^m _ii−

x^mdFc,H(x)

→0, a.s.

Theorem 3. In addition to the assumptions in Theorem1and Theorem2suppose that asn→ ∞ sup

z

√n x^T_n

m_F

c,H(z)Tn+I ⁻¹xn−

dHn(t) m_F

c,H(z)t+1

→0. (1.13)

Then the following conclusions hold.

(a) The random vector

g₁(x)dGn1(x), . . . ,

g_k(x)dGn1(x)

forms a tight sequence.

(b) IfE(X₁₁−μ)⁴=3, the above random vector converges weakly to a Gaussian vectorX_g₁, . . . , X_g_k,with zero means and covariance function

Cov(Xg1, X_g₂)= − 1

2π² g₁(z₁)g₂(z₂) (z2m(z2)−z1m(z1))²

c²z1z2(z2−z1)(m(z2)−m(z1))dz1dz2, (1.14) where the contours in(1.14)are the same as in those in(1.9).

(c) Instead ofE(X11−μ)⁴=3,assuming that maxi

e^∗_iT^1/2n

zm(z)Tn+zI ⁻¹xn→0, (1.15)

the assertions in(b)still holds.

(6)

Remark 3. Assumptions(1.13)and(1.15)are imposed for the same reason as given in Remark2.Besides,whenTn

reduces to the identity matrix, (1.15)becomes

maxi |x_ni| →0. (1.16)

As can be seen from(1.14)the asymptotic covariance does not depend on the fourth moment ofX₁₁ and hence we have to impose the condition like(1.16)to ensure that the term involving the product of the fourth moment ofX₁₁and the diagonal elements of(S−zI)⁻¹converges to zero in probability.One may see[16]for more details.

Remark 4. As pointed out in[4],whenTn=I condition(1.13)holds and formula(1.14)becomes Cov(Xg1, X_g₂)=2

c

g₁(x)g₂(x)dFc(x)−

g₁(x₁)dFc(x₁)

g₂(x₂)dFc(x₂)

. (1.17)

Theorem1essentially boils down to the study of the Stieltjes transform ofF^S(x)while Theorems2and3boils down to the study of the Stieltjes transform ofF₁^S(x). In view of this we conjecture the same phenomenon holds for other objects involvingF^S(x)orF₁^S(x). For example we believe central limit theorems of the normalized empirical spectral distribution functions ofSandSare different ([17] considers central limit theorems of the smoothed empirical spectral distribution functions).

We conclude this section by stating the structure of this work. Section 2 gives the proof of Theorem1. Sections 3 and 4 pick up the proofs of Theorems2and3, respectively. Throughout the paperMandC denote constants which may change from line to line, and all matrices and vectors are denoted by boldface letters.

2. Proof of Theorem1and (1.11)

The proof of Theorem1is essentially based on the Stieltjes transform following [3]. First, by analyticity of functions it is enough to consider the Stieltjes transforms of the empirical functions of sample covariance matrices. Moreover, note that the Stieltjes transform of the empirical function ofS can be decomposed as the sum of the Stieltjes transform of the empirical function ofS and some random variable involving sample mean andS. Thus, our main aim is to prove that the extra random variable converges in probability in theCspace to some nonrandom variable. Once this is finished, Theorem1follows from Slutsky’s theorem, the continuous mapping theorem and the results in [3] and [16].

Before proceeding we observe the following useful fact. By the structure ofS, without loss of generality, we can assumeEX₁₁=0, EX²₁₁=1 in the course of establishing Theorems1–3 (but the fourth moment will beE(X₁₁− μ)⁴).

2.1. Truncation of underlying random variables and random processes

We begin the proof with the replacement of the underlying random variablesX_ij with truncated and centralized variables. To this end, write

S=1

n(Xn−B)

X^T_n −B^T ,

whereB=(s,¯ s, . . . ,¯ ¯s)p×n.Since the argument for (1.8) (and two lines below (1.8)) in [3] can be carried over directly to the present case we can choose a positive sequenceεnsuch that

ε_n→0, ε_n⁻⁴E|X₁₁|⁴I

|X₁₁| ≥ε_n√

n →0, ε_nn^1/4→ ∞. (2.18)

LetSˆ=(1/n)(Xˆn− ˆB)(Xˆ^T_n − ˆB^T)whereXˆn andBˆ are respectively obtained fromXn andBwith the entriesXij

replaced byX_ijI (|X_ij|< ε_n√

n). We then obtain P (Sˆ =S)≤P

i≤p,j≤n

|X_ij| ≥ε_n√

n ≤pnP

|X11| ≥ε_n√ n

≤Mε⁻_n⁴E|X11|⁴I

|X11| ≥εn

√n →0, (2.19)

(7)

where and in what followsMdenotes a constant which may change from line to line.

DefineS˜=(1/n)(Xñ− ˜B)(X˜^T_n − ˜B^T)whereXñandB˜ are respectively obtained fromXn andBwith the entries Xij replaced by(Xîj−EXîj)/σnwithXîj=XijI (|Xij|< εn√

n)andσ_n²=E| ˆXij−EXîj|². Denote byGñ(x)and Gˆ_n(x)the analogues ofG_n(x)with the matrixS replaced bySândS˜respectively. As in [3] (one may also refer to the proof of Corollary A.42 of [5]) we have for anyg(x)∈ {g₁(x), . . . , g_k(x)},

g_j(x)dG˜_n(x)−

g_j(x)dGˆ_n(x)

≤M n k=1

λ^S_k^˜−λ^S_k^ˆ

≤ M

√n tr

(Xˆn− ˆB)−(X˜n− ˜B) (Xˆn− ˜Xn)−(Bˆ− ˜B)^T1/2

tr(S˜+ ˆS)1/2

=Mσn

1− 1

σ_n

[trS˜]^1/2

tr(S˜+ ˆS)1/2

≤M

σ_n²−1 nλ_max(S˜)1/2

nλ_max(S˜)+nλ_max(Sˆ)1/2

, where the third step uses the fact that by the structure of(Xˆn− ˆB),

1 n

(Xˆn− ˆB)−(X˜n− ˜B) (Xˆn− ˜Xn)−(Bˆ− ˜B)^T =σ_n²

1− 1 σ_n

2

S˜.

From [23] andTn ≤M we obtain lim sup_nλ_max(S˜)is bounded byM(1+√

c)²with probability one. Similarly, lim sup_nλ_max(Sˆ)is bounded byM(1+√

c)²with probability one by the structure ofSˆand estimate (2.20) below.

Moreover it is not difficult to prove that σ_n²−1=o

ε²_nn⁻¹ . (2.20)

Summarizing the above we obtain

g(x)dGˆ_n(x)−

g(x)dG˜_n(x)−→^i.p. 0.

This, together with (2.19), implies that

g(x)dGn(x)−

g(x)dG˜n(x)−→^i.p. 0.

Therefore, in what follows, we may assume that

|X_ij| ≤√

nε_n, EX_ij=0, E|X_ij|²=1 (2.21)

and for simplicity we shall suppress all subscripts or superscripts on the variables.

As in [3], by Cauchy’s formula, (1.6) and (1.7), with probability one for allnlarge,

g(x)dGn(x)= − 1 2πi

g(z)

tr(S−zI)⁻¹−nm_c_n(z) dz, (2.22)

wherem_c_n(z)is obtained fromm(z)withcreplaced byc_nand the complex integral is overC. The contourCis speci- fied below. Letv₀>0 be arbitrary and setCu= {u+iv0, u∈ [u_l, u_r]},whereu_r>lim sup_nλ_max(Tn)(1+√

c)²and 0< u_l<lim infnλ_min(Tn)I_(0,1)(c)(1−√

c)²oru_lis any negative number if lim infnλ_min(Tn)I_(0,1)(c)(1−√ c)²=0.

Then define C⁺=

ul+iv: v∈ [0, v0]

∪Cu∪

ur+iv: v∈ [0, v0]

(8)

and letC⁻be the symmetric part ofC⁺about the real axis. Then setC=C⁺∪C⁻. Moreover, since tr(S−zI)⁻¹=trA⁻¹(z)+ s¯^TA⁻²(z)s¯

1− ¯s^TA⁻¹(z)¯s, we have

tr(S−zI)⁻¹−nm_c_n(z) =

trA⁻¹(z)−nm_c_n(z)

+ s¯^TA⁻²(z)s¯

1− ¯s^TA⁻¹(z)¯s, (2.23)

whereA⁻¹(z)=(S−zI )⁻¹. The first term on the right-hand side above was investigated in [3].

We next consider the second term on the right-hand side in (2.23). To this end, introduce a truncation version of

¯

s^TA⁻²(z)s. Define¯ Cr = {u_r+iv: v∈ [n⁻¹ρ_n, v₀]}, Cl= ul+iv:v∈ [n⁻¹ρn, v0

iful>0, ul+iv:v∈ [0, v0]

iful<0, where

ρ_n↓0, ρ_n≥n⁻^α, for someα∈(0,1). (2.24)

LetCn⁺=Cl∪Cu∪Cr andC⁻n denote the symmetric part of Cn⁺ with respect to the real axis. We then define the truncated process¯s^TA⁻²(z)¯sof the processs¯^TA⁻²(z)s¯forz=u+ivby

¯ s^TA⁻²(z)s¯=

⎧⎪

⎨

⎪⎩

¯

s^TA⁻²(z)s¯ forz∈Cn=Cn⁺∪Cn⁻,

nv+ρ_n

2ρn s¯^TA⁻²(z_r₁)s¯+^ρⁿ_2ρ⁻_n^nvs¯^TA⁻²(z_r₂)s¯ foru=u_r, v∈

−n⁻¹ρ_n, n⁻¹ρ_n ,

nv+ρn

2ρ_n s¯^TA⁻²(z_l₁)s¯+^ρⁿ_2ρ⁻_n^nvs¯^TA⁻²(z_l₂)s¯ foru=u_l>0, v∈

−n⁻¹ρ_n, n⁻¹ρ_n ,

wherezr₁ =ur−in⁻¹ρn, zr₂ =ur+in⁻¹ρn, zl₁=ul−in⁻¹ρnandzl₂=ul+in⁻¹ρn. Similarly one may define the truncation version¯s^TA⁻¹(z)¯sof¯s^TA⁻¹(z)¯s.

From Theorem 2 in [15] we have

¯s^T¯s≤ Tnx¯^Tx¯≤Mx¯^Tx¯−→^i.p. Mc. (2.25)

Note that

¯

s^T(S−zI )⁻¹s¯= ¯s^TA⁻¹(z)¯s+ (s¯^TA⁻¹(z)s)¯ ² 1− ¯s^TA⁻¹(z)¯s, where we use the identity

r^T

D+arr^T ⁻¹= r^TD⁻¹

1+ar^TD⁻¹r, (2.26)

whereDandD+arr^T are both invertible,r∈R^panda∈R. This implies that 1

1− ¯s^TA⁻¹(z)s¯=1+ ¯s^T(S−zI )⁻¹s.¯ (2.27)

We then conclude that 1

1− ¯s^TA⁻¹(z)¯s ≤1+

M

v₀+ M

|λ_max(S)−u_r|+ M

|λ_min(S)−u_l|

¯

s^Ts.¯ (2.28)

(9)

This, together with (2.25), (1.6) and (1.7), ensures that

g(z)

¯s^TA⁻²(z)s¯ 1− ¯s^TA⁻¹(z)¯s

− s¯^TA⁻²(z)¯s 1− ¯s^TA⁻¹(z)¯s

dz

≤

g(z)

(¯s^TA⁻²(z)¯s− ¯s^TA⁻²(z)¯s) 1− ¯s^TA⁻¹(z)s¯

dz

+

g(z)

s¯^TA⁻²(z)s(¯ ¯s^TA⁻¹(z)¯s− ¯s^TA⁻¹(z)¯s) (1− ¯s^TA⁻¹(z)s)(1¯ − ¯s^TA⁻¹(z)s)¯

dz

≤[(¯s^T¯s)²+(¯s^Ts)¯ ⁴]Mρ_n n

1

|λ_max(S)−u_r|+ 1

|λ_min(S)−u_l|

1

|λ_max(S)−u_r|+ 1

|λ_min(S)−u_l|

, (2.29) converging to zero in probability.

The aim of subsequent Sections 2.2–2.4 is to prove convergence ofs¯^TA⁻^j(z)¯s, j =1,2,in probability to non- random variables in the space of continuous functions forz∈C. It is well known that convergence in probability is equivalent to convergence in distribution when the limiting variable is nonrandom (for example see page 25 of [8]).

Thus, instead, we shall prove convergence ofs¯^TA⁻^j(z)¯s, j=1,2,in distribution to nonrandom variables in the space of continuous functions forz∈C.

2.2. Convergence of finite-dimensional distributions of¯s^TA⁻^j(z)s¯−E(¯s^TA⁻^j(z)¯s),j=1,2

Considerz∈Cuand¯s^TA⁻²(z)¯s−E(s¯^TA⁻²(z)s)¯ first. Define theσ-fieldFj=σ (s1, . . . ,sj), and letE_j(·)=E(·|Fj) andE0(·)be the unconditional expectation. Introduces¯j= ¯s−n⁻¹sj,

A⁻_j¹(z)=

S−n⁻¹sjs^T_j −zI ⁻¹, γ_j(z)=1

ns^T_jA⁻_j¹(z)sj−E1

ntrA⁻_j¹(z)Tn,

β_j(z)= 1

1+(1/n)s^T_jA⁻_j¹(z)sj

, b₁(z)= 1

1+(1/n)Et rA⁻₁¹(z)Tn

.

Write

¯

s^TA⁻²(z)¯s−E

¯

s^TA⁻²(z)¯s = n j=1

Ej

s¯^TA⁻²(z)s¯ −Ej−1

¯s^TA⁻²(z)¯s

= n j=1

(E_j−E_j₋₁)

¯

s^TA⁻²(z)¯s− ¯s^T_jA⁻_j²(z)¯sj

= n j=1

(E_j−E_j₋₁)[d_n1+d_n2+d_n3],

where

d_n1=(¯s− ¯sj)^TA⁻²(z)s,¯ d_n2= ¯s^T_j

A⁻²(z)−A⁻_j²(z)s,¯ and

dn3= ¯s^T_jA⁻_j²(z)(¯s− ¯sj).

(10)

Considerd_n3first. It is proved in (3.12) of [15] that Ex^T₁B¯x1^k=O

n^(k/2)⁻²ε^k_n⁻⁴ fork≥4, (2.30)

wherex1andx¯j= ¯x−n⁻¹xjare independent ofBwith a bounded spectral norm. This implies that Es^T₁A⁻_j²(z)¯s1^k=Ex^T₁T^1/2n A⁻_j²(z)T^1/2n x¯1^k=O

n^(k/2)⁻²ε_n^k⁻⁴ fork≥4, (2.31)

where yields n j=1

(E_j−E_j₋₁)d_n3−→^i.p. 0.

Furthermore, write

dn1=d_n1⁽¹⁾+d_n1⁽²⁾+d_n1⁽³⁾+d_n1⁽⁴⁾, where

d_n1⁽¹⁾=1

ns^T_jA⁻_j²(z)¯sj, d_n1⁽²⁾= 1

n²s^T_jA⁻_j²(z)sj

and

d_n1⁽³⁾=1 ns^T_j

A⁻²(z)−A⁻_j²(z) s¯j, d_n1⁽⁴⁾= 1 n²s^T_j

A⁻²(z)−A⁻_j²(z) sj. From (2.31) we have

n j=1

(E_j−E_j₋₁)d_n1⁽¹⁾−→^i.p. 0.

By Lemma 2.7 in [3] and (2.21) n⁻^kEs^T₁B(z)s1−trBT^k=O

ε^2k_n ⁻⁴n⁻¹ , k≥2. (2.32)

This gives E

n j=1

(Ej−Ej−1)d_n1⁽²⁾

2

= 1 n⁴

n j=1

E(Ej−Ej−1)

s^T_jA⁻_j²(z)sj−trTnA⁻_j²(z) ²=O n⁻² .

SinceA⁻²(z) ≤1/v²we obtain E

n j=1

(Ej−Ej−1)d_n1⁽⁴⁾

2

≤ M n⁴v⁴

n j=1

Esj⁴≤M n. Note that

A⁻¹(z)−A⁻_j¹(z)= −1

nA⁻_j¹(z)sjs^T_jA⁻_j¹(z)β_j(z). (2.33)

Applying it we may write d_n1⁽³⁾= −1

n²s^T_jA⁻_j¹(z)sjs^T_jA⁻_j²(z)s¯jβ_j(z)+ 1

n³s^T_jA⁻_j¹(z)sjs^T_jA⁻_j²(z)sjs^T_jA⁻_j¹(z)¯sjβ_j²(z)

− 1

n²s^T_jA⁻_j²(z)sjs^T_jA⁻_j¹(z)¯sjβ_j(z).