• Aucun résultat trouvé

Universality of the eigenvalues of the sample covariance matrixX†X atbothedges of the spectrum

N/A
N/A
Protected

Academic year: 2022

Partager "Universality of the eigenvalues of the sample covariance matrixX†X atbothedges of the spectrum"

Copied!
61
0
0

Texte intégral

(1)

UNIVERSALITY OF COVARIANCE MATRICES By Natesh S. Pillai, Jun Yin

In this paper we prove the universality of covariance matrices of the formHN×N =

1

NXX where [X]M×N is a rectangular matrix with independent real valued entries [xij] satisfying Exij = 0 and Ex2ij = M1, N, M → ∞. Furthermore it is assumed that these entries have sub-exponential tails. We will study the asymptotics in the regimeN/M=dN (0,),limN→∞dN 6= 1. Our main result states that the Stieltjes transform of the empirical eigenvalue distribution ofH is given by the Marcenko-Pastur law uniformly up to the edges of the spectrum with an error of order (N η)−1 whereη is the imaginary part of the spectral parameter in the Stieltjes transform. From this strong local Marcenko-Pastur law, we derive the following results. 1. The rigidity of eigenvalues: Ifγj =γj,N denotes theclassical locationof thej-th eigenvalue under the Marcenko Pastur law ordered in increasing order, then the j-th eigenvalueλj of H is close toγj in the sense that for some positive constantsC, csuch that,

P

j:|λjγj|>(logN)Clog logNh

min min(N, M)j, ji−1/3

N−2/3

6Cexp

(logN)clog logN

forN large enough. 2. The delocalization of the eigenvectors of the matrixXX uni- formly both at the edge and the bulk. 3. Bulk universality, i.e., n-point correlation functions of the eigenvalues of the sample covariance matrixXX coincide with those of the Wishart ensemble, whenN goes to infinity. 4. Universality of the eigenvalues of the sample covariance matrixXX atbothedges of the spectrum. Furthermore the first two results are applicable even in the case in which the entries of the column vectors ofX are not independent but satisfy a certain large deviation principle. All our results hold for both real and complex valued entries.

1. Introduction. Covariance matrices are fundamental objects in modern multivariate statistics where the advance of technology has lead to high dimensional data. They have manifold applications in various applied fields; see [2,13,14,15] for an extensive account on statistical applications, [12,16] for applications in economics and [17] in population genetics to name a few. Except in special cases (under specific assumptions on the distributions of the entries of the covariance matrix such as Gaussian), the exact asymptotic distribution of the eigenvalues is not known. In this context, akin to the central limit theorem, the phenomenon

AMS 2000 subject classifications:15B52, 82B44

Keywords and phrases:Covariance Matrix, Marcenko Pastur law, Universality, Dyson Brownian motion.

1

(2)

of universality helps us to obtain the asymptotic distribution of the eigenvalues, without hav- ing restrictive assumptions on the distribution on the entries. Borrowing a physical analogy, the key observation is that the eigenvalue gap distribution for a large complicated system is universal in the sense that it depends only on the symmetry class of the physical system but not on other detailed structures.

The covariance matrix formed by i.i.d standard Gaussian entries is the well studied Wishart matrix for which one has closed form expressions for many objects of interest including the joint distribution of the eigenvalues. Furthermore the empirical spectrum of the Wishart matrix converges to the Marcenko-Pastur law. In this paper we prove the universality of covariance matrices (both at the bulk and at the edges) under the assumption that the matrix entries are independent, have mean 0, variance 1 and have a sub-exponential tail decay. This implies that, asymptotically the distribution of the local statistics of eigenvalues of the covariance matrices of the above kind are identical to those of the Wishart matrix.

Over the past two decades, great progress have been made in proving the universality properties of i.i.d. matrix elements (Standard Wigner ensembles) (see [9] and the references there in). However results regarding universality for covariance matrices have been obtained only recently [1,8,18,19,20,21]. Moreover these results are obtained under strong assump- tions; for example in the “four moment theorem” of [23, 24], universality results are proved under the assumption that the first four moments of the matrix elements are equal to those of the standard Gaussian. In [8] the authors prove bulk unversality of covariance matrices under the assumption that distribution of matrix elements have a smooth density. These results, although quite interesting, exclude many important cases including the Bernoulli ensembles. On the other hand, we don’t require the smoothness of the distribution of the matrix entries and only need the first two moments to be identical to those of the standard Gaussian. Furthermore, some of our results are applicable even in situations where the en- tries in same column are not independent, but satisfy a certain large deviation bound as explained below. Of course, we do require an exponential tail decay condition for the matrix entries. However in the future work, all of our results will be proved with the tail condition replaced by a uniform bound on pth moment of the matrix elements (say p= 5 or 7), by the methods in [5].

The approach we take in this paper to prove universality is the one developed in a recent series of papers [4, 5, 6, 7, 8, 9, 10, 11]. The first step is to derive a strong local Marcenko- Pastur law, a precise estimate of the local eigenvalue density, which is our key technical tool for proving universality. En route to this, we also obtain precise bounds on the matrix elements of the corresponding Green function. For proving bulk universality of eigenvalues, the next step is to embed the Covariance matrix into a stochastic flow of matrices and so that the eigenvalues evolve according to a distinguished coupled system of stochastic differential equations, called the Dyson Brownian motion [3]. The central idea in the papers mentioned

(3)

above is to estimate the time to local equilibrium for the Dyson Brownian motion with the introduction of a new stochastic flow, the local relaxation flow, which locally behaves like a Dyson Brownian motion but has a faster decay to global equilibrium. This approach [6,8] entirely eliminates the usage of explicit formulas and it provides a unified proof for the universality. For proving edge universality of eigenvalues, we apply a “moment comparison”

method based on the Green function, which is similar to the “four moment theorem” of [20, 21]. This idea has been recently used in [11] for proving edge universality of Wigner matrices.

More precisely, letX = (xij) be anM ×N matrix with independent centered real valued entries of variance M1:

xij =M1/2qij, Eqij = 0, Eqij2 = 1. (1.1) Furthermore, the entries qij have a sub-exponential decay, i.e., there exists a constantϑ >0 such that foru >1,

P(|qij|> u)6ϑ1exp(−uϑ). (1.2) Notice that all our constants may depend on θ, but we will not denote this dependence.

Define the Green function of XX by Gij(z) =

1 XX−z

ij

, z =E+iη, E ∈R, η >0. (1.3) The Stieltjes transform of the empirical eigenvalue distribution of XX is given by

m(z) := 1 N

X

j

Gjj(z) = 1

N Tr 1

XX−z. (1.4)

We will be working the regime

d=dN =N/M, lim

N→∞d6= 1. Define

λ± :=

1±√ d2

. (1.5)

The Marchenko-Pastur (henceforth abbreviated by MP) law is given by:

̺W(x) = 1 2πd

s(λ+−x)(x−λ)

+

x2 . (1.6)

(4)

We define mW(z),z ∈C, as the Stieltjes transform of ̺W, i.e., mW(z) =

Z

R

̺W(x)

(x−z)dx . (1.7)

The function mW depends on d and has the closed form solution

(1.8) mW(z) = 1−d−z+ip

(z−λ)(λ+−z)

2d z ,

where √ denotes the square root on complex plane whose branch cut is the negative real line. One can check that mW(z) is the unique solution of

mW(z) + 1

z−(1−d) +z d mW(z) = 0,

with ℑmW(z)>0 whenℑz >0. Define the normalized empirical counting function by n(E) := 1

N#{λj >E}. (1.9)

Let

nW(E) :=

Z

E

ρW(x)dx (1.10)

so that 1−nW(·) is the distribution function of the MP law.

By the singular value decomposition ofX, there exist orthonormal bases{u1,u2. . . ,uM} ∈ CM and {v1, . . . ,vN} ∈RN such that

X = XM

α=1

αuαvα = XN

α=1

αuαvα, (1.11)

whereλ12. . . λmax{M,N} >0,λα = 0 for min{N, M}+ 16α6max{N, M} and v

¯α = 0, if α > N and u

¯α = 0, for α > M. We also define the classical location of the eigenvalues with ρW as follows

Z λ+

γj

̺W(x)dx= Z +

γj

̺W(x)dx=j/N . (1.12)

Define the parameter

ϕ := (logN)log logN. (1.13)

(5)

For ζ >0, define the set

(1.14) S

¯(ζ) :=

z ∈C : 1d>1/5)6E 65λ+, ϕζN−1 6η610(1 +d) . Note thatmW ∼O(1) in S

¯(0).

Definition1.1 (High probability events). Let ζ >0. We say that an eventΩholds with ζ-high probability if there exists a constant C >0 such that

(1.15) P(Ωc) 6 NCexp(−ϕζ)

for large enough N.

Our goal is to estimate the following quantities Λd:= max

k |Gkk−mW|, Λo := max

k6=ℓ |Gkℓ|, Λ :=|m−mW|, (1.16) where the subscripts refer to “diagonal” and “off-diagonal” matrix elements. All these quan- tities depend on the spectral parameter z and on N but for simplicity we suppress this in the notation.

The following is the main result of this paper:

Theorem 1.2 (Strong local Marchenko-Pastur law). Let X = [xij] with the entries xij

satisfying (1.1) and (1.2). For any ζ > 0 there exists a constant Cζ such that the following events hold with ζ-high probability.

(i) The Stieltjes transform of the empirical eigenvalue distribution of H satisfies

\

zS

¯(Cζ)

Λ(z)6ϕCζ 1 Nη

. (1.17)

(ii) The individual matrix elements of the Green function satisfy that

\

zS

¯(Cζ)

Λo(z) + ΛdCζ

sℑmW(z) Nη + 1

! . (1.18)

(iii) The smallest non zero and largest eigenvalue of XX satisfy λ−N−2/3ϕCζ 6 min

j6min{M,N}λj 6max

j λj++N−2/3ϕCζ . (1.19) (iv) Delocalization of the eigenvectors of XX:

α:λmaxα6=0kvαkCζN1/2 . (1.20)

(6)

The main theorem above is then used to the following results:

Theorem1.3 (Rigidity of the eigenvalues of covariance matrix). Recallγj in (1.12). Let X = [xij] with the entries xij satisfying (1.1) and (1.2). For any 16j 6N, let

ej = minn

min{N, M} −j, jo .

For any ζ >0 there exists a constant Cζ such that

j −γj|6ϕCζN−2/3ej−1/3 (1.21) and

|n(E)−nW(E)|6ϕCζN−1 (1.22) hold with ζ-high probability for any 16j 6N.

The above two results are stated under the assumption that the matrix entries are indepen- dent. The independence assumption (of the elements in each column vector of X) required in Theorems 1.2 and 1.3 may be replaced with the following large deviation criteria.

Let us first recall the following large deviation lemma for independent random variables (see [9], Appendix B for a proof).

Lemma 1.4. (Large Deviation Lemma) Supposeai be independent, mean0complex vari- ables, with E|ai|2 = σ2 and have a sub-exponential decay as in (1.2). Then there exists a constant ρ≡ρ(ϑ)>1 such that, for any ζ >0 and for any Ai ∈C and Bij ∈C, the bounds

XM

i=1

aiAi 6(logM)ρζσkAk (1.23)

| XM

i=1

¯

aiBiiai− XM

i=1

σ2Bii|6(logM)ρζσ2( XM

i=1

|Bii|2)1/2 (1.24)

|X

i6=j

¯

aiBijaj|6(logM)ρζσ2(X

i6=j

|Bij|2)1/2 (1.25) hold with ζ-high probability.

Next we extend Theorems 1.2 and 1.3 by relaxing the independence assumption:

(7)

Theorem 1.5. Let X = (xij) be a random matrix with the entries satisfying (1.1) and assume that the column vectors of the matrix X are mutually independent. Furthermore, suppose that for any fixed j 6N, the random variables defined byai =xij,16i6M satisfy the large deviation bounds (1.23), (1.24) and (1.25), for any Ai ∈ C and Bij ∈ C and for any ζ >0. Then the conclusions of Theorem 1.2 and 1.3 hold for the random matrix X.

Thus the above Theorem extends the universality results to a large class of matrix en- sembles. For instance, let hij be a sequence of i.i.d random variables from a symmetric distribution and set

xij = hij

qPM i=1h2ij

, 16 i6M,16j 6N . (1.26)

Thus the entries of the column vector (x1j, x2j,· · ·, xM j) are not independent, but ex- changable. Clearly E(xij) = 0,E(x2ij) = M1 . The random variables xij given by (1.26), are called self normalized sums and arise in various statistical applications.

Proof of Theorem (1.5): Actually in the proof of Theorem 1.2 and 1.3, we only use the large deviation properties of ai = xij instead of independence and sub-expotential decay.

Therefore, the proof of Theorem 1.2 and 1.3 is already enough for Theorem (1.5).

Theorem1.6 (Universality of eigenvalues in Bulk). LetXv = [xvij] with the independent entries satisfying (1.1) and (1.2), as so Xw. Let E ∈ [λ +c, λ+−c] with some c > 0.

Then for any ε >0, N1+ε < b < c/2, any integer n > 1 and for any compactly supported continuous test function O:Rn →R we have

Nlim→∞

Z E+b

Eb

dE 2b

Z

Rn

O(α1, . . . , αn)

p(n)vN −p(n)w,N

E+ α1

W(E), . . . , E+ αnW(E)

Y

i

i ρW(E) = 0 (1.27)

where p(n)vN and p(n)w,N, are the n-points correlation functions of the eigenvalues of (Xv)Xv and (Xw)Xw, respectively.

Theorem 1.7 (Universality of extreme eigenvalues). Let Xv = [xvij] with independent entries satisfying (1.1) and (1.2), as so Xw. Then there is an ε >0and δ >0 such that for any real number s (which may depend on N) we have

Pv(N2/3N −λ+)6s−N−ε)−N−δ 6Pw(N2/3N −λ+)6s)6Pv(N2/3N −λ+)6s+N−ε) +N−δ

(8)

(1.28) for N > N0 sufficiently large, where N0 is independent of s. Analogous result hold for the smallest eigenvalue λ1.

Theorem 1.7 can be extended to finite correlation functions of extreme eigenvalues. For example, we have the following extension to (1.28):

Pv

N2/3N −λ+)6s1−Nε, . . . , N2/3Nk−λ+)6sk+1−Nε

−Nδ 6Pw

N2/3N −λ+)6s1, . . . , N2/3N−k−λ+)6sk+1 (1.29)

6Pv

N2/3N −λ+)6s1+N−ε, . . . , N2/3Nk−λ+)6sk+1+N−ε

+N−δ for all k fixed and N sufficiently large. The proof of (1.29) is similar to that of (1.28) and we will not provide details except stating the general form of the Green function comparison theorem (Theorem 6.4) needed in this case. We remark that edge universality is usually formulated in terms of joint distributions of edge eigenvalues in the form (1.29) with fixed parameters s1, s2, . . . etc. Our result holds uniformly in these parameters, i.e., they may depend onN. However, the interesting regime is|sj|6ϕO(1), otherwise the rigidity estimate (1.21) gives a stronger control than (1.29).

The rest of the paper is organized as follows. In Sections 2-4 we establish the strong version of the Marcenko-Pastur law, rigidity and delocalization of eigenvalues. In Section 6-7, we respectively prove the bulk and edge universality results.

2. Apriori bound for the strong local Marcenko-Pastur law. We first prove a weaker form of Theorem 1.2, and in Section 4we will use this apriori bound to obtain the stronger form as claimed in Theorem 1.2.

Theorem 2.1. Let X = [xij] with the entries xij satisfying (1.1) and (1.2). For any ζ >0 there exists a constant Cζ such that the following events hold with ζ-high probability.

\

zS

¯(Cζ)

Λd(z) + Λo(z)6ϕCζ 1 (Nη)1/4

(2.1)

Before proceeding, let us introduce some notations. Define

H :=XX, G(z) := (H−z)1 = (XX−z)1, m(z) := 1

N TrG(z) (2.2) G(z) := (XX−z)1 .

(9)

We know that the non-zero eigenvalues ofXXand XX are identical andXX hasM−N more (or N −M less) zero eigenvalues. We then have the identity

TrG(z)−TrG(z) = M −N

z . (2.3)

We shall often need to consider minors of X, which are the content of the following definition.

Definition 2.2 (Minors). Let T⊂ {1, . . . , N}. Then we define X(T) as the (M ×(N −

|T|))minor of X obtained by removing all columns ofX indexed byi∈T. Note that we keep the names of indices of X when defining X(T).

(X(T))ij := 1

¯(j /∈T)Xij. The quantities G(T)(z), G(T)(z), λ(αT), u

¯

(T) α , v

¯

(T)

α etc. are defined in the obvious way using X(T). Furthermore, we write abbreviate (i) = ({i}) as well as (iT) = ({i} ∪T). We also set

m(T)(z) := 1 N

X

i /T

G(iiT)(z). (2.4)

We denote by xi as the i-th column of X, which is a M×1 vector.

2.1. Preliminary Lemmas. We start with the following elementary lemma whose proof is standard:

Lemma 2.3. For any rectangular matrix M, and partition matrices, A, B and D of M given by M =

A B B D

, we have the following identity

M−1 =

G−1 −G−1BD−1

−D1BG1 D1+D1BG1BD1

, G=A−BD−1B . Lemma 2.4. For any z not in the spectrum of XX,

X(XX−z)1X=I+z(XX−z)1

Proof. Indeed from the SVD decomposition given in (1.11) we have X(XX−z)1X=X

α

λα

λα−zu

¯αu

¯α

=X

α

(1 + z λα−z)u

¯αu

¯α

=I+z(XX−z)1 and the lemma is proved.

(10)

The next lemma collects the main identities of the resolvent matrix elementsG(ijT)andGij(T)(z).

Lemma 2.5 (Resolvent identities).

Gii(z) = 1

−z−zhxi,G(i)(z)xii, i.e., hxi,G(i)(z)xii= −1

z Gii(z) −1, (2.5) Gij(z) =z Gii(z)G(i)jj(z)hxi,G(ij)(z)xji, i6=j (2.6) Gij(z) = G(k)ij (z) + Gik(z)Gkj(z)

Gkk(z) , i, j 6=k . (2.7)

Proof. First we show (2.5) with i = 1. Let a = x1 and B = (X(1)). We have X = a

B

, so that

XX−z =

aa−z aB Ba BB−z

.

By Lemmas 2.3 and 2.4

G11(z) = 1

aa−z−aB(BB−z)1Ba

!

11

= 1

aa−z−a(1 +z(BB−z)1)a

= 1

−z−z a(BB−z)−1a . (2.8)

On the other hand, we have hx

¯1,G(1)(z) x

¯1i=a(BB−z)−1a

which together with (2.8) implies (2.5). Next we prove (2.6). From Lemma 3.2 of [11] and H =XX, we have the identity

Gij =−GiiG(i)jj hij − X

k,l6=i,j

hikG(ijkl)hlj

!

(2.9) i.e.,

Gij(z) =−Gii(z)G(i)jj(z) (hij −Zij), Zij = x

¯

iX(ij)G(ij)X(ij)x

¯j = x

¯

i(I+zG(ij)) x

¯j(2.10), where the last equality follows from an application of Lemma 2.4. Now (2.6) follows from (2.10). Finally (2.7) is proved in Lemma 3.2 of [11].

(11)

Set

κ:= min(|λ+−E|,|E−λ|). (2.11) Lemma 2.6 (Properties of mW). Based on the definition of mW, for z ∈ S(0), (see (1.14)) we have the following bounds:

|mW(z)| ∼1, |1−m2W(z)| ∼√

κ+η (2.12)

ℑmW(z)∼



η

κ+η if κ>η and |E|∈/ [λ, λ+]

√κ+η if κ6η or |E| ∈[λ, λ+].

(2.13)

where A ∼B denotes C1B 6A6CB for some constants C. Furthermore ℑmW

Nη >O(1

N) and ∂ηℑmW

η 60 (2.14)

Forz ∈S

¯(0), define the event

B¯(z) :=n

Λo(z) + Λd(z)>(logN)−1o

. (2.15)

Lemma 2.7 (Rough bounds of Λ(oT) and Λ(dT)). Fix T ⊂ {1,2,· · · , N}. For z ∈ S(0), there exists a constant C =CT such that the following estimates hold in B

¯

c:

maxk /T |G(kkT)−Gkk|6CΛ2o (2.16) 1

C 6|G(kkT)|6C (2.17)

Λ(oT) 6CΛo (2.18)

Proof. For T= ∅, (2.16) , (2.18) follow from definition, (2.17) follows the definition of B¯(z) and mW ∼1 in (2.12). For nonempty T, one can prove the lemma using an induction on|T|. For example, for |T|= 1, using (2.7) we can show that

|Gkk(z)−G(kkT)(z)|6CΛ2o , (2.19) which implies the bound (2.16). A similar argument will yield (2.17), (2.18).

(12)

On the other hand, in the case of η = O(1), similar result of (2.17) holds without the assumption of B

¯

c.

Lemma 2.8 (Rough bounds for Gkk in large η case). For any z ∈ S(0) and η = O(1), we have the bound

|Gii(z)| 6C , for some C >0.

Proof: By definition

|Gii|=

X

α

uα(i)uα(i) λα−z

6 1

η X

α

uα(i)uα(i)6 1 η 6C

where we used |λα−z|>ℑz=η.

Define the quantity

Ψ :=

sℑmW + Λ

Nη . (2.20)

and

Zi :=zhx

¯i,G(i)x

¯ii − z

M TrG(i) . (2.21)

Remark 2.9. Note that if mW 6O(1) and Λ6O(1), then

Ψ6O(Nη)1/2 . (2.22)

We now identify the “bad sets” (improbable events) and show that they indeed have small probability. Define, for fixed z, the events

o(z, K) :=n

Λo(z)>KΨ(z)o

(2.23) Ωd(z, K) :=n

maxij |Gii(z)−m(z)|>KΨ(z)o . Lemma 2.10. Let Ω(z, K)c be the good set where

Ω(z, K) := Ωd(z, K)∪Ωo(z, K) (2.24)

(13)

and

Γ(z, K) = Ω(z, K)c∪B

¯(z) For any ζ >0 there exists a constant Cζ such that

\

zS

¯(Cζ)

Γ(z, ϕCζ) (2.25)

holds with ζ-high probability.

Proof. We only need to prove that there exists a uniform constantCζ such that for any z ∈S

¯(Cζ) the event

Γ(z, ϕCζ) (2.26)

holds with ζ-high probability. It is clear that (2.25) follows from (2.26) and the fact that

|∂zGij|6NC , η > N−1 . (2.27) Note Γ(z, K) = (Ωco∪B)∩(Ωcd∪B). First we prove Ωco∪B holds with ζ-high probability.

Using Lemma1.4, Equation (2.6) and the fact that|G|2 =GG, we infer that there exists a constant Cζ such that with ζ-high probability,

Λo 6C|z|max

i6=j

hx

¯i,G(ij)x

¯ji

Cζ|z| N

X

kl

|Gkl(ij)|2

!1/2

Cζ|z|

N Tr|G(ij)|21/2

Cζ|z| s

ℑTrG(ij)

N2η , in B

¯

c (2.28)

where in the last step we used the identity 1ηℑ TrG(ij) = Tr|G(ij)|2. Using the identity TrG(T)(z)−TrG(T)(z) = M −N+|T|

z , (2.29)

Equation (2.16) and ℑ(z−1) = η|z|−2 we have that with ζ-high probability ΛoCζ

s

ℑmW + Λ + Λ2o

Nη + 1

N in B

¯

c .

(14)

For the above choice of Cζ, for z ∈S

¯(3Cζ), we have that with ζ-high probability ΛoCζ

sℑmW + Λ Nη + 1

N +o(Λo) in B

¯

c (2.30)

Together, with (2.14), we have Ωco∪B

¯ holds with ζ-high probability.

A similar argument using Lemma 1.4 will give

|Zi|=|z| hx

¯i,G(i)x

¯ii − 1

M TrG(i)

CζΨ, in B

¯

c (2.31)

hold withζ-high probability. Notice that maxi|Gii−m|6maxi6=j|Gii−Gjj|. From (2.5) we obtain

|Gii−Gjj|6

1

−z−zhxi,G(i)(z)xii− 1

−z−zhxj,G(j)(z)xji

6|GiiGjj|

|Zi−Zj|+|z|

M|TrG(i)−TrG(j)|

6C(ϕCζΨ + Λ2o+N1) in B

¯

c

hold with ζ-high probability, where the last inequality follows from (2.31), (2.3), (2.16) and (2.17). The lemma now follows from (2.30) and (2.22).

On the other hand, in the case of η = O(1), similar result holds without the assumption of B¯

c.

Lemma 2.11. LetΩo(z) andΩd(z)be as in (2.23). For any ζ >0, there exists a constant Cζ such that the event

\

zS

¯(0),η>1

d(z, ϕCζ)∪Ωo(z, ϕCζ)c\ {max

i |Zi|6ϕCζΨ} (2.32) holds with ζ-high probability.

Proof. With (2.27), we only need to prove (2.32) for fixed z. First we note in this case, we have ℑmW ∼1 and Λ =O(1) and therefore

Ψ∼N1/2 . (2.33)

It follows from (2.28) and Lemma 2.8 we have ΛoCζ

rℑTrG(ij)

N2CζN1/2CζΨ.

(15)

The Zi part can be proved as in (2.31), with Lemma 2.8. The Ωd part can also be proved similarly as in above proof, where for TrG(i)−TrG(j), we used

TrG(i)−TrG(j)= TrG(i)−TrG(j) =O(η)1 which follows the interlacing theorem of the eigenvalue, i.e.,

|m−m(i)|6(Nη)−1 (2.34)

2.2. Self consistent equations. In last subsection, we have obtained the bound of Λo and maxi(Gii−m) in term of mW, η and Λ in Bc. In this subsection, we will give the desired bound for Λ and show Bc holds with ζ-high probability.

First we give the bound for Λ in the case of η=O(1).

Lemma 2.12. For any ζ >0, there exists a constant Cζ such that

\

zS

¯(0),η=10(1+d)

Λ(z)6 ϕCζN1/4 (2.35)

hold with ζ-high probability.

Proof. Recall (2.33). By definition and (2.5), m(z) = 1

N X

i

Gii(z) = 1 N

X

i

1

−z−zM1 TrG(i)−Zi

. Using (2.29) and (2.34), we obtain

z 1

M TrG(i)−zd m(z) + 1−d

6CN1. (2.36)

Together with |Zi|6ϕCζΨ (see (2.32)), we have m(z) = 1

N X

i

1

1−z−d−zdm(z) +Yi

, max

i |Yi|6ϕCζΨ. Since |m|6η1, we have 1−z−d−zdm(z)>O(1), then

m(z) = 1

1−z−d−zdm(z) +O(ϕCζΨ).

which implies (2.35).

(16)

Now combining (2.35) with (2.32), we have proved that For any ζ > 0, there exists a constant Cζ such that, in the case η = 10(1 + d), (2.1) hold with ζ-high probability. It immediately implies that

\

zS

¯(0),η=10(1+d)

Bc(z) (2.37)

hold with ζ-high probability.

Now we prove (2.1) for general η. For a functionu(z), define its “deviance” to be

D(u)(z) := (u−1(z) +zd u(z))−(mW−1(z) +zd mW(z)) (2.38) Clearly, D(mW) = 0.Recall Zi from (2.21) and define

[Z] = 1 N

XN

i=1

Zi . (2.39)

Recall the set B

¯(z) from (2.15) and Γ(z) from Lemma 2.10.

Lemma 2.13. Let 16K 6(logN)1(Nη)1/2, on the set Γ(z, K) (see (2.24)),

|D(m)|6O([Z]) +O(K2Ψ2) +∞1B

¯(z)

Proof. Using (2.5), (2.16), (2.29) and the definition of mW, we have that on the set Γ(z, K)

Gii(z)1 =mW(z)1+zd[mW(z)−m(z)] +O(K2Ψ2) +O(Zi) +O(N1) in B

¯

c∩Ωc . Then

G−1ii −m−1 =D(m) +O(K2 Ψ2) +O(Zi) +O(N−1) in B

¯

c ∩Ωc (2.40) and summation over iyields

1 N

XN

i=1

(G−1ii −m−1) =D(m) +O(K2Ψ2) +O(Zi) +O(N−1) in B

¯

c ∩Ωc .

It follows from the assumptions K ≪ (Nη)1/2 ≪ Ψ that Gii−m = o(1). Expanding the left hand side and using the facts that P

i(Gii−m) = 0, XN

i=1

(Gii1−m1) = XN

i=1

Gii−m Giim = 1

m3 XN

i=1

(Gii−m)2+ XN

i=1

O((Gii−m)3

m4 ) in B

¯

c∩Ωc

(17)

Together with (2.17) and (2.23), it follows that 1

N XN

i=1

(Gii1−m1) =C KΨ2

(1 +KΨ) in B

¯

c∩Ωc (2.41)

Now the lemma follows from (2.40), (2.41) and the assumptionsK ≪(Nη)−1/2 6O(Ψ).

The two solutions m1, m2 of the equation D(m) =δ(z) for a given δ(·) are given by m1,2 = δ(z) + 1−d−z±ip

(z−λ)(λ+,δ−z)

2d z (2.42)

λ±,δ = 1 +d±2p

d−δ(z)−δ(z), |λ±,δ −λ±|=O(δ).

Lemma 2.14. Let K, L >0, such thatϕL >K2(logN)4, where L andK may depend on N. In any subset A of

\

zS(L)

Γ(z, K)∩ \

zS(L),η=10(1+d)

Bc(z) (2.43)

suppose we have the bound

|D(m)(z)| 6δ(z) +∞1B

¯(z) ∀z ∈S

¯(L)

where δ:C 7→R+ is a continuous function, decreasing in ℑz and |δ(z)|6 (logN)−8. Then for some uniform C >0

|m(z)−mW(z)|= Λ6C(logN) δ(z)

√κ+η+δ ∀z ∈S

¯(L). (2.44)

holds in A and

A⊂ \

zS(L)

Bc (2.45)

Note: The difficulty in the proof is that the bound D(m)6δ(z) only in the set B

¯ but we need to prove (2.45)

Proof. Let us first fix E and define the set IE =

η: Λo(E+iˆη) + Λd(E+iˆη)6 1

logN, ∀ηˆ>η , E+iηˆ∈S

¯(L) .

(18)

We first prove (2.44) for all z =E+iη with η∈IE. Define η1 = sup

η∈IE

n

η :δ(E+iη)>(logN)1(κ+η)o .

Sinceδ is a continuos decreasing function ofη by assumption,δ(E+iη)6(logN)−1(κ+η1) for η > η1 . Let m1 and m2 be the two solutions of the equation D(m) = δ(z) as given in (2.42). (Note that since we are in B

¯ by assumption we do have D(m) 6δ(z).) Then it can be easily verified that

|m1−m2|>C√

κ+η, η>η1 (2.46)

6C(logN)p

δ(z), η6η1.

The difficulty here is that we don’t know which of the two solutions m1, m2 is equal to m.

However for η=O(1), we claim that m=m1. With assumption, |m−mW|= Λ6Λd≪1.

Also a direct calculation using (2.42) gives

|m1−mW|=C δ(z)

√κ+η ≪ 1

logN . (2.47)

Since|m1−m2|>C√

κ+ηforη =O(1)(see (2.46)), it immediately follows thatm=m1 for η=O(1). Furthermore since the functionsm1, m2 andm are continuous and sincem1 6=m2, it follows that m =m1 forη >η1. Thus η>η1,

|m(z)−mW(z)|=|m1(z)−mW(z)|6C δ(z)

√κ+η 6C δ(z)

√κ+η+δ where in the last step we have used δ 6κ+η.

Forη 6η1, we take advantage of the fact that the difference |m1−m2| is the same order as in (2.47). Indeed, forη 6η1, if m =m2 (say), then

|m−mW|6|m2−m1|+|m1−mW|6(logN)p

δ(z)6C(logN) δ(z)

√κ+η+δ verifying (2.44) forη∈IE.

Now we prove that IE equals to the desired region [ϕLN−1,5], i.e. (2.45). We argue by contradiction. If not, let η0 = infIE, with continuity, we have

Λo(z0) + Λd(z0) = (logN)1 , z0 =E+iη0 (2.48) and thus Λ(z0)6Λd(z0)6(logN)1. On the other hand, with above result, i.e, (2.44) holds for η∈IE, we have

Λ(z0)6(logN)−3 (2.49)

(19)

By definition

Λo(z0) + Λd(z0) = (logN)1 ∩Γ(z0) = (Ωo(z0)∪Ωd(z0))c , and therefore

Λo(z0) + max

k |Gkk(z0)−m(z0)| 6CKΨ(z0). With assumptions ϕL > K2(logN)4, we have Ψ(z0) 6

qmW

N η + Λ(zN η0) ≪ K1(logN)2 which immediately implies that Λo(z0) + maxk|Gkk(z0)−m(z0)| ≪ (logN)−1. Using this estimate and (2.49) we deduce that

Λo(z0) + Λd(z0)6Λo(z0) + max

k |Gkk(z0)−m(z0)|+ Λ≪logN1 which contradicts (2.48) and concludes the proof of the lemma.

Proof of Lemma 2.1: Now we complete the proof of Lemma 2.1. It follows from (2.31), Lemma 2.10 and 2.13 that for any ζ >0, there exists Cζ, Dζ and Ceζ that

\

zS

¯(Cζ)

|D(m)(z)|6ϕCeζΨ +∞1B

¯(z) holds on

\

zS

¯(Cζ)

Γ(z, ϕDζ) (2.50)

which is with ζ-high probability. Choosing largerCζ, applying Lemma 2.14with choosing A being (2.43), with (2.14), we obtain that for some Cζ,

Λ(z)6ϕCζΨ1/2, ∀z ∈S

¯(Cζ) (2.51)

holds on (2.43). Using (2.50) and (2.35), we obtain that for any ζ >0, there exists Cζ such that (2.51) holds with ζ-high probability. Furthermore, (2.45) implies

\

zS

¯(Cζ) Bc(z)

is with ζ-high probability. Together with (2.25), (2.51), we obtain (2.1) and complete the proof of Lemma 2.1.

(20)

3. Strong bound on[Z]. For proving Theorem1.2and 1.3, the key input is the following lemma which gives a much stronger bound of [Z]. The following is the main result of this section:

Lemma 3.1. Let K, L >0, such that ϕL>K2(logN)4. Suppose for some event Ξ⊂ ∩zS

¯(L)

(Γ(z, K)∩Bc(z)), we have

Λ(z)6Λ(z),˜ ∀z ∈S

¯(L)

with some deterministic number Λ(z)˜ and P(Ξc)6ep(logN)2 where p depends on N and

1≪p≪(logNK)1ϕL/2. (3.1)

Then there exists a subset Ξ of Ξ such that P(Ξ\Ξ)6ep and for anyz ∈S

¯(L), z[Z]

6Cp5K2Ψe2, Ψ :=e s

ℑmW + ˜Λ

Nη , in Ξ (3.2)

Note: In the application of this lemma, pN and K = O(ϕO(1)). First, we are going to introduce the abstract Z lemma, which is similar to Theorem 5.6 of [4]. Also see [11] for a similar lemma for generalized Wigner matrices.

Theorem 3.2 (Abstract decoupling lemma). Let I be finite set which may depend on N and

Ii ⊂ I, 16i6N

Let Z1, . . . ,ZN be random variables which depend on the independent random variables {xα, α ∈ I}. Let Ei denote the expectation value respect to {xα, α∈ Ii} and IEi = 1−Ei . Define the commuting projection operators

Qi =IEi, Pi =Ei, Pi2 =Pi, Q2i =Qi, [Qi, Pj] = [Pi, Pj] = [Qi, Qj] = 0 and for A⊂ {1,2, . . . , N}

QA:=Y

iA

Qi, PA:=Y

iA

Pi

We use the notation

[QZ] = 1 N

XN

i=1

QiZi.

Let Ξ be an event and p an even integer. Suppose following assumptions hold with some constants C0, c0 >0.

(21)

(i) (Bound on QAZi in Ξ). There exist deterministic positive numbersX <1 andY such that for any set A ⊂ {1,2, . . . , N} with i ∈A and |A|6 p, QAZi in Ξ can be written as the sum of two new random variables

1(Ξ)(QAZi) = Zi,A+1(Ξ)QA1(Ξc)Zei,A (3.3) and

|Zi,A|6Y C0X|A||A|

, |Zei,A|6Y C0|A|NC0 (3.4) (ii) (Rough bound on Zi).

maxi |Zi| 6 Y NC0. (3.5)

(iii) (Ξ has high probability).

P[Ξc] 6 e−c0(logN)3/2p. (3.6) Then, under the assumptions (i) – (iii), we have

E

"

1(Ξ)[QZ]p

#

6(Cp)4p

X2+N1p

Yp (3.7)

for some C >0 and any sufficiently large N.

Before we give the proof, we introduce a trivial but useful identity Yn

i=1

(xi +yi) = Xn+1

s=1

" s−1 Y

i=1

xi

!Y

i=s

yi(=s)

Yn

i=s+1

(xi+yi)

!#

(3.8) with the convention that Q

i∈∅ = 1. It implies that

Yn

i=1

(xi+yi)− Yn

i=1

(xi)

6nmax

i |yi|

maxi |xi+yi|+ max

i |xi| For any 16k 6n, it follows from Qn

i=1(xi+yi) = (xk+yk)Q

i6=k(xi +yi) and (3.8) that Yn

i=1

(xi+yi) = Xn

s6=k,s=1

(xk+yk)

" s−1 Y

i6=k,i=1

xi

!Y

i=s

yi(=s)

Yn

i6=k,i=s+1

(xi+yi)

!#

(3.9)

(22)

Proof of Lemma 3.2 First, by definition, we have E

"

1(Ξ)[QZ]p

#

= 1 Np

X

j1,...,jp

E1 ΞYp

α=1

QjαZjα

(3.10)

For fixed j1, . . . , jp, let Tα =QjαZjα. Now using (3.9) with choosing k = 1, xi,= Pj1Ti and yi =Qj1Ti in (3.9) (Note: here xi+yi =Ti), we have

Yp

α=1

Tα= Xp+1

s=2

T1

"

Y

α<s,α6=1

Pj1Tα

!

(Qj1Ts) Y

α>s,α6=1

Tα

!#

We define Aα,s :=1{α<s,α6=1}{j1} and Bα,s:=1α=s{j1} , i.e., Bα,s ={j1} if α=s otherwise Aα,s =∅. It is clear that A1,s =B1,s =∅. Then

Yp

α=1

Tα = Xp+1

s=2

Y

α

PAα,sQBα,sTα

For generalization, we replace s with s1 and write it as Yp

α=1

Tα = Xp+1

s1=1

1(s1 6= 1)Y

α

PAα,s1QBα,s1Tα

and

Aα,s1 ={j1 : α < s1, α 6= 1}, Bα,s1 ={j1 : s1 =α} (3.11) Iterating for 16j1, j2, . . . , jp 6N, we have

Yp

α=1

Tα=

Xp+1

s1,s2,...,sp=1

Y

i

1(si 6=i)Y

α

PAα,sQBα,sTα,

where s denotess1, s2· · · , sp and Aα,s and Bα,s are defined as

Aα,s ={ji : α < si, α6=i}, Bα,s ={ji : si =α} Then

E1 ΞYp

α=1

QjαZjα

6(2p)pmax

s

Y

i

1(si 6=i)

E1 Ξ Y

α

PAα,sQBα,sTα

.

(23)

Now to prove (3.7), it only remains to show that for any{j1, . . . , jp}and s={s1, s2,· · · , sp} such that si 6=i, we have

E1 Ξ Y

α

PAα,sQBα,sTα

6(Cp)2pYpX2t, t:=|{j1, . . . , jp}| (3.12) for simplicity, we denote Aα,s and Bα,s by Aα and Bα and similarly we denote the charac- teristic function1(Ξ) by Ξ, i.e., we need to prove

E Ξ Y

α

PAαQBαTα

6(Cp)2pYpX2t, t :=|{j1, . . . , jp}| (3.13) Since T1 =Qj1T1 and P and Q’s are communitive, we have

E Ξ Y

α

PAαQBαTα=E(Qj1PA1QB1T1) Ξ Yp

α=2

(PAαQBα)Tα

!

(3.14) Recall I and Ii in assumption. For α = 2, . . . , p, if j1 ∈ ∩α6=1Aα, then (PAαQBα)Tjα are independent ofIj1, i.e., anyxt, t∈ Ij1. For any functionf independent ofIj1, sinceQ=Qj1

is an projection operator we have

|E(Qg)(Ξf)|=|E(Qg)(QΞ)f|6k(Qg)fk2kQΞk2 =k(Qg)fk2

pE|QΞc|2 6k(Qg)fk2

pP(Ξc) where we have used the Schwarz inequality. In our application,

Q=Qj1, f = Yp

α=2

(PAαQBα)Tα, g =PA1QB1T1

Since Pi and Qi’s are projections, we have

k(Qg)fk2 6(CY)pNCp

and obtain that (3.14) is bounded above by YpNCpexp[−c(logN)3/2p] with (3.6). We note that this part can be neglected in proving (3.13).

Hence we can assume that j1 ∈ ∩/ α6=1Aα, i.e., 1 < s1 6 p, (see (3.11)) i.e., j1 ∈ ∪α6=1Bα. Similarly for ji, we have ji ∈ ∪α6=iBα here i= 2, . . . p. Recall that jα ∈/ Bα. With these two conditions, Bα’s satisfy the inequality

p+t>X

α

|Bα∪ {jα}|>2t, t:=|{j1, . . . , jp}| (3.15)

Références

Documents relatifs

Based on these evidentiary support and motivated by the fact that the eigenmatrix of Wishart matrix is Haar (uniformly) distributed, we believe that the eigenmatrix of a

Moreover re- marking some strong concentration inequalities satisfied by these matrices we are able, using the ideas developed in the proof of the main theorem, to have some

The recent references [BFG + 15] and [FBD15] provide asymptotic results for maximum likelihood approaches, and require that the smallest eigenvalue of the covariance matrix of

In particu- lar, if the multiple integrals are of the same order and this order is at most 4, we prove that two random variables in the same Wiener chaos either admit a joint

The contribution to the expectation (12) of paths with no 1-edges has been evaluated in Proposition 2.2. This section is devoted to the estimation of the contribution from edge paths

Annales de l’lnstitut Henri Poincare - Physique théorique.. Our demonstration of universality was done explicitly for the density of complex eigenvalues of matrices with

Following the pioneering work of [17] and [4,5] have shown universality of local eigenvalue statistics in the bulk of the spectrum for complex sample covariance matrices when

We measure the critical scattering length for the appearance of the first three-body bound state, or Efimov three-body parameter, at seven different Feshbach resonances in ultracold