Estimating the covariance of random matrices

(1)

HAL Id: hal-00811808

https://hal-upec-upem.archives-ouvertes.fr/hal-00811808

Preprint submitted on 11 Apr 2013

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Estimating the covariance of random matrices

Pierre Youssef

To cite this version:

Pierre Youssef. Estimating the covariance of random matrices. 2013. �hal-00811808�

(2)

PIERRE YOUSSEF

ABSTRACT. We extend to the matrix setting a recent result of Srivastava-Vershynin [21] about estimating the covariance matrix of a random vector. The result can be in- terpreted as a quantified version of the law of large numbers for positive semi-definite matrices which verify some regularity assumption. Beside giving examples, we discuss the notion of log-concave matrices and give estimates on the smallest and largest eigenvalues of a sum of such matrices.

1. I

NTRODUCTION

In recent years, interest in matrix valued random variables gained momentum. Many of the results dealing with real random variables and random vectors were extended to cover random matrices. Concentration inequalities like Bernstein, Hoeffding and others were obtained in the non-commutative setting ( [5],[22], [14]). The methods used were mostly combination of methods from the real/vector case and some matrix inequalities like the Golden-Thompson inequality (see [8]).

Estimating the covariance matrix of a random vector has gained a lot of interest recently. Given a random vector X in R

ⁿ

, the question is to estimate Σ = E XX

^t

. A natural way to do this is to take X

₁

, .., X

_N

independent copies of X and try to approximate Σ with the sample covariance matrix Σ

_N

=

_N¹ ^P_i

X

_i

X

_i^t

. The challenging problem is to find the minimal number of samples needed to estimate Σ. It is known using a result of Rudelson (see [19]) that for general distributions supported on the sphere of radius √

n, it suffices to take cn log(n) samples. But for many distributions, a number proportional to n is sufficient. Using standard arguments, one can verify this for gauss- ian vectors. It was conjectured by Kannan- Lovasz- Simonovits [11] that the same result holds for log-concave distributions. This problem was solved by Adamczak et al ([3], [4]). Recently, Srivatava-Vershynin proved in [21] covariance estimate with a number of samples proportional to n, for a larger class of distributions covering the log-concave case. The method used was different from previous work on this field and the main idea was to randomize the sparsification theorem of Batson-Spielman-Srivastava [7].

Our aim in this paper is to adapt the work of Srivastava-Vershynin to the matrix setting replacing the vector X in the problem of the covariance matrix by an n × m random matrix A and try to estimate E AA

^t

by the same techniques. This will be possible since in the deterministic setting, the sparsification theorem of Batson-Spielman-Srivastava [7]

1

(3)

has been extended to a matrix setting by De Carli Silva-Harvey-Sato [9] who precisely proved the following:

Theorem 1.1. Let B

₁

, . . . , B

_m

be positive semi-definite matrices of size n × n and arbitrary rank. Set B :=

^P_i

B

_i

. For any ε ∈ (0, 1), there is a deterministic algorithm to construct a vector y ∈ R

^m

with O(n/ε

²

) nonzero entries such that y ≥ 0 and

B

^X

i

y

i

B

i

(1 + ε)B.

For an n × n matrix A, denote by kAk the operator norm of A seen as an operator on l

ⁿ₂

. The main idea is to randomize the previous result using the techniques of Srivastava- Vershynin [21]. Our problem can be formulated as follows:

Take B a positive semi-definite random matrix of size n × n. How many independent copies of B are needed to approximate E B i.e taking B

₁

, .., B

_N

independent copies of B, what is the minimal number of samples needed to make

_N¹ ^P_i

B

_i

− E B

very small.

One can view this as an analogue to the covariance estimate of a random vector by taking for B the matrix AA

^t

where A is an n × m random matrix. With some regularity, we will be able to take N proportional to n. However, in the general case this is no longer true. In fact, take B uniformly distributed on {ne

i

e

^t_i

}

_i6n

where e

j

denotes the canonical basis of R

ⁿ

. It is easy to verify that E B = I

_n

and

_N¹ ^P_i

B

_i

is a diagonal matrix and its diagonal coefficients are distributed as

n

N (p

₁

, .., p

_n

),

where p

_i

denotes the number of times e

_i

e

^t_i

is chosen. This problem is well- studied and it is known (see [12]) that we must take N > cn log(n). This example is essentially due to Aubrun [6]. More generally, if B is a positive semi-definite matrix such that E B = I

_n

and Tr(B) 6 n almost surely, then by Rudelson’s inequality in the non-commutative setting (see [15]) it is sufficient to take cn log(n) samples.

The method will work properly for a class of matrices satisfying a matrix strong regularity assumption which we denote by (M SR) and can be viewed as an analog to the property (SR) defined in [21].

Definition 1.2. [Property (M SR)]

Let B be an n × n positive semi-definite random matrix such that E B = I

_n

. We will say that B satisfies (M SR) if for some η > 0 we have :

P (kP BP k > t) 6 c

t

^1+η

∀t > c.rank(P ) and ∀P orthogonal projection of R

ⁿ

.

The main result of this paper is the following:

(4)

Theorem 1.3. Let B be an n × n positive semi-definite random matrix verifying E B = I

_n

and (M SR) for some η > 0. Then for every ε < 1, taking N = C

₁

(η)

ⁿ

ε^{2+ 2}^η

we have

E

1 N

N

X

i=1

B

_i

− I

_n

6 ε where B

₁

, .., B

_N

are independent copies of B.

If X is an isotropic random vector of R

ⁿ

, put B = XX

^t

then kP BP k = kP X k

²₂

. Therefore if X verifies the property (SR) appearing in [21], then B verifies property (M SR). So applying Theorem 1.3 to B = XX

^t

, we recover the covariance estimation as stated in [21].

In order to apply our result, beside some examples, we investigate the notion of log- concave matrices in relation with the definition of log-concave vectors. Moreover re- marking some strong concentration inequalities satisfied by these matrices we are able, using the ideas developed in the proof of the main theorem, to have some results with high probability rather than only in expectation as is the case in the main result. This will be discussed in the last section of the paper.

The paper is organized as follows: in section 2, we show how to prove Theorem 1.3 using two other results (Theorem 2.1, Theorem 2.2) which we prove respectively in sections 3 and 4 using again two other results (Theorem 3.1, Theorem 4.1) whose proofs are given respectively in sections 5 and 6. In section 7, we give some applications and discuss the notion of log-concave matrices and prove some related results.

2. P

ROOF OF

T

HEOREM

1.3 We first introduce a regularity assumption on the moments which we denote by (M W R):

∃p > 1 such that E hBx, xi

^p

6 C

_p

∀x ∈ S

ⁿ⁻¹

.

Note that by a simple integration of tails, (M SR) (with P a rank one projection) implies (M W R) with p < 1 + η.

The proof of Theorem 1.3 is based on two theorems dealing with the smallest and largest eigenvalues of 1

N

X

i=1

B

_i

.

Theorem 2.1. Let B

_i

n × n independent positive semi-definite random matrices verifying E B

_i

= I

_n

and (M W R) .

Let ε < 1, then for

N > 16 (16C

_p

)

^p−1¹

n ε

2p−1 p−1

we get

E λ

min

1 N

N

X

i=1

B

i

!

> 1 − ε

C1(η) = (64c)¹⁺²^η(1 + ¹_η)²^η∨64(4c)¹^η(32 +³²_η)¹⁺^η³ ∨256(2c)³²⁺²^η(16 +¹⁶_η)⁴^η

(5)

Theorem 2.2. Let B

_i

n × n independent positive semi-definite random matrices verifying E B

_i

= I

_n

and (M SR).

Then for any N we have

E λ

_max

N

X

i=1

B

i

!

6 C(η).(n + N ) Moreover, for ε < 1 and N > C

₂

(η)

ⁿ

ε^{2+ 2}^η

we have

E λ

max

1 N

N

X

i=1

B

i

!

6 1 + ε

We will give the proof of these two theorems in sections 3 and 4 respectively. We need also a simple lemma:

Lemma 2.3. Let 1 < r 6 2 and Z

₁

, ..., Z

_N

be independent positive random variables with E Z

_i

= 1 and satisfying ( E Z

_i^r

)

¹^r

6 M Then

E

1 N

N

X

i=1

Z

i

− 1

6 2M

N

^r−1^r

.

Proof. Let (ε

_i

)

_i6N

independent ±1 Bernouilli variables. By symmetrization and Jensen’s inequality we can write

E

1 N

N

X

i=1

Z

_i

− 1

6 2 N E

N

X

i=1

ε

_i

Z

_i

6 2 N E

N

X

i=1

Z

_i²

!

1 2

6 2 N E

N

X

i=1

Z

_i^r

!

1 r

6 2 N

N

X

i=1

E Z

_i^r

!

1 r

6 2M

N

^r−1^r

Proof of Theorem 1.3. Take N > c(η)

ⁿ

ε^{2+ 2}^η

satisfying conditions of Theorem 2.1 (with p = 1 +

^η₂

) and Theorem 2.2. Note that by the triangle inequality

1 N

N

X

i=1

B

_i

− I

_n

6 1 N

N

X

i=1

B

_i

− 1 n Tr 1

N

X

i=1

B

_i

!

I

_n

+

1 n Tr 1

N

X

i=1

B

_i

!

I

_n

− I

_n

:= α + β

C2(η) = 16c¹^η(32 +³²_η)¹⁺^η³ ∨16√

2(4c)³²⁺^η²(8 +⁸_η)⁴^η

(6)

Observe that α = max

λ 1 N

N

X

i=1

B

_i

− 1 n Tr 1

N

X

i=1

B

_i

!

I

_n

!

= max

"

λ

_max

1 N

N

X

i=1

B

_i

!

− 1 n Tr 1

N

X

i=1

B

_i

!

, 1 n Tr 1

N

X

i=1

B

_i

!

− λ

_min

1 N

N

X

i=1

B

_i

!#

Since the two terms in the max are non-negative, then one can bound the max by the sum of the two terms. More precisely, we get α 6 λ

_max

1 N

N

X

i=1

B

_i

!

−λ

_min

1 N

N

X

i=1

B

_i

!

and by Theorem 2.1 and Theorem 2.2 we deduce that E α 6 2ε.

Note that

β =

1 N

N

X

i=1

Tr(B

_i

) n − 1

=

1 N

N

X

i=1

Z

_i

− 1

,

where Z

_i

=

^Tr(B_nⁱ⁾

. Since B

_i

satisfies (M W R), then taking r = min(2, 1 +

^η₂

) we have

∀i 6 N , ( E Z

_i^r

)

¹^r

6 1 n

n

X

j=1

( E hB

_i

e

_j

, e

_j

i

^r

)

¹^r

6 c(η).

Therefore Z

_i

satisfy the conditions of Lemma 2.3 and we deduce that E β 6 ε by the choice of N .

As a conclusion

E

1 N

N

X

i=1

B

_i

− I

_n

6 E α + E β 6 3ε

3. P

ROOF OF

T

HEOREM

2.1 Given A an n × n positive semi-definite matrix such that all eigenvalues of A are greater than a lower barrier l

_A

= l i.e A l.I

_n

, define the corresponding potential function to be φ

_l

(A) = Tr(A − l.I

_n

)

⁻¹

.

The proof of Theorem 2.1 is based on the following result which will be proved in section 5:

Theorem 3.1. Let A l.I

_n

and φ

_l

(A) 6 φ, B a positive semi-definite random matrix satisfying E B = I

n

and Property (M W R) with some p > 1.

Let ε < 1, if

φ 6 1

4 (8C

_p

)

^p−1¹

ε

^p−1^p

then there exist l

⁰

a random variable such that

(7)

A + B l

⁰

.I

_n

, φ

_l⁰

(A + B) 6 φ

_l

(A) and E l

⁰

> l + 1 − ε.

Proof of Theorem 2.1. We start with A

₀

= 0 and l

₀

= −

ⁿ_φ

so that φ

_l₀

(A

₀

) = −

_lⁿ

0

= φ.

Applying Theorem 3.1, one can find l

₁

such that

A

₁

= A

₀

+ B

₁

l

₁

.I

_n

, φ

_l₁

(A

₁

) 6 φ

_l₀

(A

₀

) and

E l

₁

> l

₀

+ 1 − ε Now apply Theorem 3.1 once again to find l

₂

such that

A

₂

= A

₁

+ B

₂

l

₂

.I

_n

, φ

_l₂

(A

₂

) 6 φ

_l₁

(A

₁

) and

E l

₂

> l

₁

+ 1 − ε > l

₀

+ 2(1 − ε)

After N steps, we get E λ

_min

(A

_N

) > E l

_N

> l

₀

+ N (1 − ε). Therefore, E λ

_min

1 N

N

X

i=1

B

_i

!

> 1 − ε − n N φ

Taking N =

_εφⁿ

, we get E λ

_min

1 N

N

X

i=1

B

_i

!

> 1 − 2ε.

4. P

ROOF OF

T

HEOREM

2.2 Given A an n × n positive semi-definite matrix such that all eigenvalues of A are less than an upper barrier u

_A

= u i.e A ≺ u.I

_n

, define the corresponding potential function to be ψ

_u

(A) = Tr (u.I

_n

− A)

⁻¹

.

The proof of Theorem 2.2 is based on the following result which will be proved in section 6:

Theorem 4.1. Let A ≺ u.I

_n

and ψ

_u

(A) 6 ψ, B a positive semi-definite random matrix satisfying E B = I

_n

and Property (M SR).

Let ε < 1, if

ψ 6 C

₃

(η)ε

¹⁺^η²

there exists u

⁰

a random variable such that

A + B ≺ u

⁰

.I

_n

, ψ

_u⁰

(A + B) 6 ψ

_u

(A) and E u

⁰

6 u + 1 + ε.

C₃(η) =h

8(2c)^η¹(16 +¹⁶_η)¹⁺³^ηi⁻¹

∧h

16(2c)³²⁺^η²(8 +⁸_η)⁴^ηi⁻¹

(8)

Proof of Theorem 2.2. We start with A

₀

= 0, u

₀

=

ⁿ_ψ

so that ψ

_u₀

(A

₀

) = ψ.

Applying Theorem 4.1, one can find u

₁

such that

A

₁

= A

₀

+ B

₁

≺ u

₁

.I

n

, ψ

u1

(A

₁

) 6 ψ

u0

(A

₀

) and E u

₁

6 u

₀

+ 1 + ε.

Now apply Theorem 4.1 once again to find u

₂

such that

A

2

= A

1

+ B

2

≺ u

2

.I

n

, ψ

u2

(A

2

) 6 ψ

u1

(A

1

) and E u

2

6 u

1

+ 1 + ε.

After N steps we get E λ

max N

X

i=1

B

i

!

6 E u

N

6 u

0

+ N (1 + ε).

Taking N >

_εψⁿ

= c

⁰

(η)

⁻¹ ⁿ

ε^{2+ 2}^η

, we deduce that E λ

_max

1 N

N

X

i=1

B

_i

!

6 1 + 2ε

5. P

ROOF OF

T

HEOREM

3.1 5.1. Notations. We are looking for a random variable l

⁰

of the form l + δ where δ is a positive random variable playing the role of the shift.

If in addition A (l + δ).I

_n

, we will note :

L

_δ

= A − (l + δ).I

_n

0 so that Tr

B

¹²

(A − (l + δ).I

_n

)

⁻¹

B

¹²

=

^D

L

⁻¹_δ

, B

^E

.

λ

1

, .., λ

n

will denote the eigenvalues of A and v

1

, .., v

n

the corresponding eigenvectors. (v

_i

)

_i6n

are also the eigenvectors of L

⁻¹_δ

corresponding to the eigenvalues

_λ ¹

i−(l+δ)

. 5.2. Finding the shift. To find sufficient conditions for such δ exists, we need a matrix extension of Lemma 3.4 in [7] which, up to a minor change, is essentially contained in Lemma 20 in [9] and we formulate it here in Lemma 5.2. This method uses the Sherman-Morrison-Woodbury formula:

Lemma 5.1. Let E be an n × n invertible matrix, C a k × k invertible matrix, U an n × k matrix and V a k × n matrix. Then we have:

(E + U CV )

⁻¹

= E

⁻¹

− E

⁻¹

U (C

⁻¹

+ V E

⁻¹

U )

⁻¹

V E

⁻¹

Lemma 5.2. Let A as above satisfying A l.I

_n

. Suppose that one can find δ > 0 verifying δ 6

_kL¹⁻¹

0 k

and

D

L

⁻²_δ

, B

^E

φ

_l+δ

(A) − φ

_l

(A) −

B

¹²

L

⁻¹_δ

B

¹²

> 1 Then

λ

min

(A + B) > l + δ and φ

l+δ

(A + B) 6 φ

l

(A).

(9)

Proof. First note that

_kL¹−1

0 k

= λ

_min

(A) − l, so the first condition on δ implies that λ

min

(A) > l + δ.

Now using Sherman-Morrison-Woodbury formula with E = L

_δ

, U = V = B

¹²

, C = I

_n

we get :

φ

_l+δ

(A + B) = Tr (L

_δ

+ B)

⁻¹

= φ

_l+δ

(A) − Tr

L

⁻¹_δ

B

¹²

I

_n

+ B

¹²

L

⁻¹_δ

B

¹²⁻¹

B

¹²

L

⁻¹_δ

6 φ

_l+δ

(A) −

D

L

⁻²_δ

, B

^E

1 +

B

¹²

L

⁻¹_δ

B

¹²

Rearranging the hypothesis, we get φ

_l+δ

(A + B ) 6 φ

_l

(A).

Since kL

⁻¹₀

k 6 Tr

L

⁻¹₀

= φ

_l

(A) and

B

¹²

L

⁻¹_δ

B

¹²

6

^D

L

⁻¹_δ

, B

^E

then in order to satisfy conditions of Lemma 5.2, we may search for δ satisfying:

(1) δ 6 1

φ

_l

(A) and

D

L

⁻²_δ

, B

^E

φ

_l+δ

(A) − φ

_l

(A) −

^D

L

⁻¹_δ

, B

^E

> 1 For t 6

¹_φ

, let us note :

q

₁

(t, B) =

^D

L

⁻¹_t

, B

^E

= Tr

B (A − (l + t).I

_n

)

⁻¹

and

q

₂

(t, B) =

D

L

⁻²_t

, B

^E

Tr(L

⁻²_t

) = Tr (B(A − (l + t).I

_n

)

⁻²

) Tr (A − (l + t).I

_n

)

⁻²

We have already seen in Lemma 5.2 that if t 6

_φ¹

6

_kL¹⁻¹

0 k

then A (l + t).I

_n

so the definitions above make sense. Since we have :

φ

_l+δ

(A) − φ

_l

(A) = Tr(A − (l + δ).I

_n

)

⁻¹

− Tr(A − l.I

_n

)

⁻¹

= δTr((A − (l + δ).I

_n

)

⁻¹

(A − l.I

_n

)

⁻¹

) 6 δTr(A − (l + δ).I

_n

)

⁻²

In order to have (1), it will be sufficient to choose δ satisfying δ 6

_φ¹

and

(2) 1

δ q

₂

(δ, B) − q

₁

(δ, B) > 1

(10)

q

₁

and q

₂

can be expressed as follows:

q

₁

(t, B) =

n

X

i=1

hBv

_i

, v

_i

i

λ

_i

− l − t and q

₂

(t, B) =

P

i

hBv_i,vii (λi−l−t)²

P

i

(λ

_i

− l − t)

⁻²

φ

_l

(A) =

n

X

i=1

(λ

_i

− l)

⁻¹

6 φ so that (λ

_i

− l).φ > 1 for all i, then we have : (1 − t.φ)(λ

_i

− l) = λ

_i

− l − t.(λ

_i

− l).φ 6 λ

_i

− l − t 6 λ

_i

− l therefore

q

₁

(t, B) 6 (1 − t.φ)

⁻¹

q

₁

(0, B) and

(1 − t.φ)

²

q

₂

(0, B) 6 q

₂

(t, B) 6 (1 − t.φ)

⁻²

q

₂

(0, B)

Lemma 5.3. Let s ∈ (0, 1) and take δ = (1 − s)

³

q

2

(0, B)1

_{q₁_(0,B)6s}

1

_{q₂_(0,B)6^s

φ}

. Then A + B (l + δ).I

_n

and φ

_l+δ

(A + B) 6 φ

_l

(A).

Proof. As stated before, it is sufficient to check that δ 6

¹_φ

and

¹_δ

q

₂

(δ, B)−q

₁

(δ, B) > 1.

If q

₁

(0, B) > s or q

₂

(0, B) >

_φ^s

then δ = 0 and there is nothing to prove since φ

_l

(A + B) 6 φ

_l

(A).

In the other case i.e q

₁

(0, B) 6 s and q

₂

(0, B ) 6

_φ^s

, we have δ = (1 − s)

³

q

₂

(0, B).

So δ 6 (1 − s)

³_φ^s

6

_φ¹

and 1

δ q

₂

(δ, B) − q

₁

(δ, B) = 1

(1 − s)

³

q

₂

(0, B) q

₂

(δ, B) − q

₁

(δ, B)

> 1

(1 − s)

³

q

₂

(0, B) (1 − δφ)

²

q

₂

(0, B ) − (1 − δφ)

⁻¹

q

₁

(0, B)

> (1 − s)

²

(1 − s)

³

− s

(1 − s) = 1

5.3. Estimating the random shift. Now that we have found δ, we will estimate E δ using the property (M W R). We will start by stating some basic facts about q

₁

and q

₂

. Proposition 5.4. Let as above A l.I

n

and φ

l

(A) 6 φ, B satisfying (M W R). Then we have the following :

(1) E q

₁

(0, B) = φ

_l

(A) 6 φ and E q

₁

(0, B)

^p

6 C

_p

φ

^p

. (2) E q

2

(0, B) = 1 and E q

2

(0, B)

^p

6 C

p

.

(3) P (q

₁

(0, B) > u) 6 C

_p

(

^φ_u

)

^p

and P (q

₂

(0, B) > u) 6

^C_u^p^p

.

(11)

Proof. Since E B = I

_n

then E q

₁

(0, B) = φ

_l

(A) and E q

₂

(0, B) = 1.

Now using the triangle inequality and Property (M W R) we get :

( E q

1

(0, B)

^p

)

¹^p

=

"

E

n

X

i=1

hBv

_i

, v

_i

i λ

_i

− l

!p#¹_p

6

n

X

i=1

( E hBv

_i

, v

_i

i

^p

)

^p¹

λ

_i

− l 6

n

X

i=1

C

1

pp

λ

_i

− l 6 C

1

pp

φ

With the same argument we prove that E q

₂

(0, B )

^p

6 C

p

. The third part of the propo-

sition follows by Markov’s inequality.

Lemma 5.5. If δ is as in Lemma 5.3. Then

E δ > (1 − s)

³





1 − 2C

_p

φ s

!p−1



Proof. Using the above proposition and H older’s inequality with

^.. ¹_p

+

¹_q

= 1 we get : E δ = E (1 − s)

³

q

2

(0, B)1

_{q₁_(0,B)6_s}

1

_{q₂_(0,B)6^s

φ}

= (1 − s

³

)

^h

E q

₂

(0, B) − E q

₂

(0, B)1

_{q₁(0,B)>s or q2(0,B)>_φ^s}

i

> (1 − s)

³





1 − ( E q

₂

(0, B)

^p

)

¹^p

. P

(

q

₁

(0, B) > s or q

₂

(0, B) > s φ

)!¹_q



> (1 − s)

³





1 − C

1

pp

C

p

φ s

!p

+ C

p

φ s

!p!¹_q



> (1 − s)

³





1 − 2C

_p

φ s

!p−1



Now it remains to make good choice of s and φ in order to finish the prove Theo- rem 3.1. Take l

⁰

= l + δ, the choice of δ being as before with s =

₄^ε

.

As we have seen, we get A + B l

⁰

.I

_n

and φ

_l⁰

(A + B ) 6 φ

_l

(A). Moreover, E l

⁰

= l + E δ > l + (1 − s)

³





1 − 2C

_p

φ s

!p−1



> 1 − ε, by the choice of φ. This ends the proof of Theorem 3.1.

6. P

ROOF OF

T

HEOREM

4.1 6.1. Notations. We are looking for a random variable u

⁰

of the form u + ∆ where ∆ is a positive random variable playing the role of the shift.

We will note U

_t

= (u+t).I

_n

−A so that Tr

B

¹²

((u + t).I

_n

− A)

⁻¹

B

¹²

=

^D

U

_t⁻¹

, B

^E

.

(12)

As before, λ

₁

, .., λ

_n

will denote the eigenvalues of A and v

₁

, .., v

_n

the corresponding eigenvectors. (v

_i

)

_i6n

are also the eigenvectors of U

_t⁻¹

corresponding to the eigenvalues

1 u+t−λi

.

6.2. Finding the shift. To find sufficient conditions for such ∆ exists, we need a matrix extension of Lemma 3.3 in [7] which, up to a minor change, is essentially contained in Lemma 19 in [9]. For the sake of completeness, we include the proof.

Lemma 6.1. Let A as above satisfying A ≺ u.I

n

. Suppose that one can find ∆ > 0 verifying

(3)

D

U

_∆⁻²

, B

^E

ψ

_u

(A) − ψ

_u+∆

(A) +

B

¹²

U

_∆⁻¹

B

¹²

6 1 Then

A + B ≺ (u + ∆).I

_n

and ψ

_u+∆

(A + B) 6 ψ

_u

(A).

Proof. Since

^D

U

_∆⁻²

, B

^E

and ψ

_u

(A) − ψ

_u+∆

(A) are positive, then by (3) we have

B

¹²

U

_∆⁻¹

B

¹²

< 1 and

D

U

_∆⁻²

, B

^E

1 −

B

¹²

U

_∆⁻¹

B

¹²

6 ψ

_u

(A) − ψ

_u+∆

(A) First note that

B

¹²

U

_∆⁻¹

B

¹²

=

U

⁻

1 2

∆

BU

⁻

1 2

∆

< 1, so U

⁻

1 2

∆

BU

⁻

1 2

∆

≺ I

_n

. Therefore we get B ≺ U

_∆

which means that A + B ≺ (u + ∆).I

_n

.

Now using the Sherman-Morrison-Woodbury (see Lemma 5.1) with E = U

_∆

, U = V = B

¹²

, C = I

_n

we get :

ψ

u+∆

(A + B) = Tr (U

∆

− B )

⁻¹

= ψ

_u+∆

(A) + Tr

U

_∆⁻¹

B

¹²

I

_n

− B

¹²

U

_∆⁻¹

B

¹²⁻¹

B

¹²

U

_∆⁻¹

6 ψ

_u+∆

(A) +

D

U

_∆⁻²

, B

^E

1 −

U

⁻

1 2

∆

BU

⁻

1 2

∆

6 ψ

_u

(A)

We may now find ∆ satisfying (3). Let us note :

Q

1

(t, B) =

B

¹²

U

_t⁻¹

B

¹²

=

B

¹²

((u + t).I

_n

− A)

⁻¹

B

¹²

and

Q

₂

(t, B) =

D

U

_t⁻²

, B

^E

ψ

_u

(A) − ψ

_u+t

(A) = Tr

B ((u + t).I

_n

− A)

⁻²

ψ

_u

(A) − ψ

_u+t

(A)

Since Q

₁

and Q

₂

are both decreasing in t, we work with each separately. Precisely

fix θ ∈ (0, 1) and define ∆

₁

, ∆

₂

as follows :

(13)

∆

₁

the smallest positive number such that Q

₁

(∆

₁

, B ) 6 θ and

∆

₂

the smallest positive number such that Q

₂

(∆

₂

, B) 6 1 − θ

Now take ∆ = ∆

₁

+ ∆

₂

, then Q

₁

(∆, B) + Q

₂

(∆, B) 6 θ + 1 − θ = 1. So this choice of ∆ satisfies (3) and it remains now to estimate ∆

₁

and ∆

₂

separately.

6.3. Estimating ∆

₁

.

We may write Q

₁

(∆

₁

, B) =

n

X

i=1

B

¹²

v

i

v

_i^t

B

¹²

u + ∆

₁

− λ

_i

.

Put ξ

i

= B

¹²

v

i

v

_i^t

B

¹²

, µ

i

= ψ(u − λ

i

) and µ = ψ∆

₁

. Denote P

S

the orthogonal projection on (v

_i

)

_i∈S

, clearly rank(P

_S

) = |S|. Then we have :











E kξ

i

k = 1

P

X

i∈S

ξ

i

> t

!

= P (kP

S

BP

S

k > t) 6

_t^1+η^c

∀t > c |S|

n

X

i=1

1 µ

_i

= ψ

_u

(A) ψ 6 1

µ is the smallest positive number such that

n

X

i=1

ξ

_i

µ

_i

+ µ θ ψ Id

We will need an analog of Lemma 3.5 appearing in [21]. We extend this lemma to a matrix setting:

Lemma 6.2. Suppose {ξ

_i

}

_i6n

are symmetric positive semi-definite random matrices with E kξ

_i

k = 1 and :

P

X

i∈S

ξ

_i

> t

!

6 c

t

^1+η

provided t > c|S| = c

^X

i∈S

E kξ

_i

k.

for all subsets S ⊂ [n] and some constants c, η > 0. Consider positive numbers µ

_i

such

that

n

X

i=1

1 µ

_i

6 1.

Let µ be the minimal positive number such that

n

X

i=1

ξ

_i

µ

i

+ µ K · Id,

for some K > C = 4c. Then E µ 6

_K^c(η)^1+η

.

(14)

We do not reproduce the proof here as it is a direct adaptation of the argument in [21]

to the matrix setting. Applying Lemma 6.2 we get E µ 6 c(η)

^ψ_θ^1+η

, so that

(4) E ∆

₁

6 c(η) ψ

^η

θ

^1+η

6.4. Estimating ∆

₂

.

Suppose θ 6

¹₂

. Since ψ

_u

(A) − ψ

_u+t

(A) = t.Tr

(u.I

_n

− A)

⁻¹

((u + t).I

_n

− A)

⁻¹

we can write

Q

₂

(t, B) =

P

i

hBvi,vii (u+t−λ_i)²

t

^P_i

(u + t − λ

i

)

⁻¹

(u − λ

i

)

⁻¹

6 1 t

P

i

hBvi,vii (u+t−λ_i)(u−λ_i)

P

i

(u + t − λ

i

)

⁻¹

(u − λ

i

)

⁻¹

(5)

:= 1

t P

₂

(t, B)

First note that P

₂

(t, B) can be written as

^P_i

α

_i

(t) hBv

_i

, v

_i

i with

^P_i

α

_i

= 1. Having this in mind, one can easily check that E P

₂

(t, B) = 1 and

(6) E P

₂

(t, B)

¹⁺^3η⁴

6 c(η),

where for the last inequality, we used the fact that B satisfies (M W R) with p = 1 +

^3η₄

. In order to estimate ∆

₂

, we will divide it into two parts as follows:

∆

₂

= ∆

₂

1

_{P

2(0,B)6_4ψ^θ }

+ ∆

₂

1

_{P

2(0,B)>_4ψ^θ }

:= H

₁

+ H

₂

Let us start by estimating E H

₁

. Suppose that P

₂

(0, B) 6

_4ψ^θ

and denote

x = (1 + 4θ)P

₂

(0, B).

Since ψ

_u

(A) 6 ψ, we have (u−λ

_i

).ψ > 1 ∀i and therefore u+x−λ

_i

6 (1+xψ)(u−λ

_i

).

This implies that

P

₂

(x, B) 6 (1 + xψ)P

₂

(0, B).

Now write

Q

₂

(x, B) 6 1

x P

₂

(x, B) 6 1 + xψ

x P

₂

(0, B) 6 1 + (1 + 4θ)

^θ₄

1 + 4θ 6 1 − θ, which means that

∆

₂

1 {

^P2(0,B)6_4ψ^θ

} 6 (1 + 4θ)P

₂

(0, B) and therefore

(7) E H

₁

= E ∆

₂

1 {

^P²(0,B)6_4ψ^θ

} 6 1 + 4θ

Now it remains to estimate E H

₂

. For that we need to prove a moment estimate for

∆

2

. First observe that using (6) we have

P {∆

₂

> t} = P {Q

₂

(t, B) > 1 − θ} 6 P {P

₂

(t, B) > t.(1 − θ)} 6 c(η)

t

¹⁺^3η⁴