• Aucun résultat trouvé

Technical appendix to ``Adaptive estimation of covariance matrices via Cholesky decomposition''

N/A
N/A
Protected

Academic year: 2022

Partager "Technical appendix to ``Adaptive estimation of covariance matrices via Cholesky decomposition''"

Copied!
32
0
0

Texte intégral

(1)

HAL Id: hal-00524623

https://hal.archives-ouvertes.fr/hal-00524623

Preprint submitted on 8 Oct 2010

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Technical appendix to “Adaptive estimation of covariance matrices via Cholesky decomposition”

Nicolas Verzelen

To cite this version:

Nicolas Verzelen. Technical appendix to “Adaptive estimation of covariance matrices via Cholesky

decomposition”. 2010. �hal-00524623�

(2)

ISSN: 1935-7524

Technical appendix to “Adaptive estimation of covariance matrices via Cholesky decomposition”

Nicolas Verzelen

INRA, UMR 729 MISTEA, F-34060 Montpellier, France SUPAGRO, UMR 729 MISTEA,

F-34060 Montpellier, France e-mail:[email protected] Abstract:

This is a technical appendix to “Adaptive estimation of covariance matrices via Cholesky decomposition (arXiv:1010.1445).

AMS 2000 subject classifications:Primary 62H12; secondary 62F35, 62J05.

Keywords and phrases:Covariance matrix, banding, Cholesky decomposition, directed graphical models, penalized criterion, minimax rate of estimation.

For the sake of clarity, we emphasize the references to [7] in bold . For instance Eq.(12) refers to Eq.(12) in [7], while Eq.(12) stands for Eq.(12) in this paper.

Contents

1 Additional simulations . . . 2

2 Proof of the risk upper bounds . . . 2

2.1 Proof of Lemma10.4 . . . 3

2.2 Proof of Lemma10.5 . . . 4

2.3 Proof of Lemma10.6 . . . 5

2.4 Proof of Lemma10.7 . . . 7

2.5 Proof of Lemma10.8 . . . 8

2.6 Proof of Lemma10.9 . . . 9

2.7 Proof of Corollary6.1 . . . 11

2.8 Proof of Proposition4.5 . . . 11

3 Proofs of the minimax bounds . . . 12

3.1 Main lemma . . . 12

3.2 Adaptive banding . . . 14

3.2.1 Proof of Proposition5.3 . . . 15

3.2.2 Proof of Proposition3.5 . . . 16

3.3 Complete graph selection (proof of Proposition6.2) . . . 18

3.3.1 First case: minimax rate overU1(k, p) . . . 18

3.3.2 Second case: minimax rate overU2(k, p) . . . 21

3.3.3 Technical lemmas . . . 23

4 Proof of the Frobenius bounds . . . 24

4.1 Proof of Corollary5.4 . . . 24

4.2 Proof of Corollary6.3 . . . 26

Research mostly carried out at Univ Paris-Sud (Laboratoire de Mat´ematiques, CNRS-UMR 8628) 1

(3)

5 Proof of Lemma10.2 . . . 27 6 Proof of Proposition7.2 . . . 28 References . . . 31 1. Additional simulations

We provide here additional simulation results for the complete graph selection problem. We use the same notations and compute the same estimators as in Section8.2.

In the third simulation scheme, we consider the case where the ”good” ordering is completely unknown. We first sample a precision matrix Ωc1 according to the first simulation scheme. Then, we sample uniformly a permutation of {1, . . . , p} and reorder the variables according to this permutation to get the precision matrixΩc3. Its Cholesky factor is generally far less sparse than the Cholesky factor of Ω1, while Ωc1 is as sparse as Ωc3. The purpose of this scheme is to illustrate the limits of procedures based on the Cholesky factor when no suitable ordering is known. As in the second scheme, we only perform the simulations forp= 200, Esp= 1,3,5, andn= 100.

Method Ledoit GLasso Lasso ChoSelectf Kullback discrepancyK(Ω;Ω)b

Esp=1 20.1±0.2 9.1±0.2 8.2±0.1 7.6±0.1 Esp=3 42.8±1.8 22.0±0.2 24.0±0.4 25.3±0.5 Esp=5 52.6±1.0 35.8±0.3 42.7±0.4 49.1±0.5

Operator distancekbΩk

Esp=1 6.3±0.1 5.6±0.1 4.7±0.1 4.5±0.2 Esp=3 10.0±0.1 10.1±0.1 8.8±0.2 8.0±0.2 Esp=5 14.3±0.2 15.3±0.2 13.0±0.2 12.6±0.2

Operator distancekb−1Σk Esp=1 2.7±0.1 1.7±0.1 2.1±0.1 1.7±0.1 Esp=3 8.4±0.5 6.4±0.3 11.5±0.8 10.8±0.8 Esp=5 16.3±0.8 14.7±0.8 26.5±1.6 25.0±1.4

Table 1: Comparison between the procedures for the third covariance model Ωc3 withp= 200.

We observe different results for the Kullback risk depending on the sparsity. When Esp=1, ChoSelectf still performs better than the other methods. However, the GLasso provides better results than the two other results for Esp=3 and Esp=5. This is not really surprising since the Glasso has been introduced to handle the ”unordered” situation. It seems from the case Esp=5, that the Lasso procedure is more robust to a ”bad” ordering than ChoSelectf. ChoSelectf still performs better than the other procedures in terms of the operator distance between preci- sion matrices. Nevertheless, the differences of performance are less obvious than in the previous schemes. Finally, the Glasso and Ledoit and Wolf’s method exhibit a smaller operator distance between covariance matrices than the Lasso and ChoSelectf.

2. Proof of the risk upper bounds

Lemma 2.1. Let V be a χ2 random variable with N >2 degrees of freedom and let k be some positive integer such that N >2k, then

E 1

Vk

= 1

(N−2). . .(N−2k) and E Vk

=N(N+ 2). . .(N+ 2(k−1)). We refer to Lemma 5 in [1] for the proof of slightly more general version of this lemma.

(4)

2.1. Proof of Lemma 10.4 Using expression (31) ofK

t, s;btm,sbm

, we derive 2(1−κ0)K t, s;et,es

= 2K

t, s;btm,bsm

+ (1−κ0) log es

b sm

+ (1−κ0)s+l(et, t) e s

− s+l(btm, t) b

sm00log s

b sm

.

By definition ofm, log (b es/sbm)≤pen(m)−pen(m). Hence,b 2(1−κ0)K t, s;et,se

≤ 2K t, s;btm,bsm

+ (1−κ0) [pen(m)−pen(m)]b + κ0 s

b sm

0

− s b sm

+ 1 + log s

b sm

−s+l(btm, t)− kΠm(ǫ+ǫm)k2n

b sm

+ l(et, t)(1−κ0) +s(1−κ0)− kΠmb(ǫ+ǫmb)k2n

e

s ,

sincebsm=kΠm(ǫ+ǫm)k2n andes=kΠmb(ǫ+ǫmb)k2n. As the functionx−logx−1 is non-negative, the term [−s/bsm+ 1 + log (s/bsm)] is non-positive. SinceX<itmis the best predictor ofXigiven Xm, it follows thatl(btm, t) =l(btm, tm) +l(tm, t). Hence,

κ0 s b

sm −s+l(btm, t)− kΠm(ǫ+ǫm)k2n

b

sm ≤ −(1−κ0) s b

sm +kΠm(ǫ+ǫm)k2n−l(tm, t) b

sm .

In the proof of Lemma 7.5 in [8], we state that l(btm, tm)≤ϕmax

nZmZm)1

m(ǫ+ǫm)k2n . This yields

(1−κ0)l(et, t) +s e

s ≤ (1−κ0)s+l(tmb, t) +κ2ϕmax

n(ZmbZmb)−1

mb(ǫ+ǫmb)k2n

e s + (1−κ0)(1−κ2)l(et, tmb)

e

s .

Let us gather all these bounds 2(1−κ0)K

t, s;et,se

≤ 2K

t, s;btm,bsm

+ (1−κ0) [pen(m)−pen(m)]b + (1−κ0)l(tmb, t) + (1−κ2)l(et, tmb) +κ2ϕmax

n(ZmbZmb)1

mb(ǫ+ǫmb)k2n

e s

− kΠmb(ǫ+ǫmb)k2n

e

s +s(1−κ0) 1

e s − 1

b sm

+kΠm(ǫ+ǫm)k2n−l(tm, t) b

sm

≤ 2K

t, s;btm,bsm

+ (1−κ0) [pen(m)−pen(m)]b + (1−κ0)l(tmb, t) + (1−κ2)l(et, tmb) +κ2ϕmax

n(ZmbZmb)1

mb(ǫ+ǫmb)k2n

e s + kǫk2n−s(1−κ0) 1

b sm −1

e s

+kΠmbǫk2n

e

s + 2hΠmbǫ,Πmbǫmbin e s

− kΠmbǫmbk2n

e

s + 2hΠmǫ,Πmǫmin b sm

+kΠmǫmk2n−l(tm, t) b

sm

.

(5)

We then use Condition (39) onpen(m) and we apply the inequalityb 2hΠmbǫ,Πmbǫmbin

e

s ≤κ1

l(tmb, t) e

s +κ11s e s

mbǫ,Πmbǫmbi2n

sl(tmb, t) . Hence, we conclude that

2(1−κ0)K

t, s;et,es

≤ 2K

t, s;btm,bsm

+ (1−κ0)pen(m) + l(et, t)

e

s [R1(m)b ∨(1−κ2)(1−κ0)] +R2(m) +s e

sR3(m) +b R4(m,m)b . 2.2. Proof of Lemma 10.5

This proof follows the same sketch as the proof of Lemma 7.10 in [8]. The main difference lies in the fact thatκ0is zero in [8]. Letxbe a positive number that we shall fix later. For anyk >0, let us define

δk :=

2k+ exp(−k/16).

We shall first control the deviations of the random variables involved inR1(m). Applying devi-b ation inequality for χ2 random variables and largest values of Standard Wishart matrices (see e.g. Lemmas 7.2, 7.3, and 7.4 in [8]) to all models m∈ Mensures that there exists an eventB2

such thatP(Bc2)≤4nexp(−nx) and under B2 it holds that kΠmbǫmbk2n

l(tmb, t) ≥ n− |mb| n

"

1−δn−|mb|

s2|mb|H(|mb|) n− |mb| −

s 2xn n− |mb|

!

∨0

#2 ,

mb(ǫ+ǫmb)k2n

s+l(tmb, t) ≤ 2|mb| n

h1 +p

H(|mb|) +H(|mb|)i

+ 3x , (1)

mb(ǫ+ǫmb)k2n

s+l(tmb, t) ≥ n− |mb| n

"

1−δn−|mb|

s2|mb|H(|mb|) n− |mb| −

s 2xn n− |mb|

!

∨0

#2 ,

max

h(ZmbZmb)1i

"

1− 1 +p

2H(|mb|) r|mb| n −√

2x

!

∨0

#2

.

By Assumption (HiK,η), the expression (1 +p

2H(m))b p

|mb|/nis bounded by √η. Moreover, (HiK,η) also ensures that|mb| ≤n/2. Henceδn−|m|b ≤δn/2≤ν(K) fornlarger than some quantity n0(K). Sinceν(K)≤1−√η, we derive that

mbǫmbk2n

l(tmb, t) ≥

1−|mb| n

[1−ν(K)−√η]2−2√

2x , (2)

mb(ǫ+ǫmb)k2n

s+l(tmb, t) ≥

1−|mb| n

[1−ν(K)−√η]2−2√

2x , (3)

ϕmax

hn(ZmbZmb)1i

≤ h

1−√η−√ 2x

∨0i2

.

Constrainingxto be smaller than 1−√η2

/8 ensures that

κ2ϕmax

hn(ZmbZmb)−1i

1B1 ≤(K−1)(1−√η−ν(K))2

4 . (4)

(6)

Gathering the definition ofR1(.), the inequalities (1), (2), (3), and (4), we get R1(m)b ≤ κ1+ 1−[1−√η−ν(K)]2+|mb|

n (1−√η−ν(K))2U1+√

xU2+xU3 , whereU1,U2, andU3are respectively defined by







U1 := −(1−κ0)Kh 1 +p

2H(|mb|)i2

+ 1 + (1−κ0)(K−1)/2h 1 +p

H(|mb|)i2

≤0 U2 := 2√

2 [1 +Kη]

U3 := 34(K−1)

1−√η−ν(K)2 .

SinceU1 is non-positive, we obtain an upper bound ofR1(m) that does not depend anymore onb the model m. By assumption (Hb iK,η), we know that η <(1−ν(K)−(3/(K+ 2))1/6)2. Hence, coming back to the definition ofκ1allows to prove thatκ1is strictly smaller than [1−√η−ν(K)]2. Setting

x:=

"

1−√η−ν(K)2

−κ1

4U2

#2

1−√η−ν(K)2

−κ1

4U3 ∧ 1−√η2

8 ,

we get

R1(m)b ≤1−1 2

h(1−√η−ν(K))2−κ1

i<1 , under the eventB2.

This is enough to prove Lemma 10.5if we take B1 = B2. In fact, we shall define an event B1 slightly more restrictive in order to simplify the proof of Lemma 10.6. Let B3 be the event defined by

kǫk2n/s≤κ11 . (5) Since κ−11 is strictly larger than one and since κ1 only depends on K and η, it follows that P(Bc3)≤exp(−nLK,η) withLK,η>0. Finally we take,B1:=B2∩B3.

2.3. Proof of Lemma 10.6

The sketch of this proof is similar to the proof of Lemma 7.11 in [8]. First, under the eventB1, it holds that

mb(ǫ+ǫmb)k2n

s+l(tmb, t) ≥[1−ν(K)−√η]2/4>0 , κ2ϕmax

h

n(ZmbZmb)1i

≤(K−1)(1−√η−ν(K))2

4 .

This is a consequence of (3), of the choice of x in the previous proof, and of the assumption (HiK,η). Since es = |Πmb(ǫ+ǫmb)k2n, it follows that s/es is upper bounded under the event B1. Hence, we only have to upper bound the expectation ofR3(m) onb B1.

Ehs e

sR3(m1b B1)i

≤LK,ηE[R3(m1b B1)] . (6) Let us consider the random variablesEmb defined by

Emb :=κ11mbǫ,Πmbǫmbi2n

sl(tmb, t) +kΠmbǫk2n

s .

(7)

By (5), the random variableEmb is upper bounded underB1 by Emb ≤κ−21mbǫ/kΠmbǫknmbǫmbi2n

sl(tmb, t) +kΠmbǫk2n

s

This upper bound follows the distribution of a linear combinations ofχ2random variables. More details about this observation are given in the proof of Lemma 7.7 in [8]. We shall simultaneously control the deviations of the random variablesEmb,kΠmb(ǫ+ǫmb)k2n/[l(tmb, t) +s], andkΠmb(ǫ+ ǫmb)k2n/[s+l(tmb, t)] by applying Lemma 1 in [5] and Lemmas 7.2 and 7.3 in [8]. For anyx >0, we define an eventF(x) such that conditionally onF(x)∩B1,













Emb|mb|n −21 +n2q

|mb|+κ14

[|mb|(ξ+H(|mb|)) +x]

+ 2κ12[ξ(|mb|+H(|mb|)) +x]/n ,

kΠcm(ǫ+ǫ

c m)k2n

s+l(tmc,t)n1

h|mb|+ 2p

|mb|[|mb|(1/16 +H(|mb|)) +x] + 2 [|mb|(1/16 +H(|mb|)) +x]i ,

kΠcmǫcmk2n

s+l(tmc,t)n−|nmb|

h1−δn−|mb|−q

|bm|(1+2H(|bm|)) n−|mb| −q

2x n−|mb|

∨0i2 ,

whereδk is defined in the previous proof. Then, the probability ofF(x) satisfies P[F(x)c] ≤ ex

"

X

m∈M

exp [−|mb|H(|mb|)]

eξ|mb|+e|c16m| +e|cm|2 #

≤ e−x 1

1−eξ + 1

1−e1/16 + 1 1−e1/2

.

Let us expand the three deviation bounds thanks to the inequality 2ab≤τ a21b2: Emb ≤ |mb|

n

h1 + 2p

ξ+ 2κ12ξ+τ1ξ+τ2

i+x n

12211

+ κ12 n

1 +τ11κ12

+|mb|H(|mb|) n

121

+ 2|mb|p H(|mb|) n

≤ |mb| n

1 +p

2H(|mb|)2h

κ12+ 2p

ξ+ 2κ12ξ+τ1ξ+τ2

i

+ x

n

1221112

n

1 +τ11κ12 .

Similarly, we get

mb(ǫ+ǫmb)k2n

l(tmb, t) +s ≤ 2|mb| n

h1 +p

2H(|mb|)i2

+ 5x n .

We recall that |m| ≤n/2 by Assumption (HiK,η). If nis larger than some quantityn0(K), then δn−|m|≤δn/2≤ν(K). Applying again Assumption (HiK,η), we get

−K |mb| n− |mb|

1 +p

2H(|mb|)2mb(ǫ+ǫmb)k2n

l(tmb, t) +s

≤ −K|mb| n

1 +p

2H(|mb|)2"

1−√η−ν(K)−

s 2x n− |mb|

!

∨0

#2

≤ −K|mb| n

1 +p

2H(|mb|)2h

(1−√η−ν(K))2−τ3

i+ 2Kητ3−1x n .

(8)

Let us combine these three bounds with the definitions ofR3(m),b κ1, andκ2, and the bound (4).

Hence, under the eventB1∩F(x), it holds that R3(m)b ≤|mb|

n

h1 +p

2H(|mb|)i2

U1+x

nU2+LK,η

n U3 , where



U1 := −K−110 1−√η−ν(K)2

+Kτ3+ 2√

ξ+ 2κ12ξ+τ1ξ+τ2 , U2 := τ2−11+LK,η(1 +τ3−1),

U3 := 1 +τ11 .

Since K >1, there exists a suitable choice of the constantsξ, τ1, andτ2, only depending onK andη that constrainsU1to be non positive. Hence, under the eventB1∩F(x),

Bmb ≤ LK,η

n +L(K, η)x n .

SinceP[F(x)c]≤exLK,η, we conclude by integrating the last expression with respect tox.

2.4. Proof of Lemma 10.7 Let us assume thatn≥17.

R2(m) = 1−l(tm, t) +kΠmǫk2n

mǫ+ǫmk2n

.

We shall first upper bound the expectation ofR2(m).

E

l(tm, t) kΠmǫ+ǫmk2n

= n

n− |m| −2

l(tm, t)

[s+l(tm, t)] ≥ n− |m| n− |m| −2

l(tm, t)

[s+l(tm, t)] (7) Let us to the second term.

E

mǫ+ǫmk2n

mǫk2n

= 1 +l(tm, t) s

n− |m|

n− |m| −2 ≤s+l(tm, t) s

n− |m| n− |m| −2 Applying a convexity argument, we derive that

E

mǫk2n

mǫ+ǫmk2n

≥ s

s+l(tm, t)

n− |m| −2

n− |m| . (8)

Gathering the inequalities (7) and (8) allows to conclude that E[R2(m)]≤ 2

n− |m| . We get the upper bound

E[R2(m)1B1] = E[R2(m)]−E

R2(m)1Bc

1

≤ 2

n− |m|+q P(Bc1)

q

E[R22(m)]. (9)

(9)

Thus, we need to upper bound the second moment ofR2(m).

E R22(m)

≤ 3 (

1 +E

l2(tm, t) kΠmǫ+ǫmk4n

+

s

E[kΠmǫk8n]E

1

mǫ+ǫmk8n

)

≤ 3

"

1 +

l(tm, t)n

(s+l(t, m, t))(n− |m| −4) 2

+

s(n− |m|+ 6) (s+l(t, m, t))(n− |m| −8)

2# ,

by Lemma 2.1. By Assumption (HiK,η), we have |m| ≤ n/2. If n ≥ 32, we can upper bound E

R22(m)

by a universal constant. Combining this result with (9) enables to conclude that E[R2(m)1B1]≤LK,η/n.

2.5. Proof of Lemma 10.8

We bound the quantityR4(m,m) using the same arguments as in the proof of Theorem 3 in [1].b We first split this quantity into a sum of two terms:

R4(m,m) =b kǫk2n−s(1−κ0)

+

1 b sm −1

e s

+ s(1−κ0)− kǫk2n

+

− 1 b sm

+1 e s

≤ R4,1(m,m) +b R4,2(m)b ,

whereR4,1(m,m) andb R4,2(m) are respectively defined byb R4,1(m,m)b := kǫk2n−s(1−κ0)

+

1 b sm−1

e s

R4,2(m)b := s(1−κ0)− kǫk2n

+

1 e s .

By definition, we know that log (es/bsm) is smaller thanpen(m)−pen(m).b R4,1(m,m)b ≤ kǫk2n−s(1−κ0)

+

1 b sm

log es

b sm

≤ kǫk2n−s(1−κ0)

+

1 b sm

pen(m).

Applying Cauchy-Schwarz inequality yields E[R4,1(m,m)1b B1] ≤ E

"

kǫk2n−s(1−κ0)

+

b sm

#

pen(m)

≤ s

Eh

(kǫk2n−s(1−κ0))2i E

1 b s2m

pen(m)

≤ s sm

s κ20+2

n

n2

(n− |m| −2)(n− |m| −4)pen(m)

≤ Lpen(m),

(10)

since |m| ≤ n/2 by Assumption HK,η and since n ≥ 17. Let us turn to R4,2(m). We applyb H¨older’s inequality withv:=⌊n/8⌋andu=v/(v−1).

E[R4,2(m)1b B1] ≤ E

s(1−κ0)− kǫk2n

+

1 e s

≤ Eh

1s(1κ0)≥kǫk2n

s e s i

≤ P

kǫk2n≤s(1−κ0)1/uh Es

e s

vi1/v

≤ P

kǫk2n≤s(1−κ0)1/u"

X

m∈M

E s

b sm

v#1/v

.

Sincevis smaller thann/8 and since|m|is smaller thann/2 it follows thatn− |m| −2vis larger thann/4. Hence, we can apply Lemma2.1to any modelm∈ M.

E[R4,2(m)b 1B1] ≤ exp

−nκ20 4u

" X

m∈M

nv

(n− |m| −2). . .(n− |m| −2v)

#1/v

≤ n

n/2−2vexp

−nκ20 4u

|M|1/v

≤ nexp

−nκ20 4u

|M|1/v .

Let us bound the cardinality of the collection M. We recall that the dimension of any model m∈ M is assumed to be smaller thann/2 by (HK,η). Besides, for any d∈ {1, . . . , n/2}, there are less than exp(dH(d)) models of dimensiond. Hence,

log (|M|)≤log(n) + sup

d=1,...,n/2

dH(d).

By assumption (HK,η),dH(d) is smaller thann/2. Thus, log(|M|)≤log(n) +n/2 and it follows that|M|1/v is smaller than an universal constant providing thatnis larger than 8. All in all, we get

E[R4,2(m)1b B1]≤Lnexp

−nκ20 4u

.

2.6. Proof of Lemma 10.9

For anyx >0, the following inequality holds x−1−log(x)≤ 9

64

x− 1 x

2

.

This statement is easy to establish by studying the derivative of the associated function. Hence, we upper bound the Kullback divergence

K

t, s;btm,bsm

= s

b sm

+ 1−log s

b sm

+l(btm, t) b sm

≤ 9 64

s2 b s2m +bs2m

s2

+l(tm, t) b

sm +l(btm, tm) b sm .

(11)

Thanks to Cauchy-Schwarz inequality, we obtain E

K t, s;et,es 1Bc

1

≤ E

"

9 64

s2 e s2 +se2

s2

+l(tm, t) e

s +l(et, tmb) e s

! 1Bc

1

#

≤ Lq P[Bc1]

vu utX

m∈M

E

"

1m=mb s4 b s4m+bs4m

s4 +l2(tm, t) b

s2m +l2(btm, tm) b s2m

!#

.

As in the proof of Lemma10.8, we apply H¨older’s inequality withv=⌊n/16⌋andu=v/(v−1).

Again, we check that for any modelm∈ M, n− |m| −8v≥1.

E

"

1m=mb s4 b s4m+bs4m

s4 +l2(tm, t) b

s2m +l2(btm, tm

b s2m

!#

≤P[m=m]b 1u

Es4v b s4vm

1v

+E bs4vm

s4v v1

+E

l2v(tm, t) b s2vm

1v

+E l2v(btm, tm) b s2vm

!v1

 . We bound the first two terms applying Lemma 2.1 or computing the v-th moment of χ2 random variable.

E s4v

b s4vm

v1

≤ n4

(n− |m| −8v)4 , E

bs4vm s4v

v1

=

(n− |m|)(n− |m|+ 2). . .(n− |m|+ 2(4v−1))(sm)4v (ns)4v

1v

≤ (n− |m|+ 8v)4(s+l(0, t))4

n4s4 .

AskΠm(ǫ+ǫm)k2n is independent of the couple (kΠm(ǫ+ǫm)k2n,Xm), the random variablesbsm

andl(btm, tm) are independent. We bound the thel2v-risk ofl(btm, t) thanks to Proposition 7.8 in [8].

E l2v(btm, tm) b s2vm

!1v

=

E

l2v(btm, tm) E

1 b s2vm

1v

≤ Lv2|m|2n4

(n− |m| −4v)2 ≤Lv2|m|2n2 n2

(n− |m| −4v)2 . Combining these upper bounds and noting thatn− |m| −8v≥1 and|m| ≤n/2 yields

E

K t, s;et,es 1Bc1

"

2n2

(n− |m| −8v)2 + Lv|m|n2

n− |m| −4v +(n− |m|+ 8v)2 n2

1 +l(0, t) s

2#

× L q

P[Bc1]|M|2v1

≤ LK,ηn5/2

1 + l(0, t) s

exp [−nLK,η] ,

since |M|1/2v is smaller than than an universal constant as explained in the proof of Lemma 10.8. Finally,l(0, t)/sis smaller thanK(t, s; 0,1).

(12)

2.7. Proof of Corollary 6.1

First, we claim that the penalties (21) are lower bounded by penalties defined in (7). Suppose that K > e−1. Since log(1 +Kx)≥log(1 +K)xfor anyxbetween 0 and 1, it follows that

peni(mi)≥log(1 +K) |mi| n− |mi|

1 +

q

2 [1 + log ((i−1)/|mi|)]2

, (10)

if |mi|/(n− |mi|){1 +p

2[1 + log((i−1)/|mi|))]2} ≤ 1. If K ≤ e−1, there exists a positive constantζ(K) such that log(1 +Kx)≥√

Kx, for allx≤ζ(K). Hence, we get peni(mi)≥√

K |mi| n− |mi|

1 +

q

2 [1 + log ((i−1)/|mi|)]2

, (11)

ifK≤e−1 and if|mi|/(n− |mi|)[1 +p

2[1 + log((i−1)/|mi|)]2]≤ζ(K).

Gathering (10) and (11), we get that for any K > 1, there exists some K > 1 and some ζ(K)>0 such that:

peni(mi)≥K |mi| n− |mi|

1 +

q

2 [1 + log ((i−1)/|mi|)]2

,

if|mi|/(n− |mi|){1 +p

2[1 + log((i−1)/|mi|)]2} ≤ζ(K).

For any 2≤i≤pand any 1≤k≤(i−1)∧d,Hi(k) is smaller than 1 + log((i−1)/k). Hence, peni(mi) is lower bounded by a penalty of the form (7) with someK>1. Assuming that

|mi| n− |mi|

1 +

q

2 [1 + log ((i−1)/|mi|)]2

≤ζ(K)∧η(K), (12) we derive that (HK) is fulfilled and that the risk bound (23) holds.

We conclude by observing that Condition (12) is satisfied if d[1 + log(p/d)∨0]≤nη(K), for some suitable functionη(K).

2.8. Proof of Proposition 4.5

We apply the same arguments as in the proof of Theorem4.4, except that we replaceH(|m|) by lm. Then, Lemmas10.4and10.7are still true.

In the proof of Lemmas 10.8and 10.9, the only difference with the previous case concerns the upper bound of log (|M|). By definition oflm,

|M| −1≤ sup

m∈M\{∅}

exp(|m|lm).

Hence, log(|M|)≤1 + supm∈M\{∅}|m|lm, which is smaller than 1 +n/2 by Assumption (HbayK,η).

Lemmas10.5and10.6also hold whenH(|m|) is replaced bylmas explained in the proof of Proposition 3.5 in [8].

(13)

3. Proofs of the minimax bounds

We note dH(., .) the Hamming distance between two vectors. The Hamming distance between two matrices of sizepis defined as the Hamming distance between their vectorialized version of size p2. It is also noteddH(., .).

3.1. Main lemma

We first state two useful lemmas for proving the minimax lower bounds. The first one is known as Varshamov-Gilbert’s lemma, whereas the second one is a modified version of Birg´e’s lemma for covariance estimation.

Lemma 3.1(Varshamov-Gilbert’s lemma). Let{0,1}Dbe equipped with Hamming distancedH. There exists some subset Θof{0,1}D with the following properties

dH(θ, θ)> D/4 for every(θ, θ)∈Θ2 with θ6=θ and log|Θ| ≥D/8 . We notektkl2 the Euclidean norm of a vectort.

Lemma 3.2. LetA be a subset of{1, . . . , p}. For any positive matricesand, we define the function d(Ω,Ω)by

d(Ω,Ω) :=X

i∈A

log

"

1 +kti−tik2l2

4

#

+ X

i∈Ac

si

si + log si

si

−1 . (13)

Let Υbe a subset of square matrices of sizepwhich satisfies the following assumptions:

1. For allΩ∈Υ, ϕmax(Ω)≤2 andϕmin(Ω)≥1/2.

2. There exists(s1,s2)∈[1; 2]2 such that∀Ω∈Υ, ∀1≤i≤p,si∈ {s1,s2}.

Setting δ = minΩ,Ω∈Υ,Ω6=Ωd(Ω,Ω), provided that maxΩ,Ω∈ΥK(Pn;Pn) ≤ κ1log|Υ|, the following lower bound holds

infb

sup

ΥE

hK Ω;Ωbi

≥κ2δ .

The numerical constants κ1 andκ2 are made explicit in the proof.

Proof of Lemma 3.2. This lemma is mainly based on an application of Birg´e’s version of Fano’s lemma [3]. We provide a statement of the result that is taken from [6] Sect.2.4.

Lemma 3.3. Let(Pi)0≤i≤N be some family of probability distributions and(Ai)0≤i≤N be some family of disjoint events. Let a= min0iNPi(Ai), then

a≤κ∨

max1≤i≤NK(Pi;P0) log(1 +N)

,

whereκ= 2e/(2e+ 1).

LetΩ be an estimator of Ω. Letb Ω be an estimator that takes its values in Υ and satisfiese d(eΩ,Ω) = minb

Υd(Ω,Ω)b .

(14)

We note (T ,e S) and (e T ,b S) the Cholesky decompositions ofb Ω ande Ω. Letb i∈ {1, . . . , p}. By the triangle inequality,

kti−etik2l2

4 ≤2

"

kti−btik2l2

4 +kbti−etik2l2

4

# .

For any positive numbers a and b, log(1 +a+b)≤log(1 +a) + log(1 +b). Moreover, for any positive numbera, we have log(1 + 2a)≤2 log(1 +a) because the log function is concave. Hence, we get

log

"

1 + kti−etik2l2

4

#

≤2 log

"

1 + kti−btik2l2

4

# + 2 log

"

1 + kbti−etik2l2

4

#

. (14)

Let us define the functionf byf(x) :=x−log(x)−1 for anyx >0. We state that there exists some numerical constantL such that

f si

e si

≤L

f si

b si

+f

esi

b si

. (15)

If si =esi, this inequality holds for anyL >0 sincef(1) = 0 and f is non negative. Ifsi 6=esi, there are two possibilities: either si = s1 and esi = s2 or si = s2 and esi = s1. By deriving f(s1/x) +f(s2/x), one observes that this sum is minimized for x= (s1+s2)/2 and that this minimum equalsf[2/(1 +s1/s2)] +f[2/(1 +s2/s1)]. Hence, we obtain that

f si

e si

≤ f

s1

s2

∨f

s2

s1

fh

2/

1 + ss12i +fh

2/

1 + ss21i

f si

b si

+f

esi

b si

.

Sinces1 ands2lie between one and two, it follows that f

si

e si

≤ sup

1/2x2

f(x) f[2/(1 +x)] +f

2/(1 + 1x)

f si

b si

+f

esi

b si

. (16)

The ratiof(x)/(f[2/(1 +x)] +f[2/(1 + 1/x)]) is positive and continuous on [1/2; 1[ and ]1; 2]. By studying the Taylor series off(x) atxequals one, we observe thatf(x) = (x−1)2/2 +o[(x−1)2], f(2/(1 +x)) = (x−1)2/8 +o[(x−1)2], andf(2/(1 + 1/x)) = (x−1)2/8 +o[(x−1)2]. Hence, there exists a continuation of the ratio f(x)/(f[2/(1 +x)] +f[2/(1 + 1/x)]) around one. The supremum in (16) is therefore finite and the upper bound (15) holds.

Combining the upper bounds (14) and (15) with the definition ofΩ yieldse d(Ω,Ω)e ≤ 2X

i∈A

( log

"

1 +kti−btik2l2

4

# + log

"

1 + kbti−etik2l2

4

#)

+LX

i∈Ac

f

si

b si

+f

si

e si

≤ Lh

d(Ω,Ω) +b d(eΩ,Ω)b i

≤Ld(Ω,Ω)b . Hence, one can lower bound the risk ofΩ as followsb

sup

ΥE

hd Ω,Ωbi

≥L−1δsup

ΥP

hΩ6=Ωei

=L−1δ

1−min

ΥP

hΩ =Ωbi .

(15)

Applying Lemma3.3, we conclude that infb

sup

ΥE

hd Ω,Ωbi

≥L−1(1−κ)δ , (17)

if maxΩ,Ω∈ΥK(Pn,Pn)≤κlog|Υ|.

Let us now express this minimax lower bound in term of Kullback divergence. Thanks to the chain rule and Lemma 10.1, the Kullback divergence between two positive matrices Ω and Ω decomposes as

K(Ω; Ω) = Xp i=1

1 2

logsi si

+si

si −1 +li(ti, ti) si

.

Straightforward computations allow to prove that the function log(si/si) +si/si−1 +li(ti, ti)/si is minimized with respect tosi whensi=si+li(ti, ti). This leads to the lower bound

logsi si +si

si −1 + li(ti, ti) si ≥log

1 +li(ti, ti) si

.

By Definition (29) ofli(., .) the quantityli(ti, ti) is lower bounded by [ϕmax(Ω)]1kti−tik2l2. By assumption, [ϕmax(Ω)]1 is larger than 1/2 for any Ω∈Υ. Moreover,si is smaller than 2. We conclude that for any Ω∈Υ and any positive matrix Ω, the following lower bound holds

2K(Ω; Ω)≥X

i∈A

log 1 +kti−tik2l2

4

!

+X

i∈Ac

si

si −log si

si

−1 =d(Ω,Ω).

We conclude by gathering this last bound with (17) and setting κ1 := κand κ2 := L1(1− κ)/2.

3.2. Adaptive banding

In order to compute the minimax rates of estimation over ellipsoids, we first need to consider a lower bound over the setsTord[k1, . . . , kp, r] andUord[k1, . . . , kp, r].

Definition 3.4. Let (k1, . . . , kp)∈Np such thatki ≤i−1 and let r be a positive number. We respectively define the setsTord[k1, . . . , kp, r] andUord[k1, . . . , kp, r] as

Tord[k1, . . . , kp, r] :=

T ∈T rig(p)s.t. ∀2≤i≤p, T[i, j] = 0 if 1≤j≤i−ki−1 0 orr if i−ki≤j≤i−1

,(18)

Uord[k1, . . . , kp, r] :=

TS−1T , T ∈ Tord[k1, . . . , kp, r] andS∈Diag(p) . (19) The setTord[k1, . . . , kp, r] contains lower triangular matrices with unit diagonal such that for each lineibetween 2 andp, the support of the vector (T[i, j])1ji1is included in{i−ki, i−ki+ 1, . . . , i−1}. We are able to lower bound the minimax rates of estimation overUord[(k1, . . . , kp), r].

Proposition 3.5. Assume that k:= 1∨max1≤i≤pki is smaller than √n. Then, infb

sup

∈Uord[(k1,...,kp),r]Eh K

Ω;Ωbi

≥L

" p

X

i=2

ki+p

# r2∧1

n

(16)

These minimax rates of estimation are not really surprising, since they correspond to the minimax rates of estimation ofpdifferent parametric regression problems whose minimax rates is known to be of the orderki(r2∧1/n). We refer for instance to [6] Prop. 4.8. Moreover, the termp/nis due to the diagonal matricesS in Ω =TS1T. We believe that the assumption “k is smaller than √n” is not necessary but we do not know how to remove it.

3.2.1. Proof of Proposition 5.3

The lower bound (17) is a consequence of Proposition3.5. Letkbe a positive integer smaller than

⌊√n⌋∧(p−1). Given some 0< r <p

a2kR2/k, we consider the setUord[0,1, . . . , k−1, k, . . . , k, r].

Let (T, S) refer to the Cholesky decomposition of a matrix Ω belonging to this set. By definition ofUord,

i1

X

j=1

T[i, i−j]2 a2j =

Xk j=1

T[i, i−j]2

a2j ≤kr2/a2k≤R2.

Hence, the set Uord[0,1, . . . , k−1, k, . . . , k, r] is included in E(a, R, p). By Proposition 3.5, we obtain the minimax lower bound.

infb

sup

∈E(a,R,p)Eh K

Ω;Ωbi

≥ Lp(k+ 1) a2kR2

k ∧ 1 n

≥ Lp

a2kR2∧k+ 1 n

.

Similarly ifk = 0, the setUord[0, . . . ,0,+∞] is included inE(a, R, p) and the minimax rates of estimation over E(a, R, p) is lower bounded by Lp(a0R2∧1/n) with the conventiona0 = +∞. Taking the infimum for all non-positive integersksmaller than⌊√

n⌋∧(p−1) yields the first result.

Let us now turn to the second part of the proposition. We fix some γ >2. The matrices Ω considered in the proof of Proposition3.5for bounding the minimax rates of estimation over sets of the typeUord[0,1, . . . , k−1, k, . . . , k, r] have their eigenvalues between 1/2 and 2. Hence, the previous lower bound still holds and we get

infb

sup

∈E(a,R,p)∩Bop(γ)Eh K

Ω;Ωbi

≥ Lp sup

k=0,...,⌊

n⌋∧(p−1)

a2kR2∧k+ 1 n

. (20) Letkbe a non-negative integer smaller or equal tod∧(p−1), wheredis the maximal dimension of the models defining the estimatorΩedord. We consider the modelm∈ Mdord defined by

m:= (∅,{1}, . . . ,{i−1, . . . , i−k}, . . . ,{p−1, . . . , p−k}).

This model corresponds to estimating ak-th banded Cholesky factor. By Corollary5.1, the risk ofΩedord is upper bounded by

Eh K

Ω;Ωedordi

≤LK,η

p(k+ 1)

n +K(Ω; Ωm)

n(Ω, K, η). (21) Let us upper bound the bias termK(Ω; Ωm). By Equation (31), it decomposes as

2K(Ω; Ωm) = Xp i=1

si

si,mi

−log si

si,mi

−1 + li(ti, ti,mi) si,mi

= Xp i=1

log

1 +li(ti, ti,mi) si

≤ Xp

i=1

li(ti, ti,mi) si

,

Références

Documents relatifs

Moreover re- marking some strong concentration inequalities satisfied by these matrices we are able, using the ideas developed in the proof of the main theorem, to have some

We consider the problem of estimating a conditional covariance matrix in an inverse regression setting.. We show that this estimation can be achieved by estimating a

Keywords: Condition number, Resolvent, Toeplitz matrix, Model matrix, Blaschke product 2010 Mathematics Subject Classification: Primary: 15A60;

In conclusion, in this study we demonstrate that: (i) Dp71d, Dp71f and DAP are localized in the nuclei of hippocampal neurons cultured at 21 DIV, (ii) Dp71d and Dp71f form

Following the pioneering work of [17] and [4,5] have shown universality of local eigenvalue statistics in the bulk of the spectrum for complex sample covariance matrices when

Through a number of single- and multi- agent coordination experiments addressing such issues as the role of prediction in resolving inter-agent goal conflicts, variability in levels

There it is shown how to transform any pseudoidentity system defining a given group pseudovariety H within the class of all finite groups into a pseudoidentity systems defining

Finally, we mention that using Lemmas 10 and 7 it is easy to recover Hashiguchi's Main Lemma of [3] with validity for any finitely generated semigroup of matrices over the