# Appendix: Proofs

Dans le document The DART-Europe E-theses Portal (Page 101-110)

In order to prove Theorem 4.2.1, we use the following result from Butucea et al. (2018).

Consider the set of binary vectors

A={⌘2{0,1}p : |⌘|0 s}

and assume that we are given a family {P,⌘ 2 A} where each P is a probability distribution on a measurable space (X,U). We observe X drawn from P with some unknown⌘= (⌘1, . . . ,⌘p)2Aand we consider the Hamming risk of a selector⌘ˆ= ˆ⌘(X):

sup

2A

E|⌘ˆ ⌘|

where E is the expectation w.r.t. P. We call the selector any estimator with values in {0,1}p. Let ⇡ be a probability measure on {0,1}p (a prior on ⌘). We denote by E the expectation with respect to ⇡. Then the following result is proved in Butucea et al.

(2018)

Theorem 4.8.1. Butucea et al. (2018) Let ⇡ be a product on p Bernoulli measures with parameter s0/p where s0 2(0, s]. Then,

infˆ sup

⌘2AE|⌘ˆ ⌘| inf

Tˆ2[0,1]pEE

Xp i=1

|Tˆii| 4s0exp⇣ (s s0)2 2s

, (4.29) where infˆ is the infimum over all selectors and infTˆ2[0,1]p is the infimum over all esti-mators Tˆ = ( ˆT1, . . . ,Tˆp) with values in [0,1]p.

Proof of Theorem 4.2.1. Let⇥(p, s, a)a subset of ⌦ps,a defined as

⇥(p, s, a) ={ 2⌦ps,a : i =a, 8i2S }.

Since any 2⇥(p, s, a)can be written as =a⌘ , there is a one-to-one correspondence between A and ⇥(p, s, a). Hence,

infˆ sup

⌘2AE|⌘ˆ ⌘|= inf

ˆ

sup

2⇥(p,s,a)

E |⌘ˆ ⌘ |.

4.8. APPENDIX: PROOFS 89 Using this remark and Theorem 4.8.1 we obtain that, for all s0 2(0, s],

infˆ sup

where ⇡ a product on p Bernoulli measures with parameter s0/p. Thus, to finish the proof it remains to show that

ˆinf

Here,' is the density of Gaussian distribution inRnwith i.i.d. zero-mean and variance

2 components. By the Bayesian version of the Neyman-Pearson lemma, the infimum in (4.30) is attained forT˜i =Ti given by the formula

where ⇠ is a standard Gaussian random vector in Rn independent of Xi. Notice now that " := kXXiT

ik is a standard Gaussian random variable and it is independent of kXik since Xi ⇠N(0,In). Combining the above arguments we find that

We conclude the proof by using the fact that the function u! +(n,p,u,a,u ) is decreasing for u >0 (cf. Butucea et al. (2018)), so that +(n, p, s0, a, ) ss0 +(n, p, s, a, ).

Proof of Theorem 4.3.1. In view of Theorem 4.2.1 with s0 = s/2, it is suﬃcient to bound += +(n, p, s, a, )from below. We have

+ (p s)P( " t(⇣)).

We will use the following bound for the tails of standard Gaussian distribution: For some c0 >0,

8y 2/3, P(" y) c0exp( y2/2)

y .

We also recall that the densityfnof a chi-squared distribution withn degrees of freedom has the form Combining the above remarks we get

+ (p s)

4.8. APPENDIX: PROOFS 91 Using the change of variable v = u⇣

1 + 4a22

⌘ and the assumptions of the theorem we get

where the second inequality uses the conditiona p

2 to guarantee that 23

Proposition 3.1 from Inglot (2010) implies that, for some absolute constantc >0, Z 1

n

fn 1(u)du > c

(indeed, n is very close to the median of a chi-squared random variable with n 1 degrees of freedom). Combining the above inequalities we obtain

+ C

Proof of Theorem 4.3.2. In view of Theorem 4.2.2, it is suﬃcient to bound from above the expression

(n, p, s, a, ) = (p s)P( " t(⇣)) +sP " (ak⇣k t(⇣))+ .

Introducing the event D={ak⇣k t(⇣)} we get

P " (ak⇣k t(⇣))+ P({ " ak⇣k t(⇣)}\D) + 1

2P(Dc). Using the assumption on n2 we obtain

P(Dc) =P

Here, k⇣k2 is a chi-squared random variable with n2 degrees of freedom. Lemma 4.4.2 implies

1

2P(Dc)e n242. Thus, to finish the proof it remains to show that

(p s)P( " t(⇣)) +sP({ " ak⇣k t(⇣)}\D)2p

where fn2(·) is the density of chi-squared distribution with n2 degrees of freedom and bn2 is the corresponding normalizing constant, cf. (4.31). Using again the bound P(" y)e y22, 8y >0,and the inequality

4.8. APPENDIX: PROOFS 93 Proof of Lemma 4.4.1. Recall that the density of a Student random variableZ with k degrees of freedom is given by:

fZ(t) =ck It is easy to check that the derivative ofg has the form

g0(t) =

The lemma follows since, in view of (4.33), there exist two positive constantsc and C such thatcck C for all k 1.

Proof of Lemma 4.5.1. It is not hard to check that the random variable |u>V|

kuk is -sub-Gaussian for any fixed u 2 Rn. Also, any -sub-Gaussian random ⇣ variable satisfies P(|⇣| t) 2e t

2

2 2 for all t > 0. Therefore, we have the following bound for the conditional probability:

To bound the last probability, we apply the following inequality (Wegkamp, 2003, Propo-sition 2.6).

Using this lemma with Zi =Ui2, µi ⌘1, x= 3/4, and v2 = 14 we find P kUk p

n/2 e

9n 32 41, which together with (4.34) proves the lemma.

Proof of Proposition 4.5.1. Under the assumptions of the proposition, the columns of matrix X have the covariance matrix Ip. Without loss of generality, we may assume that this covariance matrix is 12Ip and replace by p2. We next define the event

A={the design matrix X satisfies the W RE(s,20) condition},

where theW RE condition is defined in Bellec et al. (2018). It is easy to check that the assumptions of Theorem 8.3 in Bellec et al. (2018) are fulfilled, with⌃= 12Ip,= 12 and n1 C0slog(2p/s) for some C0 > 0 large enough. Using Theorem 8.3 in Bellec et al.

(2018) we get

P(Ac)3e C0slog 2p/s,

for some C0 >0. Now, in order to prove the proposition, we use the bound P ⇣

kˆ k2 2 2

P ⇣n

kˆ k2 2 2o

\A⌘

+P(Ac). Under the assumption n1 C0slog(ep/s)/ 2, we have

P ⇣n

kˆ k2 2 2o

\A⌘

P

✓⇢

kˆ k2 C0 2slogep/s

n1 \A

◆ . By choosingC0 large enough, and using Proposition 4 from Comminges et al. (2018) we get that, for some C00>0,

P ⇣n

kˆ k2 2 2o

\A⌘

C00

e slog(2p/s)/C00+e n1/C00⌘ .

Recalling that n1 C0slog(2p/s) and comibinig the above inequalities we obtain the result of the proposition with C1 = 2C00+ 3 and C2 =C0 ^1/C00^C0/C00.

Proof of Proposition 4.6.1. We apply Theorem 6 in Lecué and Lerasle (2017). Thus, it is enough to check that items 1-5 of Assumption 6 in Lecué and Lerasle (2017) are satisfied. Item 1 is immediate since |I| = n1 |O| n1/2, and |O|  c0slog(ep/s).

To check item 2, we first note that the random variable x>1t isktk X-sub-Gaussian for any t 2 Rp. It follows from the standard properties of sub-Gaussian random variables (Vershynin, 2012, Lemma 5.5) that, for some C > 0,

E|x>1t|d 1/d Cktkp

d, 8t 2Rp,8d 1.

On the other hand, since the elements ofx1 are centered random variables with variance 1,

E|x>1t|2 1/2 =ktk, 8t2Rp. (4.35) Combining the last two displays proves item 2. Item 3 holds since we assume that E(|⇠i|q0) q0, i2I, with q0 = 2 +q. To prove item 4, we use (4.35) and the fact that, for some C >0,

E|x>1t| Cktk, 8t 2Rp,

4.8. APPENDIX: PROOFS 95 due to Marcinkiewicz-Zygmund inequality (Petrov, 1995, page 82). Finally we have that, for somec >0,

Thus, all conditions of Theorem 6 in Lecué and Lerasle (2017) are satisfied. Application of this theorem yields the result.

Proof of Lemma 4.6.1. We first prove that for all i2I and 1j p, E ("(i)j )21{A} CK 2

n , (4.36)

where C > 0 depends only on the sub-Gaussian constant X. Indeed, the components of"(i) have the form C¯ depends only on the sub-Gaussian constant X. Using these remarks we obtain from the last display that

E ("(i)j )21{A}  2( ¯C+ 2) 2

q .

Asq =bn/Kc this yields (4.36).

Next, the definition of the median immediately implies that {|M ed("j)| t}✓

Since the number of outliers |O| does not exceed bK/4c there are at least K0 := K bK/4cblocks that contain only observations from I. Without loss of generality, assume that these blocks are indexed by 1, . . . , K0. Hence

P (|M ed("j)| t)P

✓XK0 i=1

1{|"(i)

j | t}\A

K 4

+P (Ac). (4.37) Note that using (4.36) we have, for all i= 1, . . . , K0,

P ⇣

{|"(i)j | t}\A

⌘E ("(i)j )21{A} /t2  CK 2 t2n  1

5.

The last inequality is granted by a choice of large enough constant c4 in the definition of t. Thus, introducing the notation ⇣i =1{|"(i)

j | t}\A we obtain P

✓XK0 i=1

1{|"(i)

j | t}\A

K 4

 P

✓XK0 i=1

(⇣i E (⇣i)) K 4

K0 5

 P

✓XK0 i=1

(⇣i E (⇣i)) K 20

e c5K (4.38) where the last inequality is an application of Hoeﬀding’s inequality. Combining (4.37) and (4.38) proves the lemma.

### Interplay of minimax estimation and minimax support recovery under sparsity

In this chapter, we study a new notion of scaled minimaxity for sparse estimation in high-dimensional linear regression model. We present more optimistic lower bounds than the one given by the classical minimax theory and hence improve on existing results. We recover sharp results for the global minimaxity as a consequence of our study. Fixing the scale of the signal-to-noise ratio, we prove that the estimation error can be much smaller than the global minimax error. We construct a new optimal estimator for the scaled minimax sparse estimation. An optimal adaptive procedure is also described.

Based on Ndaoud (2019): Ndaoud, M. (2019). Interplay of minimax estimation and minimax support recovery under sparsity. ALT 2019.

Dans le document The DART-Europe E-theses Portal (Page 101-110)