• Aucun résultat trouvé

Appendix: Main proofs

Dans le document The DART-Europe E-theses Portal (Page 60-73)

given in Theorem 3.5.2. The same modification can be done in Theorem 3.5.1. Namely, under the assumptions of Theorem 3.5.1 and ad c0ad, where c0 1 is a numerical constant, the selector ⌘ˆ defined by (3.8) with threshold t = ˆp

2 logd achieves exact recovery when is unknown.

Remark 3.5.3.In this section, the problem of adaptive variable selection was considered only for the classes ⇥d(sd, ad). The corresponding results for classes ⇥+d(sd, ad) and

d(sd, ad) are completely analogous. We do not state them here for the sake of brevity.

3.6 Appendix: Main proofs

Proof of Theorem 3.2.3. We have, for anyt >0,

|⌘ˆ ⌘| = X

j:⌘j=0

ˆ

j + X

j:⌘j=1

(1 ⌘ˆj)

= X

j:⌘j=0

I | ⇠j| t + X

j:⌘j=1

I | ⇠j +✓j|< t . Now, for any✓ 2⇥d(s, a) and any t >0,

E I | ⇠j +✓j|< t  P |✓j| | ⇠j|< t P |⇠|>(a t)/

= P |⇠|>(a t)+/ ,

where⇠ denotes a standard Gaussian random variable. Thus, for any ✓2⇥d(s, a), 1

sE|⌘ˆ ⌘| d |S|

s P

|⇠| t◆ +|S|

s P

|⇠|> (a t)+

2 (d, s, a). (3.48) Indeed, for t defined in (3.5), t (a t)+ given that s d/2. Here and in the sequel,

|S| denotes the cardinality of S=S(✓).

Proof of Theorem 3.2.1. Arguing as in the proof of Theorem 3.2.3, we obtain ˆ

+ ⌘ = X

j:⌘j=0

I(⇠j t) + X

j:⌘j=1

I( ⇠j+✓j < t), and E(I( ⇠j +✓j < t))P(⇠ <(t a)/ ). Thus, for any ✓2⇥+d(s, a),

1

sE ⌘ˆ+ ⌘  d |S|

s P(⇠ t/ ) + |S|

s P ⇠<(t a)/  +(d, s, a), by the monotonicity of and the conditions d/2.

Proof of Theorem 3.2.2. We prove here the first inequality of Theorem 3.2.2. Since ⌘˜j depends only onXj,

E|⌘˜ ⌘|= Xd

j=1

Ej,✓j|⌘˜jj|, (3.49) whereEj,✓j is the expectation with respect to the distribution of Xj.

Let ⇥0 be the set of all ✓ in ⇥+d(s, a) such that s components ✓j of ✓ are equal to a display (3.50), Eu is understood as the expectation with respect to the distribution of X =u+ ⇠, where ⇠ ⇠N(0,1) and infT2[0,1] denotes the infimum over all [0,1]-valued

By the Bayesian version of the Neyman–Pearson lemma, the infimum here is attained for T =T given by Combining this with (3.49) and (3.50), we get

inf˜ sup

3.6. APPENDIX: MAIN PROOFS 49

wherePj,✓j denotes the distribution ofXj, and⇠is a standard Gaussian random variable.

We now bound from above the probabilitiesPj,✓j(¯⌘j = 0). Introduce the notation g(x) = cosh x < ⇠ < b x) is monotonically decreasing on [0,1). Therefore, the maximum of P( b ✓j < ⇠ < b ✓j) over ✓j a is attained at ✓j = a. Thus, for any ✓j a we

where the last equality follows from the fact that⇠ has the same distribution as ⇠ and cosh is an even function. Combining (3.51)–(3.53) proves the theorem.

Proof of Theorem 3.2.5. We follow the lines of the proof of Theorem 3.2.2 with suitable modifications.

As shown in the proof of Theorem 3.2.2, for any ⌘˜2T, From the last three displays, we obtain

sup Here, E0 denotes the expectation with respect to the distribution of X with density ' (·), E¯ is the expectation with respect to the distribution of X with mixture density

¯

By the Bayesian version of the Neyman–Pearson lemma, the infimum here is attained for T = ˜T given by

where Pu denotes the probability distribution of X with density ' (· u). Note that, for all x2R,

3.6. APPENDIX: MAIN PROOFS 51 Using this formula with x = ⇠ +a and x = ⇠ a, and the facts that cosh(·) is an even function and⇠ coincides with ⇠ in distribution, we obtain

Pa

Proof of Theorem 3.2.6. The upper bounds (3.14), (3.15) and (3.16) follow immediately from (3.2) and Theorems 3.2.1, 3.2.4 and 3.2.3, respectively. We now prove the lower bound (3.17). To this end, first note that for any✓ 2⇥+d(s, a)and any ⌘˜2T we have

where Pj,u denotes the distribution of Xj when ✓j = u. We now bound the right-hand side of (3.58) by following the argument from the last three lines of (3.50) to the end of the proof of Theorem 3.2.2. Applying this argument yields that, for any

˜

We now prove the lower bound (3.18). Let the sets ⇥+ and ⇥ and the constants pj(✓) be the same as in the proof of Theorem 3.2.5. Then

We continue along the same lines as in the proof of (3.58) to get, for any separable selector⌘˜,

3.6. APPENDIX: MAIN PROOFS 53 where again E¯j denotes the expected value with respect to P¯j = 12(Pj,a+Pj, a). Anal-ogously to the proof of Theorem 3.2.5, the expression in the last display can be further bounded from below by pd¯ L˜ = ¯ps (d, s, a). Thus,¯

Proof of Theorem 3.4.2. (i) It follows from the second inequality in (3.48) that sup Combining this with (3.60), we get

sup Now, to prove (3.33) it remains to note that under assumption (3.32),

a t function a 7! (a2 2 2log((d s)/s))/a is monotonically increasing in a > 0. On the other hand,

a20 2 2log (d s)/s /(2a0 ) = . (3.61) (ii) We now prove (3.36). By Theorem 3.2.2,

inf˜ sup

Observe that the function a 7!(2 2log((d s)/s) a2)/a is monotonically decreasing ina >0and that assumption (3.35) states thataa0. In view of (3.61), the value of its minimum foraa0 is equal to . The bound (3.36) now follows by the monotonicity of (·).

Proof of Theorem 3.4.3. Assume without loss of generality that d is large enough to have(d sd)/sd>1. We apply Theorem 3.4.2 with W =Ap

Equivalently, d/sd 1 1 +⌫ and, therefore, using the monotonicity argument, we find

2 A2p

2 log(1 +⌫)

p2 log(1 +⌫) +A ! 1 asA! 1. This and (3.33) imply part (i) of the theorem.

Part (ii) follows from (3.36) by noticing that 2  supx>0 4(x+A)A2x = A2/4 for any fixedA >0. Now, for s large enough, let us puts0 = (1 ")s for some" in (0,1), fixed.

Thus, the lower bound of the risk becomes

(1 ") ( ) 4 exp⇣ s

2(1 ")2

>0, for s large enough.

Proof of Theorem 3.4.4. Throughout the proof, we assume without loss of generality that d is large enough to have sd 2, and (d sd)/sd > 1. Set W(s) , 4(logs+ the quantity defined in (3.34) with respect to W, this implies

d, Wd

Now, by Theorem 3.4.2 and using (3.31) we may write sup

3.6. APPENDIX: MAIN PROOFS 55 This and (3.64) imply that, for all d large enough,

sup

We now prove part (ii) of the theorem. It suffices to consider Wd > 0 for all d large enough since for nonpositive Wd almost full recovery is impossible and the result follows from part (ii) of Theorem 3.4.3. If (3.42) holds, there exists A < 1 such that Wd  AW(sd) for all d large enough. By the monotonicity of the quantity defined in (3.34) with respect toW and in view of equation (3.62), this implies

2

where we have used the fact that A < 1 and equations (3.62), (3.63). Next, by Theo-rem 3.4.2 and using (3.31), we have fors0 =sd/2, Combining this inequality with (3.66), we find that, for alld large enough,

inf˜ sup

This proves part (ii) of the theorem.

Proof of Theorem 3.5.1. By (3.48), for any✓ 2⇥d(sd, ad), and any t >0 we have E|⌘ˆ ⌘|dP |⇠| t/ +sdP |⇠|>(ad t)+/ ,

where ⇠ is a standard normal random variable. It follows that, for any ad ad, any

✓ 2⇥d(sd, ad), and any t >0,

E|⌘ˆ ⌘|dP |⇠| t/ +sdP |⇠|> ad t +/ .

Without loss of generality assume that d 6 and 2  sd  d/2. Then, using the inequalityp

x py(x y)/p2y, 8x > y > 0, we find that, fort= p

2 logd, ad t +/ p

2 p

log(d sd) p

logd+p

log(sd) p2 log(sd) log

✓ d d sd

◆ /p

log(d sd) p2 log(sd) (log 2)/p

log(d/2)>0.

From this we also easily deduce that, for 2  sd  d/2, we have ((ad t)+/ )2/2 log(sd) p

2 log 2. Combining these remarks with (3.31) and (3.43), we find

sup

2d(sd,ad)

E|⌘ˆ ⌘|  1

p2 logd +sdexp( log(sd) +p

2 log 2) p2 log(sd) , which immediately implies the theorem by taking the limit as d! 1.

Proof of Theorem 3.5.2. Throughout the proof, we will write for brevity sd = s, ad = a, Ad =A, and set = 1. Since⇥d(s, a)✓⇥d(s, a0(s, A))for alla a0(s, A), it suffices to prove that

d!1lim sup

✓2⇥d(s,a0(s,A))

1

sE ⌘ˆad ⌘ = 0. (3.67)

Here, s sd and recall that throughout this section we assume that sdd/4; since we deal with asymptotics as d/sd! 1, the latter assumption is without loss of generality in the current proof.

Ifs < gM, letm0 2{2, . . . , M}be the index such thatgm0 is the minimal element of the grid, which is greater than the true underlying s. Thus, gm0/2 =gm0 1 s < gm0. If s2[gM, sd], we set m0 =M. In both cases,

s gm0/2. (3.68)

We decompose the risk as follows:

1

sE ⌘ˆad ⌘ =I1+I2, where

I1 = 1

sE ⌘(gˆ mˆ) ⌘ I( ˆmm0) , I2 = 1

sE ⌘(gˆ mˆ) ⌘ I( ˆm m0+ 1) .

3.6. APPENDIX: MAIN PROOFS 57

Next, note that the first inequality in (3.48) is true for any t > 0. Applying it with t=w(gm0), we obtain

where⇠ is a standard Gaussian random variable. Using the bound on the Gaussian tail probability and the fact thatgm0 > s gm0/2, we get

To bound the second probability on the right-hand side of (3.70), we use the following lemma.

Lemma 3.6.1. Under the assumptions of Theorem 3.5.2, for any m m0 we have P |⇠|> a0(s, A) w(gm) +  log d/sd 1 12. (3.72) Combining (3.70), (3.71) and (3.72) withm =m0, we find

1

sE ⌘(gˆ m0) ⌘  4⇡ 1/2+ 1

plog(d/sd 1), (3.73)

which together with (3.69) leads to the bound

I1  4⌧ + 4⇡ 1/2+ 1

plog(d/sd 1). (3.74)

We now turn to the evaluation ofI2. It is enough to consider the case m0 M 1since I2 = 0 when m0 =M. We have

I2 = 1 s

XM m=m0+1

E ⌘(gˆ mˆ) ⌘ I( ˆm =m)

 1 s

XM m=m0+1

E ⌘(gˆ m) ⌘ 2 1/2 P( ˆm=m) 1/2.

(3.75)

By definition, the event {mˆ = m} occurs implies that Pd

j=1I(wm  |Xj| < wm 1) >

⌧gm ,vm, where we set for brevity wm =w(gm). Thus, P( ˆm=m)  P

Xd j=1

I wm |Xj|< wm 1 > vm

!

. (3.76)

By Bernstein’s inequality, for any t >0 we have P

Xd j=1

I wm |Xj|< wm 1 E

Xd j=1

I wm |Xj|< wm 1

!

> t

!

exp

✓ t2/2

Pd

j=1E(I(wm |Xj|< wm 1)) + 2t/3

◆ ,

(3.77)

where we have used that, for random variables with values in {0,1}, the variance is smaller than the expectation.

Now, similar to (3.48), for any ✓ 2⇥d(s, a0(s, A)), E

Xd j=1

I wm |Xj|< wm 1

!

dP wm |⇠|< wm 1 + X

j:✓j6=0

P |✓j+⇠|< wm 1

dP |⇠| wm +sP |⇠|> a0(s, A) wm 1 + ,

where⇠ is a standard Gaussian random variable. Sincem m0+ 1, from Lemma 3.6.1 we get

P |⇠|> a0(s, A) wm 1 +  log d/sd 1 12. (3.78) Next, using the bound on the Gaussian tail probability and the inequalities gm sd  d/4, we find

dP |⇠| wm  d d/gm 1

1/2

plog(d/gm 1)  (4/3)⇡ 1/2gm

plog(d/sd 1). (3.79)

3.6. APPENDIX: MAIN PROOFS 59 We now deduce from (3.78) and (3.79), and the inequalitysgm for m m0+ 1, that

E

Xd j=1

I wm |Xj|< wm 1

!

 ((4/3)⇡ 1/2+ 1)gm

plog(d/sd 1) 2⌧gm. (3.80) Taking in (3.77)t = 3⌧gm = 3vm and using (3.80), we find

P

Xd j=1

I wm |Xj|< wm 1 > vm

!

exp( C1vm) = exp C12m⌧ ,

for some absolute constantC1 >0. This implies

P( ˆm =m)exp C12m⌧ . (3.81) On the other hand, notice that the bounds (3.70), and (3.71) are valid not only for gm0 but also for any gm with m m0+ 1. Using this observation and Lemma 3.6.1 we get that, for any ✓2⇥d(s, a0(s, A))and any m m0+ 1,

E ⌘(gˆ m) ⌘ s

 d/s d/gm 1

1/2

plog(d/gm 1)+ log d/sd 1 12

 ((4/3)⇡ 1/2+ 1)gm

plog(d/sd 1) ,⌧0gm =⌧02m,

(3.82)

where the last inequality follows from the same argument as in (3.79). We denote by V ar |⌘(gˆ m) ⌘ the variance of |⌘(gˆ m) ⌘ . Observing that|⌘(gˆ m) ⌘ is a sum of independent Bernoulli random variables, we get

E ⌘(gˆ m) ⌘ 2 =V ar |⌘(gˆ m) ⌘ + E ⌘(gˆ m) ⌘ 2

E ⌘(gˆ m) ⌘ + E ⌘(gˆ m) ⌘ 2. Using (3.82) and the fact that ⌧0 is bounded, we get that

E ⌘(gˆ m) ⌘ 2 C2022m, (3.83) for some absolute constantC2 >0.

Now, we plug (3.81) and (3.83) in (3.75) to obtain

I2  (C20)1/2 s

XM m=m0+1

2mexp C12m 1

C30 1/21exp C12m0 1⌧ C30 1/21

for some absolute constant C3 > 0. Notice that (⌧0)1/2 = O((log(d/sd 1)) 14) as d/sd ! 1 while ⌧ 1 = O((log(d/sd 1))17). Thus, I2 = o(1) as d ! 1. Since from (3.74) we also get that I1 =o(1) asd! 1, the proof is complete.

Proof of Lemma 3.6.1. Let first s < gM. Then, by definition of m0, we have s < gm0. Therefore,s < gm for m m0, and we have w(gm)< w(s). It follows that

a0(s, A) w(gm) a0(s, A) w(s)

pA 2p

2min

✓p pA

2,log1/4(d/s 1)

◆ , where we have used the elementary inequalities

px+y p

x y/(2p

x+y) (2p

2) 1min(y/p x,py) with x = 2 log(d/s 1) and y = Ap

log(d/s 1). By assumption, A 16p

log log(d/sd 1), so that we get

a0(s, A) w(gm) a0(s, A) w(s) 4

✓ log log

✓d sd 1

◆◆1/2

. (3.84) This and the standard bound on the Gaussian tail probability imply

P |⇠|> a0(s, A) w(gm) + exp a0(s, A) w(gm) 2/2

 log d/sd 1 12. (3.85) Let now s2 [gM, sd]. Thenm0 =M and we need to prove the result only for m =M. By definition of M, we have sd2gM. This and (3.84) imply

a0(s, A) w(gM) a0(s, A) w(s) w sd/2 w sd 4

✓ log log

✓d sd 1

◆◆1/2

w sd/2 w sd .

Now, using the elementary inequality p

log(x+y) p

log(x)  y/(2xp

log(x)) with x=d/sd 1 and y=d/sd, and the fact that sdd/4 we find

w sd/2 w sd  1

p2 log(d/sd 1) d

d sd  2p 2 3p

log(d/sd 1)

Dans le document The DART-Europe E-theses Portal (Page 60-73)