• Aucun résultat trouvé

Asymptotic analysis: almost full recovery

Dans le document The DART-Europe E-theses Portal (Page 146-159)

p =o(p), we proved that for any ✏>0, the condition

2 (1 ✏) 2

✓2p p

1/2

logn

is necessary to achieve exact recovery. This is a consequence of considering a Gaussian prior on ✓ which makes recovering its direction the hardest. We give here a heuristics that this should hold independently on the choice of prior as long as ✓ is uniformly well-spread (i.e., not sparse). Suppose that we put a Rademacher prior on✓ such that

✓= pp⇣, where⇣is a random vector with i.i.d. Rademacher entries. Following the same argument as in Proposition 6.2.1, it is clear that a necessary condition to get non-trivial correlation with⇣ is given by

2 c 2p n,

for somec >0. Observing that, in the hard estimation regime, we have

✓p p

1/2

logn=o⇣p n

⌘,

it comes that, while exact recovery of ⌘ is possible, non-trivial correlation with ⇣ is impossible. Consequently, there is no hope achieving exact recovery through non-trivial correlation with✓ in the hard estimation regime.

Conjecture 6.6.1. Let >0. Assume that Y follows model (6.1). Let ⌘ be a random vector with i.i.d. Rademacher random entries, and ✓= pp⇣ where ⇣ is a random vector with i.i.d. Rademacher entries and independent of⌘. Assume thatnlogn =o(p). Prove or disprove that, for any✏>0,

2 (1 ✏) 2

r2plogn n is necessary to achieve exact recovery.

In particular, a positive answer to the previous question will be very useful to derive optimal conditions for exact recovery in bipartite graph models among other problems.

6.7 Asymptotic analysis: almost full recovery

In this section, we conduct the asymptotic analysis of the problem of almost full recovery in the two component Gaussian mixture model. We first recall the terminology used in Butucea et al. (2018) that we adopt for the problem of almost full recovery.

Definition 6.7.1. Let (⌦ n)n 2 be a sequence of classes corresponding to ( n)n 2:

• We say that almost full recovery is possible for (⌦ n)n 2 if there exists an esti-mator ⌘ˆ such that

nlim!1 sup

(✓,⌘)2 n

1

nE(✓,⌘)r(ˆ⌘,⌘) = 0. (6.17)

In this case, we say that ⌘ˆachieves almost full recovery.

• We say that almost full recovery is impossible for (⌦ n)n 2 if lim inf

n!1 inf

˜

sup

(✓,⌘)2 n

1

nE(✓,⌘)r(˜⌘,⌘)>0, (6.18) where inf˜ denotes the infimum over all estimators in { 1,1}n.

The following general characterization theorem is a straightforward corollary of the results of previous sections.

Theorem 6.7.1. (i) Almost full recovery is possible for (⌦ n)n 2 if and only if

c(rn)!0 as n ! 1. (6.19)

In this case, the estimator⌘kdefined in (6.12)-(6.13), withk=b3 lognc, achieves almost full recovery.

(ii) Exact recovery is impossible for (⌦ n)n 2 if for some ✏>0 lim inf

n!1 n c(rn(1 +✏))>0 as n ! 1, (6.20) and possible if for some ✏>0

n c(rn(1 ✏))!0 as n ! 1, (6.21)

In this case, the estimator ⌘k defined in (6.12)-(6.13), with k =b3 lognc, achieves exact recovery.

Although this theorem gives a complete solution to the problem of almost full and exact recovery, conditions (6.19), (6.20) and (6.21) are not quite explicit. The next theorem is a consequence of Theorem 6.7.1. It describes a “phase transition” for n in the problem of almost full recovery.

Theorem 6.7.2. (i) If 2⇣ 1 +p

p/n⌘

= o( 2n). Then, the estimator ⌘k defined in (6.12)-(6.13), with k=b3 lognc, achieves almost full recovery.

(ii) Moreover, if 2n =O⇣

2(1 +p p/n)⌘

. Then, almost full recovery is impossible.

Theorem 6.7.2 shows that almost full recovery occurs if and only if

2⇣ 1 +p

p/n⌘

=o( 2n). (6.22)

6.8. APPENDIX: MAIN PROOFS 135

6.8 Appendix: Main proofs

In all the proofs of lower bounds, we follow the same argument as in Theorem2 in Gao et al. (2018) in order to substitute the minimax risk ofr(˜⌘,⌘) by a Hamming minimax risk. Let z be a vector of labels in { 1,1}n and letT be a subset of{1, . . . , n} of size bn/2c+1. A lower bound of the minimax risk is given on the subset of labels Z, such that for all i2T,we have ⌘i =zi. Observe that in that case

r(⌘1,⌘2) = |⌘12|,

for any⌘1,⌘2 2Z. The argument in Gao et al. (2018), leads to c

|Tc| X

i2Tc

inf˜i EE(✓,⌘)|⌘˜ii|,

for some c > 0 and for any prior ⇡ such that ⇡ is invariant by a sign change. That is typically the case under Rademacher prior on labels. As a consequence, a lower bound of is given by a lower bound of the R.H.S minimax Hamming risk.

Proof of Proposition 6.2.1

Let✓¯ be a vector in Rp such thatk✓¯k= . Placing an independent Rademacher prior

⇡ on⌘, and fixing ✓, it follows that

inf˜j EE✓,⌘)|⌘˜jj| inf

¯

j EE✓,⌘)|⌘¯j(Yj) ⌘j|, (6.23) where ⌘¯j 2 [ 1,1]. The last inequality holds because of independence between the priors. We define, for ✏ 2{ 1,1}, f˜(.) the density of the observation Yj conditionally on the value of⌘j =✏. Now, using Neyman-Pearson lemma and the explicit form of f˜, we get that the selector⌘ given by

j =sign⇣

¯✓>Yj

, 8j = 1, . . . , n

is the optimal selector that achieves the minimum of the RHS of (6.23). Plugging this value in (6.23), we get further that

inf¯j

E|⌘¯j(Yj) ⌘j|= 2 c( / ).

Proof of Theorem 6.2.1

Throughout the proof, we write for brevity A = ⌦ . Set ⌘A = ⌘1((✓,⌘) 2 A) and denote by⇡¯A the probability measure ⇡ conditioned by the event {(✓,⌘)2A}, that is, for anyC ✓Rp⇥{ 1,1}n,

¯

A(C) = ⇡({(✓,⌘)2C}\{(✓,⌘)2A})

⇡((✓,⌘)2A) .

The measure⇡¯A is supported on A and we have inf˜j E¯AE(✓,⌘)|⌘˜jj| inf

˜

j E¯AE(✓,⌘)|⌘˜jAj| infTˆj

E¯AE(✓,⌘)|TˆjjA|

where infTˆj is the infimum over all estimators Tˆj = ˆTj(Y) with values in R. According to Theorem 1.1 and Corollary 1.2 on page 228 in Lehmann and Casella (2006), there exists a Bayes estimator BjA=BjA(Y) such that

infTˆj

E¯AE(✓,⌘)|TˆjjA|=E¯AE(✓,⌘)|BjAAj |, and this estimator is a conditional median of ⌘Aj given Y. Therefore,

c 0

@ 1 bn/2c

bn/2cX

j=1

E¯AE(✓,⌘)|BjAjA| 1

A. (6.24)

Note thatBjA2[ 1,1] since ⌘Aj takes its values in [ 1,1]. Using this, we obtain

ˆ inf

Tj2[ 1,1]EE(✓,⌘)|Tˆjj|EE(✓,⌘)|BAjj|

=EE(✓,⌘)

⇣|BjAj|1((✓,⌘)2A)⌘

+EE(✓,⌘)

⇣|BjAj|1((✓,⌘)2Ac)⌘

E¯AE(✓,⌘)|BjAjA|+EE(✓,⌘)

⇣|BjAj|1((✓,⌘)2Ac)⌘

E¯AE(✓,⌘)|BjAjA|+ 2P((✓,⌘)62A). (6.25) The result follows combining (6.24) and (6.25).

Proof of Proposition 6.2.2

We start by using the fact that

EE(✓,⌘)|⌘ˆii|=Ep iEpiE(✓,⌘)(|⌘ˆii||(⌘j)j6=i),

where pi is the marginal of ⇡ on (✓,⌘i), while p i is the marginal of ⇡ on (⌘j)j6=i. Using the independence between different priors, one may observe that ⇡ =pi ⇥p i. We define, for ✏ 2{ 1,1}, f˜i the density of the observation Y given (⌘j)j6=i and given

i =✏. Using Neyman-Pearson lemma, we get that

i⇤⇤ =

⇢ 1 if f˜1i(Y) f˜i1(Y), 1 else,

minimizes EpiE(✓,⌘)(|⌘ˆii||(⌘j)j6=i) over all functions of (⌘j)j6=i and of Y with values in[ 1,1]. Using the independence of the rows ofY we have

i(Y) = Yp j=1

e 12L>j1Lj (2⇡)p/2|⌃|,

where Lj is the j-th row of Y and ⌃ = In +↵2>. We denote by ⌘ the binary vector such that ⌘i =✏ and the other components are known. It is easy to check that

|⌃|= 1 +↵2n, hence it does not depend on ✏. A simple calculation leads to

1 =In

2

1 +↵2n⌘>.

6.8. APPENDIX: MAIN PROOFS 137 It is now immediate that

i⇤⇤=sign Yi> X

Combining Theorem 6.2.1 and Proposition 6.2.2, we get that c

Recall that here ✓ has i.i.d. centered Gaussian entries with variance ↵2. This yields the second term on the R.H.S of the inequality of Proposition 6.2.3. While, for the first term, one may notice that the vectors⌘iYi for i= 1, . . . , n are i.i.d. and that

Then, we use the definition of G (6.9) in order to conclude.

Proof of Theorem 6.2.2

We prove the result by considering separately the following three cases.

1. Case  logp2n(n). In this case we use Proposition 6.2.1.

Hence Proposition 6.2.3. Set↵2 such that

2 =

where ⇠1,⇠2 are two independent random vectors with i.i.d. standard Gaussian entries and ✓ is an independent Gaussian prior. Moreover, using independence, we have

where " is a standard Gaussian random variable. Fix ✓ and define the random event

6.8. APPENDIX: MAIN PROOFS 139 Using the definition of ↵2 we get

P(Bc)P by combining (6.27) and (6.28).

Proof of Theorem 6.3.1

We begin by writing that

1 Based on Lemma 6.9.2, we have

kZ2kop 4 1

Using the Davis-Kahansin✓Theorem cf .Theorem 4.5.5in Vershynin (2018), we obtain

Hence, using Lemma 6.9.5, we get 1

For the inequality in probability, we first observe, using (6.31), that 1

n|⌘>0| 1 8kZ2k2op

k✓k4 .

Next, and since rn C for some C large enough, observe that P(✓,⌘)

Using Lemma 6.9.3 and Lemma 6.9.4, we get P(✓,⌘) Using the tail Gaussian function, we conclude easily that

e cplognr2n =o( c(rn)).

6.8. APPENDIX: MAIN PROOFS 141

where we use the same notation of the previous proof andc0 a positive constant that we may choose large enough.

We first prove, by induction, that on the event B\C, we have 1 A simple calculation leads to

1

nH(Y>Y)>i ⌘ˆk= (Z2)>i (ˆ⌘k ⌘) + 1

nH(Y>Y)>i ⌘ k✓k2i

n ⌘>⌘ˆk

n .

Hence if ⌘i = 1 and if Ai is true, then using the induction hypothesis we get 1

Using the induction hypothesis and the events B and C, we get Hence, and using (6.32), we obtain

1 The term P(Bc) is upper bounded exactly as in the previous proof and we have

P(Bc) =o( c(rn)).

For the other term observe that

P(Aci) =G

6.8. APPENDIX: MAIN PROOFS 143

Proof of Theorem 6.4.2

Combining Theorem 6.3.1 and Theorem 6.4.1, it is enough to prove that r2n sup

where ⇠1,⇠2 are two independent Gaussian random vector with i.i.d. standard entries and ✓ and independent Gaussian prior. Moreover, using independence, we have

G

where" is a standard Gaussian random variable. Set the random event

A=

⇢k⇠2k2

n 1  p

n 1 +⇣nk✓k2 \ |✓>2|p

n 1 nk✓k2 , where⇣n and n are positive sequences. It is easy to check that

P(Ac)e ckk4n2n2/p+e c n2nkk2 +e c⇣nnkk2,

Moreover and since ⇣n and n are vanishing sequences as n ! 1, we get that P

x+pn is non-decreasing on R+ and the fact that for C < x < y, we have x2 c(y)  c1 c(y c2logx), for some c1, c2 >0.

Proof of Proposition 6.4.1

Set n large enough. According to Theorem 6.2.2, we have 1

2

c(2rn). (6.33)

For the upper bound. Ifrn is larger than2C, then using Theorem 6.4.2, we get

C0 c(⇣rn

4

⌘, (6.34)

for some C0 >0. Observe that for rn 2C, we have c1c(rn), for some c1 >0. Hence, forrn2C, we get

c(rn) c1

. (6.35)

We conclude combining (6.34), (6.33) and (6.35).

Proof of Theorem 6.7.1

• Necessary conditions:

According to Theorem 6.2.2, we have

(1 ✏n) c(rn(1 +✏n)), for some ✏n=o(1). If for some ✏>0,

lim inf

n!1 n c(rn(1 +✏))>0,

then using the monotonicity of c(, we conclude that exact recovery is impossible.

For Almost full recovery, assume that c(rn) does not converge to 0, and that almost full recovery is possible. Then using continuity and monotonicity of c(, we get that rn(1 +✏n) ! 1. Hence rn ! 1 and c(rn) ! 0 which is absurd.

That proves that the condition c(rn) ! 0 is necessary to achieve almost full recovery.

• Sufficient conditions:

According to Theorem 6.4.2, we have that, under the condition rn > C for some C >0, the estimator ⌘ˆk defined in the Theorem satisfies

sup

(✓,⌘)2

1

nE(✓,⌘)r(⌘k,⌘)C0 c(

✓ rn

1 ✏n C0logrn

rn

◆◆

,

for some sequence ✏n such that ✏n = o(1). If c(rn) ! 0, then rn ! 1. Hence for any ✏ > 0, rn(1 ✏) ! 1. It follows that rn

⇣1 ✏n C0logrn

rn

⌘ ! 1. We

6.8. APPENDIX: MAIN PROOFS 145 conclude that almost full recovery is possible under the condition c(rn)!0, and

ˆ

k achieves almost full recovery in that case.

For exact recovery, observe that, if

n c(rn(1 ✏))!0,

for some ✏>0, thenrn! 1. It follows that for n large enough rn

✓ 1 ✏n

C0logrn

rn

rn(1 ✏).

We conclude by taking the limit that ⌘ˆk achieves exact recovery in that case, and that exact recovery is possible.

Proof of Theorem 6.7.2 and 6.5.1

By inverting the function x! px

x+np, we observe that for any A >0,

r2n A , 2n A1 +q 1 + nA4p

2 .

Using Theorem 6.7.1 and the Gaussian tail function, we get immediately the results for both almost full recovery and exact recovery.

Dans le document The DART-Europe E-theses Portal (Page 146-159)