• Aucun résultat trouvé

Estimation of the norm

Dans le document The DART-Europe E-theses Portal (Page 169-173)

In this section, we consider the problem of estimation of the`2-norm of a sparse vector when the variance of the noise and the form of its distribution are both unknown. We show that the rates exp(s, d)and pol(s, d)are optimal in a minimax sense on the classes Ga,⌧ andPa,⌧, respectively. We first provide a lower bound on the risks of any estimators of the`2-norm when the noise level is unknown and the unknown noise distributionP

belongs either to Ga,⌧ orPa,⌧. We denote by L the set of all monotone non-decreasing functions `: [0,1)![0,1)such that `(0) = 0 and `6⌘0.

Theorem 7.3.1. Let s, d be integers satisfying 1sd. Let `(·) be any loss function in the class L. Then, for any a >0,⌧ >0,

infTˆ sup

P2Ga,⌧

sup

>0 sup

kk0s

E✓,P, `⇣

c( exp(s, d)) 1 Tˆ k✓k2

c0, (7.8)

7.3. ESTIMATION OF THE NORM 157 and, for any a 2,⌧ >0,

infTˆ

sup

P2Pa,⌧

sup

>0 sup

k✓k0s

E✓,P, `⇣

¯

c( pol(s, d)) 1 Tˆ k✓k2

¯

c0. (7.9) Here,infTˆ denotes the infimum over all estimators, and c,¯c >0, c0,¯c0 >0 are constants that can depend only on `(·), ⌧ and a.

The lower bound (7.9) implies that the rate of estimation of the`2-norm of a sparse vector deteriorates dramatically if the bounded moment assumption is imposed on the noise instead, for example, of the sub-Gaussian assumption.

Note also that (7.8) and (7.9) immediately imply lower bounds with the same rates

exp and pol for the estimation of the s-sparse vector ✓ under the `2-norm.

Given the upper bounds of Theorem 7.2.1, the lower bounds (7.8) and (7.9) are tight for the quadratic loss, and are achieved by the following plug-in estimator independent ofs or :

Nˆ =k✓ˆk2 (7.10)

where✓ˆ is defined in (7.5).

In conclusion, when both P and are unknown the rates exp and pol defined in (7.7) are minimax optimal both for estimation of✓ and of the the norm k✓k2.

We now compare these results with the findings in Collier et al. (2017) regarding the (nonadaptive) estimation of k✓k2 when ⇠i have the standard Gaussian distribution (P=N(0,1)) and is known. It is shown in Collier et al. (2017) that in this case the optimal rate of estimation of k✓k2 has the form

N(0,1)(s, d) = min

⇢q

slog(1 +p

d/s), d1/4 . Namely, the following proposition holds.

Proposition 7.3.1 (Gaussian noise, known Collier et al. (2017)). For any >0and any integerss, d satisfying 1sd, we have

c 2 2N(0,1)(s, d)inf

Tˆ sup

kk0s

E✓,N(0,1), Tˆ k✓k2

2 C 2 2N(0,1)(s, d),

where c > 0 and C > 0 are absolute constants and infTˆ denotes the infimum over all estimators.

We have seen that, in contrast to this result, in the case of unknown P and the optimal rates (7.7) do not exhibit an elbow at s = p

d between the "sparse" and

"dense" regimes. Another conclusion is that, in the "dense" zone s > p

d, adapta-tion to P and is only possible with a significant deterioration of the rate. On the other hand, for the sub-Gaussian class G2,⌧, in the "sparse" zone s  p

d the non-adaptive rateq

slog(1 +p

d/s)differs only slightly from the adaptive sub-Gaussian rate pslog(ed/s); in fact, this difference in the rate appears only in a vicinity of s=p

d. A natural question is whether such a deterioration of the rate is caused by the ignorance of or by the ignorance of the distribution of ⇠i within the sub-Gaussian

class G2,⌧. The answer is that both are responsible. It turns out that if only one of the two ingredients ( or the noise distribution) is unknown, then a rate faster than the adaptive sub-Gaussian rate exp(s, d) =p

slog(ed/s) can be achieved. This is detailed in the next two propositions.

Consider first the case of Gaussian noise and unknown . Set

N(0,1)(s, d) = max defined by (7.15), cf. Section 7.4 below, and let ˆ2med,1med,22 be the median estimators (7.12) corresponding to the samples (Yi)i2I1 and (Yi)i2I2, respectively. Consider the estimator also that the estimatorNˆ depends on the preliminary estimator ˜2 since ˆ>0defined in (7.15) depends on it.

Proposition 7.3.2 (Gaussian noise, unknown ). The following two properties hold.

(i) Let s and d be integers satisfying 1  s < b dc/4, where 2 (0,1/2] is the tuning parameter in the definition of ˜2. There exist absolute constants C > 0 and 2(0,1/2] such that

whereinfTˆ denotes the infimum over all estimators, andc >0,c0 >0are constants that can depend only on `(·).

Proposition 7.3.2 establishes the minimax optimality of the rate N(0,1)(s, d). It also shows that if is unknown, the knowledge of the Gaussian character of the noise leads to an improvement of the rate compared to the adaptive sub-Gaussian rate p

slog(ed/s). However, the improvement is only in a logarithmic factor.

Consider now the case of unknown noise distribution inGa,⌧ and known . We show in the next proposition that in this case the minimax rate is of the form

exp(s, d) = min{p

slog1a(ed/s), d1/4}

7.3. ESTIMATION OF THE NORM 159 and it is achieved by the estimator

exp = 8<

:

k✓ˆk2 if s pd

log2a(ed), Pd

j=1Yj2 d 2 1/2 if s > pd

loga2(ed),

where✓ˆ is defined in (7.5). Note exp(s, d) can be written equivalently (up to absolute constants) as min{p

slog1a(ed), d1/4}.

Proposition 7.3.3 (Unknown noise inGa,⌧, known ). Let a,⌧ >0. The following two properties hold.

(i) Let s and d be integers satisfying 1  s < b dc/4, where 2 (0,1/2] is the tuning parameter in the definition of ˜2. There exist constants c, C > 0, and 2(0,1/2]depending only on (a,⌧) such that if✓ˆ is the estimator defined in (7.5) with j =clog1a(ed/j) , j = 1, . . . , d, then

sup

P2Ga,⌧

sup

kk0s

E✓,P,

⇣Nˆexp k✓k22

C 2 exp(s, d) 2.

(ii) Lets and dbe integers satisfying 1sd and let `(·)be any loss function in the class L. Then, there exist constants c >0, c0 >0 depending only on `(·), a and ⌧ such that

infTˆ sup

P2Ga,⌧

sup

kk0s

E✓,P, `

c( exp(s, d)) 1 Tˆ k✓k2 ◆ c0, where infTˆ denotes the infimum over all estimators.

Proposition 7.3.3 establishes the minimax optimality of the rate exp(s, d). It also shows that if the noise distribution is unknown and belongs to Ga,⌧, the knowledge of leads to an improvement of the rate compared to the case when is unknown.

In contrast to the case of Proposition 7.3.2 (Gaussian noise), the improvement here is substantial; it results not only in a logarithmic but in a polynomial factor in the dense zones > pd

loga2(ed).

We end this section by considering the case of unknown polynomial noise and known . The next proposition shows that in this case the minimax rate, for a given a >4, is of the form

pol(s, d) = min{p

s(d/s)1a, d1/4} and it is achieved by the estimator

pol = 8<

:

k✓ˆk2 if sd12 a12, Pd

j=1Yj2 d 2 1/2 if s > d12 a12, where✓ˆ is defined in (7.5).

Proposition 7.3.4(Unknown noise inPa,⌧, known ). Let ⌧ >0, a >4. The following two properties hold.

(i) Let s and d be integers satisfying 1  s < b dc/4, where 2 (0,1/2] is the tuning parameter in the definition of ˜2. There exist constants c, C > 0, and 2(0,1/2]depending only on(a,⌧)such that if ✓ˆ is the estimator defined in (7.5) with j =c(d/j)1a, j = 1, . . . , d, then

sup

P2Pa,⌧

sup

kk0s

E✓,P,

⇣Nˆpol k✓k2

2

C 2 pol(s, d) 2.

(ii) Let s andd be integers satisfying 1sd and let`(·) be any loss function in the class L. Then, there exist constants c >0, c0 >0 depending only on `(·), a and ⌧ such that

infTˆ

sup

P2Pa,⌧

sup

kk0s

E✓,P, `

c( pol(s, d)) 1 Tˆ k✓k2 ◆ c0, where infTˆ denotes the infimum over all estimators.

Note that here, similarly to Proposition 7.3.3, the improvement over the case of unknown is in a polynomial factor in the dense zone s > d12 a12.

Dans le document The DART-Europe E-theses Portal (Page 169-173)