the situation is completely diﬀerent in the sparse mean model, where the optimal rates are much slower and depend on the polynomial index of the noise.

We summarize all our findings in Table 2.2.

Gaussian noise Noise in class Ga,⌧, Noise in class Pa,⌧,

known Donoho et al. (1992)

unknown Verzelen (2012) unknown unknown

k✓k^{2}

q

slog(1 + ^{p}_{s}^{d})^d^{1/4} p

slog^{1}^{a}(ed/s)^d^{1/4} p

s(d/s)^{1}^{a} ^d^{1/4}

known Collier et al. (2017) known known

q

Table 2.2: Rates of convergence of the minimax risks.

### 2.4 Simulation of Gaussian processes

Stochastic processes are very popular for modeling dynamics, in particular, they are used to model price fluctuations in finance. It is well known that Gaussian processes can be discretized, then simulated when their covariance is known. Unlike the case of Brownian motion, simulation is costly when increments of the process are no longer inde-pendent. The papers by Ayache and Taqqu (2003); Dzhaparidze and Van Zanten (2004) propose optimal methods, in a precise sense, to simulate fractional Brownian motion using diﬀerent series expansions. These series are complicated and use special functions not giving any intuition on their construction which makes it diﬃcult to generalize them to other processes.

In Chapter 8, we present a new approach to derive series expansions for some Gaus-sian processes based on harmonic analysis of their covariance function. In particular, a new simple rate-optimal series expansion is derived for fractional Brownian motion:

B_{t}^{H} =p

8k 1,

( ck:= _{T}^{2} RT

0 t^{2H}cos^{k⇡t}_{T} dt, H <1/2
ck:= ^{4H(2H}_{(k⇡)}2^{1)T}

RT

0 t^{2H} ^{2}cos^{k⇡t}_{T} dt, H >1/2,

and (Zk)k2Z is a sequence of independent standard Gaussian random variables. The convergence of the latter series holds in mean square and uniformly almost surely, with a rate-optimal decay of the remainder term of the series. We also develop a general framework of convergent series expansion for certain classes of Gaussian processes.

The main Chapters of this thesis are based, respectively, on the following works:

Butucea, C., Ndaoud, M., Stepanova, N. A., and Tsybakov, A. B. (2018). Variable selection with hamming loss. The Annals of Statistics, 46(5):1837-1875.

Ndaoud, M. and Tsybakov, A. B. (2018). Optimal variable selection and adaptive noisy compressed sensing. arXiv preprint arXiv:1809.03145.

Ndaoud, M. (2019). Interplay of minimax estimation and minimax support recovery under sparsity. ALT 2019.

Ndaoud, M. (2018b). Sharp optimal recovery in the two component Gaussian mixture model. arXiv preprint arXiv:1812.08078.

Comminges, L., Collier, O., Ndaoud, M., and Tsybakov, A. B. (2018). Adaptive robust estimation in sparse vector model. arXiv preprint arXiv:1802.04230v3.

Ndaoud, M. (2018a). Harmonic analysis meets stationarity: A general framework for series expansions of special Gaussian processes. arXiv preprint arXiv:1810.11850.

### Part I

### Variable Selection in

### High-Dimensional Linear Regression

27

### Chapter 3

### Variable selection with Hamming loss

We derive nonasymptotic bounds for the minimax risk of variable selection under
ex-pected Hamming loss in the Gaussian mean model inR^{d} for classes of at most s-sparse
vectors separated from 0 by a constanta >0. In some cases, we get exact expressions for
the nonasymptotic minimax risk as a function of d, s, a and find explicitly the minimax
selectors. These results are extended to dependent or non-Gaussian observations and to
the problem of crowdsourcing. Analogous conclusions are obtained for the probability of
wrong recovery of the sparsity pattern. As corollaries, we derive necessary and suﬃcient
conditions for such asymptotic properties as almost full recovery and exact recovery.

Moreover, we propose data-driven selectors that provide almost full and exact recovery adaptively to the parameters of the classes.

Based on Butucea et al. (2018): Butucea, C., Ndaoud, M., Stepanova, N. A., and Tsybakov, A. B. (2018). Variable selection with hamming loss. The Annals of Statistics, 46(5):1837-1875.

### 3.1 Introduction

In recent years, the problem of variable selection in high-dimensional regression mod-els has been extensively studied from the theoretical and computational viewpoints.

In making eﬀective high-dimensional inference, sparsity plays a key role. With regard to variable selection in sparse high-dimensional regression, the Lasso, Dantzig selec-tor, other penalized techniques as well as marginal regression were analyzed in detail;

see, for example, Meinshausen and Bühlmann (2006); Zhao and Yu (2006); Wainwright (2009b); Lounici (2008); Wasserman and Roeder (2009); Zhang (2010); Meinshausen and Bühlmann (2010); Genovese et al. (2012); Ji and Jin (2012) and the references cited therein. Several other recent papers deal with sparse variable selection in nonparametric regression; see, for example, Laﬀerty and Wasserman (2008); Bertin and Lecué (2008);

Comminges and Dalalyan (2012); Ingster and Stepanova (2014); Butucea and Stepanova (2017).

In this chapter, we study the problem of variable selection in the Gaussian sequence model

Xj =✓j+ ⇠j, j = 1, . . . , d, (3.1) 29

where ⇠_{1}, . . . ,⇠_{d} are i.i.d. standard Gaussian random variables, >0 is the noise level,
and ✓ = (✓1, . . . ,✓d) is an unknown vector of parameters to be estimated. We assume
that ✓ is (s, a)-sparse, which is understood in the sense that ✓ belongs to one of the
following sets:

⇥d(s, a) = ✓2R^{d} : there exists a set S✓{1, . . . , d}with at most s elements
such that |✓j| a for all j 2S, and ✓j = 0 for all j /2S

or

⇥^{+}_{d}(s, a) = ✓ 2R^{d}: there exists a set S ✓{1, . . . , d} with at most s elements
such that ✓j a for all j 2S, and ✓j = 0 for all j /2S .

Here, a >0 and s2{1, . . . , d} are given constants.

We study the problem of selecting the relevant components of✓, that is, of estimating the vector

⌘=⌘(✓) = I(✓j 6= 0) _{j=1,...,d},

where I(·) is the indicator function. As estimators of ⌘, we consider any measurable
functions ⌘ˆ= ˆ⌘(X1, . . . , Xn) of (X1, . . . , Xn) taking values in {0,1}^{d}. Such estimators
will be called selectors. We characterize the loss of a selector ⌘ˆas an estimator of ⌘ by
the Hamming distance between ⌘ˆand ⌘, that is, by the number of positions at which ⌘ˆ
and ⌘ diﬀer:

|⌘ˆ ⌘|, Xd

j=1

|⌘ˆj ⌘j|= Xd

j=1

I(ˆ⌘j 6=⌘j).

Here, ⌘ˆj and ⌘j = ⌘j(✓) are the jth components of ⌘ˆ and ⌘ = ⌘(✓), respectively. The expected Hamming loss of a selector ⌘ˆ is defined as E✓|⌘ˆ ⌘|, where E✓ denotes the expectation with respect to the distributionP✓ of(X1, . . . , Xn)satisfying (3.1). Another well-known risk measure is the probability of wrong recovery P✓( ˆS 6= S(✓)), where Sˆ={j : ˆ⌘j = 1}and S(✓) = {j :⌘j(✓) = 1}. It can be viewed as the Hamming distance with an indicator loss and is related to the expected Hamming loss as follows:

P✓ Sˆ6=S(✓) =P✓ |⌘ˆ ⌘| 1 E✓|⌘ˆ ⌘|. (3.2) In view of the last inequality, bounding the expected Hamming loss provides a stronger result than bounding the probability of wrong recovery.

Most of the literature on variable selection in high dimensions focuses on the
re-covery of the sparsity pattern, that is, on constructing selectors such that the
proba-bility P_{✓}( ˆS 6= S(✓)) is close to 0 in some asymptotic sense (see, e.g., Meinshausen and
Bühlmann (2006); Zhao and Yu (2006); Wainwright (2009b); Lounici (2008); Wasserman
and Roeder (2009); Zhang (2010); Meinshausen and Bühlmann (2010)). These papers
consider high-dimensional linear regression settings with deterministic or random
co-variates. In particular, for the sequence model (3.1), one gets that if a > C p

logd for some C >0 large enough, then there exist selectors such thatP✓( ˆS 6=S(✓))tends to 0, while this is not the case ifa < c p

logdfor somec > 0small enough. More insight into
variable selection was provided in Genovese et al. (2012); Ji and Jin (2012) by
consid-ering a Hamming risk close to the one we have defined above. Assuming thats ⇠d^{1}
for some 2 (0,1), the papers Genovese et al. (2012); Ji and Jin (2012) establish an

3.1. INTRODUCTION 31 asymptotic ind “phase diagram” that partitions the parameter space into three regions called the exact recovery, almost full recovery, and no recovery regions. This is done in a Bayesian setup for the linear regression model with i.i.d. Gaussian covariates and random ✓. Note also that in Genovese et al. (2012); Ji and Jin (2012) the knowledge of is required to construct the selectors, so that in this sense the methods are not adaptive. The selectors are of the form⌘ˆj =I(|Xj| t)with threshold t=⌧( ) p

logd for some function ⌧(·) > 0. More recently, these asymptotic results were extended to a combined minimax–Bayes Hamming risk on a certain class of vectors ✓ in Jin et al.

(2014).

The present paper makes further steps in the analysis of variable selection with a
Hamming loss initiated in Genovese et al. (2012); Ji and Jin (2012). Unlike Genovese
et al. (2012); Ji and Jin (2012), we study the sequence model (3.1) rather than Gaussian
regression and analyze the behavior of the minimax risk rather than that of the Bayes
risk with a specific prior. Furthermore, we consider not only s ⇠ d^{1} but general s
and derive nonasymptotic results that are valid for any sample size. Remarkably, we
get an exact expression for the nonasymptotic minimax risk of separable
(coordinate-wise) selectors and find explicitly the separable minimax selectors. Finally, we construct
data-driven selectors that are simultaneously adaptive to the parameters a and s.

Specifically, we consider the minimax risk

inf⌘˜ sup

✓2⇥

1

sE✓|⌘˜ ⌘| (3.3)

for⇥=⇥d(s, a)and ⇥=⇥^{+}_{d}(s, a), where inf⌘˜ denotes the infimum over all selectors ⌘˜.
In Section 3.2, for both classes ⇥ = ⇥d(s, a) and ⇥ = ⇥^{+}_{d}(s, a) we find the upper and
lower bounds of the minimax risks and derive minimax selectors for any fixedd, s, a >0
such that s < d. For ⇥ = ⇥d(s, a), we also propose another selector attaining the
lower bound risk up to the factor 2. Interestingly, the thresholds that correspond to
the minimax optimal selectors do not have the classical formA p

logd for some A >0; the optimal threshold is a function ofaand s. Analogous minimax results are obtained for the risk measured by the probability of wrong recovery P✓( ˆS 6= S(✓)). Section 3.3 considers extensions of the nonasymptotic minimax theorems of Section 3.2 to settings with non-Gaussian or dependent observations. In Section 3.4, as asymptotic corollaries of these results, we establish sharp conditions under which exact and almost full recovery are achievable. Section 3.5 is devoted to the construction of adaptive selectors that achieve almost full and exact recovery without the knowledge of the parametersa and s. Most of the proofs are given in Appendix 3.6.

Finally, note that quite recently several papers have studied the expected Hamming loss in other problems of variable selection. Asymptotic behavior of the minimax risk analogous to (3.3) for classes ⇥ diﬀerent from the sparsity classes that we consider here was analyzed in Butucea and Stepanova (2017) and without the normalizing factor 1/s in Ingster and Stepanova (2014). Oracle inequalities for Hamming risks in the problem of multiple classification under sparsity constraints are established in Neuvial and Roquain (2012). The paper Zhang and Zhou (2016) introduces an asymptotically minimax approach based on the Hamming loss in the problem of community detection in networks.