• Aucun résultat trouvé

Deconvolution for some singular density errors with respect to the $\mathbb{L}^1$ loss

N/A
N/A
Protected

Academic year: 2021

Partager "Deconvolution for some singular density errors with respect to the $\mathbb{L}^1$ loss"

Copied!
41
0
0

Texte intégral

(1)

HAL Id: hal-02511859

https://hal.archives-ouvertes.fr/hal-02511859v2

Preprint submitted on 2 Oct 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Deconvolution for some singular density errors with

respect to the L

1

loss

Clément Marteau, Mathieu Sart

To cite this version:

Clément Marteau, Mathieu Sart. Deconvolution for some singular density errors with respect to the

(2)

RESPECT TO THE L1 LOSS

CL´EMENT MARTEAU AND MATHIEU SART

Abstract. We present a versatile and model-based procedure for estimating a density in a decon-volution setting where the error density is assumed to be singular enough. We assess the quality of our estimator by establishing non-asymptotic risk bounds for the L1 loss. We specify them under

various constraints on the target. We investigate cases where the density is multimodal, (piece-wise) concave/convex, belongs to a (possibly inhomogeneous) Besov space or a particular parametric model. Moreover, our estimation procedure is robust with respect to model misspecification.

1. Introduction

Let Y1, . . . , Ynbe n independent and identically distributed real valued random variables defined

on an abstract probability space (Ω, E , P ). We assume that each random variable Yi can be written

as

(1) Yi= Xi+ εi, i∈ {1, . . . , n}

where for all i ∈ {1, . . . , n}, Xi denotes a random variable admitting a density f0 with respect

to the Lebesgue measure. The variable εi denotes measurement errors, independent of Xi, with

known density q. Our aim is to estimate f0 from the indirect observations Y1, . . . , Yn. This turns

to be a deconvolution problem, the density of the Yi’s being the convolution product f0∗ q.

Deconvolution in a statistical context has been at the core of several investigations in the litera-ture. We refer to, e.g., [Mei09] for a comprehensive introduction to this topic. Over the last decades, several approaches have been proposed. For instance, kernel procedures have been designed, taking advantage of the convolution structure in the Fourier domain. We mention a seminal contribu-tion by [Fan91] who establishes rates of convergence under smoothness assumptions on the target density f0. Several extensions and improvements have then been obtained, including in particular

discussions regarding an adaptive choice of the bandwidth. We refer, e.g., to [DG04] for bootstrap bandwidth selection, [BMP08] or [CL11] when the error density q is unknown, [DM13] for rates of convergence using the Wasserstein metric, or [Reb16] for adaptation under Lp-losses. Alternatively,

several methods based on the minimization of a (penalized) criterion over a given family have been carried out. Typically, given a collection of candidates for f0, one would like to find the one leading

to the smallest quadratic risk. This optimal element (in a sense that should be made precise) is then approximated as a minimizer of a functional, playing the role of a risk estimator. This func-tional is possibly penalized to include constraints on the target, often in an adaptive way. We refer

Date: October, 2020.

2010 Mathematics Subject Classification. 62G05.

Key words and phrases. density deconvolution, combinatorial method, shape constraints, Besov balls.

(3)

for instance to [CRT06] for projection methods and to [PNR13] where the performance of a dic-tionary based method is investigated in a spherical context. Other authors have extended wavelet approaches to the deconvolution model. The idea is to take advantage of the density representation property in a wavelet basis. The coefficients of this decomposition must then be estimated and sometimes thresholded in an appropriate way, see [PV99,FK02,CR10,LN11].

The aforementioned references assess the quality of their estimators by means of Lp-losses, with p larger than 1 (except [Fan91]). In the present paper, we focus to the case p = 1. The loss then does not only measure the distance between the estimator and the density f0, but also the distance

between the two underlying probability measures.

We restrict our study to a moderately posed deconvolution problem with small degree of ill-posedness. More precisely, we assume that the Fourier transform q⋆ of q does not vanish and that

|q⋆(t)|−1 is of the order of|t|β for large values of t. The parameter β should be smaller than 1/2,

which indicates in some sense that q is singular enough, or in other terms, that the framework is not so far away from the direct setting (which would correspond to β = 0). As noticed by [Mei09, DGJ11], this condition makes it possible the consistent estimation of the distribution function without further assumptions on f0. This last problem is indeed closely related to the

estimation of a density using the L1 loss. We refer to [DL12] for more informations regarding this issue.

In direct estimation, there exist very general procedures that lead to both optimal and robust es-timators. We are thinking in particular to methods based on robust tests or like-minded approaches such as those described in [Bir83,Bir06,DL12,BBS17]. These procedures rely on arguments that are very different from those usually used in deconvolution. Developing them in the context of deconvolution seems therefore challenging. In this paper, we present a first step in this direction by proposing a new estimation procedure in line with [DL12]. It does not lead to risk bounds as general as in the direct case, but at least, it already yields new results in the deconvolution setting. For each suitable collection F of densities (named model in the sequel), we define an estimator

ˆ

f ∈ F and get a non-asymptotic upper-bound for the L1 risk d

1(f0, ˆf ) of the estimator, both in

expectation and with high probability. We then deduce new upper bounds of the minimax risks for some collections corresponding to parametric, smoothness or shape constraints on f0 as described

below.

We obtain rates of convergence when f0 is multimodal, or when f0is (piecewise) concave/convex.

In both cases, only the number of modes or pieces must be known by the statistician. Their locations may be completely unknown. These rates are moreover better for some particular densities of F . They become, for instance, nearly parametric when f0 is piecewise constant and when F models

a multimodal density. This phenomenon is similar to the one which occurs for the Grenander estimator in direct estimation, see [Bir89]. Let us mention that very few attempts have been conducted to deal with shape constraints in density deconvolution, the only one we know about is [CDH11].

We consider also the case where the target f0 belongs to a bounded set F = Bp,∞α (R) of a

Besov space. Here, p may be very small and in particular smaller than 1, which allows to take into account spatially inhomogeneous smoothness. Moreover, f0 may not be continuous nor bounded

(4)

many references in the literature mixing Besov spaces and L1 loss in density deconvolution. For other losses or other smoothness assumptions, see [FK02,CRT06,CL13,LW19] and the references therein.

We also give an example of parametric model to illustrate the flexibility of our approach. We show that our estimator achieves a (near) parametric rate when f0 is known, up to a location

and scale parameter. It is worth mentioning that the method of maximum likelihood does not work here, and that ours is robust to model errors. Similar results are shown for the model F composed of all piecewise polynomial densities on at most d intervals (these intervals are not set by the model). Our rates may then be related to that obtained in [FHM14] under known bounds on kf0k∞ in inverse regression. Their estimator, however, does not share same robustness

properties than ours (which are crucial to deal with spatial inhomogeneity as in the previous paragraph). Although our estimator is piecewise polynomial in that case, it is not constructed to locate the possible discontinuities of the density. This other problem is studied for instance in [Neu97,GTZ06,GJTZ08].

This paper is organized as follows. In Section 2, we present an elementary approach based on an histogram-type estimator. We carry out our main results on shape, smooth and parametric constraints in Section 3. The description of our general procedure, which is technical, is deferred to Section 4. Section 5 gathers the proofs. To reduce the size of the paper, we study the problem of minimax estimation of a decreasing density in Appendix A. We carry out in Appendix B two elementary approximation lemmas that were required in our proofs.

Throughout the paper, we will use the following notation. The terms X, Y, ε denote generic random variables having same law than respectively Xi, Yi and εi. We will sometimes write RAf

instead of RAf (x) dx to shorten the formulas and omit the set A when A = R. We denote the Fourier transform f⋆ of a integrable, or square integrable, function f ∈ L1(R)∪ L2(R) by

f⋆(t) = Z

eitxf (x) dx, ∀t ∈ R.

The L1 distance between two integrable functions f1, f2 ∈ L1(R) is denoted by

d1(f1, f2) =

Z

|f1− f2|.

For all F ⊂ L1(R), and f ∈ L1(R), we set d

1(f, F ) = infg∈Fd1(f, g). The complex conjugate

of a number x is ¯x. The notation |I| may refer either to the Lebesgue measure or the cardinality of I. The supremum norm of a function f is kfk∞= supx∈R|f(x)| and its derivative (provided it

exists) is denoted by f′. The notations c, c′, C, C′, . . . are used for quantities that may change from line to line. They are usually constants (that is numbers), but may sometimes depend on some parameters. In that case, the dependency will be specified in the text.

2. Preliminaries

(5)

2.1. Assumptions on the error density. All along the paper, we will assume that the vari-ables εi involved in the model (1) admits a density q satisfying the following requirement. This

assumption will always assumed to be met in the next sections.

Assumption 1. The Fourier transform q⋆ of q does not vanish. Moreover there exist three

con-stants κ1, κ2 and β ∈ (0, 1/2) such that for all t ∈ R,

(2) |q⋆(t)|−2 ≤ κ1+ κ2|t|2β.

Such kind of assumption is quite classical in the statistical literature (see, e.g., [Mei09]). The fact that q⋆ is not allowed to vanish entails in particular that f0⋆ can be recovered from (f0∗ q)⋆ at

any frequency and hence ensures that the problem is identifiable. Such a constraint is for instance satisfied when q is even and its derivative q′on (0, +∞) exists and is strictly increasing (see [Tuc06]). The difficulty (expressed for instance in terms of convergence rates) of the deconvolution problem is often measured through the behavior of q⋆(t) for large value of t. The condition (2) indicates that q⋆ has a polynomial behavior: the deconvolution problem is said in this case to be mildly

ill-posed. We restrict our attention to the specific case where the polynomial exponent β is smaller than 1/2. The following proposition provides examples of densities satisfying such a requirement. In particular, such examples correspond to densities having a singularity on a given point which appears to be of first mathematical interest. The proof of Proposition1is postponed to Section5.2. Proposition 1. Let ϕ : R→ R be a map such that ϕ(0) is positive and whose derivative exists, is bounded and integrable on R. Let q be a density of the form q(x) = |x|β−1ϕ(x) with β ∈ (0, 1/2).

Then if q⋆ does not vanish, q satisfies Assumption 1.

2.2. Estimation of a probability. Our procedures are heavily based on probability estimations of the form P(X ∈ I) = RIf0 for some I ⊂ R. The estimation of these terms seems to be tricky

when I is only assumed to be a Borel set, but is more easy when I is an interval, or more generally a union of intervals. This has been studied for instance in [Mei09,DGJ11], and we will adopt here the estimation strategy of [Mei09]. Note that the frontier β = 1/2 for the index involved in (2) plays an important role for this problem.

In the following, for any d∈ N⋆, we denote byI

dthe collection of unions of at most d intervals

with finite length. For any I ∈ Id, the function φI : t→1

I(t)×[q⋆(t)]−1belongs to L2(R) according

to Assumption 1. Therefore its Fourier transform φ⋆

I is well defined, and we introduce the random

variable Z(I) as Z(I) = 1 2πφ ⋆ I(Y ) = 1 2π Z 1 ⋆ I(t) [q⋆(t)]−1eitY dt.

When kf0 ∗ qk∞ is finite, Z(I) has moments up to 2th order, see Proposition 2 below. We may

(6)

where the last equality comes from Plancherel isometry when f0 ∈ L2(R) and from an additional

density argument otherwise, see Section 5.3. We deduce that

(4) Zn(I) = 1 2πn n X j=1 Z 1 ⋆ I(t) [q⋆(t)]−1eitYjdt.

is an unbiased estimator for the term P(X ∈ I). A control on its variance is given by the proposition below. Its proof is postponed to Section 5.4.

Proposition 2. For all union of at most d intervals I ∈ Id, we have Z(I)∈ R and

E[(Z(I))2]≤ kf0∗ qk∞ h κ1d|I| + cβκ2d1+2β|I|1−2β i , (5)

where cβ only depends on β.

2.3. Estimation of f0 using histograms. In this paper, we are interested in the estimation of

the density f0 of the Xi rather than a probability in itself. For pedagogical reasons, we present

below an histogram estimator and give a L1 risk bound. The following results should be seen as a foretaste of our main contributions that will be presented in Sections 3and 4.

Let m be a given finite collection of size |m| of disjoint intervals of finite and positive lengths. The integral of f0 over each interval I included in m can be estimated thanks to the expression (4).

Gathering all these estimations, we can define an histogram estimator ˆfm of f0 as follows:

(6) fˆm =

X

I∈m

Zn(I)

|I| 1I.

By using Proposition2, we get a control (in expectation) of its L1-risk as displayed in the following proposition. The proof is deferred to Section 5.5.

Proposition 3. For any finite collection m of disjoint intervals (of positive and finite lengths), let

(7) Fm= ( X I∈m αI1I, (αI)I∈m ∈ [0, +∞) |m|, X I∈m αI|I| = 1 )

be the collection of piecewise constant densities on m. Then, Ehd1 f0, ˆfmi≤ 2d1(f0,Fm) + r kf0∗ qk∞ κ1ℓ|m| + cβκ2ℓ1−2β|m|1+2β n (8) where ℓ = SI∈mI .

This upper bound is non-asymptotic and shares similarities with results obtained in the direct case (which would roughly correspond to β = 0). In particular, our bound is composed of a bias term d1(f0,Fm) and an estimation component. The supremum norm of f0∗ q that appears in this

inequality can always be related to a suitable Ls norm of f

0. Indeed, by Young’s inequality,

kf0∗ qk∞ ≤ inf s≥1  kf0kskqks/(s−1) .

Since q is not bounded, we cannot get an upper bound of kf0 ∗ qk∞ independent of f0.

Nev-ertheless, we always have the rough upper-bound kf0 ∗ qk∞ ≤ kf0k∞ as q is a density. More

(7)

We now give some direct illustrations of this result. First, when f0 is piecewise constant on the

collection m, the bias term in (8) vanishes and the estimator ˆfm achieves the expected rate 1/√n.

Second, when f0 is compactly supported on [0, 1] and is smooth, the bias term can be controlled.

For instance, if f0 is H¨older with regularity α ∈ (0, 1], we may choose m as a regular partition

of [0, 1] of adequate size to get the rate of convergence n−α/(2β+2α+1). This rate cannot be im-proved in general, see [Fan91]. Thirdly, we can deal with the estimation of f0 under monotonic

constraints. Choosing m in an appropriate way allows then to obtain the optimal rate of conver-gence n−1/(2β+3). For the sake of clarity, this result and its associated minimax lower bound are

displayed in Appendix A.

According to the results presented above, the histogram estimator ˆfm introduced in (6) leads to

optimal rates of convergence in several situations. This estimator is particularly simple and can easily be computed in practice. Nevertheless, this approach has two major defects. The first is that the choice of m should be made according to the data. The second is that a piecewise constant estimator — no matter the choice of m — cannot be rate optimal under other assumptions of interest (which is the case, for instance, when α is larger than 1 in the H¨older case).

3. Non-asymptotic risk bounds

We propose below a new estimation procedure inspired by both the median of mean principle and the combinatorial method. It allows to consider a wide variety of different problems and to obtain more general risk bounds than those displayed in Section 2. This procedure should be essentially considered as a tool leading to new theoretical results rather than a method for practical and applied statistical analysis. Since the method is quite technical, its presentation is deferred to Section 4.

3.1. Assumption and main result. In the following, F denotes a collection of densities given by the statistician. This collection can be seen as a translation of the a priori available knowledge on f0. We stress that f0 should not necessarily belong to F but should at least be well approximated,

in a sense which will be made precise later on, by some elements of this set. Concerning the set F , we consider the following assumption.

Assumption 2. There exist for every density f ∈ F two integers ℓf, df such that:

• For all f ∈ F , f is compactly supported on [−ℓf, ℓf].

• For all f, g ∈ F , the set [f > g] = {x ∈ R, f(x) > g(x)} is a union of at most min{df, dg}

intervals.

Moreover, there exists an at most countable collection F ⊂ F such that: for all f ∈ F and ε > 0, there exists f′ ∈ F such that d1(f, f′)≤ ε, df′ ≤ df and ℓf′ ≤ ℓf.

(8)

Theorem 4. Let ξ ≥ 1, and F be a model fulfilling Assumption 2. There exists an estimator ˆ f ∈ F satisfying d1(f0, ˆf )≤ 3 inf f ∈F ( d1(f0, f ) + c s kf0∗ qk∞ (κ1ℓfdf + cβκ2ℓ1−2βf d2β+1f ) (ξ + log(ℓfdf)) n ) (9)

on an event having probability 1− e−ξ. Moreover, when ξ = log n,

Ehd1(f0, ˆf ) i ≤ 3 inf f ∈F   d1(f0, f ) + c ′ s kf0∗ qk∞ (κ1ℓfdf + cβκ2ℓ1−2βf d1+2βf ) log(ℓfn) n   . In the above inequalities, c, c′ are universal constants and c

β only depends on β.

It can be noticed that the above result, whose proof can be found in Section 5.10, applies in particular to the collection F =Fmintroduced in (7) with ℓf =|∪I∈mI|, df =|m|. In such a case,

our estimator ˆf leads to the same expectation bound (up to a log term) obtained in Proposition 3

for the histogram procedure. The main interest of this result is that it allows to cope with several alternative models that are described in the forthcoming sections.

3.2. Piecewise monotone densities. In this section, we explain how to estimate a multimodal density f0 in our deconvolution setting.

We consider r ≥ 1 and introduce the family Mr that gathers all the collections m of size r of

the form

m ={[x1, x2], (x2, x3], (x3, x4], . . . , (xr, xr+1]} ,

(10)

where x1 < x2 <· · · < xr+1 are r + 1 real numbers (with the convention that m ={[x1, x2]} when

r = 1). We then define the collection Fr of piecewise monotone functions by

Fr=

( X

I∈m

fI1I, where m∈ Mr and fI is a non-negative monotone function on I

) . (11)

Note that compactly supported densities that are monotone on its support belongs to F1. The

ones that are multimodal with r modes belongs to F2r.

We may estimate such densities by applying Theorem4to a suitable subset ofFr. More precisely,

we introduce for k≥ 1, the collection Pk= ( X I∈m αI1I, where m∈ Mk, αI ∈ [0, +∞), X I∈m αI|I| = 1 )

of piecewise constant densities based on collections of Mk. According to Proposition 3 of [BB16],

Assumption 2 is fulfilled for the set

(9)

with df = 3(k + r + 5)/2 for all f ∈ Pk∩ Fr. This enables Theorem 4 to be applied.

In the following, to measure the variations of a function f , we introduce for any interval I, VI(f ) = sup

x,y∈I |f(x) − f(y)| .

(13)

We then set for all f ∈ Fr,

L1(f ) = inf m∈Mr, f =PI∈mfI1I ( X I∈m log1/2(1 +|I|VI(fI)) )2 .

An application of Theorem 4leads to the following result. Its proof can be found in Section 5.6. Theorem 5. Consider r ≥ 1. There exists an estimator ˆf ∈ Fr satisfying for all integer ℓ and

all f0 such that f01[−ℓ,ℓ]∈ Fr,

Ehd1(f0, ˆf ) i ≤ C ( L1(f01 [−ℓ,ℓ])(2β+1)/(2β+3)  c1kf0∗ qk∞ log n n 1/(2β+3) +c1/22 L1(f01[−ℓ,ℓ])× c −1/2 1 1/(2β+3) kf0∗ qk∞log n n (β+1)/(2β+3) +  c3kf0∗ qk∞ log n n 1/2 + Z [−ℓ,ℓ]c f0 ) , where C is a universal constant, where c1, c2, c3 are defined by

c1= (1 + (log ℓ)/(log n))κ2cβℓ1−2β,

c2= (1 + (log ℓ)/(log n))κ1ℓ,

c3= c1r1+2β+ c2r,

and where cβ only depends on β.

The term L1(f01

[−ℓ,ℓ]) provides a description of the variation of the density f0 on [−ℓ, ℓ]. In

particular, when the fI admit slow variation on small intervals I, this quantity appears to be small.

The extreme case is obtained when f0 is piecewise constant over r intervals included in [−ℓ, ℓ]. In

such a case, L1(f01[−ℓ,ℓ]) = 0 and

Ehd1(f0, ˆf ) i ≤ C r c3kf0∗ qk∞ log n n .

Up to the logarithmic term, this bound exactly corresponds to the one that can be obtained with our histogram estimator introduced in the previous section. However, the partition over which the target density is piecewise constant is completely unknown.

(10)

Our result also applies to the situation where the target density f0 is not compactly supported,

but admits a tail that decreases fast enough. In that case, optimizing our risk bound with respect to ℓ yields:

Corollary 1. Suppose that f0 is a bounded multimodal density with r modes. When there is p > 0

such that R[−ℓ,ℓ]cf0 ≤ e−pℓ for all ℓ, the estimator ˆf defined on Fr satisfies when n is large enough

Ehd1(f0, ˆf )

i

≤ C (log log n)(2β+1)/(2β+3)(log n)2(1−β)/(2β+3)n−1/(2β+3), where C only depends on κ2, β, p, r and kf0∗ qk∞.

Hence, provided the density f0 admits an exponential moment, we retrieve the same rates as in

the compact case, up to some logarithmic terms. We point out that the estimator is adaptive with respect to the value of p.

3.3. Piecewise concave-convex densities. We now consider stronger assumptions on the shape of f0 in order to get faster rates of convergence. More precisely, we consider the estimation of a

density f0 that is piecewise concave-convex on its support.

We define for r≥ 1, Gr=

( X

I∈m

fI1I, where m∈ Mr and fI is either a convex or a concave

(14) non-negative function on I ) , and for k ≥ 1, Qk= ( X I∈m

fI1I, where m∈ Mk and where fI is an affine function such that

f = X

I∈m

fI1I is a density

) .

It follows from Proposition 5 of [BB16] that the set F defined as F = ∞ [ k=1 (Qk∩ Gr) , (15)

satisfies Assumption 2with df = 3(k + r + 5) for all f ∈ Qk∩ Gr. We may thus apply Theorem 4

to this model.

A convex or concave function on a given interval I is not necessarily derivable. It is however derivable on both the left and right-hand sides. Such a derivative (on the right or the left, according to the reader’s choice) will be denoted as f′. Then, we set for f ∈ Gr,

(11)

where ˚I denotes the interior of I and V˚I(·) was introduced in (13). We apply Theorem 4 to the

above collection F and deduce the following theorem whose proof is deferred to Section 5.6. Theorem 6. Consider r ≥ 1. There exists an estimator ˆf ∈ Gr satisfying for all integer ℓ and all

f0 such that f01 [−ℓ,ℓ]∈ Gr, Ehd1(f0, ˆf ) i ≤ C ( L2(f01 [−ℓ,ℓ])(2β+1)/(2β+5)  c1kf0∗ qk∞ log n n 2/(2β+5) +c1/22 L2(f01 [−ℓ,ℓ])× c−1/21 1/(2β+5) kf0∗ qk∞ log n n (β+2)/(2β+5) +  c3kf0∗ qk∞ log n n 1/2 + Z [−ℓ,ℓ]c f0 ) , where C is a universal constant, where c1, c2, c3 are defined by

c1= (1 + (log ℓ)/(log n))κ2cβℓ1−2β,

c2= (1 + (log ℓ)/(log n))κ1ℓ,

c3= c1r1+2β+ c2r,

and where cβ only depends on β

This result is hence very close to the one obtained for multimodal functions. The main differences are contained in the value of the exponents and in the way the variations off0are measured on [−ℓ, ℓ].

Provided the target density f0 is compactly supported and convex (or concave) on its support,

we can deduce from the previous bound that for n large enough, Ehd1(f0, ˆf )

i

≤ C((log n)/n)2/(2β+5),

for an appropriate value C (not depending on n). This rate is better than the one obtained in Section 3.2under a monotonicity constraint.

When f0 is piecewise linear on r intervals, namely when f0 ∈ Qr, then L2(f01

[−ℓ,ℓ]) = 0 for ℓ

large enough. The estimation rate then becomes automatically parametric (up to a logarithmic term). By the way, our estimator is adaptive with respect to the intervals on which f0 is linear.

Similarly to the discussion displayed in the previous section, this result can be extended to the situation where the target function is not compactly supported, but admit a fast decreasing tail. Such a situation is described in the following corollary.

Corollary 2. Suppose that f0 is either convex or concave on each interval of a finite partition of R

of size r with bounded (left-hand) derivative. When there is p > 0, such that R[−ℓ,ℓ]cf0 ≤ e−pℓ for

all ℓ, the estimator ˆf defined on Gr satisfies when n is large enough,

Ehd1(f0, ˆf )

i

(12)

3.4. Piecewise polynomial densities and application to Besov semi-balls. In the previous results, the rate of convergence appears to be parametric (up to a logarithmic term) when the target density f0 is piecewise constant (or linear) on a given number of intervals, that is when

f0 ∈ Pr ⊂ Fr (or f0 ∈ Qr ⊂ Gr). It is in fact possible to extend these results to the situation

where f0 is piecewise polynomial.

We introduce the set F =Fk,r of piecewise polynomial densities defined by

Fk,r =

( X

I∈m

fI1I, where m∈ Mk and fI is a polynomial function of degree at most r

such that f = X

I∈m

fI1I is a density

) .

According to [Sar18], Assumption2is fulfilled for F =Fk,rwith df = (r+2)(k+2). An application

of Theorem 4 hence shows:

Theorem 7. For all k, r, there exists an estimator ˆf ∈ Fk,r such that for all f ∈ Fk,r, and ℓ≥ 1,

Ehd1(f0, ˆf ) i ≤ 3d1(f0, f1[−ℓ,ℓ]) + C r kf0∗ qk∞ (κ1ℓk(r + 1) + cβκ2ℓ1−2β(k(r + 1))1+2β) log(ℓn) n .

where C is universal and where cβ only depends on β.

We get therefore a parametric rate of convergence when the target is piecewise polynomial (up to a logarithmic term). Besides, the knowledge of the intervals on which f0 is polynomial is not

necessary to define the estimator. This theorem can be applied to estimate smooth functions (in a sense which is made precise below).

Let α > 0, p > 0 and f be a compactly supported map on [0, 1]. We set r as the smallest integer larger than α, and define

∆rhf (t) = r X k=0  r k  (−1)r−kf (t + kh), ωp(f, x) = sup 0<h≤x "Z [0,1−rh]|∆ r hf (t)|p dt #1/p , and |f|p,∞= sup x∈(0,1) ωp(f, x) xα .

We then define the Besov semi-ball Bα

p,∞(R) as the set of compactly supported densities on [0, 1]

such that |f|p,∞≤ R.

(13)

estimation is easier over H¨older sets than over Besov sets, considering smaller values of p leading to additional technical issues.

We now use Theorem 7 and approximation results of [DL93] to get an upper bound on the maximal risk over the collection

Bp,∞α (R, M ) =



f ∈ Bαp,∞(R), kf ∗ qk∞ ≤ M .

More precisely:

Corollary 3. Let p, R, M > 0 and α > (1/p− 1)+. Then, if r≥ α + 1 and if k is suitably chosen,

the estimator ˆf ∈ Fk,r defined in Theorem 7 satisfies: for all f0 ∈ Bp,∞α (R, M ),

Ehd1(f0, ˆf ) i ≤ C ( R(2β+1)/(2β+2α+1)  Mlog n n α/(2β+2α+1) +  Mlog n n 1/2) ,

where C only depends on κ1, κ2, β, p, r, α. The choice of k depends on α, β, p, R, M and n only.

In the literature of density deconvolution, many results can be found regarding the estimation of f0 under regularity constraints. This no longer seems to be true when the L1 loss is considered,

and the above rate is, as far as we know, new under our assumptions. Note that we do not work in a standard framework since the error density q is neither squared integrable nor bounded. Moreover, p is allowed to be arbitrarily small, and in particular smaller than 1. Our result is therefore much better than the one that could have been achieved if the model were composed of piecewise polynomial functions on a regular grid of [0, 1] only.

The above maximum risk does not relate to the whole semi-ballBαp,∞(R) but only to the smaller part Bα

p,∞(R, M ). However, these two sets coincide in some cases.

It follows from standard results about the inclusion of Besov spaces into Ls spaces that kfks≤

c(kfkp+ R) for all f ∈ Bp,∞α (R) and α > 1/p− 1/s. Using this result with s = ∞ and noticing

that kfkp ≤ 1 when p < 1 as f is a density, we deduce that kfk∞ ≤ c(kfkp + 1). By using

Young’s inequality, we deduce that the set Bαp,∞(R, M ) coincides with the semi-ballBp,∞α (R) when M = c(R + 1) and α > 1/p, p∈ (0, 1).

Moreover, when the conditions on q described page5are met,kf ∗ qk∞ ≤ c′kfksfor all s > 1/β

and c′ depending only on q, s. By choosing s appropriately, and by applying the reasoning of the

previous paragraph,Bp,∞α (R) =Bp,∞α (R, c′′(1 + R)) when α > 1/p− β and p ∈ (0, 1).

3.5. A parametric example. We illustrate in this section our procedure in an example cor-responding to parametric assumptions on f0. More precisely, given a density f , we define the

model F =  fm,s, m∈ R, s ∈ (0, +∞) where fm,s(·) = 1 sf  · − m s  . (16)

Saying that f0 belongs to F amounts to saying that f0 is known, up to a location m and scale

parameter s.

(14)

that q admits a singularity in 0. In such a case, when f0 = fm,s for some m ∈ R and s > 0, the

density of the Yi’s can be rewritten as

(fm,s∗ q)(y) =

Z

f (t)q(y− m − st) dt, ∀y ∈ R. Then, we can remark that for any y∈ R,

sup m∈R s>0 (fm,s∗ q)(y) ≥ sup s>0 Z f (t)q(−st) dt ≥ Z f (t)lim inf s→0 q(−st)  dt,

using Fatou’s lemma. Hence, the log-likelihood can take infinite values, and the maximum likelihood estimator is not defined.

An alternative approach would be to use estimators of f0∗q leading to results for the L1 loss like,

for instance, the T -estimation method [Bir06], or the combinatorial method [DL12]. However, such procedures would lead to risk bounds for E[d1(f0 ∗ q, fm,ˆˆ s∗ q)] instead of E[d1(f0, fm,ˆˆ s)]. Young’s

inequality asserts that E[d1(f0∗ q, fm,ˆˆ s∗ q)] ≤ E[d1(f0, fm,ˆˆ s)] but the converse is unfortunately not

true in general.

To apply our procedure, we have to check that Assumption2holds. This is the aim of the next proposition (to be proved in Section 5.7).

Proposition 8. Suppose that f is a positive density on (−1, 1) vanishing outside (−1, 1). More-over, f is log-concave on (−1, 1) and unimodal with mode at 0. Then, Assumption 2 is fullfilled with dfm,s = 5 and ℓm,s≤ |m| + s + 1.

A direct application of Theorem4 leads to the following result.

Theorem 9. Consider the collection F defined by (16) and suppose that f satisfies the conditions of Proposition 8. Then, there exists an estimator ˆf of the form ˆf = fm,ˆˆ s such that:

Ehd1(f0, ˆf ) i ≤ 3 inf m∈R s>0 ( d1(f0, fm,s) + r cm,skf0∗ qk∞ log n n ) , (17)

where in the above inequality,

cm,s= c(|m| + s + 1)



1 +log (|m| + s + 1) log n

 for a term c depending only on β, κ1 and κ2.

When f0 does belong to the model F , that is when there exist m0, s0 such that f0= fm0,s0, we

get E[d1(f0, ˆf )]≤ 3 r cm0,s0kf0∗ qk∞ log n n .

Up to a logarithmic term, we recover the expected square root rate in parametric density estimation. Let us mention that we do not assume that f0 lies in the model in order to get (17). Since

d1(f0, ˆf ) ≥ d1(f0, F ) for all ˆf ∈ F , an estimator can perform well only when there exists

(15)

risk of our estimator remains under control. In other words, our estimator ˆf is robust with respect to model misspecification measured through the L1 loss.

4. Estimation procedure

This section is devoted to the construction of the estimator ˆf ∈ F leading to the results displayed in Theorem 4.

4.1. Probability estimators. In Section 2.2, we proposed simple estimators Zn(I) of P (X∈ I).

Unfortunately, the only deviations bounds that can be established for these quantities come from Chebyshev inequality (as Z(I) may not admit moments of order larger than 2). Such bounds are too rough to be used in the construction of our estimator ˆf of f0. This is why we propose in the

following new probability estimators for which uniform bounds in deviation can be proved. Their constructions involve different steps.

Step 1. We consider an interval I of finite length and begin by defining a median of means estimator of the probability P (X ∈ I). We refer, e.g., to [DLLO16] for more details regarding this approach.

Let δ > 0 be a parameter whose value will be specified later on. When either δ≥ n−1 or |I| = 0, we set bZδ(I) = 0. Otherwise, we split the data (Y1, . . . , Yn) into r ∈ (δ, δ + 1] parts b1, . . . , br. Each

part should be approximatively of the same size and more precisely such that |bk| ∈ (n/r, n/r + 1].

For each k ∈ {1, . . . , r}, we define the empirical mean based on the observations in the k-th bloc only: Zbk(I) = 1 2π|bk| X Yj∈bk Z 1 ⋆ I(t)[q⋆(t)]−1eitYjdt.

We then define bZδ(I) as any empirical median of {Zb1(I), . . . , Zbr(I)}, that is as any real number

such that

k ∈ {1, . . . , r}, Zbk(I)≤ bZδ(I) = k ∈ {1,...,r}, Zbk(I)≥ bZδ(I) .

Step 2. We broaden the previous definition of bZδ(I) to unions I of intervals. In the sequel, we

denote I∞=∪∞d=1Idand the closure of a set I by ¯I.

Every union of intervals I ∈ I∞ can be written as I = Sdj=1I Ij where dI ∈ N⋆ and Ij is an

interval such that ¯Ij∩ ¯Ij′ =∅ for all j 6= j′. We stress that dI and the intervals ¯Ij are defined in a

unique way. Due to the additivity of measures, it is then natural to set b Zδ(I) = dI X j=1 b Zδ( ¯Ij).

Step 3. One may already show that each estimator bZδ(I) is close to P (X ∈ I) with high probability.

However, we need this result to be true uniformly for all I ∈ I∞. Hence, we add a technical

(16)

For any d, j, ℓ∈ N⋆, we introduce the grid Gd,j,ℓ =

n

−ℓ + kℓ2−(j−1)/d, k∈ {0, . . . , d2j}o⊂ [−ℓ, ℓ],

and define the set Id,j,ℓ ⊂ Id of unions of at most d intervals whose endpoints lie inGd,j,ℓ. For all

union of intervals I ∈ I∞, let ℓI be the smallest integer such that I ⊂ [−ℓI, ℓI], and πj(I) be the

largest set ofIdI,j,ℓI included in I. If such set does not exist, πj(I) =∅.

Consider ξ≥ 1 and δj(I) = 2 log(2jℓIdI) + 2 log(1 + dI2j+1) for all (I, j)∈ I∞× N⋆. We define

for all I ∈ I∞, our final estimator bZn,ξ(I) of P (X∈ I) by

b

Zn,ξ(I) = bZξ+δ1(I)(π1(I)) +

X

j=1

b

Zξ+δj(I)(πj+1(I)\ πj(I)).

This sum is composed of a finite number of terms as ξ + δj(I) exceeds n− 1 for j large enough. We

prove in Section 5.8the following result.

Proposition 10. Consider ξ≥ 1. Then, there is an event of probability lower bounded by 1 − e−ξ

on which: for all I ∈ I∞,

bZn,ξ(I)− P (X ∈ I) ≤ C s kf0∗ qk∞ κ1ℓIdI + cβκ2ℓ1−2βI d2β+1I (ξ + log(ℓIdI)) n .

In the above inequality, C denotes a universal constant and cβ only depends on β.

4.2. Estimation procedure. We consider a collection F satisfying Assumption2and propose a theoretical procedure in line with the combinatorial method [DL12] to define an estimator ˆf ∈ F satisfying (9). Examples of such collections are given in Sections3.2,3.3,3.4and 3.5.

Our procedure relies on Scheff´e’s lemma which entails that d1(f0, f ) = 2 sup I∈B(R) Z I f Z I f0  ,

where the supremum is taken over all Borelian sets of R. Actually, this supremum is reached for the collection I⋆ ={x ∈ R, f(x) ≥ f

0(x)}.

According to Assumption 2, I⋆ should be a union of intervals when f, f

0 ∈ F . We may then

approximate the probability RIf0 = P (X ∈ I) by our estimator bZn,ξ(I). This yields an

estima-tor γξ(f ) of d1(f0, f ) to be minimized over f ∈ F .

More precisely, define for d≥ 1, the set Id⊂ Id of unions of at most d intervals of finite length

with endpoints in Q. For any ξ ≥ 1, we introduce the criterion γξ(·) defined for f ∈ F by

γξ(f ) = sup I∈Idf I⊂[−ℓf,ℓf] Z I f− bZn,ξ(I) . Then, we consider estimators ˆf ∈ F satisfying

γξ( ˆf )≤ inf

f ∈Fγξ(f ) + 1/n.

(17)

Assumption 2 asserts that there is a dense and at most countable suitable subset F of F . The lemma below ensures that we may always define ˆf as an element of this set F to avoid possible measurability issues.

Lemma 1. There exists an estimator ˆf ∈ F satisfying (18).

Note that there is no uniqueness concerning the way where our estimator is defined. However, Theorem 4applies for any of the estimators ˆf satisfying (18).

Let us mention that there are algorithmic issues concerning the computation of this estimator that are far beyond the scope of this paper. This is the counterpart to get a versatile procedure. This weakness also appears for instance in the original combinatorial method of [DL12], in the procedures based on robust tests, see [Bir06], and in the ρ-estimation procedure [BBS17]. As it is said in the introduction of [Bir12], we cannot have our cake and eat it.

4.3. About ξ. Our estimator explicitly depends on the desired confidence level for the risk bound (9). This is due to the use of a median of means estimator in the construction of bZn,ξ. It is

however possible to substitute it with a subgaussian estimator to bypass this problem, see [DLLO16]. This requires, however, to know an upper bound M on kf0∗ qk∞.

In order to avoid redundancy with the previous section, we do not rewrite here the entire con-struction of the probability estimators. Let us simply mention that they will no longer depend on ξ but on M . We make a slight abuse of notations by writing them bZn,M(·). We have:

Proposition 11. Let M ≥ kf0∗ qk∞ be a known number. Then, for all I ∈ I∞, there exist an

estimator bZn,M(I) of P (X ∈ I) such that: for all ξ ≥ 1, and probability larger than 1 − e−ξ: for all

I ∈ I∞, bZn,M(I)− P (X ∈ I) ≤ C s M κ1ℓIdI + cβκ2ℓ1−2βI d 2β+1 I  (ξ + log(ℓIdI)) n .

In the above inequality, C denotes a universal constant and cβ only depends on β.

A sketch of its proof is given in Section5.11. We may therefore use the same approach as above to estimate f0. By replacing bZn,ξ by bZn,M, we get a new estimator associated to the following

result (for a description of the procedure, see Section 5.12).

Theorem 12. Let F be a model fulfilling Assumption 2 and M ≥ kf0∗ qk∞ be a known number.

There exists an estimator ˆf ∈ F such that for all ξ ≥ 1 and probability larger than 1 − e−ξ,

(18)

If we are interested in bounds in expectation, we can notice that we obtain a parametric rate of convergence as soon as the bias term vanishes for a function f corresponding to a term ℓfdf

not depending on n. Hence, some of our results can be improved when the bound M is available. For instance, the rate of convergence associated to the estimation of a piecewise constant (or more generally polynomial) density on a given number of (unknown) intervals becomes parametric (see, e.g. Theorem 7). Similarly, the logarithmic term in Theorem9 can be omitted in this situation. However, when the infimum is reached for some ℓfdf depending in a polynomial way of n, there is

no gain to expect from the knowledge of M .

In general, when the quantity M is unknown, one can split the sample in two different parts, one of these parts allowing an estimation of this term. This issue is quite standard in statistics. It appears for instance in density estimation for some procedures based on penalized L2 criteria. It is actually possible to obtain sharp bounds when the density f0∗ q is smooth enough. We refer

to [BM97] for more details.

5. Proofs

5.1. An elementary inequality. For the ease of exposition, we reproduce here the following simple inequality that will be used all along the proofs.

Lemma 2. For any non-negative sequence (aj)j∈N, k ∈ N∗ and γ∈ (0, 1) we have k X j=1 a1−γj ≤   k X j=1 |aj|   1−γ kγ.

This bound is obtained thanks to a direct application of the H¨older inequality. 5.2. Sketch of the proof of Proposition 1. Define for t∈ R,

I(t) = Z 1

−1|x|

β−1 eitxdx.

Then elementary computations gives

|t|βI(t) = 2 Z |t|

0 |y|

β−1cos(y) dy.

Moreover, when |t| → +∞, |t|βI(t)−→ 2 cos(πβ/2)Γ(β) where Γ(·) denotes the Gamma function.

In particular, this limit is finite and non-zero.

Setting ψ(x) =|x|β−1(ϕ(x)− ϕ(0)) for all x ∈ R, we can write Z 1

−1

q(x)eitx dx = ϕ(0)I(t) + Z 1

−1

ψ(x)eitx dx.

On [−1, 1], ψ is bounded and its derivative is integrable. Therefore, integration by parts shows Z 1

−1

(19)

Likewise, on (−∞, −1] ∪ [1, +∞), q is bounded and its derivative is integrable and thus, Z

(−∞,−1] ∪[1,+∞)

q(x)eitx dx =O(1/t).

Hence q⋆(t) = ϕ(0)I(t) +O(1/t) as t → +∞, from which we deduce that (|t|β|q⋆(t)|)−1 admits a finite limit when |t| → +∞. As qdoes not vanish and is continuous, |q(t)|−1 is bounded above

on [−1, 1] and (|t|β|q(t)|)−1 is bounded above on (−∞, −1] ∪ [1, +∞). This ends the proof. 

5.3. Proof of (3) when f0 6∈ L2(R). Without loss of generality, we may suppose that I is an

interval says I = [a, b]. Let K be a density belonging to L2(R), h > 0 and Kh(·) = (1/h)K(·/h).

Then, if F0 denotes the cumulative distribution function of X,

Z 1I(t)(f0∗ Kh)(t) dt− Z 1I(t)f0(t) dt = Z K(x) (F0(b− xh) − F0(b) + F0(a)− F0(a− xh)) dx.

By using the dominated convergence theorem, we deduce, lim h→0 Z 1I(t)(f0∗ Kh)(t) dt− Z 1I(t)f0(t) dt = 0.

Yet, f0∗ Kh ∈ L2(R) as it is the case for Kh, and by Plancherel isometry,

lim h→0 1 2π Z 1 ⋆ I(t)f0⋆(t)Kh⋆(t) dt = Z 1I(t)f0(t) dt. Note that1 ⋆ I(t)f0⋆(t) = (1 ⋆

I(t)[q⋆(t)]−1)×(q⋆(t)f0⋆(t)). This map is a product of two square integrable

functions (q⋆f

0 belongs to L2(R) as q∗ f0 does since this density is bounded) and is therefore

integrable. We may thus use limh→0Kh⋆(t) = K⋆(0) = 1 and the dominated convergence theorem

to get Z 1 ⋆ I(t)f0⋆(t) dt = Z 1I(t)f0(t) dt as wished. 

5.4. Proof of Proposition 2. For any I∈ Id, we can always write I =∪kj=1Ij where k≤ d and

where the Ij are disjoint intervals. Then

(20)

We successively use Fourier Plancherel theorem and Cauchy-Schwarz inequality to get E[(Z(I))2] kf0∗ qk∞ 2π Z 1 ⋆ I(t) q⋆(t) 2 dt = kf0∗ qk∞ 2π Z k X j=1 1 ⋆ Ij(t) q⋆(t) 2 dt ≤ kf0∗ qk∞ × k × k X j=1 Z 1 ⋆ Ij(t) q⋆(t) 2 dt. For any j ∈ {1, . . . , k} and t ∈ R, we have

|1 ⋆ Ij(t)| = | 1 ⋆ Ij(t)| = 2 | sin(t|Ij|/2)| |t|

since Ij is an interval of finite length |Ij|. Hence, thanks to Assumption1,

Z 1 ⋆ Ij(t) q⋆(t) 2 dt = 4 Z sin2(t|I j|/2) t2|q(t)|2 dt ≤ 4 Z sin2(t|I j|/2) t2 κ1+ κ2|t| 2βdt ≤ 4κ1(|Ij|/2) Z sin2(t) t2 dt + 4κ2(|Ij|/2) 1−2βZ sin2(t)|t|2β−2dt.

By computing these integrals, Z 1 ⋆ Ij(t) q⋆(t) 2 dt≤ 2κ1π|Ij| + 4κ2 sin(πβ)Γ(2β) 1− 2β |Ij| 1−2β,

where Γ(·) denotes the Gamma function. Finally, we get

E[(Z(I))2] kf0∗ qk∞ 2π × k ×  2κ1π k X j=1 |Ij| + 4κ2sin(πβ)Γ(2β) 1− 2β k X j=1 |Ij|1−2β   , ≤ kf0∗ qk∞ × d ×  2κ1π|I| + 4κ2sin(πβ)Γ(2β) 1− 2β d 2β |I|1−2β  , where we have used k ≤ d,Pkj=1|Ij| = |I| and Lemma 2 with γ = 2β to conclude.

(21)

For all I ∈ m, and α ∈ R, Z I f0− 1 |I| Z I f0 ≤ Z I|f 0− α| + α|I| − Z I f0 = Z I|f0− α| + Z I (α− f0) ≤ 2 Z I|f0− α| . Therefore, d1(f0, ¯fm) = X I∈m Z I f0− 1 |I| Z I f0 + Z R\∪I∈mI f0≤ 2d1(f0,Fm). Moreover, Ehd1 fˆm, ¯fmi= X I∈m E[|Zn(I)− P (X ∈ I)|] ≤X I∈m r Eh|Zn(I)− P (X ∈ I)|2i ≤ 1 n1/2 X I∈m r Eh(Z(I))2i.

By using Proposition 2 and Lemma2, Ehd1 fˆm, ¯fmi≤ kf0∗ qk 1/2 ∞ n1/2 " κ1/21 X I∈m |I|1/2+ c1/2β κ1/22 X I∈m |I|1/2−β # , ≤ kf0∗ qk 1/2 ∞ n1/2 h κ1/21 ℓ1/2|m|1/2+ cβ1/2κ1/22 ℓ1/2−β|m|1/2+βi.

The proof then follows from the triangle inequality Ehd1 f0, ˆfmi≤ d1 f0, ¯fm+ E

h

d1 fˆm, ¯fmi. 

5.6. Proofs of Theorems 5 and 6. We need the elementary claim displayed below Claim 1. For all positive numbers A, a, b, n, α,

(22)

Proof of Theorem 5. We apply Theorem4, use d1(f0, f )≤ d1(f01

[−ℓ,ℓ], f ) +

R

[−ℓ,ℓ]cf0and Lemma4

in Appendix Bto get for all ℓ≥ 1, Ehd1(f0, ˆf ) i ≤ C′inf k≥1  L1(f01 [−ℓ,ℓ])/k + s kf0∗ qk∞ κ1ℓ(k + r + 1) + κ2cβℓ1−2β(k + r + 1)1+2βlog(ℓn) n + Z [−ℓ,ℓ]c f0 ) .

We then use Claim 1. 

Proof of Theorem 6. Similarly, we apply Theorem4, use d1(f0, f )≤ d1(f01[−ℓ,ℓ], f )+

R

[−ℓ,ℓ]cf0, and

deduce from Lemma 5 in AppendixBthat for all ℓ≥ 1, Ehd1(f0, ˆf ) i ≤ C′inf k≥1  L2(f01[−ℓ,ℓ])/k 2 + s kf0∗ qk∞ κ1ℓ(k + r + 1) + κ2cβℓ1−2β(k + r + 1)1+2β  log(ℓn) n + Z [−ℓ,ℓ]c f0 ) .

We finally use Claim 1. 

5.7. Proof of Proposition 8. We set

F ={fm,¯¯ s, ¯m ∈ Q, ¯s ∈ Q ∩ (0, +∞)} .

For any (m, s) ∈ R × (0, +∞), there exists a sequence ( ¯mr, ¯sr)r ∈ ((Q ∩ [−|m|, |m|]) × (Q ∩

(0, s]))N converging to (m, s). Since f is continuous almost everywhere, (f ¯

mr,¯sr)r converges almost

everywhere to fm,s. Scheff´e’s lemma then ensures that the convergence is in the L1 space. This

proves the second part of Assumption 2.

The proof of the first part is elementary but a bit tedious. We need to show that [fm,s> fm′,s′] is

a union of at most 5 intervals. We may assume without loss of generality that m′ = 0 and s= 1 in

which case f0,1 = f . We moreover suppose in the sequel that m≥ 0. The proof is similar when m

is non-positive.

Let I = (−1, 1) ∩ (m − s, m + s) . Note that

[fm,s> f ]∩ Ic= (m− s, m + s) ∩ (−1, 1)c,

is a union of at most 2 intervals. Since fm,s− f is non-decreasing on I ∩ [0, m], [fm,s> f ]∩ I ∩ [0, m]

is an interval.

We now show that [fm,s> f ]∩ I ∩ (m + ∞) is an interval. Let x ∈ I ∩ (m, +∞). We distinguish

(23)

• Suppose first that s ≥ 1. Let f2 be the concave function f2 = log f . In the following, f2′

denotes either the left or right derivative of f2. Since f2′ is non-increasing and non-positive

on [0, 1), and (x− m)/s < x, we deduce that

f2′(x)≤ f2′((x− m)/s) ≤ f2′((x− m)/s)/s.

Hence, ϕ(x) = f2((x− m)/s) − f2(x) is non-decreasing on I∩ (m, +∞). Therefore, the set

[fm,s> f ]∩ I ∩ (m, +∞) = [ϕ > log s] ∩ I ∩ (m, +∞)

is an interval.

• Suppose now that s < 1. When x ≤ (x − m)/s,

f2′((x− m)/s)/s ≤ f2′((x− m)/s) ≤ f2′(x).

The map ϕ is therefore non-increasing on I∩ [m/(1 − s), +∞) and [fm,s> f ]∩ I ∩ [m/(1 − s), +∞)

is an interval. Moreover, this interval is either empty, or contains m/(1− s). When x ≥ (x− m)/s,

f (x)≤ f((x − m)/s) < f((x − m)/s)/s. Then,

[fm,s> f ]∩ I ∩ (m, m/(1 − s)) = (m, m/(1 − s))

is an interval. We derive that

[fm,s> f ]∩ I ∩ (m, +∞) = ([fm,s> f ]∩ I ∩ [m/(1 − s), +∞)) ∪ (m, m/(1 − s))

is an interval.

We now show that [fm,s > f ]∩ I ∩ (−∞, 0) is an interval. For this purpose, remark that f is

non-decreasing and f2′ is both non-negative and non-increasing on (−∞, 0) ∩ I. Let • When s ≥ 1, a similar reasoning to the previous one shows when x ∈ I ∩ (−∞, 0),

( fm,s(x)≤ f(x) when (x − m)/s ≤ x ϕ′(x)≤ 0 when (x− m)/s ≥ x. Therefore, [fm,s> f ]∩ I ∩ (−∞, 0) = [ϕ > log s] ∩ I ∩ (−∞, −m/(s − 1)) is an interval.

• When s < 1, (x − m)/s ≤ x when x ∈ I ∩ (−∞, 0), and thus ϕ′(x)≥ 0. This implies that

[fm,s> f ]∩ I ∩ (−∞, 0) = [ϕ > log s] ∩ I ∩ (−∞, −m/(s − 1)]

is an interval.

(24)

5.8. Proof of Proposition 10

Claim 2. For all I ∈ Id,

P(X∈ I) ≤qkf0∗ qk∞[κ1d|I| + cβκ2d1+2β|I|1−2β].

(19)

Proof. We use Proposition 2and Jensen’s inequality E[Z(I)]≤pE[(Z(I))2] 

Claim 3. For all δ > 0 and interval I ∈ I1, the following assertion holds true on an event of

probability lower bounded by 1− e−δ:

bZδ(I)− P (X ∈ I) ≤ C r kf0∗ qk∞[κ1|I| + cβκ2|I|1−2β] (1 + δ) n . (20)

In this inequality, C is a universal constant and cβ only depends on β.

Proof. The proof is straightforward when |I| = 0. Suppose now that |I| > 0. When δ < n − 1, this inequality comes from standard results about median of means estimators, see Section 4.1 of [DLLO16]. When δ is larger, bZδ(I) = 0, and (20) ensues from (19). 

Claim 4. For all ξ > 0, the following assertion holds true on an event of probability lower bounded by 1− e−ξ: for all d, j, ℓ and interval I ∈ I

1 with endpoints lying in Gd,j,ℓ,

bZξ+δd,j,ℓ(I)− P (X ∈ I) ≤ C r kf0∗ qk∞[κ1|I| + cβκ2|I|1−2β] (ξ + 1 + δd,j,ℓ) n , (21)

where δd,j,ℓ = 2 log(jdℓ) + 2 log(1 + d2j). In this inequality, C is a universal constant and cβ only

depends on β.

Proof. Let for all δ > 0 and interval I, E(δ, I) be the event on which (20) is true. Let Id,j,ℓ′ = Id,j,ℓ∩ I1 be the collection of intervals with endpoints inGd,j,ℓ. Then, (21) holds true on

E = \

(d,j,ℓ)∈(N⋆)3

I∈I′ d,j,ℓ

E(ξ + δd,j,ℓ, I).

Moreover, Bonferroni’s inequality asserts P(Ec) X (d,j,ℓ)∈(N⋆)3 |Id,j,ℓ′ |P Ec(ξ + δd,j,ℓ, I) ≤ e−ξ X (d,j,ℓ)∈(N⋆)3 (1 + d2j)2e−2 log(jdℓ)−2 log(1+d2j) ≤ e−ξ.  Claim 5. For all ξ > 0, the following assertion holds true on an event of probability lower bounded by 1− e−ξ: for all d, j, ℓ and I ∈ I

d,j,ℓ, bZξ+δd,j,ℓ(I)− P (X ∈ I) ≤ C r kf0∗ qk∞[κ1d|I| + cβκ2d1+2β|I|1−2β] (ξ + 1 + δd,j,ℓ) n .

(25)

Proof. Let I ∈ Id,j,ℓ written as I =∪dk=1Ik such that ¯Ik1∩ ¯Ik2 =∅ for all k16= k2. Then, bZξ+δd,j,ℓ(I)− P (X ∈ I) ≤ d X k=1 bZξ+δd,j,ℓ( ¯Ik)− P (X ∈ ¯Ik) .

Note that each ¯Ik belongs toId,j,ℓ′ =Id,j,ℓ∩ I1. Therefore, on the event defined in the preceding

claim, bZξ+δd,j,ℓ(I)− P (X ∈ I) ≤ C r kf0∗ qk∞(ξ + 1 + δd,j,ℓ) n d X k=1 q κ1|Ik| + cβκ2|Ik|1−2β.

We finally use Cauchy-Schwarz inequality and Lemma2. 

Claim 6. For all ξ > 0, the following assertion holds true on an event of probability lower bounded by 1− e−ξ: for all j and I ∈ I

∞ with endpoints lying in G2dI,j,ℓI,

bZξ+δj(I)(I)− P (X ∈ I) ≤ C s

kf0∗ qk∞[κ1dI|I| + cβκ2d1+2βI |I|1−2β](ξ + 1 + δj(I))

n .

In this inequality, C is a universal constant and cβ only depends on β.

Proof. We apply the preceding claim, use that δj(I) = δ2dI,j,ℓI and increase C, cβ. 

Proof of Proposition 10. For all j and I ∈ I∞, |πj+1(I)\ πj(I)| ≤ ℓI2−j+2 and πj+1(I)\ πj(I)

belongs to I2dI,j,ℓI. The preceding claim then ensures:

bZξ+δj(I)(πj+1(I)\ πj(I))− P (X ∈ πj+1(I)\ πj(I))

≤ C v u u tkf0∗ qk∞  κ1dIℓI2−j + cβκ2d1+2βI ℓ1−2βI 2−(1−2β)j  (ξ + 1 + 2 log(2jℓIdI) + 2 log(1 + dI2j+1)) n . Therefore, ∞ X j=1

bZξ+δj(I)(πj+1(I)\ πj(I))− P (X ∈ πj+1(I)\ πj(I))

≤ C′ s kf0∗ qk∞ κ1ℓIdI + c′βκ2ℓ1−2βI d2β+1I (ξ + log(ℓIdI)) n ,

where C′ is a universal constant and where c

β depends on β only. Since

P(X ∈ I) = P (X ∈ π1(I)) + ∞ X j=1 P(X ∈ πj+1(I)\ πj(I)), we obtain bZn,ξ(I)− P (X ∈ I) ≤ C′ s kf0∗ qk∞ κ1ℓIdI + c′βκ2ℓ1−2βI d 2β+1 I  (ξ + log(ℓIdI)) n + bZξ+δ1(I)(π1(I))− P (X ∈ π1(I))

(26)

We use Claim 6 again to bound the last term.  5.9. Proof of Lemma 1. Let f be an arbitrary function of F . Thanks to Assumption2, there exists f′∈ F such that df′ ≤ df, ℓf′ ≤ ℓf, d1(f, f′)≤ 1/n. Then,

γξ(f′)≤ sup I∈Idf ′ I⊂[−ℓf ′,ℓf ′]  Z I f′ Z I f + Z I f − bZn,ξ(I)  . By Scheffe’s identity, γξ(f′)≤ d1(f, f′) 2 + I∈Isup df ′ I⊂[−ℓf ′,ℓf ′] Z I f− bZn,ξ(I) ≤ 1 2n+ γξ(f ). We thus deduce that

inf

f ∈F

γξ(f )≤

1

2n+ inff ∈Fγξ(f ).

Therefore, any ˆf ∈ F such that

γξ( ˆf )≤ inf f ∈F γξ(f ) + 1 2n. satisfies condition (18).  5.10. Proof of Theorem 4

Lemma 3. Let ˆf ∈ F be an estimator satisfying (18). Then,

d1(f0, ˆf )≤ inf f ∈F          3d1(f0, f ) + 4 sup I∈Idf I⊂[−ℓf,ℓf] Zbn,ξ(I)− Z I f0 + 2 n          .

Proof. For all f ∈ F , we define I = [f > ˆf ] when ℓf ≤ ℓfˆ and I = [ ˆf > f ] otherwise. Note that

I ∈ Idf ∩ Idfˆ and is included in [−ℓf, ℓf]∩ [−ℓfˆ, ℓfˆ] as the functions of F are non-negative. Since

the functions f ∈ F are densities,

d1(f, ˆf ) = 2 Z I f − Z I ˆ f . There exists ¯I ∈ Idf ∩ Idfˆ included in I such that

(27)

Therefore, d1(f, ˆf )≤ 2γξ(f ) + 2γξ( ˆf ) + 1 n ≤ 4γξ(f ) + 2 n (22) by using (18). Now, γξ(f )≤ sup I∈Idf I⊂[−ℓf,ℓf] Z I f0− Z I f + sup I∈Idf I⊂[−ℓf,ℓf] Z I f0− bZn,ξ(I) . By using Scheffe identity,

γξ(f )≤ d1(f0, f ) 2 + I∈Isup df I⊂[−ℓf,ℓf] Z I f0− bZn,ξ(I) . This inequality, together with (22) entails that

d1(f, ˆf )≤ 2d1(f0, f ) + 4 sup I∈Idf I⊂[−ℓf,ℓf] Z I f0− bZn,ξ(I) + 2 n.

We conclude using the triangle inequality

d1(f0, ˆf )≤ d1(f0, f ) + d1(f, ˆf ).

 Proof of Theorem 4. Combining Proposition 10 and Lemma 3 implies that for all f ∈ F , and probability larger than 1− e−ξ,

d1(f0, ˆf )≤ 3d1(f0, f ) + C′ s kf0∗ qk∞ κ1ℓfdf + cβκ2ℓ1−2βf d 2β+1 f  (ξ + log(ℓfdf)) n + 2 n, (23)

where C′ is a universal constant. It can be easily seen that the residual term 2/n can be removed

from the previous bound up to a slight change in C′ and cβ.

Concerning the second inequality, we apply the first part with ξ = log n. There exists therefore an event En of probability 1− 1/n and an estimator ˆf such that d1(f0, ˆf )1E

n is not larger than

inf f ∈F   3d1(f0, f ) + c s kf0∗ qk∞ κ1ℓfdf + cβκ2ℓ1−2βf d 1+2β f 

(log ℓf + log df + log n)

n

(28)

Up to an increase of c, we now assert that the term log df inside the square root can be removed.

Let f ∈ F such that df ≥ n. Then, we have

3d1(f0, f ) + c

s

kf0∗ qk∞

κ1ℓfdf + cβκ2ℓ1−2βf d1+2βf (log ℓf + log df + log n)

n ≥ 3d1(f0, f ) + q kf0∗ qk∞ κ1ℓf+ cβκ2ℓ1−2βf , ≥ 3d1(f0, f ) + 2 Z [−ℓf,ℓf] f0, ≥ 3d1(f0, f ) + 2(1− d1(f0, f )).

The last term being always larger than 2, the risk bound becomes straightforward. To conclude the proof, just remark that

Ehd1(f0, ˆf ) i ≤ Ehd1(f0, ˆf )1E n i + Ehd1(f0, ˆf )1Ec n i ≤ Ehd1(f0, ˆf )1E n i + 2/n.

It then remains to show that the residual term 2/n is negligible as before (up to an increase of the

terms c, cβ). 

5.11. Sketch of the proof of Proposition 11. We explain here how bZn,M(I) is defined.

Thanks to (5), an upper bound of var(Z(I)) is known:

var(Z(I))≤ (M/4)hκ1d|I| + cβκ2d1+2β|I|1−2β

i .

It then follows from (the proof of) Theorem 4.2 of [DLLO16] that there exists a multiple δ-sub-Gaussian estimator eZM(I) such that for all δ∈ (0, n/2 − 1 − log 4), and I ∈ I1,

P eZM(I)− P (X ∈ I) ≤ c r M (κ1|I| + cβκ2|I|1−2β) (1 + δ) n ! ≥ 1 − e−δ,

where c is a universal constant. We then set b ZM(I) = min  e ZM(I), (1/2) q M (κ1|I| + cβκ2|I|1−2β)  . By using Claim 2, we get, up to an increase of c, that for all δ > 0 and I ∈ I1,

P bZM(I)− P (X ∈ I) ≤ c r M (κ1|I| + cβκ2|I|1−2β) (1 + δ) n ! ≥ 1 − e−δ.

This defines bZM(I) for all interval I. When I ∈ I∞ is written as I =Sj=1dI Ij where dI ∈ N⋆, and

where Ij is an interval such that ¯Ij∩ ¯Ij′ =∅ for all j 6= j′, we set

(29)

Finally, bZn,M(I) is defined by b Zn,M(I) = bZM(π1(I)) + n X j=1 b ZM(πj+1(I)\ πj(I)).

Up to minor modifications, we show the uniform exponential deviation inequality for bZn,M(I) by the

same method used in the proof of Proposition 10. The only things that change are the replacement of kf0∗ qk∞ by M , the fact that the estimator no longer depends on the confidence level, and that

the above sum is truncated to n instead of being infinite. 

5.12. Sketch of the proof of Theorem 12. The procedure is the following. We define for f ∈ F , γ(f ) = sup I∈Idf I⊂[−ℓf,ℓf] Z I f − bZn,M(I) . We then consider an estimator ˆf ∈ F such that

γ( ˆf )≤ inf

f ∈Fγ(f ) + 1/n.

We may prove a variant of Lemma 1 here to avoid measurability issues. Moreover, the theorem follows from same arguments as the ones used in the proof of Theorem 4. We merely use

Proposi-tion 11 in place of Proposition 10. 

References

[BB16] Yannick Baraud and Lucien Birg´e. ρ-estimators for shape restricted density estimation. Stochastic Processes and their Applications, 126(12):3888–3912, 2016.

[BBS17] Yannick Baraud, Lucien Birg´e, and Mathieu Sart. A new method for estimation and model selection: ρ-estimation. Inventiones mathematicae, 207(2):425–517, 2017.

[Bir83] Lucien Birg´e. Approximation dans les espaces m´etriques et th´eorie de l’estimation. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und verwandte Gebiete, 65(2):181–237, 1983. [Bir87a] Lucien Birg´e. Estimating a density under order restrictions: nonasymptotic minimax

risk. Ann. Statist., 15(3):995–1012, 1987.

[Bir87b] Lucien Birg´e. On the risk of histograms for estimating decreasing densities. The Annals of Statistics, 15(3):1013–1022, 1987.

[Bir89] Lucien Birge. The Grenander estimator: a nonasymptotic approach. The Annals of Statistics, pages 1532–1549, 1989.

[Bir06] Lucien Birg´e. Model selection via testing: an alternative to (penalized) maximum like-lihood estimators. Annales de l’Institut Henri Poincare (B) Probability and Statistics, 42(3):273 – 325, 2006.

[Bir12] Lucien Birg´e. Robust tests for model selection. In From Probability to Statistics and Back: High-Dimensional Models and Processes. A Festschrift in Honor of Jon Wellner, volume 9, pages 47–64. IMS Collections, 2012.

(30)

[BMP08] Cristina Butucea, Catherine Matias, and Christophe Pouet. Adaptivity in convolution models with partially known noise distribution. Electron. J. Stat., 2:897–915, 2008. [CDH11] Raymond J. Carroll, Aurore Delaigle, and Peter Hall. Testing and estimating

shape-constrained nonparametric density and regression in the presence of measurement error. J. Amer. Statist. Assoc., 106(493):191–202, 2011.

[CL11] F. Comte and C. Lacour. Data-driven density estimation in the presence of additive noise with unknown distribution. J. R. Stat. Soc. Ser. B Stat. Methodol., 73(4):601–627, 2011. [CL13] Fabienne Comte and Claire Lacour. Anisotropic adaptive kernel deconvolution. In

Annales de l’IHP Probabilit´es et statistiques, volume 49, pages 569–609, 2013.

[CR10] Laurent Cavalier and Marc Raimondo. Multiscale density estimation with errors in variables. J. Korean Statist. Soc., 39(4):417–429, 2010.

[CRT06] Fabienne Comte, Yves Rozenholc, and Marie-Luce Taupin. Penalized contrast estimator for adaptive density deconvolution. Canad. J. Statist., 34(3):431–452, 2006.

[DG04] A. Delaigle and I. Gijbels. Bootstrap bandwidth selection in kernel density estimation from a contaminated sample. Ann. Inst. Statist. Math., 56(1):19–47, 2004.

[DGJ11] I. Dattner, A. Goldenshluger, and A. Juditsky. On deconvolution of distribution func-tions. Ann. Statist., 39(5):2477–2501, 2011.

[DL93] Ronald A DeVore and George G Lorentz. Constructive approximation, volume 303. Springer Science & Business Media, 1993.

[DL12] Luc Devroye and G´abor Lugosi. Combinatorial methods in density estimation. Springer Science & Business Media, 2012.

[DLLO16] Luc Devroye, Matthieu Lerasle, Gabor Lugosi, and Roberto I Oliveira. Sub-gaussian mean estimators. The Annals of Statistics, 44(6):2695–2725, 2016.

[DM13] J´erˆome Dedecker and Bertrand Michel. Minimax rates of convergence for Wasserstein deconvolution with supersmooth errors in any dimension. J. Multivariate Anal., 122:278– 291, 2013.

[Fan91] Jianqing Fan. On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist., 19(3):1257–1272, 1991.

[FHM14] Sophie Frick, Thorsten Hohage, and Axel Munk. Asymptotic laws for change point estimation in inverse regression. Statist. Sinica, 24(2):555–575, 2014.

[FK02] Jianqing Fan and Ja-Yong Koo. Wavelet deconvolution. IEEE transactions on infor-mation theory, 48(3):734–747, 2002.

[GJTZ08] A. Goldenshluger, A. Juditsky, A. B. Tsybakov, and A. Zeevi. Change-point estimation from indirect observations. I. Minimax complexity. Ann. Inst. Henri Poincar´e Probab. Stat., 44(5):787–818, 2008.

[GTZ06] A. Goldenshluger, A. Tsybakov, and A. Zeevi. Optimal change-point estimation from indirect observations. Ann. Statist., 34(1):350–372, 2006.

[LN11] Karim Lounici and Richard Nickl. Global uniform risk bounds for wavelet deconvolution estimators. The Annals of Statistics, 39(1):201–231, 2011.

[LW19] OV Lepski and Thomas Willer. Oracle inequalities and adaptive estimation in the convolution structure density model. The Annals of Statistics, 47(1):233–287, 2019. [Mei09] Alexander Meister. Deconvolution problems in nonparametric statistics, volume 193 of

Lecture Notes in Statistics. Springer-Verlag, Berlin, 2009.

(31)

[PNR13] Thanh Mai Pham Ngoc and Vincent Rivoirard. The dictionary approach for spherical deconvolution. J. Multivariate Anal., 115:138–156, 2013.

[PV99] Marianna Pensky and Brani Vidakovic. Adaptive wavelet estimator for nonparametric density deconvolution. Ann. Statist., 27(6):2033–2053, 1999.

[Reb16] G. Rebelles. Structural adaptive deconvolution under Lp-losses. Math. Methods Statist.,

25(1):26–53, 2016.

[Sar18] Mathieu Sart. Estimating a density, a hazard rate, and a transition intensity via the ρ-estimation method. preprint Hal, 2018.

[Tsy08] Alexandre B Tsybakov. Introduction to nonparametric estimation. Springer Science & Business Media, 2008.

(32)

A. Minimax rates of convergence for decreasing densities

Proposition 13. Suppose Assumption 1 is met and that there exist κ′1, κ′2, κ′3 such that for all t∈ R,

|q⋆(t)|−2≥ κ′1+ κ′2|t|2β and

(q⋆)(t) 2

≤ κ′3|t|−2β−2.

Let D be the set of densities f that are non-increasing on [0, 1] and such that kfk∞ ≤ 1. Then,

for n large enough,

inf ˆ f sup f0∈D Ehd1(f01 [0,1], ˆf ) i ≤ C (1/n)1/(2β+3) inf ˆ f sup f0∈D Ehd1(f01 [0,1], ˆf ) i ≥ c (1/n)1/(2β+3)

where the infimum is taken over all possible estimators ˆf of f0. Moreover, C depends only

on κ1, κ2, β and c depends only on q.

Proof of Proposition 13. We remark that the proof of (8) also entails Ehd1 f01[0,1], ˆfm i ≤ 2d1(f01[0,1],Fm) + r kf0∗ qk∞ κ1|m| + cβκ2|m|1+2β n , (24)

whenever m is a partition of [0, 1]. Let N be an integer whose value will be made precise later on. We consider here the partition m of size N introduced in [Bir87a]. It satisfies in particular

d1(f01

[0,1],Fm)≤ C/N

provided f01

[0,1] is non-increasing on [0, 1] and bounded by 1. Here C is a numerical constant. We

then choose N to balance the two terms in the right-hand side of (24).

We now turn our attention to the lower bound. The arguments we propose are inspired by the lower minimax bound of [Bir87b] for decreasing functions in the noise free case. Let p≥ 1 whose value will be made precise at the end of the proof. We then define

ε = log(3/2)

p ≤ 1, λ =

1 + ε

u(1 + ε/2), and u = [(1 + ε)

p− 1]−1.

For any j ∈ {1, . . . , p}, we introduce xj = u[(1 + ε)j − 1], Ij = [xj−1, xj) and ℓj = xj − xj−1.

Remark that the intervals Ij are disjoint and included in [0, 1].

Now, for any j ∈ {1, . . . , p}, define the functions fj and gj as

fj = λ(1 + ε)−j(1 + ε/2)1 [xj−1,xj), gj = λ(1 + ε)−j+11[x j−1,(xj−1+xj)/2]+ λ(1 + ε) −j 1((x j−1+xj)/2,xj).

We denote by ϕ the Cauchy density, introduce for all ν ∈ {0, 1}p,

φν = (1− log(3/2))ϕ + p

X

j=1

(νjfj+ (1− νj)gj),

and gather all these functions into a set F = {φν, ν ∈ {0, 1}p}. We begin by establishing the

(33)

Claim 7. We have F ⊂ D. Moreover, for any j ∈ {1, . . . , p}, d1(fj, gj)≥ ε2/3.

Proof. For any j ∈ {1, . . . , p}, we observe that the functions fj and gjshare the following properties:

• fj and gj are piecewise constant functions, compactly supported on Ij.

• for all x ∈ Ij and y∈ Ij−1,

max{fj(x), gj(x)} ≤ min{fj−1(y), gj−1(y)}.

• R fj =R gj = ε.

• kf1k∞≤ 1 and kg1k∞≤ 1.

We deduce that φν is a decreasing density on [0, 1] bounded by 1 for any ν∈ {0, 1}p. Moreover,

d1(fj, gj) = ℓjλ(1 + ε)−j ε 2 = ε 2× 1 2 + ε ≥ 1 3ε 2.  In the sequel, to underline the dependency of the density f0 in the results, we add a subscript to

the expectation. More precisely, Ef corresponds to the expectation with respect to the variables Xi

which are assumed to be of density f . We have for all estimator ˆf , sup f0∈D Ef0hd1(f01[0,1], ˆf ) i ≥ sup f0∈F Ef0hd1(f01[0,1], ˆf ) i ≥ 1 2ν∈{0,1}sup p p X j=1  Eφνj,1  d1  ˆ f1I j, ϕ + fj 2 1I j  +Eφνj,0  d1  ˆ f1I j, ϕ + gj 2 1I j  , where for any ν ∈ {0, 1}p, and k∈ {0, 1},

νj,k = (ν1, . . . , νj−1, k, νj+1, . . . , νp).

Then, using standard arguments, sup f0∈D Ef0hd1(f01[0,1], ˆf ) i ≥ 1 4ν∈{0,1}sup p p X j=1 d1(fj, gj) Z Rn min n Y k=1 φνj,1∗ q(xk), n Y k=1 φνj,0∗ q(xk) ! dx. Using Claim 7, we get

Références

Documents relatifs

The first one [DL11] was to consider the mean of the Euler function evaluated at Fibonacci arguments and the second one was to notice that since the sequences have a linear growth

More specifically, studying the renewal in R (see [Car83] and the beginning of sec- tion 2), we see that the rate of convergence depends on a diophantine condition on the measure and

[2005], and Felzer and Brodsky [2006] all used a selection criterion to first select mainshocks, based on their relative isolation from previous, close‐by, large earthquakes, and

To reach this goal, we develop two wavelet estimators: a nonadaptive based on a projection and an adaptive based on a hard thresholding rule.. We evaluate their performances by

A note on the adaptive estimation of a bi-dimensional density in the case of knowledge of the copula density.. Ingo Bulla, Christophe Chesneau, Fabien Navarro,

Comparison of the method with the results of a Gaussian kernel estimator with an unsymmetric noise (MISE × 100 averaged over 500 simulations), ˆ r ker is the risk of the

We then show that this leads to adaptative estimators, that means that the estimator reaches the best possible rate of convergence (up to a log factor).. Finally we show some ways

Let us recall (see in particular [13] and the textbooks [3, 20]) that, roughly, from the qualitative viewpoint the classical results for the homogeneous Navier-Stokes equations