On adaptive wavelet estimation of the regression function and its derivatives in an errors-in-variables model

(1)

HAL Id: hal-00466830

https://hal.archives-ouvertes.fr/hal-00466830v2

Preprint submitted on 29 Jun 2010

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

function and its derivatives in an errors-in-variables model

Christophe Chesneau

To cite this version:

Christophe Chesneau. On adaptive wavelet estimation of the regression function and its derivatives

in an errors-in-variables model. 2010. �hal-00466830v2�

(2)

in an errors-in-variables model

Christophe Chesneau

University of Caen (LMNO), Caen, France. Email: chesneau@math.unicaen.fr

Abstract

We consider a regression model with errors-in-variables:(Y, X), whereY =f(Z) + ξ and X = Z +W. Our goal is to estimate the unknown regression functionf and its derivatives under mild assumptions onξ(only finite moments of order2are required). To reach this goal, we develop a new adaptive wavelet estimator based on a hard thresholding rule. Taking the minimax approach under the mean integrated squared error over Besov balls, we prove that it attains a sharp rate of convergence.

Keywords: Errors-in-variables model, Derivatives function estimation, Minimax approach, Wavelets, Hard thresholding.

2000 Mathematics Subject Classification:62G07, 62G20.

1 Motivations

We observe nindependent pairs of random variables (Y1, X1), . . . ,(Yn, Xn)in the following errors-in-variables model: for anyv∈ {1, . . . , n},

( Yv =f(Zv) +ξv,

X_v=Z_v+W_v, (1.1)

wheref is an unknown function, X1, . . . , Xn are n i.i.d. random variables having the uniform distribution on[0,1],W₁, . . . , W_nareni.i.d. unobserved random variables and ξ1, . . . , ξn areni.i.d. unobserved zero mean random variables. We assume that all these random variables are independent, the density function ofW₁, denotedg, is known and ξ1 admits finite moments of order 2. We aim to estimate f and its derivatives from (Y₁, X₁), . . . ,(Y_n, X_n).

The estimation of f from the model (1.1) has received a lot of attention. See e.g.

[6, 11–13, 15, 18, 19]. In this paper, we focus on a more general problem: the adaptive estimation of thed-th derivative off: f^(d). This is of interest to detect possible bumps, concavity or convexity properties of f. The estimation off^(d) has been investigated in

This work is supported by ANR grant NatImages, ANR-08-EMER-009

(3)

several papers for various models (but never (1.1)) starting with [1]. For references using wavelet methods, see e.g. [3–5, 23].

Another feature of the study concernsξ1, . . . , ξn: to estimatef^(d)(includingf =f⁽⁰⁾), we only assume thatξ₁has finite moments of order2. And thus we do not need to know the distribution ofξ1. Moreover, this relaxes the assumption onξ1in [6] where finite moments of order>6are required.

Under this general framework, considering the ordinary smooth case ong(see (2.2)), we estimatef^(d) by a new wavelet estimator based on a hard thresholding rule. It has the originality to combine a singular value decomposition (SVD) approach similar to the one of [17] and some technical tools introduced in wavelet estimation theory by [7].

We evaluate its performances by taking the minimax approach under the mean integrated squared error (MISE) over a wide class of functions: the Besov balls B_p,r^s (M) (to be defined in Section 3). We prove that our estimator attains the rate of convergence vn = (lnn/n)2s/(2s+2δ+2d+1)

, whereδ is a factor related to the ordinary smooth case.

This rate of convergence is sharp in the sense that it is the one attains by the best non- realistic linear wavelet estimator up to a logarithmic term.

The paper is organized as follows. Assumptions on the model and some notations are introduced in Section 2. Section 3 briefly describes the periodized wavelet basis on[0,1]

and the Besov balls. The estimators are presented in Section 4. The results are set in Section 5. The proofs are gathered in Section 6.

2 Assumptions and notations

We assume in the sequel thatf^(d)andgbelong toL²per([0,1]), the space of periodic functions of period one that are square-integrable on[0,1]:

L²per([0,1]) = (

h; his1-periodic andkhk2= Z 1

0

h²(x)dx ^1/2

<∞ )

. We assume that there exists a known constantC_∗>0such that

kfk∞= sup

x∈[0,1]

|f(x)| ≤C_∗<∞. (2.1) Any functionh∈L²per([0,1])can be represented by its Fourier series

h(t) =X

`∈Z

F(h)(`)e^2iπ`t, t∈[0,1],

where the equality is intended in mean-square convergence sense, andF(h)(`)denotes the Fourier coefficient given by

F(h)(`) = Z 1

0

h(x)e^−2iπ`xdx, `∈Z,

(4)

whenever this integral exists. The notation · will be used for the complex conjugate.

We consider the ordinary smooth case ong: there exist three constants,c_g >0,C_g>0 andδ >1, such that, for any`∈Z, the Fourier coefficient ofg, i.e.F(g)(`), satisfies

cg

(1 +`²)^δ/2 ≤ | F(g)(`)| ≤ Cg

(1 +`²)^δ/2. (2.2)

This assumption controls the decay of the Fourier coefficients ofg, and thus the smoothness ofg. It is a standard hypothesis usually adopted in the field of nonparametric estimation for deconvolution problems. See e.g. [14, 17, 21].

3 Wavelets and Besov balls

3.1 Periodized Meyer Wavelets

We consider an orthonormal wavelet basis generated by dilations and translations of a

”father” Meyer-type waveletφand a ”mother” Meyer-type waveletψ. The main features of such wavelets are:

1. they are bandlimited, i.e. the Fourier transforms ofφandψhave compact supports respectively included in[−4π/3,4π/3]and[−8π/3,−2π/3]∪[2π/3,8π/3].

2. for any frequency in[−2π,−π]∪[π,2π], there exists a constantc >0such that the magnitude of the Fourier transform ofψis lower bounded byc.

3. the functions(φ, ψ)areC^∞as their Fourier transforms have a compact support, and ψhas an infinite number of vanishing moments as its Fourier transform vanishes in a neighborhood of the origin, i.e. for anyu∈N,R∞

−∞x^uψ(x)dx= 0.

For the purpose of this paper, we use the periodized Meyer wavelet bases on the unit interval. For anyx∈[0,1], any integerjand anyk∈ {0, . . . ,2^j−1}, let

φ_j,k(x) = 2^j/2φ(2^jx−k), ψ_j,k(x) = 2^j/2ψ(2^jx−k) be the elements of the wavelet basis, and

φ^per_j,k(x) =X

l∈Z

φj,k(x−l), ψ^per_j,k(x) =X

l∈Z

ψj,k(x−l),

their periodized versions. There exists an integer j_∗ such that the collection B = {φ^per_j

∗,k, k ∈ {0, . . . ,2^j^∗−1}; ψ_j,k^per, j ∈ N− {0, . . . , j_∗−1}, k ∈ {0, . . . ,2^j −1}}

forms an orthonormal basis ofL²per([0,1]). In what follows, the superscript ”per” will be dropped to lighten the notation.

(5)

Letjc be an integer such thatjc ≥j_∗. A functionh∈ L²per([0,1])can be expanded into a wavelet series as

h(x) =

2^jc−1

X

k=0

α_j_c_,kφ_j_c_,k(x) +

∞

X

j=j_c 2^j−1

X

k=0

β_j,kψ_j,k(x), x∈[0,1], where

αj,k= Z 1

0

h(x)φ_j,k(x)dx, βj,k= Z 1

0

h(x)ψ_j,k(x)dx. (3.1) See [20, Vol. 1 Chapter III.11] for a detailed account on periodized orthonormal wavelet bases.

3.2 Besov balls

LetM >0,s >0,p≥1andr≥1. Setβj∗−1,k =αj∗,k. A functionhbelongs to the Besov ballsB_p,r^s (M)if and only if there exists a constantM^∗ >0such that the wavelet coefficients (3.1) satisfy







∞

X

j=j∗−1





2j(s+1/2−1/p)





2^j−1

X

k=0

|βj,k|^p





1/p





r





1/r

≤M^∗.

For a particular choice of parameterss,pandr, these sets contain the H¨older and Sobolev balls. See [20].

4 Estimators

Wavelet coefficient estimators. The first step to estimatef^(d)consists in expandingf^(d) onBand estimating its unknown wavelet coefficients.

For any integer j ≥ j_∗ and any k ∈ {0, . . . ,2^j − 1}, we estimate αj,k = R1

0 f^(d)(x)φ_j,k(x)dxby αb_j,k= 1

n

X

v=1

X

`∈Cj

(2iπ`)^dF(φj,k)(`)

F(g)(`) Y_ve^−2iπ`X^v, (4.1) Cj= supp (F(φj,0)) = supp (F(φj,k)), andβj,k=R1

0 f^(d)(x)ψ_j,k(x)dxby βbj,k= 1

n

X

v=1

Gv1_{|G_v_|≤η_j_}, (4.2)

where

Gv= X

`∈D_j

(2iπ`)^dF(ψj,k)(`)

F(g)(`) Yve^−2iπ`X^v,

(6)

Dj = supp (F(ψj,0)) = supp (F(ψj,k)), for any random eventA,1_Ais the indicator function onA, and the thresholdη_jis defined by

η_j =θ2^(δ+d)j r n

lnn, (4.3)

θ = p

C_∗∗(C_∗²+E(ξ₁²)),C_∗ is (2.1) andC_∗∗ = 2^δ−1(2(2π)^2d/c²_g)(8π/3)^2(δ+d). The estimators αbj,k andβbj,k are constructed via a SVD approach similar to the one of [17].

The idea of the thresholding in (4.2) is to operate a selection on the observations: when, forv ∈ {1, . . . , n},f is ”too noisy” byξv, the observation(Yv, Xv)is neglected. Such a technique has been introduced by [7] in wavelet estimation. Statistical properties ofαb_j,k andβbj,kare given in Propositions 6.1, 6.2 and 6.3.

We consider two wavelets estimators forf^(d): a linear estimator and a hard thresholding estimator.

Linear estimator. Assuming that f^(d) ∈ B^s_p,r(M) withp ≥ 2, we define the linear estimatorfb_d^Lby

fb_d^L(x) =

2^j⁰−1

X

k=0

αbj₀,kφj₀,k(x), (4.4) whereαbj,kis defined by (4.1) andj0is the integer satisfying

2⁻¹n1/(2s+2δ+2d+1)<2^j⁰≤n1/(2s+2δ+2d+1). It is not adaptive since it depends ons, the smoothness parameter off^(d). Hard thresholding estimator.We define the hard thresholding estimatorfb_d^Hby

fb_d^H(x) =

2^j∗−1

X

k=0

αb_j_∗_,kφ_j_∗_,k(x) +

j₁

X

j=j∗

2^j−1

X

k=0

βb_j,k1{^|b^βj,k|≥κλj}ψ_j,k(x), (4.5) whereαbj,kandβbj,kare defined by (4.1) and (4.2),j1is the integer satisfying

2⁻¹n1/(2δ+2d+1)<2^j¹≤n1/(2δ+2d+1), κ≥8/3 + 2 + 2p

16/9 + 4andλjis the threshold λj =θ2^(δ+d)j

rlnn

n . (4.6)

The definitions ofηjandλjare chosen to minimize the MISE offb^H and to make it adaptive. Further statistical results on the hard thresholding estimator for the standard regression model can be found in [8–10].

(7)

5 Results

Theorem 5.1. Consider(1.1)under the assumptions of Section2. Suppose thatf^(d) ∈ B_p,r^s (M)withs > 0,p ≥ 2 andr ≥ 1. Letfb_d^L be(4.4). Then there exists a constant C >0such that

E Z 1

0

fb_d^L(x)−f^(d)(x)² dx

≤Cn−2s/(2s+2δ+2d+1).

The proof of Theorem 5.1 uses a moment inequality on (4.1) and a suitable decomposition of the MISE.

Since the distribution ofξ₁is unknown, we can not use the likelihood function related to the model and the optimal lower bound seems difficult to determine (see e.g. [16, 25]).

For this reason, our benchmark will be the rate of convergence attains by the ”optimal non-realistic”fb_d^L, i.e. vn =n−2s/(2s+2δ+2d+1). Note that, under the standard Gaussian assumption onξ₁, one can prove thatv_nis optimal in the minimax sense.

Theorem 5.2. Consider(1.1)under the assumptions of Section2. Letfb_d^Hbe(4.5). Suppose thatf^(d)∈B^s_p,r(M)withr≥1,{p≥2ands >0}or{p∈[1,2)ands >(2δ+ 2d+ 1)/p}. Then there exists a constantC >0such that

E Z 1

0

fb_d^H(x)−f^(d)(x)2

dx

≤C lnn

n

2s/(2s+2δ+2d+1)

.

The proof of Theorem 5.2 is based on several probability results (moment inequalities, concentration inequality,. . . ) and a suitable decomposition of the MISE.

Theorem 5.2 proves that fb_d^H attains vn up to the logarithmic term (lnn)2s/(2s+2δ+2d+1).

Conclusion and perspectives. We have constructed a new adaptive estimatorfb_d^H for f^(d)under mild assumption onξ1. It is based on wavelet and thresholding. It has ”near- optimal” minimax properties for a wide class of functionsf^(d). Possible perspectives of this work are

- to potentially improve the estimation off^(d)by considering other kinds of thresholding rules as the block thresholding one introduced by [2],

- to investigate the random design case where the distribution ofX1is unknown.

6 Proofs

In this section, C represents a positive constant which may differ from one term to another.

(8)

6.1 Auxiliary results

Proposition 6.1. For any integerj ≥ j_∗ and any k ∈ {0, . . . ,2^j −1}, letα_j,kbe the wavelet coefficient(3.1)off^(d)andαbj,kbe(4.1). Then there exists a constantC >0such that

E

(αbj,k−αj,k)²

≤C2^2(δ+d)j1 n. Proof of Proposition 6.1.For anyv∈ {1, . . . , n}, let us set

Hv= X

`∈C_j

(2iπ`)^dF(φj,k)(`)

F(g)(`) Yve^−2iπ`X^v.

SinceX1,W1andξ1are independent, using the convolution product betweenfandg, i.e.

(f ? g)(x) =R1

0 f(x−y)g(y)dy, we have E Y1e^−2iπ`X¹

= E f(X1−W1)e^−2iπ`X¹

+E(ξ1)E e^−2iπ`X¹

= E f(X1−W1)e^−2iπ`X¹

= Z 1

0

Z 1 0

f(x−y)g(y)e^−2iπ`xdxdy= Z 1

0

(f ? g)(x)e^−2iπ`xdx

= F(f ? g)(`) =F(f)(`)F(g)(`).

Moreover, sincef is1-periodic, for anyu∈ {0, . . . , d},f^(u)is1-periodic andf^(u)(0) = f^(u)(1). Bydintegrations by parts, for any`∈Z, we have

(2iπ`)^dF(f)(`) =F f^(d)

(`).

The Parseval-Plancherel theorem gives E(H1) = X

`∈Cj

(2iπ`)^dF(φ_j,k)(`)

F(g)(`) E Y1e^−2iπ`X¹

= X

`∈Cj

(2iπ`)^dF(φj,k)(`)

F(g)(`) F(f)(`)F(g)(`)

= X

`∈Cj

F(φj,k)(`)(2iπ`)^dF(f)(`) =X

`∈Cj

F(φj,k)(`)F f^(d)

(`)

= Z 1

0

φ_j,k(x)f^(d)(x)dx=αj,k. (6.1)

HenceE(αbj,k) =αj,k. So E

(αb_j,k−α_j,k)²

= V(αb_j,k) =V 1 n

n

X

v=1

H_v

!

= 1 n²

n

X

v=1

V(H_v)

= 1

nV(H1)≤ 1 nE H₁²

. (6.2)

(9)

SinceX1,W1andξ1are independent withE(ξ1) = 0and|f(X1−W1)| ≤ kfk_∞≤C_∗<

∞, we have

E H₁²

= E





f²(X1−W1)



 X

`∈C_j

(2iπ`)^dF(φj,k)(`)

F(g)(`) e^−2iπ`X¹





2





+ 2E(ξ1)E





f(X1−W1)



 X

`∈Cj

(2iπ`)^dF(φj,k)(`)

F(g)(`) e^−2iπ`X¹





2





+ E ξ₁² E









 X

`∈Cj

(2iπ`)^dF(φ_j,k)(`)

F(g)(`) e^−2iπ`X¹





2





= E





f²(X₁−W₁)



 X

`∈Cj

(2iπ`)^dF(φj,k)(`)

F(g)(`) e^−2iπ`X¹





2





+ E ξ₁² E









 X

`∈C_j

(2iπ`)^dF(φj,k)(`)

F(g)(`) e^−2iπ`X¹





2





≤ C_∗²+E ξ₁² E









 X

`∈Cj

(2iπ`)^dF(φj,k)(`)

F(g)(`) e^−2iπ`X¹





2



. (6.3) The assumption (2.2) implies

sup

`∈Cj

(2π`)^2d

|F(g)(`)|² ≤ (2π)^2d c²_g sup

`∈Cj

`^2d(1 +`²)^δ≤2^δ−1(2π)^2d c²_g sup

`∈Cj

`^2d(1 +`^2δ)

≤ 2^δ−12(2π)^2d c²_g

8π 3

^2(δ+d)

2^2(δ+d)j =C_∗∗2^2(δ+d)j. (6.4) Using the fact that (e^−2iπ`x)_`∈_Z is an orthonormal basis of L²per([0,1]), (6.4) and the Parseval-Plancherel theorem, we obtain

E









 X

`∈Cj

(2iπ`)^dF(φj,k)(`)

F(g)(`) e^−2iπ`X¹





2





= Z 1

0



 X

`∈C_j

(2iπ`)^dF(φj,k)(`) F(g)(`) e^−2iπ`x





2

dx=X

`∈C_j

(2π`)^2d|F(φj,k) (`)|²

|F(g)(`)|²

≤ C_∗∗2^2(δ+d)j X

`∈Cj

|F(φ_j,k) (`)|²=C_∗∗2^2(δ+d)j Z 1

0

|φj,k(x)|²dx

= C_∗∗2^2(δ+d)j. (6.5)

(10)

Putting (6.3) and (6.5) together, we obtain E H₁²

≤θ²2^2(δ+d)j. (6.6)

It follows from (6.2) and (6.6) that E

(αbj,k−αj,k)²

≤C2^2(δ+d)j1 n.

Proposition 6.2. For any integerj ≥ j_∗ and anyk ∈ {0, . . . ,2^j −1}, letβ_j,k be the wavelet coefficient(3.1)off^(d)andβbj,kbe(4.2). Then there exists a constantC >0such that

E

βbj,k−βj,k

⁴

≤C2^4(δ+d)j(lnn)² n² .

Proof of Proposition 6.2.Proceeding as in (6.1) (withψinstead ofφ), we have βj,k=

Z 1 0

f^(d)(x)ψj,k(x)dx=E(Gv) =E(Gv1_{|G_v_|≤η_j_}) +E(G11_{|G₁_|>η_j_}). (6.7) We have

E

βb_j,k−β_j,k4

= E



 1 n

n

X

v=1

Gv1_{|G_v_|≤η_j_}−E Gv1_{|G_v_|≤η_j_}

−E G11_{|G₁_|>η_j_}

!4



≤ 8(A+B), (6.8)

where

A=E



 1 n

n

X

v=1

G_v1_{|G_v_|≤η_j_}−E(G_v1_{|G_v_|≤η_j_})

!⁴



and

B = E(|G1|1_{|G₁_|>η_j_})⁴ .

Let us boundAandB, in turn. To boundA, we need the Rosenthal inequality presented in lemma below (see [24]).

Lemma 6.1(Rosenthal’s inequality). Letp ≥ 2,n ∈ N^∗ and(Uv)v∈{1,...,n} benzero mean i.i.d. random variables such thatE(|U1|^p)<∞. Then there exists a constantC >0 such that

E

n

X

v=1

Uv

p!

≤Cmax

nE(|U1|^p), nE U₁²^p/2 .

(11)

Applying the Rosenthal inequality withp= 4and, for anyv∈ {1, . . . , n}, Uv=Gv1_{|G_v_|≤η_j_}−E Gv1_{|G_v_|≤η_j_}

, we obtain

A= 1 n⁴E





n

X

v=1

U_v

!4

≤C 1 n⁴max

nE U₁⁴

, nE U₁²² .

Using (6.6) (withψinstead ofφ), we have, for anya∈ {2,4}, E(U₁â)≤2âE Gâ₁1_{|G₁_|≤η_j_}

≤2^aη^a−2_j E G²₁

≤2^aθ²η^a−2_j 2^2(δ+d)j. Hence

A ≤ C 1 n⁴max

η²_jθ²2^2(δ+d)jn,

θ²2^2(δ+d)jn2

= C 1 n⁴max

2^4(δ+d)j n²

lnn,2^4(δ+d)jn²

=C2^4(δ+d)j 1

n². (6.9) Let us now boundB. Using again (6.6) (withψinstead ofφ), we obtain

E |G1|1_{|G₁_|>η_j_}

≤ E G²₁

η_j ≤ 1

θ2^(δ+d)j rlnn

n 2^2(δ+d)jθ²

= θ2^(δ+d)j rlnn

n . (6.10)

Hence

B≤C2^4(δ+d)j(lnn)²

n² . (6.11)

Combining (6.8), (6.9) and (6.11), we have E

βb_j,k−β_j,k4

≤C

2^4(δ+d)j 1

n² + 2^4(δ+d)j(lnn)² n²

≤C2^4(δ+d)j(lnn)² n² .

Proposition 6.3. For any integerj ≥ j_∗ and anyk ∈ {0, . . . ,2^j −1}, letβj,k be the wavelet coefficient(3.1)off^(d)andβbj,kbe(4.2). Then, for anyκ≥8/3+2+2p

16/9 + 4, P

|βb_j,k−β_j,k| ≥κλ_j/2

≤2n⁻². Proof of Proposition 6.3.Using (6.7), we have

|βbj,k−βj,k|

= 1 n

n

X

v=1

G_v1_{|G_v_|≤η_j_}−E G_v1_{|G_v_|≤η_j_}

−E G₁1_{|G₁_|>η_j_}

≤ 1 n

n

X

v=1

G_v1_{|G_v_|≤η_j_}−E G_v1_{|G_v_|≤η_j_}

+E |G1|1_{|G₁_|>η_j_} .

(12)

Using (6.10), we obtain

E |G₁|1_{|G₁_|>η_j_}

≤θ2^(δ+d)j rlnn

n =λ_j. Hence

S = P

|bβj,k−βj,k| ≥κλj/2

≤ P 1 n

n

X

v=1

G_v1_{|G_v_|≤η_j_}−E G_v1_{|G_v_|≤η_j_}

≥(κ/2−1)λ_j

! .

Now we need the Bernstein inequality presented in the lemma below (see [22]).

Lemma 6.2 (Bernstein’s inequality). Let n ∈ N^∗ and (U_v)v∈{1,...,n} be n zero mean i.i.d. random variables such that there exists a constant M > 0 satisfying, for any v∈ {1, . . . , n},|Uv| ≤M <∞. Then, for anyλ >0, holds

P

n

X

v=1

Uv

≥λ

!

≤2 exp − λ²

2 nE(U₁²) +^λM₃

! .

Let us set, for anyv∈ {1, . . . , n},

Uv=Gv1_{|G_v_|≤η_j_}−E Gv1_{|G_v_|≤η_j_} . ThenE(U1) = 0,

|U_v| ≤

G_v1_{|G_v_|≤η_j_}

+E |G_v|1_{|G_v_|≤η_j_}

≤2η_j and, using again (6.6) (withψinstead ofφ),

E U₁²

=V G11_{|G₁_|≤η_j_}

≤E G²₁

≤θ²2^2(δ+d)j. It follows from the Bernstein inequality that

S≤2 exp



− n²(κ/2−1)²λ²_j 2

θ²n2^2(δ+d)j+^{2n(κ/2−1)λ}₃ ^j^η^j



.

Since

λjηj=θ2^(δ+d)j rlnn

n θ2^(δ+d)j r n

lnn =θ²2^2(δ+d)j, λ²_j =θ²2^2(δ+d)jlnn n , we have, for anyκ≥8/3 + 2 + 2p

16/9 + 4,

S≤2 exp



− (κ/2−1)²lnn 2

1 +^2(κ/2−1)₃



≤2n⁻².

(13)

6.2 Proofs of the main results

Proof of Theorem 5.1.We expand the functionf^(d)as

f^(d)(x) =

2^j⁰−1

X

k=0

αj₀,kφj₀,k(x) +

∞

X

j=j₀ 2^j−1

X

k=0

βj,kψj,k(x),

where

αj0,k= Z 1

0

f^(d)(x)φ_j₀_,k(x)dx, βj,k= Z 1

0

f^(d)(x)ψ_j,k(x)dx.

We have

fb_d^L(x)−f^(d)(x) =

2^j⁰−1

X

k=0

(αbj₀,k−αj₀,k)φj₀,k(x)−

∞

X

j=j0

2^j−1

X

k=0

βj,kψj,k(x).

Hence

E Z 1

0

fb_d^L(x)−f^(d)(x)² dx

=A+B, where

A=

2^j⁰−1

X

k=0

E

(αbj₀,k−αj₀,k)²

, B=

∞

X

j=j₀ 2^j−1

X

k=0

β_j,k² .

Using Proposition 6.1, we obtain

A≤C2^j⁰^(1+2δ+2d)1

n ≤Cn−2s/(2s+2δ+2d+1). Sincep≥2, we haveB_p,r^s (M)⊆B^s_2,∞(M). Hence

B≤C2^−2j⁰^s≤Cn−2s/(2s+2δ+2d+1). So

E Z 1

0

fb_d^L(x)−f^(d)(x)2

dx

≤Cn−2s/(2s+2δ+2d+1). The proof of Theorem 5.1 is complete.

Proof of Theorem 5.2.We expand the functionf^(d)as

f^(d)(x) =

2^j∗−1

X

k=0

α_j_∗_,kφ_j_∗_,k(x) +

∞

X

j=j∗

2^j−1

X

k=0

β_j,kψ_j,k(x),

(14)

where

αj∗,k= Z 1

0

f^(d)(x)φ_j_∗_,k(x)dx, βj,k= Z 1

0

f^(d)(x)ψ_j,k(x)dx.

We have

fb_d^H(x)−f^(d)(x)

=

2^j∗−1

X

k=0

(αb_j_∗_,k−α_j_∗_,k)φ_j_∗_,k(x) +

j1

X

j=j∗

2^j−1

X

k=0

βb_j,k1{^|b^βj,k|≥κλj} −β_j,k ψ_j,k(x)

−

∞

X

j=j1+1 2^j−1

X

k=0

βj,kψj,k(x).

Hence

E Z 1

0

fb_d^H(x)−f^(d)(x)2

dx

=R+S+T, (6.12)

where R=

2^j∗−1

X

k=0

E

(αbj∗,k−αj∗,k)²

, S=

j₁

X

j=j_∗ 2^j−1

X

k=0

E

βbj,k1{^|b^βj,k|≥κλ_j} −βj,k

²

and

T =

∞

X

j=j₁+1 2^j−1

X

k=0

β_j,k² . Let us boundR,TandS, in turn.

Using Proposition 6.1, we have R≤C2^j^∗^(1+2δ+2d)1

n ≤C1 n ≤C

lnn n

2s/(2s+2δ+2d+1)

. (6.13)

Forr≥1andp≥2, we haveB_p,r^s (M)⊆B^s_2,∞(M). So T ≤C

∞

X

j=j1+1

2^−2js≤C2^−2j¹^s≤Cn−2s/(2δ+2d+1)≤C lnn

n

2s/(2s+2δ+2d+1)

.

Forr≥1andp∈[1,2), we haveB_p,r^s (M)⊆B_2,∞^s+1/2−1/p(M). Sinces >(2δ+2d+1)/p, we have(s+ 1/2−1/p)/(2δ+ 2d+ 1)> s/(2s+ 2δ+ 2d+ 1). So

T ≤ C

∞

X

j=j₁+1

2−2j(s+1/2−1/p)≤C2^−2j¹(s+1/2−1/p)

≤ Cn−2(s+1/2−1/p)/(2δ+2d+1)≤C lnn

n

2s/(2s+2δ+2d+1)

.

(15)

Hence, forr≥1,{p≥2ands >0}or{p∈[1,2)ands >(2δ+ 2d+ 1)/p}, we have T ≤C

lnn n

2s/(2s+2δ+2d+1)

. (6.14)

The termScan be decomposed as

S=e₁+e₂+e₃+e₄, (6.15)

where

e1=

j1

X

j=j∗

2^j−1

X

k=0

E

βbj,k−βj,k

2

1{^|b^βj,k|≥κλj}1_{|β_j,k_|<κλ_j_/2}

,

e2=

j₁

X

j=j_∗ 2^j−1

X

k=0

E

βbj,k−βj,k

²

1{^|b^βj,k|≥κλ_j}1_{|β_j,k_|≥κλ_j_/2}

,

e₃=

j₁

X

j=j∗

2^j−1

X

k=0

E

β_j,k² 1{^|b^βj,k|<κλj}1_{|β_j,k_|≥2κλ_j_} and

e4=

j₁

X

j=j_∗ 2^j−1

X

k=0

E

β_j,k² 1{^|b^βj,k|<κλj}1_{|β_j,k_|<2κλ_j_} . Let us analyze each terme1,e2,e3ande4in turn.

Upper bounds fore₁ande₃.We have n|βb_j,k|< κλ_j, |βj,k| ≥2κλ_jo

⊆n

|βb_j,k−β_j,k|> κλ_j/2o ,

n|bβ_j,k| ≥κλ_j, |β_j,k|< κλ_j/2o

⊆n

|βb_j,k−β_j,k|> κλ_j/2o and

n|βbj,k|< κλj, |βj,k| ≥2κλj

o⊆n

|βj,k| ≤2|βbj,k−βj,k|o . So

max(e₁, e₃)≤C

j₁

X

j=j∗

2^j−1

X

k=0

E

βb_j,k−β_j,k2

1{^|b^βj,k−βj,k|>κλj/2}

.

It follows from the Cauchy-Schwarz inequality and Propositions 6.2 and 6.3 that E

βb_j,k−β_j,k²

1{^|b^βj,k−βj,k|>κλj/2}

≤

E

βb_j,k−β_j,k⁴1/2

P

|βb_j,k−β_j,k|> κλ_j/2^1/2

≤ C2^2(δ+d)jlnn n² .

(16)

Hence

max(e1, e3) ≤ Clnn n²

j1

X

j=j∗

2^j(1+2δ+2d)≤Clnn

n² 2^j¹^(1+2δ+2d)

≤ Clnn n ≤C

lnn n

2s/(2s+2δ+2d+1)

. (6.16)

Upper bound for the terme2. Using Proposition 6.2 and the Cauchy-Schwarz inequality, we obtain

E

βbj,k−βj,k

²

≤

E

βbj,k−βj,k

⁴1/2

≤C2^2(δ+d)jlnn n . Hence

e₂≤Clnn n

j1

X

j=j∗

2^2(δ+d)j

2^j−1

X

k=0

1_{|β_j,k_|>κλ_j_/2}.

Letj2be the integer defined by 2⁻¹ n

lnn

1/(2s+2δ+2d+1)

<2^j²≤ n lnn

1/(2s+2δ+2d+1)

. (6.17)

We have

e₂≤e_2,1+e_2,2, where

e2,1=Clnn n

j2

X

j=j∗

2^2(δ+d)j

2^j−1

X

k=0

1_{|β_j,k_|>κλ_j_/2}

and

e_2,2=Clnn n

j₁

X

j=j₂+1

2^2(δ+d)j

2^j−1

X

k=0

1_{|β_j,k_|>κλ_j_/2}. We have

e2,1≤Clnn n

j2

X

j=j∗

n 2^j²^(1+2δ+2d)≤C lnn

n

2s/(2s+2δ+2d+1)

.

Forr≥1andp≥2, sinceB_p,r^s (M)⊆B_2,∞^s (M),

e_2,2 ≤ Clnn n

j1

X

j=j₂+1

2^2(δ+d)j 1 λ²_j

2^j−1

X

k=0

β_j,k² =C

∞

X

j=j₂+1 2^j−1

X

k=0

β_j,k² ≤C2^−2j²^s

≤ C lnn

n

2s/(2s+2δ+2d+1)

.

(17)

Forr≥1,p∈[1,2)ands >(2δ+ 2d+ 1)/p, sinceB^s_p,r(M)⊆B_2,∞^s+1/2−1/p(M)and (2s+ 2δ+ 2d+ 1)(2−p)/2 + (s+ 1/2−1/p+δ+d−2(δ+d)/p)p= 2s, we have

e2,2 ≤ Clnn n

j1

X

j=j2+1

2^2(δ+d)j 1 λ^p_j

2^j−1

X

k=0

|βj,k|^p

≤ C lnn

n

^(2−p)/2 ^∞ X

j=j2+1

2j(δ+d)(2−p)2−j(s+1/2−1/p)p

≤ C lnn

n

^(2−p)/2

2^−j²(s+1/2−1/p+δ+d−2(δ+d)/p)p

≤ C lnn

n

2s/(2s+2δ+2d+1)

.

So, forr≥1,{p≥2ands >0}or{p∈[1,2)ands >(2δ+ 2d+ 1)/p}, e2≤C

lnn n

2s/(2s+2δ+2d+1)

. (6.18)

Upper bound for the terme4.We have e4≤

j1

X

j=j∗

2^j−1

X

k=0

β_j,k² 1_{|β_j,k_|<2κλ_j_}.

Letj2be the integer (6.17). We have

e₄≤e_4,1+e_4,2, where

e4,1=

j2

X

j=j∗

2^j−1

X

k=0

β²_j,k1_{|β_j,k_|<2κλ_j_}, e4,2=

j1

X

j=j2+1 2^j−1

X

k=0

β_j,k² 1_{|β_j,k_|<2κλ_j_}.

We have

e_4,1 ≤ C

j₂

X

j=j∗

2^jλ²_j =Clnn n

j₂

X

j=j∗

n 2^j²^(1+2δ+2d)

≤ C lnn

n

2s/(2s+2δ+2d+1)

.

Forr≥1andp≥2, sinceB_p,r^s (M)⊆B_2,∞^s (M), we have

e_4,2≤

∞

X

j=j2+1 2^j−1

X

k=0

β_j,k² ≤C2^−2j²^s≤C lnn

n

2s/(2s+2δ+2d+1)

.

(18)

Forr≥1,p∈[1,2)ands >(2δ+ 2d+ 1)/p, sinceB^s_p,r(M)⊆B_2,∞^s+1/2−1/p(M)and (2−p)(2s+ 2δ+ 2d+ 1)/2 + (s+ 1/2−1/p+δ+d−2(δ+d)/p)p= 2s, we have

e4,2 ≤ C

j1

X

j=j2+1

λ^2−p_j

2^j−1

X

k=0

|βj,k|^p=C lnn

n

(2−p)/2 j1

X

j=j2+1

2^jδ(2−p)

2^j−1

X

k=0

|βj,k|^p

≤ C lnn

n

(2−p)/2 ∞

X

j=j2+1

2j(δ+d)(2−p)2−j(s+1/2−1/p)p

≤ C lnn

n

(2−p)/2

2^−j²(s+1/2−1/p+δ+d−2(δ+d)/p)p≤C lnn

n

2s/(2s+2δ+2d+1)

. So, forr≥1,{p≥2ands >0}or{p∈[1,2)ands >(2δ+ 2d+ 1)/p},

e₄≤C lnn

n

2s/(2s+2δ+2d+1)

. (6.19)

It follows from (6.15), (6.16), (6.18) and (6.19) that S ≤C

lnn n

2s/(2s+2δ+2d+1)

. (6.20)

Combining (6.12), (6.13), (6.14) and (6.20), we have, forr≥1,{p≥2ands >0}or {p∈[1,2)ands >(2δ+ 2d+ 1)/p},

E Z 1

0

fb_d^H(x)−f^(d)(x)² dx

≤C lnn

n

2s/(2s+2δ+2d+1)

. The proof of Theorem 5.2 is complete.

This work is supported by ANR grant NatImages, ANR-08-EMER-009.

References

[1] P.K. Bhattacharya, Estimation of a probability density function and its derivatives, Sankhya Ser. A.29 (1967), 373–382.

[2] T. Cai, Adaptive Wavelet Estimation: A Block Thresholding And Oracle Inequality Approach,The Annals of Statistics27 (1999), 898–924.

[3] Y.P. Chaubey and H. Doosti, Wavelet based estimation of the derivatives of a density for m-dependent random variables, Journal of the Iranian Statistical Society 4 (2) (2005), 97–105.

[4] Y.P. Chaubey, H. Doosti and B.L.S. Prakasa Rao, Wavelet based estimation of the derivatives of a density with associated variables,International Journal of Pure and Applied Mathematics27 (1) (2006), 97–106.

(19)

[5] C. Chesneau, Wavelet estimation of the derivatives of an unknown function from a convolution model, Preprint (2009).

[6] F. Comte and M.-L. Taupin, Nonparametric estimation of the regression function in an errors-in-variables model,Statistica Sinica173 (2007), 1065-1090.

[7] B. Delyon and A. Juditsky, On minimax wavelet estimators,Applied Computational Harmonic Analysis3(1996), 215–228.

[8] D.L. Donoho and I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika81(1994), 425-455.

[9] D. L. Donoho and I.M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage,Journal of the American Statistical Association90432 (1995), 1200-1224.

[10] D.L. Donoho and I.M. Johnstone, Minimax estimation via wavelet shrinkage, The Annals of Statistics26(1998), 879-921.

[11] J. Fan and M. Masry, Multivariate regression estimation with errors-in-variables:

asymptotic normality for mixing process,J. Multivariate Anal.43(1992), 237-272.

[12] J. Fan and Y.K. Truong, Nonparametric regression with errors-in-variables,The An- nals of Statistics214 (1993), 1900-1925.

[13] J. Fan, Y.K. Truong and Y. Wang, Nonparametric function estimation involving errors-in-variables,Nonparametric Functional Estimation and Related Topic(1991), 613-627.

[14] J. Fan and J.Y. Koo, Wavelet deconvolution,IEEE transactions on information theory 48 (2002), 734–747.

[15] D.A. Ioannides and P.D. Alevizos, Nonparametric regression with errors in variables and applications,Statist. Probab. Lett.32(1997), 35-43.

[16] W. H¨ardle, G. Kerkyacharian, D. Picard and A. Tsybakov,Wavelet, Approximation and Statistical Applications, Lectures Notes in Statistics New York, 129 Springer Verlag, 1998.

[17] I.M. Johnstone, G. Kerkyacharian, D. Picard and M. Raimondo, Wavelet deconvolution in a periodic setting,Journal of the Royal Statistical Society Serie B663 (2004), 547–573.

[18] J.Y. Koo and K.W. Lee, B-splines estimations of regression functions with errors in variables,Statist. Probab. Lett.40(1998), 57-66.

[19] E. Masry, Multivariate regression estimation with errors-in-variables for stationary processes,Nonparametric Statistics3(1993), 13-36.

[20] Y. Meyer,Ondelettes et Op´erateurs, Hermann, Paris, 1990.

[21] M. Pensky and B. Vidakovic, Adaptive wavelet estimator for nonparametric density deconvolution,The Annals of Statistics27 (1999), 2033–2053.

[22] V. V. Petrov,Limit Theorems of Probability Theory: Sequences of Independent Ran- dom Variables, Oxford: Clarendon Press, 1995.

[23] B.L.S. Prakasa Rao, Nonparametric estimation of the derivatives of a density by the method of wavelets,Bull. Inform. Cyb.28 (1996), 91–100.

[24] H. P. Rosenthal, On the subspaces ofL^p(p≥2) spanned by sequences of independent random variables,Israel Journal of Mathematics8(1970), 273-303.

(20)

[25] A.B. Tsybakov,Introduction a l’estimation non-parametrique, Springer, 2004.