A note on wavelet regression with uniform random design and heteroscedastic errors

(1)

HAL Id: hal-00449970

https://hal.archives-ouvertes.fr/hal-00449970v2

Preprint submitted on 19 Jul 2010

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

design and heteroscedastic errors

Christophe Chesneau

To cite this version:

Christophe Chesneau. A note on wavelet regression with uniform random design and heteroscedastic

errors. 2010. �hal-00449970v2�

(2)

tic errors

Christophe Chesneau

University of Caen (LMNO), Caen, France. Email: chesneau@math.unicaen.fr

Abstract

A nonparametric regression model with uniform random design and heteroscedastic errors (with a deterministic structure) is considered. The distributions of the errors are unknown; we only know that they admit finite moments of order2. Based on this general framework, we want to estimate the unknown regression function. To reach this goal, an adaptive nonlinear wavelet estimator is developed. Taking the minimax approach under the MISE over Besov balls, we prove that it attains a sharp rate of convergence, closed to the one attains by the optimal non-realistic linear wavelet estimator.

Keywords: Regression, heteroscedastic errors, minimax estimation, Besov balls, wavelets, hard thresholding.

2000 Mathematics Subject Classification:62G07, 62G20.

1 Motivations

We observe nindependent pairs of random variables(X1, Y1), . . . ,(Xn, Yn)where, for anyi∈ {1, . . . , n},

Yi=f(Xi) +ξi, (1.1)

f : [0,1]→Ris an unknown function,X₁, . . . , X_nareni.i.d. random variables having the uniform distribution on[0,1]andξ1, . . . , ξnarenindependent random variables independent ofX₁, . . . , X_n. We assume that there exist two known sequences of real numbers, (µi)i∈{1,...,n}and(σ_i²)i∈{1,...,n}, such that, for anyi∈ {1, . . . , n},

E(ξ_i) =µ_i, V(ξ_i) =E (ξ_i−µ_i)²

=σ_i².

The distributions ofξ1, . . . , ξnand possible finite moments of orderδ > 2are unknown.

We aim to estimatef from(X1, Y1), . . . ,(Xn, Yn).

To reach this goal, we focus our attention on the wavelet methods. They are attractive for nonparametric function estimation because of their spatial adaptivity, computational ef- ficiency and asymptotic optimality properties. They can achieve near optimal convergence

This work is supported by ANR grant NatImages, ANR-08-EMER-009

(3)

rates over a wide range of function classes (Besov balls, . . . ) and enjoy excellent mean integrated squared error (MISE) properties when used to estimate spatially inhomogeneous function.

If we consider (1.1) whenξ1, . . . , ξn are i.i.d. and ξ1 is Gaussian or with tail of ex- ponentially decreasing, various wavelet methods have been developed. See, e.g., [13–15], [16], [17], [7,8], [5,6], [10], [19] and [9]. Whenξ1has other kinds of distributions (Cauchy, . . . ), see, e.g., [1,2] and [3]. When the distribution ofξ₁is unknown but admits known finite moment of orderδwithδ >2, see, e. g., [22], [12], [26] and [3]. Whenδ >0(including δ∈(0,2]) and some conditions on the distribution ofξ₁are known, see [4]. For the nonin- dependent case andδ >2, we refer to [20]. However, to the best of our knowledge, there is no adaptive wavelet estimator off in (1.1) when the errors are heteroscedastic and only finite moments of order2are known.

Adopting the methodology of [12], we develop two wavelet estimators: a linear non- adaptive and a nonlinear adaptive based on the hard thresholding rule. We evaluate their performances by taking the minimax approach under the MISE over Besov ballsB_p,r^s (M) (to be defined in Section 3). We prove that the considered hard thresholding wavelet estimator attains the rate of convergence

rn= lnwn

w_n

^2s/(2s+1) , wherewn = Pn

i=1 1

σ²_i. This rate is sharp in the sense that it is the one attained by the (non-realistic) linear wavelet estimator up to a logarithmic term. Moreover, if we restrict our study to the Gaussian case, we prove that it is near optimal in the minimax sense.

The paper is organized as follows. Assumptions on the model and some notations are introduced in Section 2. Section 3 briefly describes the wavelet basis on[0,1] and the Besov balls. The estimators are presented in Section 4. The results are set in Section 5.

Section 6 is devoted to the proofs.

2 Assumptions and notations

Assumption onf We suppose that there exists a known constantC_∗>0such that kfk_∞= sup

x∈[0,1]

|f(x)| ≤C_∗. (2.1)

Assumptions onξ1, . . . , ξn We recall that ξ1, . . . , ξn are non (necessarily) identically distributed: there exist two known sequences of real numbers, (µ_i)i∈{1,...,n} and (σ²_i)i∈{1,...,n}, such that, for anyi∈ {1, . . . , n},

E(ξi) =µi, V(ξi) =σ_i².

(4)

These sequences can depend onn. The distributions ofξ1, . . . , ξnare unknown. We suppose that there exists a (known) constantc_∗>0such that

i∈{1,...,n}inf σ²_i ≥c∗. (2.2)

We set

wn =

n

X

i=1

1 σ²_i and we suppose thatw_n≥e(i.e.lnw_n≥1).

3 Wavelet bases and Besov balls

Wavelet basis Letφbe a father wavelet of a multiresolution analysis onRandψbe the associated mother wavelet. Assume that Supp(φ) = Supp(ψ) = [1−N, N], RN

1−Nφ(x)dx = 1 and, for any` ∈ {0, . . . , N −1}, RN

1−Nx^`ψ(x)dx = 0. For instance, the Daubechies waveletsdbNsatisfy these assumptions.

Set

φj,k(x) = 2^j/2φ(2^jx−k), ψj,k(x) = 2^j/2ψ(2^jx−k).

Then there exists an integerτsatisfying2^τ ≥2Nsuch that the collection

B={φ_τ,k(.), k∈ {0, . . . ,2^τ−1}; ψ_j,k(.); j∈N−{0, . . . , τ−1}, k∈ {0, . . . ,2^j−1}}, (with an appropriate treatments at the boundaries) is an orthonormal basis of L²([0,1]) ={h: [0,1]→R; R1

0 h²(x)dx <∞}. See [11].

For any integer`≥τ, anyf ∈L²([0,1])can be expanded onBas f(x) =

2^`−1

X

k=0

α`,kφ`,k(x) +

∞

X

j=`

2^j−1

X

k=0

βj,kψj,k(x), whereαj,kandβj,kare the wavelet coefficients off defined by

αj,k= Z 1

0

f(x)φj,k(x)dx, βj,k= Z 1

0

f(x)ψj,k(x)dx. (3.1) Besov balls LetM > 0,s > 0,p ≥ 1andr ≥ 1. Set β_τ−1,k = ατ,k. A functionf belongs toB_p,r^s (M)if, and only if, there exists a constantM^∗ >0(depending on M) such that the associated wavelet coefficients (3.1) satisfy







∞

X

j=τ−1





2j(s+1/2−1/p)





2^j−1

X

k=0

|βj,k|^p





1/p





r





1/r

≤M^∗.

In this expression,sis a smoothness parameter andpandrare norm parameters.

For a particular choice ofs,pandr, the Besov balls contain the H¨older and Sobolev balls. See [21].

(5)

4 Estimators

Wavelet coefficient estimators The first step to estimatef consists in expandingf onB and estimating the unknown wavelet coefficients. For any integerj ≥ τ and any k∈ {0, . . . ,2^j−1},

• we estimateαj,kby

αbj,k= 1 wn

n

X

i=1

1

σ_i²(Yi−µi)φj,k(Xi), (4.1)

• we estimateβj,kby βbj,k= 1

wn n

X

i=1

1

σ_i²(Yi−µi)ψj,k(Xi)1

1 σ2

i

|(Yi−µi)ψ_j,k(Xi)|≤γn

, (4.2) where, for any random eventA,1Ais the indicator function onA,

γn=θ r w_n

lnwn

andθ=p

C_∗²/c_∗+ 1.

Notice that βbj,k is the thresholded version of the empirical estimator of βj,k (see, e.g., [13–15]). The idea of this thresholding is to operate a selection on the observations: when, for i ∈ {1, . . . , n}, f is ”too noisy” byξi, the observation Yi is neglected. Such aβb_j,k has been introduced by [12] whenξ₁, . . . ξ_n are i.i.d. (non- heteroscedastic case) andξ1admits finite moments of order3.

Linear estimator Assuming thatf ∈B^s_p,r(M)withp≥2, we define the linear estimator fb^Lby

fb^L(x) =

2^j⁰−1

X

k=0

αbj₀,kφj₀,k(x), (4.3) whereαbj,kis defined by (4.1) andj0is the integer satisfying

2⁻¹w^1/(2s+1)_n <2^j⁰ ≤w^1/(2s+1)_n .

The definition ofj0is chosen to minimize the MISE offb^L. It is not adaptive since it depends ons, the smoothness parameter off.

Hard thresholding estimator We define the hard thresholding estimatorfb^Hby

fb^H(x) =

2^τ−1

X

k=0

αb_τ,kφ_τ,k(x) +

j1

X

j=τ 2^j−1

X

k=0

βb_j,k1{^|b^βj,k|≥κλn}ψ_j,k(x), (4.4)

(6)

whereαbj,kis defined by (4.1),βbj,kby (4.2),j1is the integer satisfying 2⁻¹w_n<2^j¹ ≤w_n,

κ≥8/3 + 2 + 2p

16/9 + 4,λnis the threshold λn=θ

rlnwn

w_n (4.5)

andθ = p

C_∗²/c_∗+ 1. The definitions ofγn andλn are chosen to minimize the MISE offb^H and to make it adaptive. Notice thanfb^H is a modified version of the one of [12].

5 Results

5.1 Upper bounds

Theorem 5.1. Consider (1.1)under the assumptions of Section 2. Suppose that f ∈ B_p,r^s (M)withs > 0,p ≥ 2 andr ≥ 1. Letfb^L be(4.3). Then there exists a constant C >0such that

E Z 1

0

fb^L(x)−f(x)2

dx

≤Cw^−2s/(2s+1)_n .

The proof of Theorem 5.1 uses a moment inequality on (4.1) and a suitable decomposition of the MISE. Due to our weak assumptions onξ1, . . . , ξn, the optimal lower bound of the model seems difficult to determine (see, e.g., [18] and [25]). However, sincefb^Lis constructed to be the linear estimator which optimizes the MISE andw^−2s/(2s+1)n is the optimal rate of convergence in the Gaussian case (see Theorem 5.3 below), our benchmark will be

r_n=w^−2s/(2s+1)_n .

Theorem 5.2. Consider(1.1)under the assumptions of Section2. Letfb^Hbe(4.4). Suppose thatf ∈B_p,r^s (M)withr≥1,{p≥2ands >0}or{p∈[1,2)ands >1/p}. Then there exists a constantC >0such that

E Z 1

0

fb^H(x)−f(x)² dx

≤C lnwn

wn

^2s/(2s+1) .

The proof of Theorem 5.2 is based on several probability results (moment inequalities, concentration inequality,. . . ) and a suitable decomposition of the MISE. Note that our proof differs from the one of [12, Theorem 4]. This difference enables us to improve the moment assumption (only2finite moments is required against3in [12]) and to take into account the heteroscedasticity of the errors.

(7)

Theorem 5.2 proves that fb^H attainsrn up to a logarithmic term. Notice that when ξ₁, . . . , ξ_nare i.i.d. and only admits finite moments of order2,fb^Hachieves the same rates of convergence than the standard hard thresholding estimator defined whenξ1is Gaussian (see [12] and [13–15]). And this one is optimal up to a logarithmic term.

5.2 Lower bound

Theorem 5.3 below investigates the lower bound of (1.1) in the Gaussian case.

Theorem 5.3. Consider(1.1)whenξ₁, . . . , ξ_nare Gaussian. Then there exists a constant c >0such that, for anys >0,p≥1andr≥1,

inf

fe

sup

f∈B^s_p,r(M)

E Z 1

0

fe(x)−f(x)² dx

≥cw^−2s/(2s+1)_n ,

where the infimum is taken over all the possible estimatorsfeoff.

Theorems 5.1 and 5.3 prove that, in the Gaussian case,rn=wn^−2s/(2s+1)is the optimal rate of convergence.

5.3 Perspectives

A possible perspective of this work will be to investigate the estimation off in (1.1) whenX1has a more complex distribution than the random uniform one. In this case, the warped wavelet basis introduced in the nonparametric regression estimation by [19] seems to be a powerful tool. To improve the estimation off (and remove the extra logarithmic term inrn), maybe other thresholding rules can be developed. The thresholding rule named BlockJS introduced in wavelet methodology by [5, 6] seems to be a good alternative. All these aspects need further investigations that we leave for a future work.

6 Proofs

In this section, we consider (1.1) under the assumptions of Section 2. Moreover, C represents a positive constant which may differ from one term to another.

6.1 Auxiliary results

Proposition 6.1. For any integerj ≥ τ and any k ∈ {0, . . . ,2^j −1}, let α_j,k be the wavelet coefficient(3.1)offin(1.1)andαbj,kbe(4.1). Then there exists a constantC >0 such that

E

(αb_j,k−α_j,k)²

≤Cw⁻¹_n .

(8)

Proof of Proposition 6.1.For anyi∈ {1, . . . , n}, let us set W_i= (Y_i−µ_i)φ_j,k(X_i).

SinceXiandξiare independent, we have

E(Wi) = E(f(Xi)φj,k(Xi)) +E(ξi−µi)E(φj,k(Xi))

= E(f(X1)φj,k(X1)) = Z 1

0

f(x)φj,k(x)dx=αj,k. Hence

E(αbj,k) =E 1 wn

n

X

i=1

1 σ²_iWi

!

= 1 wn

n

X

i=1

1

σ_i²E(Wi) = 1 wn

wnαj,k=αj,k. (6.1) This implies

E

(αbj,k−αj,k)²

= V(αbj,k) =V 1 w_n

n

X

i=1

1 σ_i²Wi

!

= 1 w_n²

n

X

i=1

1 σ_i⁴V(Wi)

≤ 1 w²_n

n

X

i=1

1

σ⁴_iE W_i²

. (6.2)

For anyi∈ {1, . . . , n}, sinceXiandξiare independent,E

φ²_j,k(Xi)

=R1

0 φ²_j,k(x)dx= 1and, by (2.1),kfk_∞≤C_∗, we have

E W_i²

= E (f(X_i) + (ξ_i−µ_i))²φ²_j,k(X_i)

= E f²(X_i)φ²_j,k(X_i)

+ 2E(ξ_i−µ_i)E(f(X_i)φ²_j,k(X_i)) + E (ξ_i−µ_i)²

E φ²_j,k(X_i)

= E f²(X_i)φ²_j,k(X_i)

+σ_i²≤C_∗²+σ²_i. So, by (2.2),

1

σ⁴_iE W_i²

≤ 1

σ_i⁴(C_∗²+σ_i²)≤ θ²

σ²_i. (6.3)

It follows from (6.2) and (6.3) that E

(αbj,k−αj,k)²

≤ θ² w²_n

n

X

i=1

1

σ_i² =Cw⁻¹_n .

Proposition 6.2. For any integerj ≥ τ and any k ∈ {0, . . . ,2^j−1}, letβj,k be the wavelet coefficient(3.1)off in(1.1)andβb_j,kbe(4.2). Then there exists a constantC >0 such that

E

βbj,k−βj,k

⁴

≤C(lnwn)² w²_n .

(9)

Proof of Proposition 6.2.For anyi∈ {1, . . . , n}, let us set Zi= 1

σ_i²(Yi−µi)ψj,k(Xi).

Proceeding as in (6.1), we have βj,k =

Z 1 0

f(x)ψj,k(x)dx=E(f(X1)ψj,k(X1)) = 1 wn

n

X

i=1

E(Zi)

= 1

wn n

X

i=1

E(Zi1_{|Z_i_|≤γ_n_}) + 1 wn

n

X

i=1

E(Zi1_{|Z_i_|>γ_n_}). (6.4) This with the elementary inequality(x+y)⁴≤8(x⁴+y⁴),(x, y)∈R², imply that E

βb_j,k−β_j,k⁴

= E



 1 wn

n

X

i=1

Z_i1_{|Z_i_|≤γ_n_}−E Z_i1_{|Z_i_|≤γ_n_}

− 1 wn

n

X

i=1

E Z_i1_{|Z_i_|>γ_n_}

!4



≤ 8(A+B), (6.5)

where

A=E



 1 wn

n

X

i=1

Zi1_{|Z_i_|≤γ_n_}−E(Zi1_{|Z_i_|≤γ_n_})

!⁴



and

B= 1

w_n

n

X

i=1

E(|Zi|1_{|Z_i_|>γ_n_})

!4

. Let us boundAandB, in turn.

Upper bound forA.We need the Rosenthal inequality presented in lemma below.

Lemma 6.1(Rosenthal’s inequality [24]). Letp≥2,n∈N^∗and(U_i)i∈{1,...,n}benzero mean independent random variables satisfyingsupi∈{1,...,n}E(|Ui|^p) < ∞. Then there exists a constantC >0such that

E

n

X

i=1

U_i

p!

≤Cmax





n

X

i=1

E(|Ui|^p),

n

X

i=1

E U_i²

!^p/2

. Applying the Rosenthal inequality withp= 4and, for anyi∈ {1, . . . , n},

Ui =Zi1_{|Z_i_|≤γ_n_}−E Zi1_{|Z_i_|≤γ_n_} , we obtain

A= 1 w_n⁴E





n

X

i=1

U_i

!4

≤C 1 w⁴_nmax





n

X

i=1

E U_i⁴ ,

n

X

i=1

E U_i²

!2

.

(10)

Using (6.3) (withψinstead ofφ), we have, for anya∈ {2,4}and anyi∈ {1, . . . , n}, E(U_iâ)≤2âE Z_iâ1_{|Z_i_|≤γ_n_}

≤2^aγ_n^a−2E Z_i²

≤2^aγ_n^a−2θ² σ²_i. Hence, usingwn ≥e,

A ≤ C 1

n⁴max



γ²_n

n

X

i=1

1 σ²_i,

n

X

i=1

1 σ²_i

!2



= C 1 w_n⁴ max

w²_n lnwn

, w²_n

=C 1

w²_n. (6.6)

Upper bound forB. Using again (6.3) (withψinstead ofφ), for anyi ∈ {1, . . . , n}, we obtain

E |Zi|1_{|Z_i_|>γ_n_}

≤E Z_i² γn

≤ θ σ_i²

rlnwn

wn

. Therefore

B≤C(lnwn)² w²_n

1 w_n

n

X

i=1

1 σ_i²

!4

=C(lnwn)²

w_n² . (6.7)

Combining (6.5), (6.6) and (6.7), we have E

βbj,k−βj,k

4

≤C 1

w²_n +(lnwn)² w²_n

≤C(lnwn)² w_n² .

Proposition 6.3. For any integerj≥τand anyk∈ {0, . . . ,2^j−1}, letβj,kbe the wavelet coefficient(3.1)offin(1.1)andβbj,kbe(4.2). Then, for anyκ≥8/3 + 2 + 2p

16/9 + 4, P

|βbj,k−βj,k| ≥κλn/2

≤2w_n⁻². Proof of Proposition 6.3.For anyi∈ {1, . . . , n}, let us set

Zi= 1

σ_i²(Yi−µi)ψj,k(Xi).

Using (6.4), we have

|βbj,k−βj,k|

=

1 w_n

n

X

i=1

Zi1_{|Z_i_|≤γ_n_}−E Zi1_{|Z_i_|≤γ_n_}

− 1 w_n

n

X

i=1

E Zi1_{|Z_i_|>γ_n_}

≤

1 w_n

n

X

i=1

Z_i1_{|Z_i_|≤γ_n_}−E Z_i1_{|Z_i_|≤γ_n_}

+ 1 w_n

n

X

i=1

E |Z_i|1_{|Z_i_|>γ_n_} .

(11)

Using (6.3) (withψinstead ofφ), we obtain 1

w_n

n

X

i=1

E |Z_i|1_{|Z_i_|>γ_n_}

≤ 1 γ_n

1 w_n

n

X

i=1

E Z_i²

!

≤ θ² γ_n

1 w_n

n

X

i=1

1 σ_i²

!

= θ² γ_n

= θ rlnwn

w_n =λ_n. Hence

S = P

|βbj,k−βj,k| ≥κλn/2

≤ P

1 wn

n

X

i=1

Zi1_{|Z_i_|≤γ_n_}−E Zi1_{|Z_i_|≤γ_n_}

≥(κ/2−1)λn

! . Now we need the Bernstein inequality presented in the lemma below.

Lemma 6.2 (Bernstein’s inequality [23]). Let n ∈ N^∗ and (U_i)i∈{1,...,n} be n zero mean independent random variables such that there exists a constantM > 0 satisfying supi∈{1,...,n}|Ui| ≤M <∞. Then, for anyλ >0, we have

P

n

X

i=1

Ui

≥λ

!

≤2 exp − λ²

2 Pn

i=1E(U_i²) +^λM₃

! . Let us set, for anyi∈ {1, . . . , n},

Ui =Zi1_{|Z_i_|≤γ_n_}−E Zi1_{|Z_i_|≤γ_n_} . For anyi∈ {1, . . . , n}, we haveE(U_i) = 0,

|U_i| ≤

Z_i1_{|Z_i_|≤γ_n_}

+E |Z_i|1_{|Z_i_|≤γ_n_}

≤2γ_n and, using again (6.3) (withψinstead ofφ),

E U_i²

=V Zi1_{|Z_i_|≤γ_n_}

≤E Z_i²

≤ θ² σ_i².

So n

X

i=1

E U_i²

≤θ²

n

X

i=1

1

σ²_i =θ²wn. It follows from the Bernstein inequality that

S≤2 exp



− w_n²(κ/2−1)²λ²_n 2

θ²w_n+^2wⁿ^(κ/2−1)λ₃ ⁿ^γⁿ



. Since

λnγn=θ rlnwn

wn

θ r wn

lnwn

=θ², λ²_n=θ²lnwn

wn

,

(12)

we have, for anyκ≥8/3 + 2 + 2p

16/9 + 4,

S≤2 exp



−(κ/2−1)²lnwn

2

1 + ^2(κ/2−1)₃



= 2w

− ^(κ/2−1)2

2(¹⁺^2(κ/2−1)₃ )

n ≤2w⁻²_n .

6.2 Proofs of the main results

Proof of Theorem 5.1.We expand the functionf as

f(x) =

2^j⁰−1

X

k=0

α_j₀_,kφ_j₀_,k(x) +

∞

X

j=j0

2^j−1

X

k=0

β_j,kψ_j,k(x).

We have

fb^L(x)−f(x) =

2^j⁰−1

X

k=0

(αb_j₀_,k−α_j₀_,k)φ_j₀_,k(x)−

∞

X

j=j0

2^j−1

X

k=0

β_j,kψ_j,k(x).

Hence

E Z 1

0

fb^L(x)−f(x)² dx

=A+B, where

A=

2^j⁰−1

X

k=0

E

(αb_j₀_,k−α_j₀_,k)²

, B=

∞

X

j=j₀ 2^j−1

X

k=0

β_j,k² .

Using Proposition 6.1, we obtain

A≤2^j⁰w⁻¹_n ≤Cw^−2s/(2s+1)_n . Sincep≥2, we haveB_p,r^s (M)⊆B^s_2,∞(M). Hence

B≤C2^−2j⁰^s≤Cw^−2s/(2s+1)_n . So

E Z 1

0

fb^L(x)−f(x)² dx

≤Cw^−2s/(2s+1)_n . The proof of Theorem 5.1 is complete.

(13)

Proof of Theorem 5.2.Following the methodology of [18], we expand the functionf as f(x) =

2^τ−1

X

k=0

α_τ,kφ_τ,k(x) +

∞

X

j=τ 2^j−1

X

k=0

β_j,kψ_j,k(x).

We have

fb^H(x)−f(x)

=

2^τ−1

X

k=0

(αbτ,k−ατ,k)φτ,k(x) +

j1

X

j=τ 2^j−1

X

k=0

βbj,k1{|bβ_j,k|≥κλn} −βj,k

ψj,k(x)

−

∞

X

j=j₁+1 2^j−1

X

k=0

βj,kψj,k(x).

Hence

E Z 1

0

fb^H(x)−f(x)² dx

=R+S+T, (6.8)

where R=

2^τ−1

X

k=0

E

(αbτ,k−ατ,k)²

, S =

j₁

X

j=τ 2^j−1

X

k=0

E

βbj,k1{^|b^βj,k|≥κλn} −βj,k

²

and

T =

∞

X

j=j1+1 2^j−1

X

k=0

β_j,k² . Let us boundR,TandS, in turn.

Using Proposition 6.1,wn≥e,lnwn< wnand2s/(2s+ 1)<1, we have R≤C2^τw⁻¹_n ≤C

lnwn

w_n

^2s/(2s+1)

. (6.9)

Forr≥1andp≥2, we haveB_p,r^s (M)⊆B_2,∞^s (M). So, usingwn ≥e,lnwn< wnand 2s/(2s+ 1)<2s,

T ≤C

∞

X

j=j₁+1

2^−2js≤C2^−2j¹^s≤Cw^−2s_n ≤C lnw_n

wn

2s/(2s+1)

.

Forr≥1andp∈[1,2), we haveB_p,r^s (M)⊆B^s+1/2−1/p_2,∞ (M). Sinces >1/p, we have s+ 1/2−1/p > s/(2s+ 1). So, using againw_n≥eandlnw_n< w_n,

T ≤ C

∞

X

j=j₁+1

2−2j(s+1/2−1/p)≤C2^−2j¹(s+1/2−1/p)

≤ Cw−2(s+1/2−1/p)

n ≤C

lnwn

w_n

^2s/(2s+1) .

(14)

Hence, forr≥1,{p≥2ands >0}or{p∈[1,2)ands >1/p}, we have T ≤C

lnw_n wn

^2s/(2s+1)

. (6.10)

The termScan be decomposed as

S=S1+S2+S3+S4, (6.11)

where

S1=

j1

X

j=τ 2^j−1

X

k=0

E

βbj,k−βj,k

2

1{|bβ_j,k|≥κλn}1_{|β_j,k_|<κλ_n_/2}

,

S₂=

j₁

X

j=τ 2^j−1

X

k=0

E

βb_j,k−β_j,k2

1{^|b^βj,k|≥κλn}1_{|β_j,k_|≥κλ_n_/2}

,

S3=

j₁

X

j=τ 2^j−1

X

k=0

E

β_j,k² 1{^|b^βj,k|<κλ_n}1_{|β_j,k_|≥2κλ_n_} and

S4=

j1

X

j=τ 2^j−1

X

k=0

E

β_j,k² 1{|bβ_j,k|<κλn}1_{|β_j,k_|<2κλ_n_} . Let us analyze each termS1,S2,S3andS4in turn.

Upper bounds forS₁andS₃.We have n|βb_j,k|< κλ_n, |β_j,k| ≥2κλ_no

⊆n

|βb_j,k−β_j,k|> κλ_n/2o ,

n|βbj,k| ≥κλn, |βj,k|< κλn/2o

⊆n

|βbj,k−βj,k|> κλn/2o and

n|βbj,k|< κλn, |βj,k| ≥2κλn

o⊆n

|βj,k| ≤2|βbj,k−βj,k|o . So

max(S1, S3)≤C

j1

X

j=τ 2^j−1

X

k=0

E

βbj,k−βj,k

2

1{|bβ_j,k−βj,k|>κλn/2}

. It follows from the Cauchy-Schwarz inequality and Propositions 6.2 and 6.3 that

E

βbj,k−βj,k

²

1{^|b^βj,k−β_j,k|>κλ_n/2}

≤

E

βbj,k−βj,k

⁴1/2

P

|βbj,k−βj,k|> κλn/2^1/2

≤ Clnw_n w²_n .

(15)

Hence, sincelnwn< wn, max(S₁, S₃)≤Clnwn

w²_n

j₁

X

j=τ

2^j ≤Clnwn

w_n² 2^j¹ ≤Clnwn

w_n ≤C lnwn

w_n

^2s/(2s+1) .(6.12)

Upper bound forS2.Using Proposition 6.2 and the Cauchy-Schwarz inequality, we obtain

E

βb_j,k−β_j,k²

≤

E

βb_j,k−β_j,k⁴^1/2

≤Clnw_n wn

. Hence

S2≤Clnwn

w_n

j1

X

j=τ 2^j−1

X

k=0

1_{|β_j,k_|>κλ_n_/2}.

Letj2be the integer defined by 2⁻¹

wn

lnw_n

^1/(2s+1)

<2^j²≤ wn

lnw_n

^1/(2s+1)

. (6.13)

We have

S₂≤S_2,1+S_2,2, where

S_2,1=Clnw_n wn

j₂

X

j=τ 2^j−1

X

k=0

1_{|β_j,k_|>κλ_n_/2}

and

S_2,2=Clnwn

w_n

j₁

X

j=j₂+1 2^j−1

X

k=0

1_{|β_j,k_|>κλ_n_/2}.

We have

S_2,1≤Clnwn

wn j₂

X

j=τ

2^j≤Clnwn

wn

2^j² ≤C lnwn

wn

^2s/(2s+1) .

Forr≥1andp≥2, sinceB_p,r^s (M)⊆B_2,∞^s (M),

S_2,2 ≤ Clnwn

w_nλ²_n

j1

X

j=j₂+1 2^j−1

X

k=0

β²_j,k≤C

∞

X

j=j₂+1 2^j−1

X

k=0

β_j,k² ≤C2^−2j²^s

≤ C lnwn

w_n

^2s/(2s+1) .

(16)

Forr≥1,p∈[1,2)ands >1/p, sinceB^s_p,r(M)⊆B_2,∞^s+1/2−1/p(M)and(2s+ 1)(2− p)/2 + (s+ 1/2−1/p)p= 2s, we have

S2,2 ≤ Clnwn

w_nλ^pn j1

X

j=j2+1 2^j−1

X

k=0

|βj,k|^p

≤ C lnwn

w_n

^(2−p)/2 ^∞ X

j=j2+1

2−j(s+1/2−1/p)p

≤ C lnwn

wn

^(2−p)/2

2^−j²(s+1/2−1/p)p≤C lnwn

wn

^2s/(2s+1) .

So, forr≥1,{p≥2ands >0}or{p∈[1,2)ands >1/p},

S2≤C lnwn

w_n

^2s/(2s+1)

. (6.14)

Upper bound forS4.We have

S₄≤

j₁

X

j=τ 2^j−1

X

k=0

β²_j,k1_{|β_j,k_|<2κλ_n_}.

Letj₂be the integer (6.13). We have

S4≤S4,1+S4,2, where

S4,1=

j2

X

j=τ 2^j−1

X

k=0

β²_j,k1_{|β_j,k_|<2κλ_n_}, S4,2=

j1

X

j=j2+1 2^j−1

X

k=0

β_j,k² 1_{|β_j,k_|<2κλ_n_}.

We have

S_4,1≤Cλ²_n

j₂

X

j=τ

2^j≤Clnw_n wn

2^j² ≤C lnw_n

wn

^2s/(2s+1) .

Forr≥1andp≥2, sinceB_p,r^s (M)⊆B_2,∞^s (M), we have

S_4,2≤

∞

X

j=j2+1 2^j−1

X

k=0

β_j,k² ≤C

∞

X

j=j2+1

2^−2js≤C2^−2j²^s≤C lnwn

w_n

^2s/(2s+1) .

(17)

Forr≥1,p∈[1,2)ands >1/p, sinceB^s_p,r(M)⊆B_2,∞^s+1/2−1/p(M)and(2−p)(2s+ 1)/2 + (s+ 1/2−1/p)p= 2s, we have

S4,2 ≤ Cλ^2−p_n

j1

X

j=j2+1 2^j−1

X

k=0

|βj,k|^p

≤ C lnwn

w_n

(2−p)/2 ∞

X

j=j2+1

2−j(s+1/2−1/p)p

≤ C lnwn

w_n

(2−p)/2

2^−j²(s+1/2−1/p)p≤C lnwn

w_n

^2s/(2s+1) . So, forr≥1,{p≥2ands >0}or{p∈[1,2)ands >1/p},

S₄≤C lnw_n

wn

^2s/(2s+1)

. (6.15)

It follows from (6.11), (6.12), (6.14) and (6.15) that S≤C

lnw_n wn

2s/(2s+1)

. (6.16)

Combining (6.8), (6.9), (6.10) and (6.16), we have, forr ≥1,{p≥2ands >0}or {p∈[1,2)ands >1/p},

E Z 1

0

fb^H(x)−f(x)² dx

≤C lnw_n

wn

2s/(2s+1)

. The proof of Theorem 5.2 is complete.

Proof of Theorem 5.3.We need the following result.

Theorem 6.1(Tsybakov [25]). Letm ∈ N− {0,1}. We consider a general statistical model. We denotePf the distribution of the observations indexed by the unknown function f. Suppose that there exists a set of functionsE ⊆L²([0,1])containingh₀, . . . , h_msuch that :

Assumption (C1). there exists a real numberυ >0satisfying, for any(i, l)∈ {0, . . . , m}² withi6=l,

Z 1 0

(hi(x)−hl(x))²dx ^1/2

≥2υ.

Assumption (C2). there exists a constantp0∈]0,8⁻¹[such that inf

l∈{0,...,m}m⁻¹ X

i∈{0,...,m}

i6=l

Eh_i(log(Λ(Ph_i,Ph_l)))≤p0logm,

(18)

where Λ(Ph_i,Ph_l) = dPh_i/dPh_l is the likelihood ratio between the distributions indexed byh_iandh_l.

Then there exists a constantc >0such that inf

fe

sup

f∈EE Z 1

0

fe(x)−f(x)² dx

≥cυ², where the infimum is taken over all the possible estimators off.

Our aim is to apply Lemma 6.1 with the model (1.1) in the Gaussian case and the set of functionsE =B_p,r^s (M). Letj0be an integer such that

c0w^1/(2s+1)_n ≤2^j⁰≤C0w_n^1/(2s+1),

wherec0andC0 are two constants which will be chosen a posteriori. For any sequence ε= (εk)_k∈{0,...,2j0−1}∈ {0,1}²^j⁰, we definehε: [0,1]→Rby

h_ε(x) =M_∗2^−j⁰^(s+1/2)

2^j⁰−1

X

k=0

ε_kψ_j₀_,k(x),

whereM_∗ >0denotes a constant which will be chosen a posteriori. The wavelet coefficients ofhεare

β_j,k= Z 1

0

h_ε(x)ψ_j,k(x)dx=







M_∗2^−j⁰^(s+1/2)εk if j=j0,

0 otherwise.

Therefore2^j⁰^(s+1/2)(2^−j⁰P2^j⁰−1

k=0 |βj0,k|^p)^1/p =M∗. So, with a suitable choice ofM∗, we havehε∈B_p,r^s (M).

Let us now investigate Assumptions (C1) and (C2) of Lemma 6.1.

Assumption (C1). The Varshamov-Gilbert theorem (see [25, Lemma 2.7]) asserts that there exist a set Ej₀ =

ε⁽⁰⁾, . . . , ε^(T^j⁰⁾ and two constants, c ∈]0,1[and α ∈]0,1[, such that, for any u ∈ {0, . . . , T_j₀}, ε^(u) = (ε^(u)_k )_k∈{0,...,2j0−1} ∈ {0,1}²^j⁰ and any (u, v)∈ {0, . . . , Tj₀}²withu < v, the following hold:

2^j⁰−1

X

k=0

|ε^(u)_k −ε^(v)_k | ≥c2^j⁰, Tj0 ≥exp(α2^j⁰).

Considering such aEj₀, for any(u, v)∈ {0, . . . , Tj₀}²withu6=v, we have Z 1

0

(h_ε(u)(x)−h_ε(v)(x))²dx ^1/2

≥ c2^j⁰^/22^−j⁰^(s+1/2)



2^−j⁰

2^j⁰−1

X

k=0

|ε^(u)_k −ε^(v)_k |





1/2

≥ c2^−j⁰^s= 2υ.

The functions(h_ε(u))_u∈{0,...,T_j

0}satisfy Assumption (C1) withm=T_j₀andυ=C2^−j⁰^s. On Assumption (C2).Consider the following lemma.

(19)

Lemma 6.3. For any functionshvandhlinL²([0,1]), we have

Eh_v(log(Λ(Ph_v,Ph_l))) = 2⁻¹wn

Z 1 0

(hv(x)−hl(x))²dx.

Consider the functions (h_ε(u))_u∈{0,...,T_j

0} introduced in Assumption (C1). For any (u, v)∈ {0, . . . , T_j₀}²withu6=vand anyx∈[0,1],

|h_ε(u)(x)−h_ε(v)(x)| ≤C2^−j⁰^(s+1/2)

2^j⁰−1

X

k=0

|ψj₀,k(x)| ≤C2^−j⁰^(s+1/2)2^j⁰^/2=C2^−j⁰^s.

Applying Lemma 6.3 and takingc₀large enough, we obtain

Eh

ε(u)(log(Λ(Ph

ε(u),Ph

ε(v)))) = 2⁻¹wn

Z 1 0

(h_ε(u)(x)−h_ε(v)(x))²dx

≤ Cwn2^−2j⁰^s≤Cc^−2s+1₀ 2^j⁰ ≤p0logTj₀, withp₀∈]0,8⁻¹[. Assumption (C2) is satisfied.

It follows from Lemma 6.1 applied with(h_ε(u))_u∈{0,...,T_j

0},m=T_j₀ andυ=C2^−j⁰^s that there exists a constantc >0satisfying

inf

fe

sup

f∈B^s_p,r(M)

E Z 1

0

fe(x)−f(x)2

dx

≥c2^−2j⁰^s≥cw^−2s/(2s+1)_n .

The proof of Theorem 5.3 is complete.

Proof of Lemma 6.3.We have Λ(Ph_v,Ph_l) = dPh_v

dPh_l

= exp −2⁻¹

n

X

i=1

1

σ_i²(Yi−µi−hv(Xi))²−

n

X

i=1

1

σ_i²(Yi−µi−hl(Xi))²

!!

= exp −2⁻¹

n

X

i=1

1

σ_i²(h_l(X_i)−h_v(X_i))(2Y_i−2µ_i−h_v(X_i)−h_l(X_i))

!!

.

So, underPhv, we have

Λ(Phv,Phl) = exp 2⁻¹

n

X

i=1

1

σ²_i(h_v(X_i)−h_l(X_i))²+

n

X

i=1

(h_v(X_i)−h_l(X_i))(ξ_i−µ_i)

! .