Adaptive wavelet estimation of a function in an indirect regression model

(1)

HAL Id: hal-00497343

https://hal.archives-ouvertes.fr/hal-00497343

Preprint submitted on 4 Jul 2010

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Adaptive wavelet estimation of a function in an indirect regression model

Christophe Chesneau

To cite this version:

Christophe Chesneau. Adaptive wavelet estimation of a function in an indirect regression model. 2010.

�hal-00497343�

(2)

(will be inserted by the editor)

Adaptive wavelet estimation of a function in an indirect regression model

Christophe Chesneau

Received:

Abstract We consider a nonparametric regression model where m noise- perturbed functionsf₁, . . . , f_mare randomly observed. For a fixedν ∈ {1, . . . , m}, we want to estimatef_ν from the observations. To reach this goal, we develop an adaptive wavelet estimator based on a hard thresholding rule. Adopting the minimax approach under the mean integrated squared error over Besov balls, we prove that it attains a sharp rate of convergence.

Keywords indirect nonparametric regression ·minimax estimation ·Besov balls·wavelets·hard thresholding.

2000 Mathematics Subject Classification:62G07, 62G20.

1 Motivations

An indirect nonparametric regression model is considered: we observe n independent pairs of random variables (X₁, Y₁), . . . ,(X_n, Y_n) where, for any i∈ {1, . . . , n},

Yi=fV_i(Xi) +ξi, (1)

V1, . . . , Vn arenunobserved independent discrete random variables such that, for anyi∈ {1, . . . , n}, the set of possible values ofVi is

V_i(Ω) ={1, . . . , m}, m∈N^∗,

for any d ∈ {1, . . . , m}, fd : [0,1] → R is an unknown function,X1, . . . , Xn

are n i.i.d. random variables having the uniform distribution on [0,1] and ξ1, . . . , ξn areni.i.d. unobserved random variables such that

E(ξ₁) = 0, E(ξ₁²)<∞.

Laboratoire de Math´ematiques Nicolas Oresme, Universit´e de Caen Basse-Normandie, Cam- pus II, Science 3, 14032 Caen, France. E-mail: chesneau@math.unicaen.fr

(3)

(The distribution ofξ1can be unknown). We suppose thatV1, . . . , Vn, X1, . . . , Xn, ξ1, . . . , ξn are independent. For a fixed ν ∈ {1, . . . , m}, we want to estimate fν from (X1, Y1), . . . ,(Xn, Yn). An application of this estimation problem is the following:mnoise-perturbed signalsf1, . . . , fmare randomly observed and onlyf_ν is of interest.

To estimate fν, various methods can be investigated (Kernel, Spline, . . . ) (see e.g. Prakasa Rao (1983, 1999) and Tsybakov (2004)). In this study, we focus our attention on the wavelet methods. They are attractive for nonparametric function estimation because of their spatial adaptivity, computational efficiency and asymptotic optimality properties. They can achieve near optimal convergence rates over a wide range of function classes (Besov balls, . . . ) and enjoy excellent mean integrated squared error (MISE) properties when used to estimate spatially inhomogeneous function. Details on the basics on wavelet methods in function estimation can be found in Antoniadis (1997) and H¨ardle et al (1998).

When (1) is considered with V1 =. . . =Vn = 1, it becomes the classical nonparametric regression model. In this case, to estimate f1 =f, numerous wavelet methods have been developed. See e.g. Donoho and Johnstone (1994, 1995, 1998), Donoho et al (1995), Delyon and Juditsky (1996), Antoniadis et al. (1999), Cai and Brown (1999), Zhang and Zheng (1999), Cai (1999, 2002), Pensky and Vidakovic (2001), Chicken (2003), Kerkyacharian and Pi- card (2004), Chesneau (2007) and Pham Ngoc (2009). However, to the best of our knowledge, there is no adaptive wavelet estimator forf_ν in the general case.

In this paper, we develop an adaptive wavelet estimator for fν using the hard thresholding rule. It has the originality to combine an ”observations thresholding technique” introduced by Delyon and Juditsky (1996) with some technical tools taking into account the distributions of V₁, . . . , V_n. We eval- uate its performance via the minimax approach under the MISE over Besov balls B^s_p,r(M) (to be defined in Section 3). Under mild assumptions on the distributions of V₁, . . . , V_n, we prove that our estimator attains the rate of convergence

v_n=

z_nln(n/z_n) n

^2s/(2s+1) ,

wherezndepends on the distributions ofV1, . . . , Vn(see (5)). This rate is ”near optimal” in the sense that it is the one attained by the best nonadaptive linear wavelet estimator (the one which minimizes the MISE) up to a logarithmic term.

The paper is organized as follows. Assumptions on the model and some notations are introduced in Section 2. Section 3 briefly describes the wavelet basis on [0,1] and the Besov balls. The estimators are presented in Section 4.

The results are set in Section 5. Section 6 is devoted to the proofs.

(4)

2 Assumptions

Additional assumptions on the model (1) are presented below.

Assumption on (f_d)d∈{1,...,m}.We suppose that there exists a known con- stantC_∗>0 such that

sup

d∈{1,...,m}

sup

x∈[0,1]

|fd(x)| ≤C_∗. (2)

Assumptions onV1, . . . , Vn.Recall thatV1, . . . , Vn are unobserved and, for any i∈ {1, . . . , n}, we know

wd(i) =P(Vi=d), d∈ {1, . . . , m}.

We suppose that the matrix Γn= 1

n

X

i=1

wk(i)w`(i)

!

(k,`)∈{1,...,m}²

is nonsingular i.e. det(Γn)>0. For the considered ν (the one which refers to the estimation of fν) and anyi∈ {1, . . . , n}, we set

a_ν(i) = 1 det(Γn)

m

X

k=1

(−1)^k+νγ_ν,kⁿ w_k(i), (3) where γ_ν,kⁿ denotes the determinant of the minor (ν, k) of the matrix Γ_n. Then, for any d∈ {1, . . . , m},

1 n

n

X

i=1

aν(i)wd(i) =











1 ifd=ν, 0 otherwise,

(4)

and

(aν(1), . . . , aν(n)) = argmin

(b₁,...,b_n)∈Rⁿ

1 n

n

X

i=1

b²_i. Technical details can be found in Maiboroda (1996).

We set

z_n= 1 n

n

X

i=1

a²_ν(i) (5)

and we suppose thatzn< n/e.

In nonparametric statistics, the sequence (aν(i))i∈{1,...,n} has ever been used in some mixture density estimation problems. See e.g. Maiboroda (1996), Pokhyl’ko (2005) and Prakasa Rao (2010).

(5)

3 Wavelets and Besov balls

Wavelet basis. Let N ∈ N^∗, φ be a father wavelet of a multiresolution analysis onRandψbe the associated mother wavelet. Assume that

– supp(φ) = supp(ψ) = [1−N, N], – RN

1−Nφ(x)dx= 1,

– for anyv∈ {0, . . . , N−1},RN

1−Nx^vψ(x)dx= 0.

For instance, the Daubechies wavelets satisfy these assumptions. Set φ_j,k(x) = 2^j/2φ(2^jx−k), ψ_j,k(x) = 2^j/2ψ(2^jx−k).

Then there exists an integerτ satisfying 2^τ ≥2N such that the collection

B={φτ,k(.), k∈ {0, . . . ,2^τ−1}; ψ_j,k(.); j∈N−{0, . . . , τ−1}, k∈ {0, . . . ,2^j−1}}, (with an appropriate treatments at the boundaries) is an orthonormal basis

of L²([0,1]), the set of square-integrable functions on [0,1]. We refer to Cohen (1993).

For any integer`≥τ, any h∈L²([0,1]) can be expanded onBas h(x) =

2^`−1

X

k=0

α_`,kφ_`,k(x) +

∞

X

j=`

2^j−1

X

k=0

β_j,kψ_j,k(x),

where α_j,k andβ_j,k are the wavelet coefficients of hdefined by α_j,k=

Z 1 0

h(x)φ_j,k(x)dx, β_j,k= Z 1

0

h(x)ψ_j,k(x)dx. (6) Besov balls. Let M >0, s >0,p≥1 and r≥1. A function hbelongs to B_p,r^s (M) if and only if there exists a constantM^∗ >0 (depending onM) such that the associated wavelet coefficients (6) satisfy







∞

X

j=τ−1





2j(s+1/2−1/p)





2^j−1

X

k=0

|βj,k|^p





1/p





r





1/r

≤M^∗.

(We set β_τ−1,k = ατ,k). In this expression, s is a smoothness parameter and pand r are norm parameters. For a particular choice of s, p and r, B_p,r^s (M) contain the H¨older and Sobolev balls. See Meyer (1990).

4 Estimators

Wavelet coefficient estimators. The first step to estimate fν consists in expandingfν onBand estimating its unknown wavelet coefficients.

For any integerj≥τ and anyk∈ {0, . . . ,2^j−1},

(6)

– we estimateαj,k=R1

0 fν(x)φj,k(x)dxby αb_j,k= 1

n

X

i=1

a_ν(i)Y_iφ_j,k(X_i), (7) – we estimateβj,k=R1

0 fν(x)ψj,k(x)dxby βb_j,k= 1

n

X

i=1

Z_i1_{|Z_i_|≤γ_n_}, (8) where, for anyi∈ {1, . . . , n},

Zi =aν(i)Yiψj,k(Xi),

aν(i) is defined by (3), for any random event A, 1_A is the indicator function onA, the thresholdγn is defined by

γn =θ

r nz_n

ln(n/zn), (9)

z_n is defined by (5),θ=p

C_∗²+E(ξ²₁) andC_∗is the one in (2).

Remark 1. Mention that αb_j,k is an unbiased estimator of α_j,k, whereas βbj,k is not an unbiased estimator of βj,k. However (1/n)Pn

i=1Zi is an unbiased estimator ofβj,k. The proofs are given in (14) and (19).

Remark 2. The ”observations thresholding technique” used in (8) has been firstly introduced by Delyon and Juditsky (1996) for (1) in the classical case (i.e.V1=. . .=Vn= 1). In our general setting, this allows us to provide a good estimator ofβj,kunder mild assumptions on

– (a_ν(i))i∈{1,...,n}and a fortiori the distributions ofV₁, . . . , V_n(onlyz_n<

n/eis required),

– ξ₁, . . . , ξ_n (only finite moments of order 2 are required).

Linear estimator. Assuming thatfν ∈B^s_p,r(M) withp≥2, we define the linear estimator fb^L by

fb^L(x) =

2^j⁰−1

X

k=0

αbj0,kφj0,k(x), (10) where αbj,k is defined by (7) andj0is the integer satisfying

1 2

n z_n

^1/(2s+1)

<2^j⁰≤ n

z_n

^1/(2s+1) .

The definition of j0is chosen to minimize the MISE of fb^L. Note that it is not adaptive since it depends on s, the smoothness parameter offν.

(7)

Hard thresholding estimator.We define the hard thresholding estimator fb^H by

fb^H(x) =

2^τ−1

X

k=0

αb_τ,kφ_τ,k(x) +

j₁

X

j=τ 2^j−1

X

k=0

βb_j,k1{^|b^βj,k|≥κλn}ψ_j,k(x), (11) where αbj,k is defined by (7),βbj,k by (8),j1 is the integer satisfying

n 2zn

<2^j¹ ≤ n zn

, κ≥8/3 + 2 + 2p

16/9 + 4 andλ_n is the threshold λn=θ

rz_nln(n/z_n)

n . (12)

Further details on the hard thresholding wavelet estimator for the standard nonparametric regression model can be found in Donoho and Johnstone (1994, 1995, 1998) and Delyon and Juditsky (1996).

Note that the choice of γn in (9) depends on λn in (12): we have λn = θ²zn/γn. The definitions ofγn andλn are based on theoretical considera- tions.

5 Results

Theorem 1 Consider (1) under the assumptions of Section 2. Suppose that fν ∈B_p,r^s (M)with s >0,p≥2 andr≥1. Letfb^L be (10). Then there exists a constant C >0 such that

E Z 1

0

fb^L(x)−fν(x)2

dx

≤Czn

n

2s/(2s+1)

.

The proof of Theorem 1 uses moment inequalities on (7) and (8), and a suitable decomposition of the MISE.

Due to our weak assumptions on V1, . . . , Vn, ξ1, . . . , ξn, the optimal lower bound of (1) seems difficult to determine (see Tsybakov (2004)). However, sincefb^L is constructed to be the linear estimator which optimizes the MISE, our benchmark will be the rate of convergencevn = (zn/n)^2s/(2s+1).

Remark that, in the case V1 = . . . = Vn = 1 and ξ1 ∼ N(0,1), we have zn= 1 andvn(=n^−2s/(2s+1)) is the optimal rate of convergence (see Tsybakov (2004)).

Theorem 2 Consider (1)under the assumptions of Section2. Letfb^Hbe(11).

Suppose that fν ∈B_p,r^s (M) withr≥1,{p≥2 ands >0} or {p∈[1,2) and s >1/p}. Then there exists a constantC >0 such that

E Z 1

0

fb^H(x)−f_ν(x)² dx

≤C

z_nln(n/z_n) n

2s/(2s+1)

.

(8)

The proof of Theorem 2 is based on several probability results (moment inequalities, concentration inequality,. . . ) and a suitable decomposition of the MISE.

Theorem 2 proves thatfb^H attainsv_n = (z_n/n)^2s/(2s+1)up to the logarithmic term (ln(n/z_n))^2s/(2s+1).

Naturally, whenV1=. . .=Vn= 1 andξ1∼ N(0,1),fb^H attains the same rate of convergence than the standard hard thresholding estimator adapted to the classical nonparametric regression model (see Donoho and Johnstone (1994, 1995, 1998)). And this one is optimal in the minimax sense up to a logarithmic term.

Conclusions and perspectives.We construct an adaptive wavelet estimator to estimate the functionfν from the sophisticated regression model (1). Under mild assumptions, we prove that it attains a sharp rate of convergence for a wide class of functions.

Possible perspectives are to

– investigate the estimation of f in (1) whenX₁ has a more complex distribution than the random uniform one. In this case, the warped wavelet basis introduced in the nonparametric regression estimation by Kerkyacharian and Picard (2004) seems to be an adapted powerful tool.

– consider the case where the distributions ofV1, . . . , Vn are unknown.

– potentially improve the estimation offν (and remove the extra logarithmic term). The thresholding rule named BlockJS developed in wavelet estimation by Cai (1999, 2002) seems to be a good alternative.

All these aspects need further investigations that we leave for a future work.

6 Proofs

In this section, we consider (1) under the assumptions of Section 2. Moreover, C represents a positive constant which may differ from one term to another.

6.1 Auxiliary results

Proposition 1 For any integerj≥τ and anyk∈ {0, . . . ,2^j−1}, let αj,k be the wavelet coefficient (6) offν andαbj,k be (7). Then there exists a constant C >0such that

E

(αb_j,k−α_j,k)²

≤Cz_n n.

Proof of Proposition 1. First of all, we prove that αbj,k is an unbiased estimator ofαj,k. For anyi∈ {1, . . . , n}, set

Wi=aν(i)Yiφj,k(Xi).

(9)

SinceXi, Vi andξi are independent, andE(ξi) = 0, we have

E(Wi) =E(aν(i)Yiφj,k(Xi)) =E(aν(i)(fV_i(Xi) +ξi)φj,k(Xi))

=a_ν(i)E(f_V_i(X_i)φ_j,k(X_i)) +a_ν(i)E(ξ_i)E(φ_j,k(X_i))

=aν(i)E(fV_i(Xi)φj,k(Xi))

=aν(i)

m

X

d=1

wd(i) Z 1

0

fd(x)φj,k(x)dx. (13)

It follows from (13) and (4) that E(αbj,k) = 1

n

X

i=1

E(Wi) = 1 n

n

X

i=1

aν(i)

m

X

d=1

wd(i) Z 1

0

fd(x)φj,k(x)dx

!

=

m

X

d=1

Z 1 0

fd(x)φj,k(x)dx 1 n

n

X

i=1

aν(i)wd(i)

!

= Z 1

0

fν(x)φj,k(x)dx=αj,k. (14)

Soαbj,k is an unbiased estimator ofαj,k. Therefore E

(αbj,k−αj,k)²

=V(αbj,k) =V 1 n

n

X

i=1

Wi

!

= 1 n²

n

X

i=1

V(Wi)

≤ 1 n²

n

X

i=1

E W_i²

. (15)

For anyi∈ {1, . . . , n}, we have E W_i²

=E(a²_ν(i)Y_i²φ²_j,k(Xi)) =a²_ν(i)E (fV_i(Xi) +ξi)²φ²_j,k(Xi) . (16) SinceXi,Vi andξiare independent,E

φ²_j,k(Xi)

=R1

0 φ²_j,k(x)dx= 1 and, by (2), supd∈{1,...,m}sup_x∈[0,1]|fd(x)| ≤C_∗, we have

E (fV_i(Xi) +ξi)²φ²_j,k(Xi)

=E f_V²

i(X_i)φ²_j,k(X_i)

+ 2E(ξ_i)E(f_V_i(X_i)φ²_j,k(X_i)) +E ξ_i²

E φ²_j,k(X_i)

=E f_V²_i(Xi)φ²_j,k(Xi)

+E ξ₁²

≤C_∗²E φ²_j,k(Xi)

+E ξ²₁

=C_∗²+E ξ₁²

=θ². (17)

Putting (16) and (17) together, we obtain E W_i²

≤θ²a²_ν(i). (18) It follows from (15) and (18) that

E

(αbj,k−αj,k)²

≤ 1 n θ²1

n

X

i=1

a²_ν(i)

!

=Cz_n n.

(10)

Proposition 2 For any integerj≥τ and anyk∈ {0, . . . ,2^j−1}, let βj,k be the wavelet coefficient (6) of fν andβbj,k be (8). Then there exists a constant C >0such that

E

βbj,k−βj,k

4

≤C(znln(n/zn))²

n² .

Proof of Proposition 2.Takingψinstead ofφin (14), we obtain βj,k=

Z 1 0

fν(x)ψj,k(x)dx= 1 n

n

X

i=1

E(Zi)

= 1 n

n

X

i=1

E(Zi1_{|Z_i_|≤γ_n_}) + 1 n

n

X

i=1

E(Zi1_{|Z_i_|>γ_n_}). (19) Therefore, by the elementary inequality (x+y)⁴≤8(x⁴+y⁴), (x, y)∈R², we have

E

βbj,k−βj,k

4

=E



 1 n

n

X

i=1

Zi1_{|Z_i_|≤γ_n_}−E Zi1_{|Z_i_|≤γ_n_}

−1 n

n

X

i=1

E Zi1_{|Z_i_|>γ_n_}

!4



≤8(A+B), (20)

where

A=E



 1 n

n

X

i=1

Zi1_{|Z_i_|≤γ_n_}−E(Zi1_{|Z_i_|≤γ_n_})

!4



and

B= 1 n

n

X

i=1

E(|Zi|1_{|Z_i_|>γ_n_})

!4

. Let us boundAandB, in turn.

Upper bound forA.Let us present the Rosenthal inequality (see Rosenthal (1970)).

Lemma 1 (Rosenthal’s inequality)Letn∈N^∗,p≥2and(U_i)i∈{1,...,n}be n zero mean independent random variables such that supi∈{1,...,n}E(|U_i|^p)<

∞. Then there exists a constant C >0 such that

E

n

X

i=1

Ui

p!

≤Cmax





n

X

i=1

E(|Ui|^p),

n

X

i=1

E U_i²

!^p/2

.

(11)

Applying the Rosenthal inequality withp= 4 and, for anyi∈ {1, . . . , n}, U_i=Z_i1_{|Z_i_|≤γ_n_}−E Z_i1_{|Z_i_|≤γ_n_}

, we obtain

A= 1 n⁴E





n

X

i=1

U_i

!⁴

≤C 1 n⁴max





n

X

i=1

E U_i⁴ ,

n

X

i=1

E U_i²

!²

. Using (18) (with ψ instead of φ), we have, for any a ∈ {2,4} and any i ∈ {1, . . . , n},

E(U_iâ)≤2âE Z_iâ1_{|Z_i_|≤γ_n_}

≤2^aγ_n^a−2E Z_i²

≤2^aγ_n^a−2θ²a²_ν(i).

Hence, usingzn< n/e, A≤C 1

n⁴max



γ_n²

n

X

i=1

a²_ν(i),

n

X

i=1

a²_ν(i)

!2



=C 1 n⁴max

n²

ln(n/zn)z_n², n²z²_n

=Cz²_n

n². (21)

Upper bound for B. Using again (18) (with ψ instead of φ), for any i ∈ {1, . . . , n}, we obtain

E |Zi|1_{|Z_i_|>γ_n_}

≤E Z_i² γn

≤1 θ

s

ln(n/zn) nzn

θ²a²_ν(i) =θ s

ln(n/zn) nzn

a²_ν(i).

Therefore

B= 1

n

X

i=1

E(|Zi|1_{|Z_i_|>γ_n_})

!4

≤θ⁴(ln(n/zn))² n²z_n²

1 n

n

X

i=1

a²_ν(i)

!4

=θ⁴(ln(n/z_n))²

n²z_n² z_n⁴=θ⁴(z_nln(n/z_n))²

n² . (22)

Combining (20), (21) and (22) and usingzn< n/e, we have E

βbj,k−βj,k

⁴

≤C 1

n²z_n²+(z_nln(n/z_n))² n²

≤C(z_nln(n/z_n))²

n² .

Proposition 3 For any integerj≥τ and anyk∈ {0, . . . ,2^j−1}, let βj,k be the wavelet coefficient (6)of fν andβbj,k be (8). Then, for anyκ≥8/3 + 2 + 2p

16/9 + 4,

P

|βb_j,k−β_j,k| ≥κλ_n/2

≤2z_n n

² .

(12)

Proof of Proposition 3.By (19) we have

|βbj,k−βj,k|

= 1 n

n

X

i=1

Z_i1_{|Z_i_|≤γ_n_}−E Z_i1_{|Z_i_|≤γ_n_}

− 1 n

n

X

i=1

E Z_i1_{|Z_i_|>γ_n_}

≤ 1 n

n

X

i=1

Zi1_{|Z_i_|≤γ_n_}−E Zi1_{|Z_i_|≤γ_n_}

+ 1 n

n

X

i=1

E |Zi|1_{|Z_i_|>γ_n_} .

Using (18) (withψinstead ofφ), we obtain 1

n

X

i=1

E |Zi|1_{|Z_i_|>γ_n_}

≤ 1 γ_n

1 n

n

X

i=1

E Z_i²

!

≤ 1 γ_n θ²1

n

X

i=1

a²_ν(i)

!

= 1

γ_nθ²zn= 1 θ

s

ln(n/zn) nz_n θ²zn

=θ

rznln(n/zn) n =λn. Hence

S =P

|βbj,k−βj,k| ≥κλn/2

≤P 1 n

n

X

i=1

Zi1_{|Z_i_|≤γ_n_}−E Zi1_{|Z_i_|≤γ_n_}

≥(κ/2−1)λn

! .

Now we need the Bernstein inequality presented in the lemma below (see Petrov (1995)).

Lemma 2 (Bernstein’s inequality)Letn∈N^∗and(Ui)i∈{1,...,n}benzero mean independent random variables such that there exists a constant M >0 satisfyingsupi∈{1,...,n}|Ui| ≤M <∞. Then, for anyλ >0,

P

n

X

i=1

Ui

≥λ

!

≤2 exp

− λ²

2 (Pn

i=1E(U_i²) +λM /3)

.

Let us set, for anyi∈ {1, . . . , n},

U_i=Z_i1_{|Z_i_|≤γ_n_}−E Z_i1_{|Z_i_|≤γ_n_} . For anyi∈ {1, . . . , n}, we haveE(U_i) = 0,

|Ui| ≤ |Zi|1_{|Z_i_|≤γ_n_}+E |Zi|1_{|Z_i_|≤γ_n_}

≤2γn

and, using again (18) (withψinstead ofφ), E U_i²

=V Zi1_{|Z_i_|≤γ_n_}

≤E Z_i²1_{|Z_i_|≤γ_n_}

≤E Z_i²

≤θ²a²_ν(i).

(13)

So n

X

i=1

E U_i²

≤θ²

n

X

i=1

a²_ν(i) =θ²nzn. It follows from the Bernstein inequality that

S≤2 exp

− n²(κ/2−1)²λ²_n

2 (θ²nz_n+ 2n(κ/2−1)λ_nγ_n/3)

. Since

λnγn =θ

rz_nln(n/z_n)

n θ

r nz_n

ln(n/zn) =θ²zn, λ²_n =θ²z_nln(n/z_n)

n ,

we have, for anyκ≥8/3 + 2 + 2p

16/9 + 4, S≤2 exp

−(κ/2−1)²ln(n/zn) 2 (1 + 2(κ/2−1)/3)

= 2 n

z_n

−2(1+2(κ/2−1)/3)^(κ/2−1)2

≤2zn

n 2

.

6.2 Proofs of the main results

Proof of Theorem 1.We expand the functionfν onBas f_ν(x) =

2^j⁰−1

X

k=0

α_j₀_,kφ_j₀_,k(x) +

∞

X

j=j₀ 2^j−1

X

k=0

β_j,kψ_j,k(x).

We have

fb^L(x)−f_ν(x) =

2^j⁰−1

X

k=0

(αb_j₀_,k−α_j₀_,k)φ_j₀_,k(x)−

∞

X

j=j₀ 2^j−1

X

k=0

β_j,kψ_j,k(x).

Hence

E Z 1

0

fb^L(x)−fν(x)² dx

=A+B, where

A=

2^j⁰−1

X

k=0

E

(αb_j₀_,k−α_j₀_,k)²

, B=

∞

X

j=j₀ 2^j−1

X

k=0

β²_j,k. Proposition 1 gives

A≤2^j⁰zn

n ≤zn

n

2s/(2s+1)

. Sincep≥2, we have B_p,r^s (M)⊆B_2,∞^s (M). Hence

B≤C2^−2j⁰^s≤Czn

n

2s/(2s+1)

.

(14)

So

E Z 1

0

fb^L(x)−fν(x)2

dx

≤Czn

n

2s/(2s+1)

. The proof of Theorem 1 is complete.

Proof of Theorem 2.We expand the functionfν onBas

f_ν(x) =

2^τ−1

X

k=0

α_τ,kφ_τ,k(x) +

∞

X

j=τ 2^j−1

X

k=0

β_j,kψ_j,k(x).

We have

fb^H(x)−fν(x)

=

2^τ−1

X

k=0

(αb_τ,k−α_τ,k)φ_τ,k(x) +

j₁

X

j=τ 2^j−1

X

k=0

βb_j,k1{^|b^βj,k|≥κλn} −β_j,k ψ_j,k(x)

−

∞

X

j=j1+1 2^j−1

X

k=0

βj,kψj,k(x).

Hence

E Z 1

0

fb^H(x)−fν(x)² dx

=R+S+T, (23)

where R=

2^τ−1

X

k=0

E

(αb_τ,k−α_τ,k)²

, S=

j₁

X

j=τ 2^j−1

X

k=0

E

βb_j,k1{^|b^βj,k|≥κλn} −β_j,k² and

T =

∞

X

j=j₁+1 2^j−1

X

k=0

β_j,k² . Let us boundR, T andS, in turn.

By Proposition 1 and the inequalities:zn< n/e,znln(n/zn)< nand 2s/(2s+

1)<1, we have R≤Czn

n ≤Cznln(n/zn)

n ≤C

znln(n/zn) n

^2s/(2s+1)

. (24)

(15)

For r ≥ 1 and p ≥ 2, we have B_p,r^s (M) ⊆ B_2,∞^s (M). Using zn < n/e, znln(n/zn)< nand 2s/(2s+ 1)<2s, we obtain

T ≤C

∞

X

j=j₁+1

2^−2js≤C2^−2j¹^s≤C n

zn

−2s

≤C

z_nln(n/z_n) n

2s

≤C

z_nln(n/z_n) n

2s/(2s+1)

.

Forr≥1 andp∈[1,2), we have B_p,r^s (M)⊆B^s+1/2−1/p_2,∞ (M). Since s >1/p, we haves+ 1/2−1/p > s/(2s+ 1). So, byzn< n/eandznln(n/zn)< n, we have

T ≤C

∞

X

j=j₁+1

2−2j(s+1/2−1/p)≤C2^−2j¹(s+1/2−1/p)

≤C n

zn

−2(s+1/2−1/p)

≤C

znln(n/zn) n

2(s+1/2−1/p)

≤C

znln(n/zn) n

2s/(2s+1)

.

Hence, forr≥1,{p≥2 ands >0}or {p∈[1,2) and s >1/p}, we have T ≤C

z_nln(n/z_n) n

2s/(2s+1)

. (25)

The termS can be decomposed as

S=S₁+S₂+S₃+S₄, (26)

where S₁=

j1

X

j=τ 2^j−1

X

k=0

E

βb_j,k−β_j,k2

1{^|b^βj,k|≥κλn}1_{|β_j,k_|<κλ_n_/2}

,

S₂=

j₁

X

j=τ 2^j−1

X

k=0

E

βb_j,k−β_j,k²

1{^|b^βj,k|≥κλn}1_{|β_j,k_|≥κλ_n_/2}

,

S₃=

j₁

X

j=τ 2^j−1

X

k=0

E

β_j,k² 1{^|b^βj,k|<κλn}1_{|β_j,k_|≥2κλ_n_}

and

S₄=

j₁

X

j=τ 2^j−1

X

k=0

E

β²_j,k1{^|b^βj,k|<κλn}1_{|β_j,k_|<2κλ_n_} . Upper bounds forS1 and S3.We have

n|bβ_j,k|< κλ_n, |βj,k| ≥2κλ_no

⊆n

|βb_j,k−β_j,k|> κλ_n/2o ,

(16)

n|βbj,k| ≥κλn, |βj,k|< κλn/2o

⊆n

|bβj,k−βj,k|> κλn/2o and

n|βbj,k|< κλn, |βj,k| ≥2κλn

o⊆n

|βj,k| ≤2|βbj,k−βj,k|o . So

max(S1, S3)≤C

j₁

X

j=τ 2^j−1

X

k=0

E

βbj,k−βj,k

²

1{^|b^βj,k−β_j,k|>κλn/2}

.

It follows from the Cauchy-Schwarz inequality and Propositions 2 and 3 that E

βbj,k−βj,k

²

1{|bβ_j,k−βj,k|>κλn/2}

≤

E

βbj,k−βj,k

⁴^1/2 P

|βbj,k−βj,k|> κλn/2^1/2

≤Cz²_nln(n/z_n) n² .

Hence, usingz_n< n/e,z_nln(n/z_n)< nand 2s/(2s+ 1)<1, we have max(S₁, S₃)≤Cz_n²ln(n/z_n)

n²

j₁

X

j=τ

2^j≤Cz²_nln(n/z_n) n² 2^j¹

≤Cz_nln(n/z_n)

n ≤C

z_nln(n/z_n) n

2s/(2s+1)

. (27)

Upper bound forS2.Using Proposition 2 and the Cauchy-Schwarz inequality, we obtain

E

βbj,k−βj,k

2

≤

E

βbj,k−βj,k

4^1/2

≤Cznln(n/zn)

n .

Hence

S2≤Cznln(n/zn) n

j1

X

j=τ 2^j−1

X

k=0

1_{|β_j,k_|>κλ_n_/2}.

Letj₂ be the integer defined by 1

2

n znln(n/zn)

1/(2s+1)

<2^j² ≤

n znln(n/zn)

1/(2s+1)

. (28)

We have

S2≤S2,1+S2,2,

(17)

where

S2,1=Cznln(n/zn) n

j₂

X

j=τ 2^j−1

X

k=0

1_{|β_j,k_|>κλ_n_/2}

and

S2,2=Cznln(n/zn) n

j₁

X

j=j₂+1 2^j−1

X

k=0

1_{|β_j,k_|>κλ_n_/2}. We have

S2,1≤Cz_nln(n/z_n) n

j₂

X

j=τ

2^j≤Cz_nln(n/z_n)

n 2^j²≤C

z_nln(n/z_n) n

2s/(2s+1)

.

Forr≥1 andp≥2, sinceB_p,r^s (M)⊆B^s_2,∞(M),

S_2,2≤Cz_nln(n/z_n) nλ²_n

j₁

X

j=j₂+1 2^j−1

X

k=0

β_j,k² ≤C

∞

X

j=j₂+1 2^j−1

X

k=0

β_j,k² ≤C2^−2j²^s

≤C

z_nln(n/z_n) n

2s/(2s+1)

.

For r ≥ 1, p ∈ [1,2) and s > 1/p, since B_p,r^s (M) ⊆ B_2,∞^s+1/2−1/p(M) and (2s+ 1)(2−p)/2 + (s+ 1/2−1/p)p= 2s, we have

S2,2 ≤Cznln(n/zn) nλ^pn

j₁

X

j=j₂+1 2^j−1

X

k=0

|βj,k|^p

≤C

z_nln(n/z_n) n

(2−p)/2 ∞

X

j=j₂+1

2−j(s+1/2−1/p)p

≤C

z_nln(n/z_n) n

(2−p)/2

2^−j²(s+1/2−1/p)p≤C

z_nln(n/z_n) n

2s/(2s+1)

. So, forr≥1,{p≥2 ands >0} or{p∈[1,2) ands >1/p},

S2≤C

z_nln(n/z_n) n

2s/(2s+1)

. (29)

Upper bound forS₄.We have S4≤

j1

X

j=τ 2^j−1

X

k=0

β_j,k² 1_{|β_j,k_|<2κλ_n_}.

Letj₂ be the integer (28). We have

S4≤S4,1+S4,2,

(18)

where S_4,1=

j₂

X

j=τ 2^j−1

X

k=0

β_j,k² 1_{|β_j,k_|<2κλ_n_}, S_4,2=

j₁

X

j=j₂+1 2^j−1

X

k=0

β_j,k² 1_{|β_j,k_|<2κλ_n_}.

We have S4,1≤Cλ²_n

j2

X

j=τ

2^j ≤Cznln(n/zn)

n 2^j² ≤C

znln(n/zn) n

^2s/(2s+1) . Forr≥1 andp≥2, sinceB_p,r^s (M)⊆B^s_2,∞(M), we have

S4,2≤

∞

X

j=j₂+1 2^j−1

X

k=0

β_j,k² ≤C

∞

X

j=j₂+1

2^−2js≤C2^−2j²^s≤C

znln(n/zn) n

2s/(2s+1)

.

For r ≥ 1, p ∈ [1,2) and s > 1/p, since B_p,r^s (M) ⊆ B_2,∞^s+1/2−1/p(M) and (2−p)(2s+ 1)/2 + (s+ 1/2−1/p)p= 2s, we have

S4,2 ≤Cλ^2−p_n

j1

X

j=j2+1 2^j−1

X

k=0

|βj,k|^p

≤C

znln(n/zn) n

^(2−p)/2 ^∞ X

j=j2+1

2−j(s+1/2−1/p)p

≤C

znln(n/zn) n

(2−p)/2

2^−j²(s+1/2−1/p)p≤C

znln(n/zn) n

2s/(2s+1)

. So, forr≥1,{p≥2 ands >0} or{p∈[1,2) ands >1/p},

S4≤C

znln(n/zn) n

2s/(2s+1)

. (30)

It follows from (26), (27), (29) and (30) that S≤C

znln(n/zn) n

^2s/(2s+1)

. (31)

Combining (23), (24), (25) and (31), we have, forr≥1,{p≥2 ands >0}

or{p∈[1,2) ands >1/p}, E

Z 1 0

fb^H(x)−fν(x)² dx

≤C

znln(n/zn) n

2s/(2s+1)

. The proof of Theorem 2 is complete.

Acknowledgment.This work is supported by ANR grant NatImages, ANR- 08-EMER-009.

(19)

References

Antoniadis, A. (1997). Wavelets in statistics: a review (with discussion), Journal of the Italian Statistical Society, Series B,6, 97-144.

Antoniadis, A., Gregoire, G. and Vial, P. (1997). Random design wavelet curve smoothing.

Statist. Probab. Lett.,35, 225-232.

Cai, T. (1999). Adaptive wavelet estimation: a block thresholding and oracle inequality approach,The Annals of Statistics,27, 898-924.

Cai, T. (2002). On block thresholding in wavelet regression: adaptivity, block size and threshold level,Statistica Sinica,12, 1241-1273.

Cai, T. and Brown, L. D. (1999). Wavelet estimation for samples with random uniform design,Statistics and Probability Letters,42, 313-321.

Chesneau, C. (2007). Wavelet block thresholding for samples with random design: a minimax approach under theL^prisk,Electronic Journal of Statistics,1, 331-346.

Chicken, E. (2003). Block thresholding and wavelet estimation for nonequispaced samples, Journal of Statistical Planning and Inference,116, 113-129.

Cohen, A., Daubechies, I., Jawerth, B. and Vial, P. (1993). Wavelets on the interval and fast wavelet transforms,Applied and Computational Harmonic Analysis, 24,1, 54–81.

Delyon, B. and Juditsky, A. (1996). On minimax wavelet estimators,Applied Computational Harmonic Analysis,3, 215–228.

Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage, Biometrika,81, 425-455.

Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage,Journal of the American Statistical Association,90, 432, 1200-1224.

Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage,The Annals of Statistics,26, 879-921.

Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: asymptopia (with discussion),Journal of the Royal Statistical SocietySerie B,57, 301-369.

H¨ardle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998).Wavelet, Approxima- tion and Statistical Applications, Lectures Notes in Statistics, New York 129,Springer Verlag.

Kerkyacharian, G. and Picard, D. (2004). Regression in random design and warped wavelets, Bernoulli,10(6), 1053-1105.

Maiboroda, R. E. (1996). Estimators of components of a mixture with varying concentra- tions,Ukrain. Mat. Zh.,48, 4, 562-566.

Meyer, Y. (1990).Ondelettes et Op´erateurs, Hermann, Paris.

Pensky, M. and Vidakovic, B. (2001). On non-equally spaced wavelet regression,Ann. Inst.

Statist. Math.,53, 681–690.

Petrov, V. V. (1995). Limit Theorems of Probability Theory: Sequences of Independent Random Variables, Oxford: Clarendon Press.

Pham Ngoc, T. M. (2009). Regression in random design and Bayesian warped wavelets estimators,Electronic Journal of Statistics,3, 1084-1112.

Pokhyl’ko, D. (2005). Wavelet estimators of a density constructed from observations of a mixture,Theor. Prob. and Math. Statist., 70, 135-145.

Prakasa Rao, B. L. S. (1983). Nonparametric functional estimation, Academic Press, Or- lando.

Prakasa Rao, B. L. S. (1999). Nonparametric functional estimation: an overview,Asymp- totics, Nonparametrics and Time Series, Ed. Subir Ghosh, 461-509, Marcel Dekker Inc.

New York.

Prakasa Rao, B. L. S. (2010). Wavelet linear estimation for derivatives of a density from observations of mixtures with varying mixing proportions,Indian Journal of Pure and Applied Mathematics, 41,1, 275-291.

Rosenthal, H. P. (1970). On the subspaces ofL^p(p≥2) spanned by sequences of independent random variables,Israel Journal of Mathematics,8, pp. 273-303.

Tsybakov, A. B. (2004).Introduction a l’estimation non-paramtrique, Springer.

(20)

Zhang, S. and Zheng, Z. (1999). Nonlinear wavelet estimation of regression function with random design,Science in China Series A: Mathematics,42, 8, 825-833.