Adaptive wavelet regression in random design and general errors with weakly dependent data

(1)

HAL Id: hal-00562228

https://hal.archives-ouvertes.fr/hal-00562228

Preprint submitted on 2 Feb 2011

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

general errors with weakly dependent data

Christophe Chesneau

To cite this version:

Christophe Chesneau. Adaptive wavelet regression in random design and general errors with weakly

dependent data. 2011. �hal-00562228�

(2)

weakly dependent data

Christophe Chesneau

University of Caen (LMNO), Caen, France. Email: chesneau@math.unicaen.fr

Abstract

We investigate function estimation in a nonparametric regression model having the fol-lowing particularities: the design is random (with a known distribution), the errors admit finite moments of order 2 and the data are weakly dependent; the exponentially strongly mixing case is considered. In this general framework, we construct a new adaptive estimator. It is based on wavelets and the combination of two hard thresh-olding rules. We determine an upper bound of the associated mean integrated squared error and prove that it is sharp for a wide class of regression functions.

Keywords: Nonparametric regression, weakly dependent data, minimax estimation, Besov balls, wavelets, hard thresholding.

2000 Mathematics Subject Classification: 62G07, 62G20.

1 Introduction

Let (Xi, Yi)i∈Zbe a bivariate stationary random process where, for any i ∈ Z,

Yi= f (Xi) + ξi, (1.1)

f : [0, 1] → R is an unknown function, (Xi)i∈Z is a sequence of identically distributed

random variables having the common known density g : [0, 1] → [0, ∞) and (ξi)i∈Z is

a sequence of identically distributed variables independent of (Xi)i∈Zsatisfying E(ξ1) =

0 and E(ξ2

1) < ∞. We suppose that (Xi, Yi)i∈Z is strongly mixing (to be defined in

Section 2). Given n observations (X1, Y1), . . . , (Xn, Yn) drawn from (Xi, Yi)i∈Z, we aim

to estimate f globally on [0, 1].

To measure the performance of an estimator bf of f , we use the Mean Integrated Squared Error (MISE) defined by:

R( b_{f , f ) = E} Z 1 0 ( bf (x) − f (x))2dx .

Our goal is to construct bf such that the associated MISE is as small as possible. Many methods can be considered (kernel, spline, wavelets,. . . ) (see e.g. [31]). In this study, we

(3)

focus our attention on the wavelet methods. They are attractive for nonparametric func-tion estimafunc-tion because of their spatial adaptivity, computafunc-tional efficiency and asymptotic optimality properties. Further details can be found in [1] and [22].

In the litterature, when (X1, Y1), . . . , (Xn, Yn) are i.i.d., various wavelet methods have

been developed. See, e.g., [14–16], [17], [21], [2, 3], [4], [30], [5, 6] [7, 8], [10], [29], [13], [33], [23] and [9]. When ξ1, . . . , ξn have some kind of dependence (long memory,

ρ-mixing,. . . ), see e.g. [18], [19], [25], [26] and [24]. To the best of our knowledge, the wavelet estimation of f when (X1, Y1), . . . , (Xn, Yn) are weakly dependent has only been

investigated by [26]. More precisely, [26] constructs a linear nonadaptive wavelet estimator of f which attained a sharp rate of convergence under the uniform risk over Besov balls. However, the adaptive wavelet estimation of f , more realistic, has never been addressed earlier and motivates this study. In addition to this new challenge, we relax some classical assumptions on the errors: the common distribution of ξ1, . . . , ξn can be unknown; only

E(ξ1) = 0 and E(ξ21) < ∞ are required.

We construct a new adaptive wavelet estimator based on the following steps: we esti-mate the unknown wavelet coefficients of f by a new thresholded versions of the empirical ones, we operate a term-by-term selection of these estimators via a hard thresholding rule, then we reconstruct the selected estimators by taking the initial wavelet basis and choos-ing appropriate levels. Naturally, the definitions of both thresholds take into account the dependence of the data and are chosen to minimize the associated MISE. Assuming that f belongs to a Besov balls Bs

p,q(M ) (to be defined in Section 3), we prove that our estimator

b f satisfies R( bf , f ) ≤ C ln nθ nθ 2s/(2s+1) ,

where C > 0 is a constant (independent of n), nθ= nθ/(θ+1)and θ refers to the

exponen-tially strong mixing case.

The paper is organized as follows. Section 2 clarifies the assumptions on the model and introduces some notations. Section 3 describes the wavelet basis on [0, 1] and the Besov balls Bsp,q(M ). Our wavelet hard thresholding estimator is presented in Section 4. The

results are set in Section 5. Section 6 is devoted to the proofs.

2 Assumptions and notations

Dependence assumption on (Xi, Yi)i∈Z. Set, for any i ∈ Z, Zi = (Xi, Yi). We suppose

that Z = (Zi)i∈Zis stationary and exponentially strongly mixing. Let us now clarify

this kind of dependence.

For any m ∈ Z, we define the m-th strongly mixing coefficient of Z by am= sup

(A,B)∈FZ

−∞,0×Fm,∞Z

(4)

where, for any i ∈ Z, FZ

−∞,iis the σ-algebra generated by . . . , Zi−1, Ziand Fi,∞Z is

the σ-algebra generated by Zi, Zi+1, . . .. The bivariate random process Z is said to

be strongly mixing if limm→∞am= 0. The exponentially strongly mixing condition

is characterized by the following inequality: there exists three known constants, γ > 0, c > 0 and θ > 0 such that, for any m ∈ Z,

am≤ γexp −c|m|θ . (2.1)

This assumption is satisfied by a large class of processes (ARMA processes, ARCH process, . . . ). See e.g. [32], [20] and [28]. Remark that, when θ → ∞, Z = (Zi)i∈Z

becomes a bivariate sequence of i.i.d. random variables.

Boundedness assumption on f . We suppose that there exists a known constant C∗ > 0

such that

sup

x∈[0,1]

|f (x)| ≤ C∗. (2.2)

Boundedness ssumption on g. We suppose that there exists a known constant c∗ > 0

such that

inf

x∈[0,1]

g(x) ≥ c∗. (2.3)

3 Wavelet bases and Besov balls

3.1 Wavelet basis

Let N ∈ N∗(the set of positive integers), and φ and ψ be the initial wavelet functions of the Daubechies wavelets dbN . Set

φj,k(x) = 2j/2φ(2jx − k), ψj,k(x) = 2j/2ψ(2jx − k).

Then, with an appropriate treatments at the boundaries, there exists an integer τ satisfying 2τ _{≥ 2N such that, for any integer ` ≥ τ ,}

B = {φ`,k(.), k ∈ {0, . . . , 2`−1}; ψj,k(.); j ∈ N−{0, . . . , `−1}, k ∈ {0, . . . , 2j−1}},

is an orthonormal basis of L2

([0, 1]) = {h : [0, 1] → R; R1

0 h

2_{(x)dx < ∞}. See [11].}

For any integer ` ≥ τ , any h ∈ L2([0, 1]) can be expanded on B as

h(x) = 2`−1 X k=0 α`,kφ`,k(x) + ∞ X j=` 2j−1 X k=0 βj,kψj,k(x),

where αj,kand βj,kare the wavelet coefficients of h defined by

αj,k= Z 1 0 h(x)φj,k(x)dx, βj,k= Z 1 0 h(x)ψj,k(x)dx. (3.1)

(5)

3.2 Besov balls

Let M > 0, s > 0, p ≥ 1 and r ≥ 1. A function h belongs to Bp,rs (M ) if, and

only if, there exists a constant M∗> 0 (depending on M ) such that the associated wavelet coefficients (3.1) satisfy 2τ (1/2−1/p) 2τ₋₁ X k=0 |ατ,k|p !1/p +    ∞ X j=τ   2 j(s+1/2−1/p)   2j₋₁ X k=0 |βj,k|p   1/p   r   1/r ≤ M∗.

In this expression, s is a smoothness parameter and p and r are norm parameters. For a particular choice of s, p and r, the Besov balls contain the standard H¨older and Sobolev balls. See [27].

4 Estimators

4.1 Wavelet coefficient estimators and some of their statistical properties

The first step to estimate f in (1.1) consists in expanding f on the wavelet basis B. Then we aim to estimate the unknown wavelet coefficients: αj,k=

R1

0 f (x)φj,k(x)dx and

βj,k=

R1

0 f (x)ψj,k(x)dx. The considered estimators are described below.

Set

γn= µ

r _n

θ

ln nθ

where nθ= nθ/(θ+1), θ is the one in (2.1) and µ =p(C∗2+ E(ξ21))/c∗.

For any integer j ≥ τ and any k ∈ {0, . . . , 2j− 1}, • we estimate αj,kby b αj,k= 1 n n X i=1 Yi g(Xi) φj,k(Xi)1In Yi g(Xi)φj,k(Xi) ≤γn o, (4.1)

where, for any random event A, 1IAis the indicator function on A.

• we estimate βj,kby b βj,k= 1 n n X i=1 Yi g(Xi) ψj,k(Xi)1In Yi g(Xi)ψj,k(Xi) ≤γn o. (4.2)

Remark thatα_bj,kand bβj,kare thresholded versions of the standard empirical wavelet

es-timators for (1.1) (see, e.g., [14–16]). Such an observation thresholding has been intro-duced by [13] for (1.1) when (X1, Y1), . . . , (Xn, Yn) are i.i.d.. In our study, it allows

(6)

us to have non restrictive assumptions on ξ1, . . . , ξn and treat the weak dependence of

(X1, Y1), . . . , (Xn, Yn).

Propositions 4.1 and 4.2 below show moments properties for (4.1) and (4.2).

Proposition 4.1. Consider (1.1) under the assumptions of Section 2. For any integer j ≥ τ and anyk ∈ {0, . . . , 2j_{− 1}, let β}

j,k =R 1

0 f (x)ψj,k(x)dx and bβj,kbe(4.2). Then there

exists a constantC > 0 such that E ( bβj,k− βj,k)2 ≤ Cln nθ nθ . This inequality holds withαj,k =R

1

0 f (x)φj,k(x)dx instead of βj,kand αbj,k defined by (4.1) instead of bβj,k.

Proposition 4.2. Consider (1.1) under the assumptions of Section 2. For any integer j ≥ τ and anyk ∈ {0, . . . , 2j− 1}, let βj,k =

R1

0 f (x)ψj,k(x)dx and bβj,kbe(4.2). Then there

exists a constantC > 0 such that E

( bβj,k− βj,k)4

≤ C.

Proposition 4.3 below determines a concentration inequality for (4.2).

Proposition 4.3. Consider (1.1) under the assumptions of Section 2. For any integer j ≥ τ and anyk ∈ {0, . . . , 2j_{− 1}, let β}

j,k= R1 0 f (x)ψj,k(x)dx, bβj,kbe(4.2) and λn= µ r ln nθ nθ .

Then, for anyκ ≥ 2 + 16/(3u) + 4p(1/u)(16/9u2_{+ 2) with u = (1/2)(c/8)}1/(θ+1)_{, we}

have P | bβj,k− βj,k| ≥ κλn/2 ≤ 2(1 + 4e−2γ) 1 n4 θ .

4.2 Hard thresholding estimator

We define the hard thresholding estimator bf by

b f (x) = 2τ−1 X k=0 b ατ,kφτ,k(x) + j1 X j=τ 2j−1 X k=0 b βj,k1I{_{| b}_β_j,k_|≥κλ_n_}ψj,k(x), (4.3)

whereα_bτ,kis defined by (4.1) with j = τ , bβj,kby (4.2), j1is the integer satisfying

1 2nθ< 2

j1 _{≤ n}

θ,

κ ≥ 2 + 16/(3u) + 4p(1/u)(16/9u2_{+ 2) with u = (1/2)(c/8)}1/(θ+1)_and

λn= µ

r ln nθ

nθ

.

The feature of bf is to only estimate the “large” unknown wavelet coefficients of f because they are those which contain his main characteristics.

(7)

5 Results

Theorem 5.1. Consider (1.1) under the assumptions of Section 2. Let bf be (4.3). Suppose thatf ∈ Bs

p,r(M ) with r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1, 2) and s > 1/p}. Then there

exists a constantC > 0 such that

R( bf , f ) ≤ C ln nθ nθ

2s/(2s+1) , wherenθ= nθ/(θ+1).

The proof of Theorem 5.1 uses a suitable decomposition of the MISE with the results in Propositions 4.1, 4.2 and 4.3.

If we restrict our study to independent (X1, Y1), . . . , (Xn, Yn) i.e. θ → ∞, our rate

of convergence becomes (ln n/n)2s/(2s+1)_{which is the standard “near optimal” one in the}

minimax sense. See e.g. [22] and [13].

Note that the rate of convergence (ln nθ/nθ)2s/(2s+1)is also achieved by the abstract

minimum complexity regression estimator in [28, Theorem 2.1] but for a slightly different regression problem with more restrictions on (X1, Y1), . . . , (Xn, Yn), ξ1, . . . , ξn and f

(see [28, Section II. B.]).

6 Proofs

In this section, C represents a positive constant which may differ from one term to another.

Proof of Proposition 4.1. For any i ∈ {1, . . . , n}, set Wi,j,k=

Yi

g(Xi)

ψj,k(Xi).

Since X1and ξ1are independent and E(ξ1) = 0, we have

E(W1,j,k) = E _Y 1 g(X1) ψj,k(X1) = _E f (X1) g(X1) ψj,k(X1) + E(ξ1)E( 1 g(X1) ψj,k(X1)) = E f (X1) g(X1) ψj,k(X1) = Z 1 0 f (x) g(x)ψj,k(x)g(x)dx = Z 1 0 f (x)ψj,k(x)dx = βj,k.

Hence, since W1,j,k, . . . , Wn,j,kare identically distributed,

βj,k = E (W1,j,k) = E W1,j,k1I{|W1,j,k|≤γn} + E W1,j,k1I{|W1,j,k|>γn} = _E 1 n n X i=1 Wi,j,k1I{|Wi,j,k|≤γn} ! + E W1,j,k1I{|W1,j,k|>γn} . (6.1)

(8)

This with the elementary inequality (x + y)2_{≤ 2(x}2_{+ y}2 ), (x, y) ∈ R2_{, imply that} E ( bβj,k− βj,k)2 ≤ 2(A + B), (6.2) where A = E   1 n n X i=1

Wi,j,k1I{|Wi,j,k|≤γn}− E(Wi,j,k1I{|Wi,j,k|≤γn})

!2  and B = E |W1,j,k|1I{|W1,j,k|>γn} 2 . Let us bound B and A.

Upper bound forB. Since 1I{|W1,j,k|>γn}≤ |W1,j,k|/γn, we have

E |W1,j,k|1I{|W1,j,k|>γn} ≤

E(W1,j,k2 )

γn

.

Let us now bound E(W1,j,k2 ). Since X1and ξ1are independent and E(ξ1) = 0, we have

E(W1,j,k2 ) = E _Y2 1 g2_(X 1) ψ_j,k2 (X1) = E (f (X1) + ξ1) 2 g2_(X 1) ψ_j,k2 (X1) = E f2_(X 1) g2_(X 1) ψ2j,k(X1) + 2E(ξ1)E f (X1) g2_(X 1) ψ2j,k(X1) + _{E ξ}₁2 E ₁ g2_(X 1) ψ2_j,k(X1) = E f 2_(X 1) g2_(X 1) ψ2_j,k(X1) + E(ξ21)E ₁ g2_(X 1) ψ2_j,k(X1) . Using (2.2), (2.3) andR₀1ψ2 j,k(x)dx = 1, we obtain E f2_(X 1) g2_(X 1) ψ_j,k2 (X1) ≤ C_∗2_E ₁ g2_(X 1) ψ2_j,k(X1) = C_∗2 Z 1 0 1 g2_(x)ψ 2 j,k(x)g(x)dx = C 2 ∗ Z 1 0 1 g(x)ψ 2 j,k(x)dx ≤ C 2 ∗ c∗ Z 1 0 ψ2_j,k(x)dx = C 2 ∗ c∗ .

And, in a similar way,

E ₁ g2_(X 1) ψ2_j,k(X1) = Z 1 0 1 g2_(x)ψ 2 j,k(x)g(x)dx = Z 1 0 1 g(x)ψ 2 j,k(x)dx ≤ 1 c∗ Z 1 0 ψ2_j,k(x)dx = 1 c∗ .

(9)

So E(W1,j,k2 ) ≤ 1 c∗ (C_∗2_{+ E(ξ}2₁)) = µ2= C. (6.3) Therefore B = E |W1,j,k|1I{|W1,j,k|>γn} 2 ≤ E(W 2 1,j,k) γn !2 ≤ C 1 γ2 n ≤ Cln nθ nθ . (6.4)

Upper bound forA. We have

A = _V 1 n n X i=1 Wi,j,k1I{|Wi,j,k|≤γn} ! = 1 n2 n X v=1 n X `=1 C Wv,j,k1I{|Wv,j,k|≤γn}, W`,j,k1I{|W`,j,k|≤γn} = 1 nV(W1,j,k1I{|W1,j,k|≤γn}) + 2 n2 n X v=2 v−1 X `=1 C Wv,j,k1I{|Wv,j,k|≤γn}, W`,j,k1I{|W`,j,k|≤γn} ≤ 1 nV(W1,j,k1I{|W1,j,k|≤γn}) + 2 n2 n X v=2 v−1 X `=1 C Wv,j,k1I{|Wv,j,k|≤γn}, W`,j,k1I{|W`,j,k|≤γn} . (6.5) Using (6.3), we obtain V(W1,j,k1I{|W1,j,k|≤γn}) ≤ E(W 2 1,j,k1I{|W1,j,k|≤γn}) ≤ E(W 2 1,j,k) ≤ µ2= C. (6.6)

It follows from the stationarity of (Xi, Yi)i∈Zthat

n X v=2 v−1 X `=1 C Wv,j,k1I{|Wv,j,k|≤γn}, W`,j,k1I{|W`,j,k|≤γn} = n X m=1 (n − m)C W0,j,k1I{|W0,j,k|≤γn}, Wm,j,k1I{|Wm,j,k|≤γn} ≤ n n X m=1 C W0,j,k1I{|W0,j,k|≤γn}, Wm,j,k1I{|Wm,j,k|≤γn} .

By the Davydov inequality for strongly mixing processes (see [12]), the inequality |W0,j,k|1I{|W0,j,k|≤γn}≤ max(γn, |W0,j,k|) and (6.3), for any q ∈ (0, 1), we have

_{C W}0,j,k1I{|W0,j,k|≤γn}, Wm,j,k1I{|Wm,j,k|≤γn} ≤ 10aq_mE |W0,j,k|2/(1−q)1I{|W0,j,k|≤γn} 1−q = 10aq_mE |W0,j,k|2q/(1−q)1I{|W0,j,k|≤γn}W 2 0,j,k1I{|W0,j,k|≤γn} 1−q ≤ Caq_mγ_n2q/(1−q) 1−q E W0,j,k2 1−q ≤ Caq m _n θ ln nθ q .

(10)

Due to (2.1), we havePn m=1aqm≤ γ P∞ m=1exp −cq|m|θ = C. So n X v=2 v−1 X `=1 C Wv,j,k1I{|Wv,j,k|≤γn}, W`,j,k1I{|W`,j,k|≤γn} ≤ Cn _n θ ln nθ q . (6.7)

Taking q = 1/θ, we have nq_θ/n = 1/nθ. It follows from (6.5), (6.6) and (6.7) that

A ≤ C 1 n+ 1 n2n _n θ ln nθ q ≤ Cn q θ n = C 1 nθ ≤ Cln nθ nθ . (6.8)

It follows from (6.2), (6.4) and (6.8) that

E ( bβj,k− βj,k)2 ≤ Cln nθ nθ . (6.9)

Replacing φ instead of ψ in the previous proof, one can show that (6.9) holds with αj,k=

R1

0 f (x)φj,k(x)dx instead of βj,kandαbj,kdefined by (4.1) instead of bβj,k. The proof of Proposition 4.1 is complete.

Proof of Proposition 4.2. We have

It follows from (2.2), the Cauchy-Schwarz inequality andR₀1ψ2

j,k(x)dx = 1 that |βj,k| ≤ Z 1 0 |f (x)||ψj,k(x)|dx ≤ C∗ Z 1 0 |ψj,k(x)|dx ≤ C∗ Z 1 0 ψ_j,k2 (x)dx 1/2 = C∗≤ C r _n θ ln nθ . Therefore | bβj,k− βj,k| ≤ C r _n θ ln nθ . (6.10)

By (6.10) and Proposition 4.1, we have

E ( bβj,k− βj,k)4 ≤ C nθ ln nθE ( bβj,k− βj,k)2 ≤ C nθ ln nθ ln nθ nθ = C.

(11)

Proof of Proposition 4.3. For any i ∈ {1, . . . , n}, set Wi,j,k= Yi g(Xi) ψj,k(Xi). By (6.1), we have | bβj,k− βj,k| = 1 n n X i=1

Wi,j,k1I{|Wi,j,k|≤γn}− E Wi,j,k1I{|Wi,j,k|≤γn} − E W1,j,k1I{|W1,j,k|>γn}

≤ 1 n n X i=1

Wi,j,k1I{|Wi,j,k|≤γn}− E Wi,j,k1I{|Wi,j,k|≤γn}

+ E |W1,j,k|1I{|W1,j,k|>γn} . Using (6.3), we obtain E |W1,j,k|1I{|W1,j,k|>γn} ≤ E(W1,j,k2 ) γn ≤ µ21 µ r ln nθ nθ = λn. Hence P | bβj,k− βj,k| ≥ κλn/2 ≤ _P 1 wn n X i=1

Wi,j,k1I{|Wi,j,k|≤γn}− E Wi,j,k1I{|Wi,j,k|≤γn}

≥ (κ/2 − 1)λn ! . (6.11) Let us now present a Bernstein inequality for exponentially strongly mixing process. This is a slightly modified version of [28, Theorem 4.2].

Lemma 6.1 ( [28]). Let γ > 0, c > 0, θ > 1 and (Si)i∈Z be a stationary process such

that, for anym ∈ Z, the associated m-th strongly mixing coefficient satisfies am≤ γexp(−c|m|θ).

Letn ∈ N∗_,_{h : R → R be a measurable function and, for any i ∈ Z, U}

i = h(Si). We

assume that E(U1) = 0 and there exists a constant M > 0 satisfying |U1| ≤ M < ∞.

Then, for anyλ > 0, we have

P 1 n n X i=1 Ui ≥ λ ! ≤ 2(1 + 4e−2γ) exp − uλ 2_nθ/(θ+1) 2 (E (U2 1) + λM /3) , whereu = (1/2)(c/8)1/(θ+1)_.

Set, for any i ∈ {1, . . . , n},

(12)

Then U1,j,k, . . . , Un,j,kare identically distributed, depend on the stationary strongly

mix-ing process (Xi, Yi)i∈Zwhich satisfies (2.1), E(U1,j,k) = 0,

|U1,j,k| ≤ |W1,j,k| 1I{|W1,j,k|≤γn}+ E |W1,j,k|1I{|W1,j,k|≤γn} ≤ 2γn and, by (6.3), E(U1,j,k2 ) = V W1,j,k1I{|W1,j,k|≤γn} ≤ E(W 2 1,j,k) ≤ µ 2_.

It follows from Lemma 6.1 that

P 1 n n X i=1 Ui,j,k ≥ (κ/2 − 1)λn ! ≤ 2(1 + 4e−2γ) exp − u(κ/2 − 1) 2_λ2 nnθ 2 (µ2_{+ 2(κ/2 − 1)λ} nγn/3) . (6.12) We have λnγn = µ r ln nθ nθ µ r _n θ ln nθ = µ2, λ2_n= µ2ln nθ nθ .

Combining (6.11) and (6.12), for any κ ≥ 2 + 16/(3u) + 4p(1/u)(16/9u2_{+ 2), we have}

P | bβj,k− βj,k| ≥ κλn/2 ≤ 2(1 + 4e−2γ) exp − u(κ/2 − 1) 2_{ln n} θ 2 (1 + 2(κ/2 − 1)/3) = 2(1 + 4e−2γ)n− u(κ/2−1)2 2(1+2(κ/2−1)/3) θ ≤ 2(1 + 4e −2_γ) 1 n4 θ . This ends the proof of Proposition 4.3.

Proof of Theorem 5.1. We expand the function f on B as

f (x) = 2τ−1 X k=0 ατ,kφτ,k(x) + ∞ X j=τ 2j−1 X k=0 βj,kψj,k(x), x ∈ [0, 1], where ατ,k= Z 1 0 f (x)φτ,k(x)dx, βj,k= Z 1 0 f (x)ψj,k(x)dx.

We have, for any x ∈ [0, 1], b f (x) − f (x) = 2τ−1 X k=0 (α_bτ,k− ατ,k)φτ,k(x) + j1 X j=τ 2j−1 X k=0 b βj,k1I{_{| b}_β_j,k_|≥κλ_n_{} − β}j,k ψj,k(x) − ∞ X j=j1+1 2j−1 X k=0 βj,kψj,k(x).

(13)

Since B is an orthonormal basis of L2_{([0, 1]), we can write} R( b_{f , f ) = E} Z 1 0 ( bf (x) − f (x))2dx = R + S + T, (6.13) where R = 2τ₋₁ X k=0 E (αbτ,k− ατ,k) 2_, _{S =} j1 X j=τ 2j₋₁ X k=0 E ( bβj,k1I{_{| b}_β j,k|≥κλn} − βj,k) 2 and T = ∞ X j=j1+1 2j₋₁ X k=0 β_j,k2 .

Let us bound R, T and S.

Upper bound forR. Using Proposition 4.1 and 2s/(2s + 1) < 1, we obtain

R ≤ C2τln nθ nθ ≤ C ln nθ nθ 2s/(2s+1) . (6.14) Upper bound for T . For r ≥ 1 and p ≥ 2, we have Bs

p,r(M ) ⊆ B2,∞s (M ). Since 2s/(2s + 1) < 2s, we have T ≤ C ∞ X j=j1+1 2−2js≤ C2−2j1s_{≤ Cn}−2s θ ≤ C ln nθ nθ 2s ≤ C ln nθ nθ 2s/(2s+1) .

For r ≥ 1 and p ∈ [1, 2), we have Bp,rs (M ) ⊆ B

s+1/2−1/p 2,∞ (M ). Since s > 1/p, we have s + 1/2 − 1/p > s/(2s + 1). So T ≤ C ∞ X j=j1+1 2−2j(s+1/2−1/p)≤ C2−2j1(s+1/2−1/p) ≤ Cn−2(s+1/2−1/p)_θ ≤ C ln nθ nθ 2(s+1/2−1/p) ≤ C ln nθ nθ 2s/(2s+1) . Hence, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1, 2) and s > 1/p}, we have

T ≤ C ln nθ nθ

2s/(2s+1)

. (6.15)

Upper bound forS. We have

S = S1+ S2+ S3+ S4, (6.16) where S1= j1 X j=τ 2j−1 X k=0 E ( bβj,k− βj,k)21I{_{| b}βj,k|≥κλn}1I{|βj,k|<κλn/2} ,

(14)

S2= j1 X j=τ 2j₋₁ X k=0 E ( bβj,k− βj,k)21I{_{| b}_β_j,k_|≥κλ_n_}1I{|βj,k|≥κλn/2} , S3= j1 X j=τ 2j−1 X k=0 E β_j,k2 _1I{_{| b}_β j,k|<κλn}1I{|βj,k|≥2κλn} and S4= j1 X j=τ 2j₋₁ X k=0 E βj,k2 1I{| bβj,k|<κλn}1I{|βj,k|<2κλn} .

Let us investigate the bounds of S1, S2, S3and S4.

Upper bounds forS1andS3.We have

n | bβj,k| < κλn, |βj,k| ≥ 2κλn o ⊆n| bβj,k− βj,k| > κλn/2 o , n | bβj,k| ≥ κλn, |βj,k| < κλn/2 o ⊆n| bβj,k− βj,k| > κλn/2 o and n | bβj,k| < κλn, |βj,k| ≥ 2κλn o ⊆n|βj,k| ≤ 2| bβj,k− βj,k| o . So max(S1, S3) ≤ C j1 X j=τ 2j₋₁ X k=0 E ( bβj,k− βj,k)21I{_{| b}_β j,k−βj,k|>κλn/2} .

It follows from the Cauchy-Schwarz inequality, Propositions 4.2 and 4.3 that E ( bβj,k− βj,k)21I{_{| b}_β j,k−βj,k|>κλn/2} ≤ E ( bβj,k− βj,k)4 1/2 P | bβj,k− βj,k| > κλn/2 1/2 ≤ C 1 n4 θ 1/2 = C 1 n2 θ . Since 2s/(2s + 1) < 1, we have max(S1, S3) ≤ C 1 n2_θ j1 X j=τ 2j≤ C 1 n2_θ2 j1_{≤ C} 1 nθ ≤ C ln nθ nθ 2s/(2s+1) . (6.17)

Upper bound forS2.Using again Proposition 4.1, we obtain

E ( bβj,k− βj,k)2 ≤ Cln nθ nθ . Hence S2≤ C ln nθ nθ j1 X j=τ 2j−1 X k=0 1I{|βj,k|>κλn/2}.

(15)

Let j2be the integer defined by 1 2 _n θ ln nθ 1/(2s+1) < 2j2 _≤ _n θ ln nθ 1/(2s+1) . (6.18) We have S2≤ S2,1+ S2,2, where S2,1= C ln nθ nθ j2 X j=τ 2j₋₁ X k=0 1I{|βj,k|>κλn/2} and S2,2 = C ln nθ nθ j1 X j=j2+1 2j−1 X k=0 1I{|βj,k|>κλn/2}. We have S2,1≤ C ln nθ nθ j2 X j=τ 2j ≤ Cln nθ nθ 2j2 _{≤ C} ln nθ nθ 2s/(2s+1) .

For r ≥ 1 and p ≥ 2, since Bp,rs (M ) ⊆ B2,∞s (M ), we have

S2,2 ≤ C ln nθ nθλ2n j1 X j=j2+1 2j−1 X k=0 β_j,k2 ≤ C ∞ X j=j2+1 2j−1 X k=0 β_j,k2 ≤ C2−2j2s ≤ C ln nθ nθ 2s/(2s+1) .

For r ≥ 1, p ∈ [1, 2) and s > 1/p, using 1I{|βj,k|>κλn/2} ≤ C|βj,k|

p_/λp n, Bp,rs (M ) ⊆ B_2,∞s+1/2−1/p(M ) and (2s + 1)(2 − p)/2 + (s + 1/2 − 1/p)p = 2s, we have S2,2 ≤ C ln nθ nθλ p n j1 X j=j2+1 2j₋₁ X k=0 |βj,k|p≤ C ln nθ nθ (2−p)/2 ∞ X j=j2+1 2−j(s+1/2−1/p)p ≤ C ln nθ nθ (2−p)/2 2−j2(s+1/2−1/p)p_{≤ C} ln nθ nθ 2s/(2s+1) . So, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1, 2) and s > 1/p}, we have

S2≤ C

ln nθ

nθ

2s/(2s+1)

. (6.19)

Upper bound forS4.We have

S4≤ j1 X j=τ 2j−1 X k=0 β_j,k2 1I{|βj,k|<2κλn}.

(16)

Let j2be the integer (6.18). Then S4≤ S4,1+ S4,2, where S4,1= j2 X j=τ 2j₋₁ X k=0 β_j,k2 1I{|βj,k|<2κλn}, S4,2= j1 X j=j2+1 2j₋₁ X k=0 β_j,k2 1I{|βj,k|<2κλn}. We have S4,1≤ C j2 X j=τ 2jλ2_n= Cln nθ nθ j2 X j=τ 2j≤ Cln nθ nθ 2j2 _{≤ C} ln nθ nθ 2s/(2s+1) .

For r ≥ 1 and p ≥ 2, since Bs

p,r(M ) ⊆ B2,∞s (M ), we have S4,2≤ ∞ X j=j2+1 2j−1 X k=0 β2_j,k≤ C2−2j2s_{≤ C} ln nθ nθ 2s/(2s+1) .

For r ≥ 1, p ∈ [1, 2) and s > 1/p, using β_j,k2 1I{|βj,k|<2κλn}≤ Cλ

2−p n |βj,k|p, Bp,rs (M ) ⊆ B_2,∞s+1/2−1/p(M ) and (2s + 1)(2 − p)/2 + (s + 1/2 − 1/p)p = 2s, we have S4,2 ≤ Cλ2−pn j1 X j=j2+1 2j−1 X k=0 |βj,k|p= C ln nθ nθ (2−p)/2 j1 X j=j2+1 2j−1 X k=0 |βj,k|p ≤ C ln nθ nθ (2−p)/2 ∞ X j=j2+1 2−j(s+1/2−1/p)p≤ C ln nθ nθ (2−p)/2 2−j2(s+1/2−1/p)p ≤ C ln nθ nθ 2s/(2s+1) .

So, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1, 2) and s > 1/p}, we have

S4≤ C

ln nθ

nθ

2s/(2s+1)

. (6.20)

It follows from (6.16), (6.17), (6.19) and (6.20) that

S ≤ C ln nθ nθ

2s/(2s+1)

. (6.21)

Combining (6.13), (6.14), (6.15) and (6.21), we have, for r ≥ 1, {p ≥ 2 and s > 0} or {p ∈ [1, 2) and s > 1/p},

R( bf , f ) ≤ C ln nθ nθ

2s/(2s+1) . The proof of Theorem 5.1 is complete.

(17)

References

[1] A. Antoniadis, (1997). Wavelets in statistics: a review (with discussion), Journal of the Italian Statistical SocietySeries B, 6, 97-144.

[2] A. Antoniadis, J. Bigot and T. Sapatinas, (2001). Wavelet estimators in nonparametric regression: a comparative simulation study, Journal of Statistical Software, 6, i06. [3] A. Antoniadis, D. Leporini and J.-C. Pequet, (2002). Wavelet thresholding for some

classes of non-Gaussian noise, Statistica Neerlandica, 56, 4, 434-453.

[4] R. Averkamp and C. Houdr´e, (2005). Wavelet thresholding for non-necessarily Gaus-sian noise: functionality, The Annals of Statistics, 33, 2164-2193.

[5] T. Cai, (1999). Adaptive wavelet estimation: a block thresholding and oracle inequal-ity approach, The Annals of Statistics, 27, 898-924.

[6] T. Cai, (2002). On block thresholding in wavelet regression: adaptivity, block size and threshold level, Statistica Sinica, 12, 1241-1273.

[7] T. Cai and L. D. Brown, (1998). Wavelet shrinkage for nonequispaced samples, The Annals of Statistics, 26, 1783-1799.

[8] T. Cai and L. D. Brown, (1999). Wavelet estimation for samples with random uniform design, Statistics and Probability Letters, 42, 313-321.

[9] C. Chesneau, (2007). Wavelet block thresholding for samples with random design: a minimax approach under the Lp_{risk, Electronic Journal of Statistics, 1, 331-346.}

[10] E. Chicken, (2003). Block thresholding and wavelet estimation for nonequispaced samples, Journal of Statistical Planning and Inference, 116, 113-129.

[11] A. Cohen, I. Daubechies, B. Jawerth and P. Vial, (1993). Wavelets on the interval and fast wavelet transforms, Applied and Computational Harmonic Analysis, 24, 1, 54-81.

[12] Y. Davydov, (1970). The invariance principle for stationary processes. Theor. Probab. Appl., 15, 3, 498-509.

[13] B. Delyon and A. Juditsky, (1996). On minimax wavelet estimators, Applied Compu-tational Harmonic Analysis, 3, 215–228.

[14] D. L. Donoho and I. M. Johnstone, (1994). Ideal spatial adaptation by wavelet shrink-age, Biometrika 81, 425-455.

[15] D. L. Donoho and I. M. Johnstone, (1995). Adapting to unknown smoothness via wavelet shrinkage, Journal of the American Statistical Association, 90, 432, 1200-1224.

[16] D. L. Donoho and I. M. Johnstone, (1998). Minimax estimation via wavelet shrinkage, The Annals of Statistics, 26, 879-921.

[17] D. L. Donoho, I. M. Johnstone, G. Kerkyacharian and D. Picard, (1995). Wavelet shrinkage: asymptopia (with discussion), Journal of the Royal Statistical Society Se-rie B, 57, 301-369.

[18] H. Doosti, M. Afshari and H.A. Niroumand, (2008). Wavelets for nonparametric stochastic regression with mixing stochastic process. Commun. Stat., Theory Meth-ods, 37, 3, 373-385.

(18)

[19] H. Doosti and H.A. Niroumand, (2009). Multivariate Stochastic Regression Estima-tion by Wavelets for StaEstima-tionary Time Series, Pakistan journal of Statistics, 27, 1, 37-46.

[20] P. Doukhan, (1994). Mixing. Properties and Examples. Lecture Notes in Statistics 85. Springer Verlag, New York.

[21] P. Hall and B. A. Turlach, (1997). Interpolation methods for nonlinear wavelet regres-sion with irregularly spaced design, The Annals of Statistics, 25, 1912-1925.

[22] W. H¨ardle, G. Kerkyacharian, D. Picard and A. Tsybakov, (1998). Wavelet, Approx-imation and Statistical Applications, Lectures Notes in Statistics New York 129, Springer Verlag.

[23] G. Kerkyacharian and D. Picard, (2004). Regression in random design and warped wavelets, Bernoulli, 10, (6), 1053-1105.

[24] R. Kulik and M. Raimondo, (2009). Lpwavelet regression with correlated errors and inverse problems, Statistica Sinica, 19, 1479-1489.

[25] L. Li, J. Liu and Y. Xiao, (2009). On wavelet regression with long memory infinite moving average errors, Journal of Applied Probability and Statistics, 4, 183-211. [26] E. Masry, (2000). Wavelet-Based estimation of multivariate regression functions in

besov spaces, Journal of Nonparametric Statistics, 12, 2, 283-308. [27] Y. Meyer, (1990). Ondelettes et Op´erateurs, Hermann, Paris.

[28] D. Modha and E. Masry, (1996). Minimum complexity regression estimation with weakly dependent observations. IEEE Trans. Inform. Theory, 42, 2133-2145. [29] M. Neumann and V. Spokoiny, (1995). On the efficiency of wavelet estimators under

arbitrary error distributions, Mathematical Methods of Statistics, 4, 137-166.

[30] M. Pensky and T. Sapatinas, (2007). Frequentist optimality of Bayes factor estimators in wavelet regression models, Statistica Sinica, 17, 599-633.

[31] A. Tsybakov, (2004). Introduction `a l’estimation nonparam´etrique. Springer Verlag, Berlin.

[32] C.S. Withers, (1981). Conditions for linear processes to be strong-mixing. Zeitschrift f¨ur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 57, 477-480.

[33] S. Zhang and Z. Zheng, (1999). Nonlinear wavelet estimation of regression function with random design. Science in China Series A: Mathematics, 42, 8, 825-833.