Adaptive nonparametric estimation in heteroscedastic regression models. Part 1: Sharp non-asymptotic Oracle inequalities.

(1)

HAL Id: hal-00269196

https://hal.archives-ouvertes.fr/hal-00269196

Submitted on 10 Apr 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive nonparametric estimation in heteroscedastic regression models. Part 1: Sharp non-asymptotic Oracle

inequalities.

Leonid Galtchouk, Serguey Pergamenshchikov

To cite this version:

Leonid Galtchouk, Serguey Pergamenshchikov. Adaptive nonparametric estimation in heteroscedastic regression models. Part 1: Sharp non-asymptotic Oracle inequalities.. Journal of Nonparametric Statistics, American Statistical Association, 2009, 21 (1), pp.1-16. �10.1080/10485250802504096�.

�hal-00269196�

(2)

hal-00269196, version 2 - 10 Apr 2008

Adaptive nonparametric estimation in heteroscedastic regression models.

Part 1: Sharp non-asymptotic oracle inequalities. ^∗

L. Galtchouk ^† S. Pergamenshchikov ^‡

Abstract

An adaptive nonparametric estimation procedure is constructed for heteroscedastic regression when the noise variance depends on the unknown regression. A non-asymptotic upper bound for a quadratic risk (oracle inequality) is obtained.

Keywords: Adaptive estimation; Heteroscedastic regression; Nonasymptotic estimation; Nonparametric estimation; Oracle inequality.

AMS 2000 Subject Classifications: Primary: 62G08; Secondary: 62G05, 62G20

∗

The second author is partially supported by the RFFI-Grant 04-01-00855.

†

Department of Mathematics, Strasbourg University 7, rue Rene Descartes, 67084, Strasbourg, France, e-mail: [email protected]

‡

Laboratoire de Mathématiques Raphael Salem, Avenue de l’Université, BP. 12, Université de Rouen, F76801, Saint Etienne du Rouvray, Cedex France, e-mail:

[email protected]

(3)

1 Introduction

Suppose we are given observations (y

j

)

_1≤j≤n

which obey the heteroscedastic regression equation

y

j

= S(x

j

) + σ

j

(S)ξ

j

, (1.1) where design points x

_j

= j/n, S( · ) is an unknown function to be estimated, (ξ

j

)

_1≤j≤n

is a sequence of i.i.d. random variables, (σ

j

(S))

_1≤j≤n

are unknown volatility coefficients depending on unknown regression function S.

The models of type (1.1) with σ

_j

(S) = σ

_j

(x

_j

) were introduced in Akri- tas, Van Keilegom (2001) as a generalisation of the nonparametric ANCOVA model of Young and Bowman (1995). It should be noted that heteroscedastic regressions with this type of volatility coefficients have been encountered in econometric studies, namely, in consumer budget studies utilizing observations on individuals with diverse incomes and in analyses of the investment behavior of firms of different sizes (see Goldfeld, Quandt, 1972). For example, for consumer budget problems one uses there (see p. 83) some parametric version of model (1.1) with the volatility coefficient defined as

σ

²_j

(S) = c

₀

+ c

₁

x

_j

+ c

₂

S

²

(x

_j

) , (1.2) where c

₀

, c

₁

and c

₂

are some positive unknown constants.

Moreover, this regression model appears in the drift estimation problem for stochastic differential equations when one passes from continuous time to discrete time model by making use of sequential kernel estimators having asymptotically minimal variances (see Galtchouk, Pergamenshchikov, 2004;

2006; 2007a; 2007b).

The volatility coefficient estimation in heteroscedastic regression was con- sidered in a few papers (see, for example, Cai,Wang, 2008 and the references therein). By making use of the squared first-order differences of the observations the initial problem in that paper was reduced to the regression function estimation in the model of type (1.1).

In this paper we develop the approach proposed in Galtchouk, Perga-

menshchikov (2005). The first goal of the research is to construct an adap-

tive procedure based on observations (y

_j

)

_1≤j≤n

for estimating the function

S and to obtain a sharp non-asymptotic upper bound (oracle inequality) for

a quadratic risk in the case when the smoothness of S is unknown. The

second goal is to prove that the constructed procedure is efficient also in the

asymptotic setup.

(4)

Problems of constructing a nonparametric estimator and proving a non- asymptotic upper bound for a risk in homoscedastic model, that is when σ

j

(S) ≡ σ, were studied in few papers. A non-asymptotic upper bound for a quadratic risk over thresholding estimators is given in Kalifa, Mallat (2003). In papers by Barron, Birg´e, Massart (1999), Massart (2004) an adaptive model selection procedure has been constructed. It is based on least squares estimators and a non-asymptotic upper bound has been obtained for a quadratic risk which is best in the principal term for the given class of estimators when the noise vector (ξ

1

. . . , ξ

n

) is gaussian. This type of upper bounds is called the oracle inequality. In Fourdrinier, Pergamenshchikov (2007) the oracle inequality has been obtained for a model selection procedure based on any estimators in the case when the noise vector (ξ

1

, . . . , ξ

n

) has a spherically symmetric distribution. Moreover, some sharp oracle inequalities have been obtained also for homoscedastic regression with gaussian noises, see, for example, Kneip (1994). Here the adjective ”sharp” means that the coefficient of the principal term may be chosen as close to unity as desired.

In the paper for heteroscedastic regression an adaptive procedure is constructed for which the sharp non-asymptotic oracle inequality is proved. It should be noted that the methods used in former papers to obtain the sharp oracle inequality in regression models are limited by the homoscedastic case since they are based on the fact that an orthogonal transformation of a noise gaussian vector (ξ

₁

, . . . , ξ

_n

) gives a gaussian vector. In heteroscedastic regression models under consideration these methods are not valid since the noise vector is not gaussian. To obtain sharp non-asymptotic oracle inequalities in the heteroscedastic case the authors develop a new mathematical tools based on ”penalty” methods and Pinsker’s type weights.

Moreover, in Galtchouk, Pergamenshchikov (2007c) we show that the given adaptive estimator is efficient in the asymptotic sense, that is, the sharp asymptotic lower bound is proved for a quadratic risk and it is attained over this estimator.

The paper is organized as follows. In Section 2 we construct an adaptive

estimation procedure based on weighted least squares estimators and we ob-

tain a non-asymptotic upper bound for the quadratic risk. In Section 3 we

propose an estimator for the summarized noise variance and give the oracle

inequality in the case of Sobolev space, S ∈ W

_r^k

. The proofs are given in

Section 4. The Appendix contains some technical results.

(5)

2 Oracle inequality

In this paper we study the non-asymptotic estimation problem of the function S in the model (1.1) by observations (y

_j

)

_1≤j≤n

with odd sample number n.

We assume that in (1.1) the sequence (ξ

j

)

_1≤j≤n

is i.i.d. with

E ξ

1

= 0 , E ξ

²₁

= 1 and E ξ

₁⁴

= ξ

^∗

< ∞ . (2.1) In the sequel we denote by ξ = √

ξ

^∗

− 1.

Moreover, we assume that (σ

l

(S))

_1≤l≤n

is a sequence of positive random variables independent of (ξ

i

)

_1≤i≤n

and bounded away from + ∞ , i.e. there exists some nonrandom unknown constant σ

∗

≥ 1 such that

1≤l≤n

max σ

_l²

(S) ≤ σ

∗

. (2.2)

For any estimate S b

n

of S based on observations (y

j

)

_1≤j≤n

, the estimation accuracy is measured by the mean integrated squared error (MISE)

E

_S

k S b

n

− S k

²n

, (2.3) where

k S b

_n

− S k

²n

= ( S b

_n

− S, S b

_n

− S)

n

= 1 n

X

n l=1

( S b

_n

(x

l

) − S(x

l

))

²

. We make use of the trigonometric basis (φ

j

)

_j≥1

in L

²

[0, 1] with

φ

₁

= 1 , φ

_j

(x) = √

2 T r

_j

(2π[j/2]x) , j ≥ 2 , (2.4) where the function T r

_j

(x) = cos(x) for even j and T r

_j

(x) = sin(x) for odd j ; [x] denotes the integer part of x. Note that if n is odd, then this basis is orthonormal for the empirical inner product generated by the sieve (x

_j

)

_1≤j≤n

, that is for any 1 ≤ i, j ≤ n,

(φ

_i

, φ

_j

)

_n

= 1 n

X

n l=1

φ

_i

(x

_l

)φ

_j

(x

_l

) = Kr

_ij

, (2.5)

where Kr

_ij

is Kronecker’s symbol.

(6)

By making use of this basis we define the discrete Fourier transformation in (1.1) and obtain the Fourier coefficients

θ b

_j,n

= (Y, φ

j

)

n

and θ

_j,n

= (S, φ

j

)

n

. (2.6) Here Y = (y

₁

, . . . , y

_n

)

^′

and S = (S(x

₁

), . . . , S (x

_n

))

^′

. The prime denotes the transposition.

¿From (1.1) it follows directly that these Fourier coefficients satisfy the following equation

θ b

_j,n

= θ

_j,n

+ 1

√ n ξ

_j,n

(2.7)

with

ξ

_j,n

= 1

√ n X

n

l=1

σ

l

(S)ξ

l

φ

j

(x

l

) .

We estimate the function S by the weighted least squares estimator S b

_λ

(x) =

X

n j=1

λ(j ) b θ

_j,n

φ

_j

(x) , (2.8) where x ∈ [0, 1], the weight vector λ = (λ(1), . . . , λ(n))

^′

belongs to some finite set Λ from [0, 1]

ⁿ

. We denote by ν the cardinal number of the set Λ.

Moreover, we set

̺

_n

= max

λ∈Λ

X

n j=1

λ(j ) and ̺

_i,n

= max

λ∈Λ

sup

0≤x≤1

| X

n

j=1

λ

ⁱ

(j)φ

_j

(x) | , (2.9)

where φ

_j

= φ

²_j

− 1 and i = 1, 2.

Now we need to write a cost function to choose a weight λ ∈ Λ. Of course, it is obvious, that the best way is to minimize the cost function which is equal to the empirical squared error

Err

_n

(λ) = k S b

_λ

− S k

²_n

, which in our case is equal to

Err

_n

(λ) = X

n

j=1

λ

²

(j ) b θ

²_j,n

− 2 X

n j=1

λ(j ) θ b

_j,n

θ

_j,n

+ X

n

j=1

θ

²_j,n

. (2.10)

(7)

Since coefficients θ

_j,n

are unknown, we need to replace the term θ b

_j,n

θ

_j,n

by some estimator which we choose as

θ e

_j,n

= θ b

_j,n²

− 1 n ς b

_n

,

where b ς

_n

is some estimator of the summarized noise variance ς

_n

= n

⁻¹

X

n l=1

σ

_l²

(S) . (2.11)

Such type of estimators is given in (3.5).

Moreover, for this substitution to the empirical squared error one needs to pay a penalty. Finally, we define the cost function by the following way

J

_n

(λ) = X

n

j=1

λ

²

(j ) θ b

²_j,n

− 2 X

n

j=1

λ(j) θ e

_j,n

+ ρ P b

_n

(λ) , (2.12) where ρ is some positive coefficient which will be chosen later. The penalty term we define as

P b

_n

(λ) = | λ |

²

ς b

_n

n with | λ |

²

= X

n

j=1

λ

²

(j ) . (2.13) Note that in the case when the sequence (σ

_l

(S))

_1≤l≤n

is known, i.e. b ς

_n

= ς

_n

, we obtain

P

_n

(λ) = | λ |

²

ς

_n

n . (2.14)

We set

b λ = argmin

_λ∈Λ

J

n

(λ) (2.15) and define an estimator of S as

S b

_∗

= S b

_b_λ

. (2.16)

We recall that the set Λ is finite so λ b exists. In the case when λ b is not unique we take one of them.

To formulate the oracle inequality we introduce, for 0 < ρ < 1/3, the following function

Ψ

_n

(ρ) = ρ(1 − ρ)Υ

^∗_n

(ρ) + 2ν + 2ρ

²

(1 − ρ)̺

_2,n

ρ(1 − 3ρ) σ

_∗

(2.17)

(8)

with

Υ

^∗_n

(ρ) = 16ν

ρ + 4̺

_1,n

1 + ν ξ

√ n

+ 4ν̺

_n

ξ

√ n .

Theorem 2.1. Let Λ be any finite set in [0, 1]

ⁿ

. For any n ≥ 3 and 0 < ρ < 1/3, the estimator S b

_∗

satisfies the oracle inequality

E

_S

k S b

_∗

− S k

²_n

≤ (1 + κ(ρ)) min

λ∈Λ

E

_S

k S b

_λ

− S k

²_n

+ 1

n B

n

(ρ) , (2.18) where B

n

(ρ) = Ψ

_n

(ρ) + κ

_∗

(ρ)̺

_n

E

_S

|b ς

_n

− ς

_n

| with

κ(ρ) = 6ρ − 2ρ

²

1 − 3ρ and κ

_∗

(ρ) = 4 1 − ρ

²

1 − 3ρ .

If in model (1.1) the volatility coefficients (σ

_l

(S))

_1≤l≤n

are known, then ς b

_n

= ς

_n

and inequality (2.18) has the following form

E

_S

k S b

_∗

− S k

²_n

≤ (1 + κ(ρ)) min

λ∈Λ

E

_S

k S b

_λ

− S k

²_n

+ 1

n Ψ

_n

(ρ) . (2.19) Remark 2.1. Note that the principal term in the right-hand side of (2.18)- (2.19) is best in the class of estimators ( S b

_λ

, λ ∈ Λ). Inequalities of such type are called the sharp non-asymptotic oracle inequalities. The inequality is sharp in the sense that the coefficient of the principal term may be chosen as close to 1 as desired. Similar inequalities for homoscedastic models (1.1) with σ

l

(S) = σ were given, for example, in [13]. The methods used there cannot be extended to the heteroscedastic case since, after the Fourier transformation, the random variables (ξ

_i,n

) in model (2.7) are dependent contrary to the homoscedastic case, where these random variables are independent (see, for example, Rohde, 2004).

Remark 2.2. If one would like to obtain the asymptotically minimal MISE of the estimator S b

_∗

, then the secondary term B

n

(ρ) in (2.18) should be slowly varing. Indeed, since usually the optimal rate is of order n

^2k/(2k+1)

for some k ≥ 1, then after multiplying the inequality (2.18) by this rate the principal term gives the optimal constant and the secondary one should be of type that for any δ > 0

B

n

(ρ)

n

^δ

→ 0 as n → ∞ .

(9)

Due to the definitions Ψ

_n

(ρ) and B

n

(ρ), it should be, for any δ > 0, ρn

^δ

→ + ∞ , ̺

_n

E

_S

|b ς

_n

− ς

_n

|

n

^δ

→ 0 as n → ∞ .

One can take, for example, the parameter ρ tending to zero as n → ∞ like ρ = O

1 ln

^γ

n

(2.20) for some γ > 0. The choice of ̺

_n

and of the estimator ς b

_n

is proposed below.

Consider now the order of the termes ̺

_n

, ̺

_1,n

, ̺

_2,n

and the function Ψ

_n

(ρ) in the case when the finite set Λ is formed by a special version of Pinsker’s weights (see, for example, [15]). To this end, we define the sieve

A

ε

= { 1, . . . , k

∗

} × { t

1

, . . . , t

m

} ,

where t

i

= iε and m = [1/ε

²

]. We suppose that the parameters k

_∗

≥ 1 and 0 < ε ≤ 1 are functions of n such that,

 



lim

_n→∞

k

_∗

= + ∞ , lim

_n→∞ _ln^k^∗_n

= 0 , lim

_n→∞

ε = 0 and lim

_n→∞

n

^δ

ε = + ∞ ,

(2.21)

for any δ > 0. For example, one can take ε = 1/ ln n and k

_∗

= √

ln n for n ≥ 3.

For any α = (β, t) ∈ A

ε

we define the weight vector λ

_α

= (λ

_α

(1), . . . , λ

_α

(n))

^′

as

λ

_α

(j) = 1

_{1≤j≤j

0}

+ 1 − (j/ω

α

)

^β

1

_{j

0<j≤ω^α}

, (2.22) where j

0

= j

0

(α) = [ω

_α

/ ln n],

ω

_α

= (A

_β

t n)

^1/(2β+1)

and A

_β

= (β + 1)(2β + 1) π

^2β

β . Hence,

Λ = { λ

_α

, α ∈ A

ε

} (2.23)

and ν = k

_∗

m. Note that in this case in view of (2.21) for any δ > 0 lim

n→∞

ν

n

^δ

= 0 .

(10)

Moreover, by (2.22) X

n

j=1

λ

_α

(j ) = 1

_{j

0≥1}

j

₀

+ 1

_{ω

α≥1}

[ωα]

X

j=j0+1

1 − (j/ω

_α

)

^β

≤ ω

_α

. Therefore, taking into account that A

_β

≤ A

1

< 1 for β ≥ 1 we find that

̺

_n

≤ (n/ε)

^1/3

, i.e. for any δ > 0

n→∞

lim

̺

_n

n

^1/3+δ

= 0 . Moreover, note that for any x ∈ [0, 1], we get

X

n j=1

λ

_α

(j)φ

_j

(x) = 1

_{j

0≥1}

j0

X

j=1

φ

_j

(x)

+ 1

_{ω

α≥1}

[ωα]

X

j=j0+1

1 − (j/ω

_α

)

^β

φ

_j

(x) . Thus Lemma A.2 implies that

̺

_1,n

≤ 1 + 2

^β+1

≤ 1 + 2

^k^∗⁺¹

.

Due to the condition for k

_∗

in (2.21) this function is slowly varying, i.e. for any δ > 0,

n→∞

lim

̺

_1,n

n

^δ

= 0 . By the same way we obtain that

̺

_2,n

≤ 1 + 2

^k^∗⁺²

+ 2

^2k^∗⁺¹

and, therefore, for any δ > 0

n→∞

lim

̺

_2,n

n

^δ

= 0 .

Thus, if we choose the parameter ρ = ρ

n

as in (2.20) we obtain that in this case, for any δ > 0,

lim

n→∞

Ψ

_n

(ρ)

n

^δ

= 0 . (2.24)

(11)

3 Oracle inequality for S ∈ W _r ^k

Assume that S : R → R is a k times differentiable 1-periodic function such

that X

k

j=0

k S

^(j)

k

²

≤ r , (3.1)

where

k f k

²

= Z

1

0

f

²

(t)dt . (3.2)

We denote by W

_r^k

the set of all such functions. Moreover, we suppose that r > 0 and k ≥ 1 are unknown parameters.

Note that, the space W

_r^k

can be represented as an ellipses in the Hilbert space, i.e.

W

_r^k

= { S ∈ L

2

[0, 1] : S = X

∞

j=1

θ

_j

φ

_j

such that X

∞

j=1

a

_j

θ

_j²

≤ r } , (3.3) where the basis functions (φ

_j

)

_j≥1

are defined in (2.4); (θ

_j

)

_j≥1

are the Fourier coefficients, i.e.

θ

_j

= (S, φ

_j

) = Z

1

0

S(t)φ

_j

(t)dt . (3.4) The coefficients (a

_j

)

_j≥1

are defined as

a

_j

= X

k

l=0

k φ

^(l)_j

k

²

= X

k

l=0

(2π[j/2])

^2l

. To estimate ς

_n

, we make use of the following estimator:

b ς

_n

=

X

n j=mn+1

θ b

_j,n²

, (3.5)

where the parameter 1 ≤ m

_n

≤ n will be chosen later.

In Section 4 we show the following result.

Lemma 3.1. For any n ≥ 2 and r > 0, sup

S∈Wr¹

E

_S

|b ς

_n

− ς

_n

| ≤ σ + ς

_n^∗

(r)

√ n , (3.6)

(12)

where σ = 2 ξ + √ 2

σ

_∗

and ς

_n^∗

(r) = 4r √

n

m

²_n

+ 4 √ rσ

∗

1 m

_n

+ (2 + m

_n

)σ

∗

√ n . If we choose the parameter m

_n

in (3.5) such that

n→∞

lim m

_n

√ n = 0 and lim

n→∞

m

²_n

√ n = ∞ , (3.7)

we obtain that

n→∞

lim ς

_n^∗

(r) = 0 .

Theorem 2.1 and inequality (3.6) imply immediately the following result.

Theorem 3.2. Let Λ be any finite set in [0, 1]

ⁿ

. Assume that in the model (1.1) the function S belongs to W

_r¹

. Then, for any n ≥ 3 and 0 < ρ < 1/3, the procedure S b

_∗

from (2.16) with b ς

_n

defined by (3.5) and (3.7) satisfies the following oracle inequality

E

_S

k S b

_∗

− S k

²n

≤ (1 + κ(ρ)) min

λ∈Λ

E

_S

k S b

_λ

− S k

²n

+ 1

n D

n

(ρ, r) , (3.8) where

D

n

(ρ, r) = Ψ

_n

(ρ) + κ

_∗

(ρ) σ + ς

_n^∗

(r) ̺

_n

n .

If the set Λ is from (2.23), then for any δ > 0 and any 0 < ρ < 1/3

n→∞

lim

D

n

(ρ, r) n

^δ

= 0 .

4 Proofs

4.1 Proof of Theorem 2.1

First of all, note that we can represent the empirical squared error Err

_n

(λ) by the following way

Err

_n

(λ) = J

_n

(λ) + 2 X

n

j=1

λ(j )θ

^′_j,n

+ k S k

²_n

− ρ P b

_n

(λ) (4.1)

(13)

with θ

_j,n^′

= θ e

_j,n

− θ

_j,n

b θ

_j,n

. By setting ς

_j,n

= E

_S

ξ

_j,n²

= 1 n

X

n l=1

σ

²_l

(S)φ

²_j

(x

l

) , (4.2) we find that

θ

_j,n^′

= 1

√ n θ

_j,n

ξ

_j,n

+ 1

n ξ e

_j,n

+ 1 n δ e

_j,n

, where

ξ e

_j,n

= ξ

²_j,n

− ς

_j,n

and δ e

_j,n

= ς

_j,n

− ς b

_n

. (4.3) Note now that, we can represent ξ e

_j,n

as

ξ e

_j,n

= 1 n

X

n l=1

σ

²_l

(S)φ

²_j

(x

l

) ξ e

l

+ 2 X

n

l=2

τ

_j,l

ξ

l

= ξ e

_j,n^′

+ 2 ξ e

_j,n^′′

, (4.4)

where ξ e

l

= ξ

_l²

− 1 and τ

_j,l

= 1

n σ

l

(S)φ

_j

(x

l

) X

l−1

d=1

σ

_d

(S)φ

_j

(x

d

)ξ

d

. Now we set

N

₁

(λ) = X

n

j=1

λ(j ) ξ e

_j,n^′

and N

₂

(λ) = 1

√ nς

_n

X

n

j=1

λ(j) ξ e

_j,n^′′

1

_{ς

n>0}

, (4.5) where λ(j) = λ(j)/ | λ | . In the Appendix we show that

sup

λ∈Λ

E

_S

| N

₁

(λ) | ≤ ξσ

∗

(̺

_n

+ ̺

_1,n

) 1

√ n (4.6)

and

sup

λ∈Rⁿ

E

_S

(N

₂

(λ))

²

≤ 2σ

∗

/n . (4.7) Now, for any λ ∈ Λ, we rewrite (4.1) as

Err

_n

(λ) = J

_n

(λ) + 2

n N

₁

(λ) + 4 p

P

_n

(λ)N

₂

(λ) + 2M (λ) + 2

n ∆(λ) + e k S k

²_n

− ρ P b

_n

(λ) ,

(14)

where P

_n

(λ) is defined in (2.14),

∆(λ) = e X

n

j=1

λ(j) e δ

_j,n

and M (λ) = n

^−1/2

X

n

j=1

λ(j)θ

_j,n

ξ

_j,n

. (4.8)

We start with ∆(λ). Setting e

ς

_j,n

= ς

_j,n

− ς

_n

= 1 n

X

n l=1

σ

_l²

(S)φ

_j

(x

l

) , (4.9)

we obtain that

| ∆(λ) e | ≤ | X

n

j=1

λ(j)ς

_j,n

| + ̺

_n

|b ς

_n

− ς

_n

|

≤ σ

_∗

̺

_1,n

+ ̺

_n

|b ς

_n

− ς

_n

| . (4.10) Now from (4.1) we obtain that, for some fixed λ

0

∈ Λ,

Err

_n

( b λ) − Err

_n

(λ

₀

) = J ( λ) b − J (λ

₀

) + 2M ( ϑ) + b 2 n N

₁

( ϑ) b + 4

q

P

_n

( λ)N b

₂

( b λ) − 4 p

P

_n

(λ

₀

)N

₂

(λ

0

)

− ρ P b

_n

( b λ) + ρ P b

_n

(λ

₀

) + 2 n

∆( e b λ) − ∆(λ e

₀

) , where ϑ b = b λ − λ

₀

.

By the definition of λ b in (2.15) and by (4.10) we get Err

_n

( b λ) − Err

_n

(λ

₀

) ≤ 2M( ϑ) + b 4σ

∗

̺

_1,n

+ 4̺

_n

|b ς

_n

− ς

_n

|

n + 2

n N

₁

( ϑ) + 4 b q

P

_n

( b λ)N

₂

( b λ) − ρ P b

_n

( λ) b + ρ P b

_n

(λ

0

) − 4 p

P

_n

(λ

0

)N

₂

(λ

0

) . Moreover, making use of the inequality

2 | ab | ≤ εa

²

+ ε

⁻¹

b

²

(4.11)

(15)

with ε = ρ/4 and taking into account the definition of penalty term in (2.13) we deduce, for any λ ∈ Λ,

4 p

P

_n

(λ) | N

₂

(λ) | ≤ ρP

_n

(λ) + 4 N

₂²

(λ) ρ

≤ ρ P b

_n

(λ) + ρ | λ |

²

|b ς

_n

− ς

_n

|

n + 4N

₂²

(λ)

ρ .

Thus from here it follows that

Err

_n

( b λ) ≤ Err

_n

(λ

₀

) + 2M ( ϑ) + Υ b

_n

+ 2ρ P b

_n

(λ

₀

) , (4.12) where

Υ

_n

= 4

n N

₁^∗

+ 8

ρ (N

₂^∗

)

²

+ 4σ

∗

̺

_1,n

n + 4 + 2ρ

n ̺

_n

|b ς

_n

− ς

_n

|

with N

₁^∗

= sup

_λ∈Λ

| N

₁

(λ) | and N

₂^∗

= sup

_λ∈Λ

| N

₂

(λ) | . Moreover, note that the bounds (4.6), (4.7) and (4.10) imply that

E

_S

Υ

_n

≤ Υ

^∗_n

(ρ) σ

∗

n + 4 + 2ρ

n ̺

_n

E

_S

|b ς

_n

− ς

_n

| , (4.13) where the function Υ

^∗_n

(ρ) is defined in (2.17).

Now we study the second term in (4.8). First, note that for any nonrandom vector ϑ = (ϑ(1), . . . , ϑ(n))

^′

∈ R

ⁿ

Lemma A.4 implies

E

_S

M

²

(ϑ) ≤ σ

∗

n X

n

j=1

ϑ

²

(j)θ

²_j,n

= σ

∗

k S

_ϑ

k

²n

n , (4.14)

where

S

_ϑ

= X

n

j=1

ϑ(j )θ

_j,n

φ

_j

. We set now

Z

^∗

= sup

ϑ∈Λ1

nM

²

(ϑ) k S

_ϑ

k

²n

with Λ

₁

= Λ − λ

₀

. We estimate this term with the help of inequality (4.14), i.e.

E

_S

Z

^∗

≤ X

ϑ∈Λ1

nE

_S

M

²

(ϑ) k S

_ϑ

k

²n

≤ νσ

_∗

. (4.15)

(16)

Moreover, making use of inequality (4.11) with ε = ρ k S

_ϑ

k

n

, we get 2 | M(ϑ) | ≤ ρ k S

_ϑ

k

²_n

+ Z

^∗

nρ . (4.16)

Now we estimate k S

_ϑ

k

²_n

. We have k S

_ϑ

k

²_n

− k S b

_ϑ

k

²_n

=

X

n j=1

ϑ

²

(j)(θ

_j,n²

− θ b

_j,n²

) ≤ − 2M

₁

(ϑ) (4.17) with

M

₁

(ϑ) = 1

√ n X

n

j=1

ϑ

²

(j)θ

_j,n

ξ

_j,n

.

Now, taking into account that | ϑ(j) | ≤ 1 for any ϑ ∈ Λ

₁

, we obtain E

_S

M

₁²

(ϑ) ≤ σ

∗

k S

_ϑ

k

²n

n . Putting

Z

₁^∗

= sup

ϑ∈Λ1

nM

₁²

(ϑ) k S

_ϑ

k

²n

, we get

E

_S

Z

₁^∗

≤ νσ

_∗

. (4.18)

Therefore, applying inequality (4.16) for M

₁

(ϑ) in (4.17) we deduce the upper bound for k S

_ϑ

k

²_n

, i.e.

k S

_ϑ

k

²_n

≤ 1

1 − ρ k S b

_ϑ

k

²_n

+ Z

₁^∗

nρ(1 − ρ) . (4.19)

Taking into account this inequality in (4.16) we obtain that 2M (ϑ) ≤ ρ

1 − ρ k S b

_ϑ

k

²_n

+ Z

^∗

+ Z

₁^∗

nρ(1 − ρ)

≤ 2ρ(Err

n

(λ) + Err

n

(λ

0

))

1 − ρ + Z

^∗

+ Z

₁^∗

nρ(1 − ρ) .

(17)

Therefore (4.12) implies that Err

_n

( b λ) ≤ 1 + ρ

1 − 3ρ Err

_n

(λ

0

) + 1 − ρ 1 − 3ρ Υ

_n

+ Z

^∗

+ Z

₁^∗

nρ(1 − 3ρ) + 2ρ(1 − ρ)

1 − 3ρ P b

_n

(λ

₀

) , Now by inequalities (4.15)–(4.18) we get that

E

_S

Err

_n

( b λ) ≤ 1 + ρ

1 − 3ρ E

_S

Err

_n

(λ

₀

) + 1 − ρ

1 − 3ρ E

_S

Υ

_n

+ 2νσ

∗

nρ(1 − 3ρ) + 2ρ(1 − ρ)

1 − 3ρ E

_S

P b

_n

(λ

₀

) .

By making use of inequality (4.13) and Lemma A.1 we come to Theorem 2.1.

4.2 Proof of Lemma 3.1

First notice that from (2.7) we obtain that b

ς

_n

− ς

_n

= X

n j=mn+1

θ

²_j,n

+ 2

√ n X

n j=mn+1

θ

_j,n

ξ

_j,n

+ n

⁻¹

X

n j=mn+1

ξ e

_j,n

+ n

⁻¹

X

n j=mn+1

ς

_j,n

− m

_n

n ς

_n

:= ∆

₁

+ 2

√ n ∆

₂

+ 1

n ∆

₃

+ 1

n ∆

₄

− m

_n

n ς

_n

, where ξ e

_j,n

and ς

_j,n

are defined in (4.3) and (4.9) respectively.

We estimate the first term by Lemma A.3 for S ∈ W

_r¹

. We have

∆

₁

≤ 4r m

²_n

.

The next term we estimate with the help of Lemma A.4. We get that E

_S

(∆

₂

)

²

≤ σ

_∗

∆

₁

≤ σ

_∗

4r

m

²_n

.

(18)

By (4.4) and (4.5) we can represent ∆

₃

as

∆

₃

= N

₁

(λ

_I

) + 2 | λ

_I

| √ nς

_n

N

₂

(λ

_I

)

with the vector λ

_I

= (λ

_I

(1) , . . . , λ

_I

(n))

^′

having the indicator components, i.e. λ

_I

(j) = 1

_{j>m

n}

. By estimating in(A.1) φ

²_j

by 2 we obtain E

_S

| N

₁

(λ

_I

) | ≤ 2σ

∗

ξ √

n . Thus the upper bound (4.7) implies

E

_S

| ∆

₃

| ≤ 2σ

∗

(ξ + √ 2) √

n = σ √ n . Moreover, due to Lemma A.2 with m = 0, one has

| ∆

₄

| = n

⁻¹

X

n d=1

σ

_d²

(S) X

n j=mn+1

φ

_j

(x

d

)

≤ σ

∗

n X

n d=1

X

n j=1

φ

_j

(x

d

) + σ

∗

n X

n d=1

mn

X

j=1

φ

_j

(x

d

)

≤ 2σ

∗

. Hence Lemma 3.1.

5 Appendix

A.1 Proof of (4.6)

First note that we can represent the term N

₁

(λ) as N

₁

(λ) =

X

n l=1

v

_l,n

ξ e

_l

with v

_l,n

= σ

_l²

(S) n

X

n j=1

λ(j )φ

²_j

(x

_l

) .

Recalling that Ee ξ

₁²

= ξ

^∗

− 1 = ξ

²

we calculate

E

_S

N

₁²

(λ) = ξ

²

n

²

X

n l=1

E

_S

σ

⁴_l

(S)



 X

n

j=1

λ(j)φ

²_j

(x

_l

)





2

.

(19)

Therefore for any vector λ ∈ R

ⁿ

E

_S

| N

₁

(λ) | ≤ σ

∗

√ ξ

n max

0≤x≤1

| X

n

j=1

λ(j)φ

²_j

(x) | . (A.1) Thus taking into account here definitions (2.9) we come to inequality (4.6).

A.2 Proof of (4.7)

By putting α

_l

= P

n

j=1

λ(j )τ

_j,l

and taking into account that the random variables (ξ

_k

)

_1≤k≤n

are independent of (σ

_k

(S))

_1≤k≤n

we obtain that

E

_S

N

₂²

(λ) | σ

_k

(S) , 1 ≤ k ≤ n

= 1

_{ς_n_>0}

X

n l=1

σ

_l²

(S)

!

−1

X

n j=1

b

α

_l

, (A.2) where

α b

_l

= E(α

²_l

| σ

_k

(S) , 1 ≤ k ≤ n)

= σ

_l²

(S) n

²

X

l−1 r=1

σ

_r²

(S)



 X

n

j=1

λ(j)φ

_j

(x

_l

)φ

_j

(x

_r

)





2

.

Therefore the orthonormality property (2.5) implies that for any λ ∈ R

ⁿ

b α

_l

≤ σ

∗

σ

_l²

(S) n

²

X

n d=1



 X

n

j=1

λ(j)φ

_j

(x

_l

)φ

_j

(x

_d

)





2

= σ

∗

σ

_l²

(S) n

X

n j=1

λ

²

(j )φ

²_j

(x

_l

) ≤ 2σ

∗

n σ

_l²

(S) .

Now by making use of this inequality in (A.2) we get (4.7).

(20)

A.3 Technical lemma

Lemma A.1. For any n ≥ 1 and λ ∈ Λ, E

_S

P b

_n

(λ) ≤ E

_S

Err

_n

(λ) + ̺

_n

n E

_S

|b ς

_n

− ς

_n

| + σ

_∗

̺

_2,n

n . Proof. Indeed, by the definition of Err

_n

(λ) we have

Err

_n

(λ) = X

n

j=1

(λ(j) − 1)θ

_j,n

+ λ(j) 1

√ n ξ

_j,n

2

.

Therefore,

E

_S

Err

_n

(λ) ≥ E

_S

1 n

X

n j=1

λ

²

(j) ξ

_j,n²

= E

_S

1 n

X

n j=1

λ

²

(j) ς

_j,n

,

where the sequence (ς

_j,n

) is defined in (4.2). Moreover, note that the last term can be estimated as

X

n j=1

λ

²

(j )ς

_j,n

− | λ |

²

ς

_n

=

1 n

X

n l=1

σ

_l²

(S) X

n

j=1

λ

²

(j) φ

_j

(x

_l

)

≤ σ

∗

̺

_2,n

. We recall that the definition of the set Λ and the definition of ̺

_n

in (2.9) imply that | λ |

²

≤ ̺

_n

for λ ∈ Λ. Therefore for any λ ∈ Λ

X

n j=1

λ

²

(j) ς

_j,n

≥ | λ |

²

b ς

_n

− σ

∗

̺

_2,n

− | λ |

²

|b ς

_n

− ς

_n

|

≥ | λ |

²

b ς

_n

− σ

∗

̺

_2,n

− ̺

_n

|b ς

_n

− ς

_n

| . Hence the desired inequality.

A.4 Properties of trigonometric basis

Lemma A.2. For any m ≥ 0, sup

N≥2

sup

x∈[0,1]

N

^−m

X

N l=2

l

^m

φ

_l

(x)

≤ 2

^m

. (A.3)

(21)

Proof. Due to the properties of the trigonometric functions, we get X

N

l=2

l

^m

φ

_l

(x) = X

1≤l≤N/2

(2l)

^m

cos(4πlx)

− X

1≤l≤(N−1)/2

(2l + 1)

^m

cos(4πlx) . This yields

X

N l=2

l

^m

φ

_l

(x) ≤

X

1≤l≤(N−1)/2

((2l + 1)

^m

− (2l)

^m

) cos(4πlx) + N

^m

≤ X

1≤l≤(N−1)/2

((2l + 1)

^m

− (2l)

^m

) + N

^m

= X

1≤l≤(N−1)/2 m−1

X

j=0

_m

j

(2l)

^j

+ N

^m

. This implies (A.3).

Lemma A.3. For any function S ∈ W

_r^k

, sup

n≥1

sup

1≤m≤n−1

m

^2k

X

n j=m+1

θ

²_j,n

!

≤ 4r

π

^2(k−1)

. (A.4)

Proof. First, note that any function S from W

_r^k

can be represented by its Fourier series, i.e. S = P

∞

j=1

θ

j

φ

j

with the coefficients defined by (3.4). By denoting the residual term for S as

∆

m

(x) = S − X

m

j=1

θ

j

φ

j

= X

∞ j=m+1

θ

_j

φ

j

(x) , we obtain that

X

n j=m+1

θ

²_j,n

= inf

α¹,...,α^m

k S − X

m

j=1

α

j

φ

j

k

²n

≤ k ∆

m

k

²n

.

(22)

Moreover, it is easy to deduce that k ∆

m

k

²n

= n

⁻¹

X

n k=1

∆

²_m

(x

k

) = X

n k=1

Z

xk

xk−1

∆

²_m

(x

k

)dx

≤ 2 Z

1

0

∆

²_m

(x)dx + 2 X

n

k=1

Z

xk

xk−1

(∆

m

(x

k

) − ∆

m

(x))

²

dx . The last term in this inequality we estimate as

(∆

m

(x

k

) − ∆

m

(x))

²

= Z

xk

x

∆ ˙

m

(z)dz

2

≤ n

⁻¹

Z

xk

xk−1

( ˙ ∆

m

(z))

²

dz . Therefore,

k ∆

m

k

²n

≤ 2 k ∆

m

k

²

+ 2

n

²

k ∆ ˙

m

k

²

= 2 X

∞ j=m+1

θ

²_j

+ 2 n

²

X

∞ j=m+1

θ

²_j

k φ ˙

_j

k

²

.

Now note that by the representation of the set W

_r^k

in the form (3.3) we can estimate the first term in the last inequality as

X

∞ j=m+1

θ

²_j

= X

∞ j=m+1

θ

_j²

a

_j

a

_j

≤ r

a

_m+1

≤ r (πm)

^2k

. Similarly, we find that

X

∞ j=m+1

θ

²_j

k φ ˙

_j

k

²

≤ sup

j≥m+1

k φ ˙

_j

k

²

a

_j

r ≤ sup

j≥m+1

k φ ˙

_j

k

²

k φ

^(k)_j

k

²

r ≤ r (πm)

^2(k−1)

. Therefore, for m ≤ n we get that

1 n

²

X

∞ j=m+1

θ

_j²

k φ ˙

_j

k

²

≤ r

π

^2(k−1)

m

^2k

.

This implies (A.4).

(23)

Lemma A.4. Let ξ

_j,n

be defined in (2.7) for the model (1.1). Then, for any real numbers v

1

, . . . , v

n

,

E X

n

j=1

v

j

ξ

_j,n

!

2

≤ σ

∗

X

n j=1

v

_j²

. (A.5)

Proof. Due to the definition of ξ

_j,n

, one has X

n

j=1

v

j

ξ

_j,n

= X

n

l=1

σ

l

e v

l

ξ

_l

with

e v

l

= 1

√ n X

n j=1

v

j

φ

j

(x

l

) . Moreover

E X

n

j=1

v

j

ξ

_j,n

!

2

= X

n

l=1

σ

_l²

(S) e v

²_l

≤ σ

∗

X

n l=1

e v

²_l

= σ

∗

X

n i,j=1

v

i

v

j

(φ

i

, φ

j

)

n

.

The orthogonality of the basis (φ

j

) implies inequality (A.5). Hence Lemma A.4.

References

[1] Akritas, M.G. and Van Keilegom, I. (2001) ANCOVA methods for heteroscedastic nonparametric regression models. J. Amer. Statist. Assoc 96 220-232.

[2] Barron, A., Birg´e, L. and Massart, P. (1999) Risk bounds for model selection via penalization. Probab. Theory Related Fields. 113 301-413.

[3] Cai, T. and Wang, L. (2008) Adaptive variance function estimation

in heteroscedactic nonparametric regression. Annals of Statistics, 1-38

accepted.

(24)

[4] Fourdrinier, D. and Pergamenshchikov, S. (2007) Improved selection model method for the regression with dependent noise. Annals of the Institute of Statistical Mathematics , 59 (3), p. 435-464.

[5] Galtchouk, L. and Pergamenshchikov, S. (2004) Nonparametric sequential estimation of the drift in diffusion processes. Mathematical Meth- ods of Statistics , 13, 1, 25-49.

[6] Galtchouk, L. and Pergamenshchikov, S. (2005) Efficient adaptive nonparametric estimation in heteroscedastic regression models. Preprint of the Strasbourg Louis Pasteur University, IRMA, 2005/020 available at http://www.univ-rouen.fr/LMRS/Persopage/Pergamenchtchikov [7] Galtchouk, L. and Pergamenshchikov, S. (2006) Asymptotic efficient

sequential kernel estimates of the drift coefficient in ergodic diffusion processes. Statistical Inference for Stochastic Process. 9, 1-16.

[8] Galtchouk, L. and Pergamenshchikov, S. (2007a) Adaptive sequential estimation for ergodic diffusion processes in quadratic metric. Part 1.

Sharp non-asymptotic oracle inequalities. Mathematical Methods of Statistics , submitted.

[9] Galtchouk, L. and Pergamenshchikov, S. (2007b) Adaptive sequential estimation for ergodic diffusion processes in quadratic metric. Part 2:

Asymptotic efficiency. Mathematical Methods of Statistics , submitted.

[10] Galtchouk, L. and Pergamenshchikov, S. (2007c) Adaptive nonparametric estimation in heteroscedastic regression models. Asymptotic efficiency. Probab. Theory Related Fields , submitted.

[11] Goldfeld, S.M. and Quandt, R.E. (1972) Nonlinear Methods in Econo- metrics. North-Holland, London.

[12] Kalifa, J. and Mallat, S. (2003) Thresholding estimators for linear in- verse problems and deconvolutions. Ann. Statist. 31 58-109.

[13] Kneip, A. (1994) Ordered linear smoothers. Ann. Statist. 22 835-866.

[14] Massart, P. (2004) A non-asymptotic theory for model selection. 4ECM

Stockholm 2004. 309-323.

(25)

[15] Nussbaum, M. (1985) Spline smoothing in regression models and asymptotic efficiency in L

₂

. Ann. Statist. 13 984-997.

[16] Rohde, A. (2004) On the asymptotic equivalence and rate of conver- gence of nonparametric regression and Gaussian white noise. Statistics

& Decisions 22 235-243.

[17] Young, S. G. and Bowman, A. W. (1995) Non-parametric Analysis of

Covariance. Biometrics 51 920–931.

Adaptive nonparametric estimation in heteroscedastic regression models. Part 1: Sharp non-asymptotic Oracle inequalities.

HAL Id: hal-00269196

https://hal.archives-ouvertes.fr/hal-00269196

Submitted on 10 Apr 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires