Adaptive nonparametric estimation in heteroscedastic regression models. Part 2: Asymptotic efficiency.

(1)

HAL Id: hal-00269303

https://hal.archives-ouvertes.fr/hal-00269303

Submitted on 10 Apr 2008

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive nonparametric estimation in heteroscedastic regression models. Part 2: Asymptotic eﬀiciency.

Leonid Galtchouk, Serguey Pergamenshchikov

To cite this version:

Leonid Galtchouk, Serguey Pergamenshchikov. Adaptive nonparametric estimation in heteroscedastic

regression models. Part 2: Asymptotic eﬀiciency.. Journal of the Korean Statistical Society, Elsevier,

2009, 35 p. �10.1016/j.jkss.2008.12.001�. �hal-00269303�

(2)

hal-00269303, version 2 - 10 Apr 2008

Adaptive nonparametric estimation in heteroscedastic regression models.

Part 2: Asymptotic efficiency.

By Leonid Galtchouk and Sergey Pergamenshchikov

^∗

Louis Pasteur University of Strasbourg and University of Rouen

Abstract

In the paper we study asymptotic properties of the adaptive procedure proposed in the paper Galtchouk, Pergamenshchikov, 2007, for nonparametric estimation of unknown regression. We prove that this procedure is asymptotically efficient for some quadratic risk, i.e.

we show that the asymptotic quadratic risk for this procedure coincides with the Pinsker constant which gives a sharp lower bound for quadratic risk over all possible estimates.

^{1 2}

∗The second author is partially supported by the RFFI-Grant 04-01-00855.

1AMS 2000 Subject Classification: primary 62G08; secondary 62G05, 62G20

2Key words: adaptive estimation, asymptotic bounds, efficient estimation, heteroscedastic regression, nonparametric estimation, non-asymptotic estimation, oracle inequality, Pinsker’s constant.

(3)

1 Introduction

The paper deals with the estimation problem in the heteroscedastic non- parametic regression model

y

_j

= S(x

_j

) + σ

_j

(S) ξ

_j

, (1.1) where the design points x

_j

= j/n, S( · ) is an unknown function to be estimated, (ξ

_j

)

_1≤j≤n

is a sequence of centered i.i.d. random variables with unit variance and Eξ

⁴₁

= ξ

^∗

< ∞ , (σ

_j

(S))

_1≤j≤n

are unknown scale functionals depending on unknown regression function S and the design points.

Typically, the notion of asymptotic optimality is associated with the optimal convergence rate of the minimax risk (see for example, Ibragimov, Hasminskii,1981; Stone,1982). An important question in optimality results is to study the exact asymptotic behaviour of the minimax risk. Such results have been obtained only in a limited number of investigations. As to the nonparametric estimation problem for heteroscedastic regression models we should mention the papers Efromovich, 2007, Efromovich, Pinsker, 1996, and Galtchouk, Pergamenshchikov, 2005, concerning the exact asymptotic behaviour of the L

2

-risk and paper by Brua, 2007, devoted to the efficient pointwise estimation for heteroscedastic regressions.

We remind that an example of heteroscedastic regression models is given by econometrics (see, for example, Goldfeld, Quandt, 1972, p. 83), where for consumer budget problems one uses some parametric version of model (1.1) with the scale coefficients defined as

σ

_j²

(S) = c

₀

+ c

₁

x

_j

+ c

₂

S

²

(x

_j

) , (1.2)

(4)

where c

₀

, c

₁

and c

₂

are some positive unknown constants.

The purpose of the article is to study asymptotic properties of the adaptive estimation procedure proposed in Galtchouk, Pergamenshchikov, 2007, for which a non-asymptotic oracle inequality was proved for quadratic risks.

We will prove that this oracle inequality is asymptotically sharp, i.e. the asymptotic quadratic risk is minimal. It means the adaptive estimation procedure is efficient under some conditions on the scales (σ

_j

(S))

_1≤j≤n

which are satisfied in the case (1.2). Note that in Efromovich, 2007, Efro- movich, Pinsker, 1996, an efficient adaptive procedure is constructed for heteroscedastic regression when the scale coefficient is independent of S, i.e.

σ

_j

(S) = σ

_j

. In Galtchouk, Pergamenshchikov, 2005, for the model (1.1) the asymptotic efficiency was proved under strong conditions on the scales which are not satisfied in the case (1.2). Moreover in the cited papers the efficiency was proved for the gaussian random variables (ξ

j

)

_1≤j≤n

that is very restrictive for applications of proposed methods to practical problems.

In the paper we modify the risk by introducing into a additional supremum with respect to a classe of unknown noise distributions like to Galtchouk, Pergamenshchikov, 2006. This modification allow us to eliminate from the risk dependence on the noise distribution. Moreover for this risk a efficient procedure is robust with respect to changing of noise distributions.

It is well known to prove the asymptotic efficiency one has to show that the asymptotic quadratic risk coincides with the lower bound which is equal to the Pinsker constant. In the paper two problems are resolved:

in the first one an upper bound for the risk is obtained by making use of

the non-asymptotic oracle inequality from Galtchouk, Pergamenshchikov,

(5)

2007, in the second one we prove that this upper bound coincides with the Pinsker constant. Let us remind that the adaptive procedure proposed in Galtchouk, Pergamenshchikov, 2007, is based on weighted mean-squares estimates, where the weights are corresponding modifications of the Pinsker weights for the homogene case (when σ

₁

(S) = . . . = σ

_n

(S) = 1) relative to a certain smoothness of the function S and this procedure chooses an estimator best for the quadratic risk among these estimates. To obtain the Pinsker constant for the model (1.1) one has to prove a sharp asymptotic lower bound for the quadratic risk in the case when the noise variance depends on the unknown regression function. This lower bound is obtained by making use of an inequality of kind of the van Trees inequality (see, Gill, Levit, 1995). First we prove the inequality for a parametric regression with the noise variance depending on the unknown regression (see Section 6) and further we apply the inequality to the nonparametric regression by standard reducing to a parametric case.

The paper is organized as follows. In Section 2 we construct a adaptive

estimation procedure. In Section 3 we formulate principal conditions. The

main result is given in Section 4. The upper bound for the quadratic risk is

given in Section 5. In Section 6 we find the lower bound for a parametric

model. In Section 7 we study the parametric family. In Section 8 we obtain

the lower bound for model (1.1). An appendix contains some technical

results.

(6)

2 Adaptive procedure

In this section we describe the adaptive procedure proposed in [6]. We make use of the standard trigonometric basis (φ

_j

)

_j≥1

in L

2

[0, 1], i.e.

φ

₁

(x) = 1 , φ

_j

(x) = √

2 T r

_j

(2π[j/2]x) , j ≥ 2 , (2.1) where the function T r

_j

(x) = cos(x) for even j and T r

_j

(x) = sin(x) for odd j; [x] denotes the integer part of x. We remind that if n is odd then the functions (φ

_j

)

_1≤j≤n

are orthonormal with respect to the empirical inner product generated by the sieve (x

_j

)

_1≤j≤n

in (1.1), i.e. for any 1 ≤ i, j ≤ n,

(φ

_i

, φ

_j

)

_n

= 1 n

n

X

l=1

φ

_i

(x

_l

)φ

_j

(x

_l

) = Kr

_ij

,

where Kr

_ij

is Kronecker’s symbol. Thanks to this basis we pass to the discrete Fourier transformation of model (1.1), i.e.

ϑ ˆ

_j,n

= ϑ

_j,n

+ (1/ √

n)ξ

_j,n

, (2.2)

where ˆ θ

_j,n

= (Y, φ

_j

)

_n

, θ

_j,n

= (S, φ

_j

)

_n

and ξ

_j,n

= 1

√ n

n

X

l=1

σ

_l

(S)ξ

_l

φ

_j

(x

_l

) .

Here Y = (y

₁

, . . . , y

_n

)

^′

and S = (S(x

₁

), . . . , S (x

_n

))

^′

. The prime denotes the transposition.

We estimate the function S by the weighted least squares estimator S ˆ

_λ

=

n

X

j=1

λ(j) ˆ ϑ

_j,n

φ

_j

, (2.3)

(7)

where the weight vector λ = (λ(1), . . . , λ(n))

^′

belongs to some finite set Λ from [0, 1]

ⁿ

with n ≥ 3. Here we make use of the weight family Λ introduced in [6], i.e.

Λ = { λ

_α

, α ∈ A

ε

} , A

ε

= { 1, . . . , k

_∗

} × { t

₁

, . . . , t

_m

} , (2.4) where k

_∗

= [1/ √

ε], t

_i

= iε, m = [1/ε

²

] and ε = 1/ ln n.

For any α = (β, t) ∈ A

ε

we define the weight vector λ

_α

= (λ

_α

(1), . . . , λ

_α

(n))

^′

as

λ

_α

(j) = 1

_{1≤j≤j

0}

+

1 − (j/ω(α))

^β

1

_{j

0<j≤ω(α)}

, (2.5) where j

0

= j

₀

(α) = [ω(α)/ ln n], ω(α) = (A

_β

t)

^1/(2β+1)

n

^1/(2β+1)

and

A

_β

= (β + 1)(2β + 1)/(π

^2β

β) .

To find the optimal weights we choose the cost function equals to the pe- nalized mean integrated squared error in which unknown parameters are replaced by some estimators. The cost function is as follows

J

_n

(λ) =

n

X

j=1

λ

²

(j) ˆ ϑ

²_j,n

− 2

n

X

j=1

λ(j) ˜ ϑ

_j,n

+ ρ P ˆ

_n

(λ) , (2.6) where

ϑ ˜

_j,n

= ˆ ϑ

²_j,n

− 1

n ς ˆ

_n

with ς ˆ

_n

=

n

X

j=l_n+1

ϑ ˆ

²_j,n

(2.7) and l

_n

= [n

^1/3

+ 1]. The penalty term we define as

P ˆ

_n

(λ) = | λ |

²

ς ˆ

_n

n , | λ |

²

=

n

X

j=1

λ

²

(j) and ρ = 1 3 + ln

^γ

n . for some γ > 0. Finally, we set

λ ˆ = argmin

_λ∈Λ

J

n

(λ) and S ˆ

_∗

= ˆ S

_ˆ_λ

. (2.8)

(8)

The goal of this paper is to study asymptotic (n → ∞ ) properties of this estimation procedure.

3 Conditions

First we impose some conditions on unknown function S in model (1.1).

Let C

_per,1^k

( R ) be the set of 1-periodic k times differentiable R → R functions. We assume that S belongs to the following set

W

_r^k

= { f ∈ C

_per,1^k

( R ) :

k

X

j=0

k f

^(j)

k

²

≤ r } , (3.1)

where k · k denotes the norm in L

2

[0, 1], i.e.

k f k

²

= Z

₁

0

f

²

(t)dt . (3.2)

Moreover, we suppose that r > 0 and k ≥ 1 are unknown parameters.

Note that, we can represent the set W

_r^k

as an ellipse in L

2

[0, 1], i.e.

W

_r^k

= { f ∈ L

2

[0, 1] :

∞

X

j=1

a

_j

ϑ

²_j

≤ r } , (3.3) where

ϑ

_j

= (f, φ

_j

) = Z

₁

0

f (t)φ

_j

(t)dt (3.4)

and

a

_j

=

k

X

l=0

k φ

^(l)_j

k

²

=

k

X

i=0

(2π[j/2])

²ⁱ

. (3.5)

Here (φ

_j

)

_j≥1

is the trigonometric basis defined in (2.1).

Now we decribe the conditions on the scale coefficients (σ

_j

(S))

_j≥1

.

(9)

H

₁

) σ

_j

(S) = g(x

_j

, S) for some unknown function g : [0, 1] × L

₁

[0, 1] → R

+

, which is square integrable with respect to x such that

n→∞

lim sup

S∈W_r^k

n

⁻¹

n

X

j=1

g

²

(x

j

, S) − ς(S)

= 0 , (3.6) where ς(S) := R

1

0

g

²

(x, S)dx. Moreover, g

_∗

= inf

0≤x≤1

inf

S∈W_r^k

g

²

(x, S) > 0 (3.7) and

sup

S∈W_r^k

ς(S) < ∞ . (3.8)

H

₂

) For any x ∈ [0, 1] the operator g

²

(x, · ) : C[0, 1] → R is differentiable in the Fr´echet sense for any fixed function f

₀

from C[0, 1] , i.e. for any f from some vicinity of f

₀

in C[0, 1]

g

²

(x, f) = g

²

(x, f

₀

) + L

_x,f

0

(f − f

₀

) + Υ(x, f

₀

, f ) , where the Fr´echet derivative L

_x,f

0

: C[0, 1] → R is a bounded linear operator and the residual term Υ(x, f

₀

, f ) for each x ∈ [0, 1] satisfies the following property

|f−f

lim

₀|_∗→0

| Υ(x, f

₀

, f ) |

| f − f

₀

|

∗

= 0 , where | f |

_∗

= sup

_0≤t≤1

| f (t) | .

H

₃

) There exists some positive constant C

^∗

such that for any function S from C[0, 1] the operator L

_x,S

defined in condition H

₂

) satisfies the following inequality for any function f from C[0, 1]

| L

_x,S

(f ) | ≤ C

^∗

( | S(x)f (x) | + | f |

1

+ k S k k f k ) , (3.9) where | f |

1

= R

1

0

| f (t) | dt.

(10)

H

₄

) The function g

²₀

( · ) = g

²

( · , S

₀

) corresponding to S

₀

≡ 0 is continuous on the interval [0, 1]. Moreover,

δ→0

lim sup

0≤x≤1

sup

|S|_∗≤δ

| g

²

(x, S) − g

²

(x, S

₀

) | = 0 .

Now we give some examples of functions satisfying conditions H

₁

)-H

₄

).

We fix some c

₀

> 0. Let G : [0, 1] × R → [c

₀

, + ∞ ) be a function such that

δ→0

lim max

|u−v|≤δ

sup

y∈R

| G(u, y) − G(v, y) | = 0 . (3.10) and

G

^′_∗

= sup

0≤x≤1

sup

y∈R

| G

_y

(x, y) | / | y | < ∞ . (3.11) Moreover, let V : R → R

₊

be a continuously differentiable function such that

v

^′_∗

= sup

y∈R

| V ˙ (y) | /(1 + | y | ) < ∞ . We set

g

²

(x, S) = G(x, S(x)) + Z

1

0

V (S(t))dt . (3.12) In this case

ς(S) = Z

1

0

G(t, S(t))dt + Z

1

0

V (S(t))dt and for any S ∈ W

_r^k

n

⁻¹

n

X

j=1

g

²

(x

_j

, S) − ς(S)

≤

n

X

j=1

Z

_x_j

x_j−1

G(x

_j

, S(x

_j

)) − G(t, S (t)) dt

≤ ∆

_n

+ G

^′_∗

n

Z

₁

0

| S(t) | | S ˙ (t) | dt ≤ ∆

_n

+ G

^′_∗

n r , where

∆

_n

= max

|u−v|≤1/n

sup

y∈R

| G(u, y) − G(v, y) | .

(11)

Therefore by condition (3.10) we obtain H

₁

).

Moreover, the Fr´echet derivative in this case is given by L

_x,S

(f ) = G

_y

(x, S(x))f (x) +

Z

₁

0

V ˙ (S(t))f (t)dt .

It is easy to see that this operator satisfies the inequality (3.9) with C

^∗

= G

^′_∗

+ v

_∗^′

.

For example, we can take in (3.12)

G(x, y) = c

₀

+ c

₁

x + c

₂

y

²

and V (x) = c

₃

x

²

(3.13) with some coefficients c

₀

> 0, c

_i

≥ 0, i = 1, 2, 3. Therefore, we obtain the function (1.2) if we put in (3.12)-(3.13) c

₃

= 0, i.e. V ≡ 0.

4 Main results

Denote by P

∗

the family of unknown noise density. Remind that the noise random variables (ξ

_j

)

_1≤j≤n

are centered with unit variance and Eξ

₁⁴

≤ ξ

^∗

, where ξ

^∗

≥ 3. For any estimate ˆ S we define the following quadratic risk

R

n

( ˆ S, S ) = sup

p∈P_∗

E

_S,p

k S ˆ − S k

²_n

, (4.1) where E

_S,p

is the expectation with respect to the distribution P

_S,p

of the observations (y

₁

, . . . , y

_n

) with the fixed function S and the fixed density p of random variables (ξ

_j

)

_1≤j≤n

in model (1.1), k S k

²_n

= (S, S)

_n

.

In Galtchouk, Pergamenshchikov, 2007, we shown the following non-

asymptotic Oracle inequality for procedure (2.8).

(12)

Theorem 4.1. Assume that in model (1.1) the function S belongs to W

_r¹

. Then, for any odd n ≥ 3, any 0 < ρ < 1/3 and r > 0, the estimate S ˆ

_∗

satisfies the following oracle inequality

R

n

( ˆ S

_∗

, S) ≤ (1 + κ(ρ)) min

λ∈Λ

R

n

( ˆ S

_λ

, S) + n

⁻¹

B

n

(ρ) , (4.2) where

κ(ρ) = (6ρ − 2ρ

²

)/(1 − 3ρ) and the function B

n

(ρ) is such that, for any δ > 0,

n→∞

lim B

n

(ρ)/n

^δ

= 0 . (4.3) Now we formulate the main asymptotic results. To this end for any function S ∈ W

_r^k

we set

γ

_k

(S) = Γ

^∗_k

r

^1/(2k+1)

(ς(S))

^2k/(2k+1)

, (4.4) where

Γ

^∗_k

= (2k + 1)

^1/(2k+1)

(k/(π (k + 1)))

^2k/(2k+1)

.

It is well known (see, for example, Nussbaum, 1985) that for any function S ∈ W

_r^k

the optimal convergence rate is n

^2k/(2k+1)

.

Theorem 4.2. Assume that in model (1.1) the sequence (σ

_j

(S)) fulfils the condition H

₁

). Then the estimator S ˆ

_∗

from (2.8) satisfies the inequality

lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_r^k

R

n

( ˆ S

_∗

, S)/γ

_k

(S) ≤ 1 . (4.5)

The following result gives the sharp lower bound for risk (4.1) and show

that γ

_k

(S) is the Pinsker constant.

(13)

Theorem 4.3. Assume that in model (1.1) the sequence (σ

_j

(S)) satisfies the conditions H

₂

)– H

₄

). Then, for any estimate S ˆ

n

, the risk R

n

( ˆ S

n

, S) admits the following asymptotic lower bound

lim inf

n→∞

n

^2k/(2k+1)

inf

Sˆn

sup

S∈W_r^k

R

n

( ˆ S

_n

, S)/γ

_k

(S) ≥ 1 . (4.6) Remark 4.1. Note that in Galtchouk, Pergamenshchikov, 2005 an asymptotically efficient estimate was constructed and results similar to Theorems 4.2 and 4.3 were claimed for the model (1.1). In fact the upper bound is true there under some additional condition on the smoothness of the function S, i.e. on the parameter k. In the cited paper this additional condition is not formulated since erroneous inequality (A.6). To avoid the use of this inequality we modify the estimating procedure by introducing the penalty term ρ P ˆ

_n

(λ) in the cost function (2.6). By this way we remove all additional conditions on the smoothness parameter k.

5 Upper bound

In this section we prove Theorem 4.2. To this end we will make use of oracle inequality (4.2). We have to find an estimator from the family (2.3)-(2.4) for which we can show the upper bound (4.5). We start with the construction of such an estimator. First we put

˜ l

_n

= inf { i ≥ 1 : iε ≥ r(S) } ∧ m and r(S) = r/ς (S) . (5.1) Then we choose an index from the set A

ε

as

˜

α = (k, ˜ t

_n

) ,

(14)

where k is the parameter of the set W

_r^k

and ˜ t

_n

= ˜ l

_n

ε. Finally, we set S ˜ = ˆ S

_λ_˜

and λ ˜ = λ

_α_˜

. (5.2) Now we show the upper bound (4.5) for this estimator.

Theorem 5.1. Assume that condition H

₁

) hold. Then lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_r^k

R

n

( ˜ S, S)/γ

_k

(S) ≤ 1 . (5.3) Remark 5.1. Note that the estimator S ˜ belongs to estimate family (2.3)- (2.4), but we can’t use directly this estimator because the parameters k, r and r(S) are unknown. We can use this upper bound only through the oracle inequality (4.2) proved for procedure (2.8).

Proof. To prove the theorem we will adapt to the heteroscedastic case the corresponding proof from Nussbaum, 1985.

First, from (2.3) we obtain that, for any p ∈ P

∗

, E

_S,p

k S ˜ − S k

²n

=

n

X

j=1

(1 − ˜ λ

_j

)

²

ϑ

²_j,n

+ 1 n

n

X

j=1

λ ˜

²_j

ς

_j,n

, (5.4) where

ς

_j,n

= 1 n

n

X

l=1

σ

_l²

(S)φ

²_j

(x

_l

) . Setting now ˜ ω = ω(˜ α), ˜ j

0

= [˜ ω/ ln n], ˜ j

1

= [˜ ω ln n] and

ς

_n

= 1 n

n

X

l=1

σ

²_l

(S) , we rewrite (5.4) as follows

E

_S,p

k S ˜ − S k

²n

=

˜j1−1

X

j=˜j0+1

(1 − λ ˜

_j

)

²

ϑ

²_j,n

+ ς

_n

n

⁻¹

n

X

j=1

˜ λ

²_j

+ ∆

₁

(n) + ∆

₂

(n) (5.5)

(15)

with

∆

₁

(n) =

n

X

j=˜j1

ϑ

²_j,n

and ∆

₂

(n) = n

⁻¹

n

X

j=1

λ ˜

²_j

ς

_j,n

− ς

_n

.

Note that we have decomposed the first term in the right-hand of (5.4) into the sum

˜j1−1

X

j=˜j0+1

(1 − λ ˜

_j

)

²

ϑ

²_j,n

+ ∆

₁

(n) .

This decomposition allows us to show that ∆

₁

(n) is negligible and further to approximate the first term by a similar term in which the coefficients ϑ

_j,n

will be replaced by the Fourier coefficients ϑ

_j

of the function S.

Taking into account the definition of ω(α) in (2.5) we can bound ˜ ω as

˜

ω ≥ (A

_k

)

^1/(2k+1)

n

^1/(2k+1)

(ln n)

^−1/(2k+1)

. Therefore, by Lemma A.1 we obtain

n→∞

lim sup

S∈W_r^k

n

^2k/(2k+1)

∆

₁

(n) = 0 . Let us consider now the next term ∆

₂

(n). We have

| ∆

₂

(n) | =

1 n

²

n

X

d=1

σ

_d²

n

X

j=1

λ ˜

²_j

φ

_j

(x

_d

)

≤ σ

_∗

n sup

0≤x≤1

n

X

j=1

λ ˜

²_j

φ

_j

(x) ,

where φ

_j

(x) = φ

²_j

(x) − 1. Now by Lemma A.2 and definition (2.5) we obtain directly the same property for ∆

₂

(n), i.e.

n→∞

lim sup

S∈W_r^k

n

^2k/(2k+1)

| ∆

₂

(n) | = 0 . Setting

ˆ

γ

_k,n

(S) = n

^2k/(2k+1)

˜j1−1

X

j=˜j0

(1 − λ ˜

_j

)

²

ϑ

²_j

+ ς

_n

n

^−1/(2k+1)

n

X

j=1

˜ λ

²_j

(16)

and applying the well-known inequality

(a + b)

²

≤ (1 + δ)a

²

+ (1 + 1/δ)b

²

to the first term in the right-hand side of inequality (5.5) we obtain that, for any δ > 0 and for any p ∈ P

∗

,

E

_S,p

k S ˜ − S k

²n

≤ (1 + δ) ˆ γ

_k,n

(S) n

^−2k/(2k+1)

+ ∆

₁

(n) + ∆

₂

(n) + (1 + 1/δ) ∆

₃

(n) , (5.6) where

∆

₃

(n) =

˜j1−1

X

j=˜j0+1

(ϑ

_j,n

− ϑ

_j

)

²

. Taking into account that k ≥ 1 and that

˜ j

₁

≤ (A

_k

)

^1/(2k+1)

n

^1/(2k+1)

(ln n)

(2k+2)/(2k+1)

, we can show through Lemma A.3 that

n→∞

lim sup

S∈W_r^k

n

^2k/(2k+1)

∆

3

(n) = 0 . Therefore inequality (5.6) yields

lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_r^k

R

n

( ˜ S, S)/γ

_k

(S) ≤ lim sup

n→∞

sup

S∈W_r^k

ˆ

γ

_k,n

(S)/γ

_k

(S) and to prove (5.3) it suffices to show that

lim sup

n→∞

sup

S∈W_r^k

ˆ

γ

_k,n

(S)/γ

_k

(S) ≤ 1 . (5.7) First it should be noted that definition (5.1) and inequalities (3.7)-(3.8) imply directly

n→∞

lim sup

S∈W_r^k

˜ t

_n

/r(S) − 1

= 0 .

(17)

Moreover, by the definition of (˜ λ

_j

)

_1≤j≤n

for sufficiently large n for which

˜ t

_n

≥ r(S) we can calculate the following supremum sup

j≥1

n

^2k/(2k+1)

(1 − λ ˜

_j

)

²

/(πj)

^2k

= π

^−2k

(A

_k

t ˜

_n

)

^−2k/(2k+1)

≤ π

^−2k

(A

_k

r(S))

^−2k/(2k+1)

.

Therefore, taking into account the definition of the coefficients (a

_j

)

_j≥1

in (3.5) we obtain that

lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_r^k

sup

j≥˜j0

π

^2k

(A

_k

r(S))

^2k/(2k+1)

(1 − ˜ λ

_j

)

²

/a

_j

≤ 1 . Moreover, by definition (2.5) we get that

n→∞

lim sup

S∈W_r^k

n

^−1/(2k+1)

n

X

j=1

λ ˜

²_j

− (A

_k

r(S))

^1/(2k+1)

Z

₁

0

(1 − z

^k

)

²

dz

= 0 . Taking into account definition of W

_r^k

in (3.3) and condition (3.6) we obtain inequality (5.7). Hence Theorem 5.1.

Now Theorem 4.1 and Theorem 5.1 imply Theorem 4.2.

6 Lower bound for parametric heteroscedastic regression models

Let ( R

ⁿ

, B ( R

ⁿ

), P

_ϑ

, ϑ ∈ Θ ⊆ R

^l

) be a statistical model relative to the observations (y

j

)

_1≤j≤n

governed by the regression equation

y

_j

= S

_ϑ

(x

_j

) + σ

_j

(ϑ) ξ

_j

, (6.1)

where ξ

₁

, . . . , ξ

_n

are i.i.d. N (0, 1) random variables, ϑ = (ϑ

₁

, . . . , ϑ

_l

)

^′

is a

unknown parameter vector, S

_ϑ

(x) is a unknown (or known) function and

(18)

σ

_j

(ϑ) = g(x

_j

, S

_ϑ

), with the function g(x, S) defined in condition H

₁

). As- sume that a prior distribution µ

_ϑ

of the parameter ϑ in R

^l

is defined by the density Φ(ϑ) of the following form

Φ(ϑ) = Φ(ϑ

₁

, . . . , ϑ

_l

) =

l

Y

i=1

ϕ

_i

(ϑ

_i

) ,

where ϕ

i

is a continuously differentiable bounded density on R with I

i

=

Z

R

( ˙ ϕ

_i

(z))

²

ϕ

_i

(z) dz < ∞ .

Let λ( · ) be a continuously differentiable R

^l

→ R function such that, for any 1 ≤ i ≤ l,

|θ_i

lim

|→∞

λ(ϑ) ϕ

_i

(ϑ

_i

) = 0 and Z

R^l

λ

^′_i

(ϑ)

Φ(ϑ)dϑ < ∞ , (6.2) where

λ

^′_i

(ϑ) = (∂/∂ϑ

i

) λ(ϑ) .

Let ˆ λ

_n

be an estimator of λ(ϑ) based on observations (y

_j

)

_1≤j≤n

. For any B ( R

ⁿ

× R

^l

) - mesurable integrable function G(x, ϑ), x ∈ R

ⁿ

, ϑ ∈ R

^l

, we set

E ˜ G(Y, ϑ) = Z

R^l

E

_ϑ

G(Y, ϑ) Φ(ϑ) dϑ ,

where E

_ϑ

is the expectation with respect to the distribution P

_ϑ

of the vector Y = (y

₁

, . . . , y

_n

). Note that in this case

E

_ϑ

G(Y, ϑ) = Z

Rⁿ

G(v, ϑ) f (v, ϑ) dv , where

f (v, ϑ) =

n

Y

j=1

√ 1

2πσ

_j

(ϑ) exp (

− (v

_j

− S

_ϑ

(x

_j

))

²

2σ

²_j

(ϑ)

)

. (6.3)

We prove the following result.

(19)

Theorem 6.1. Assume that conditions H

₁

) − H

₂

) hold. Moreover, assume that the function S

_ϑ

( · ) is uniformly over 0 ≤ x ≤ 1 differentiable in C [0, 1]

with respect to ϑ

_i

, 1 ≤ i ≤ l, i.e. for any 1 ≤ i ≤ l there exists a function S

_ϑ,i^′

∈ C [0, 1] such that

h→0

lim max

0≤x≤1

S

_ϑ+he_i

(x) − S

_ϑ

(x) − S

^′_ϑ,i

(x)h /h

= 0 , (6.4) where e

_i

= (0, ...., 1, ..., 0)

^′

, all coordinates are 0, except the ith equals to 1 . Then for any square integrable estimator λ ˆ

_n

of λ(ϑ) and any 1 ≤ i ≤ l,

E(ˆ ˜ λ

_n

− λ)

²

≥ Λ

²_i

/(F

_i

+ B

_i

+ I

i

) , (6.5) where Λ

_i

= R

R^l

λ

^′_i

(ϑ) Φ(ϑ)dϑ, F

_i

= P

_n

j=1

R

R^l

(S

^′_ϑ,i

(x

_j

)/σ

_j

(ϑ))

²

Φ(ϑ)dϑ and B

i

= 1

2

n

X

j=1

Z

R^l

L ˜

²_i

(x

_j

, S

_ϑ

)

σ

_j⁴

(S

_ϑ

) Φ(ϑ)dϑ , L ˜

_i

(x, ϑ) = L

_x,S

ϑ

(S

_ϑ,i^′

), the operator L

_x_,_S

is defined in the condition H

₂

).

Proof. We put

̺

i

(v, ϑ) = 1 f (v, ϑ)Φ(ϑ)

∂

∂ϑ

_i

(f (v, ϑ)Φ(ϑ)) .

Note that due to condition (3.7) the density (6.3) is bounded, i.e.

f (v, ϑ) ≤ (2πg

_∗

)

^−n/2

. So through (6.2) we obtain that

|ϑ

lim

_i|→∞

λ(ϑ) f (v, ϑ)ϕ

_i

(ϑ

_i

) = 0 . Therefore, integrating by parts yields

E(ˆ ˜ λ

_n

− λ)̺

_i

= Z

R^n+l

(ˆ λ

_n

(v) − λ(ϑ)) ∂

∂ϑ

_i

(f (v, ϑ)Φ(ϑ)) dϑdv

= Z

R^l

∂

∂ϑ

_i

λ(ϑ)

Φ(ϑ) Z

Rⁿ

f (v, ϑ)dv

dϑ = Λ

_i

.

(20)

Now the Bouniakovskii-Cauchy-Schwarz inequality gives the following lower bound

E(ˆ ˜ λ

_n

− λ)

²

≥ Λ

²_i

/ E̺ ˜

²_i

.

To estimate the denominator in the last ratio, note that

̺

_i

(v, ϑ) = 1 f (v, ϑ)

∂

∂ϑ

_i

f (v, ϑ) + ϕ ˙

_i

(ϑ

_i

) ϕ

_i

(ϑ

_i

)

= ˜ f

_i

(v, ϑ) + ϕ ˙

_i

(ϑ

_i

) ϕ

_i

(ϑ

_i

) , where

f ˜

_i

(v, ϑ) = (∂/∂ϑ

_i

) ln f (v, ϑ) . From (6.1) it follows that

f ˜

_i

(v, ϑ) =

n

X

j=1

(ξ

²_j

− 1) 1 2σ

²_j

(ϑ)

∂

∂ϑ

_i

σ

²_j

(ϑ) +

n

X

j=1

ξ

_j

S

_i^′

(x

_j

) σ

_j

(ϑ) . Moreover, conditions H

₂

) and (6.4) imply

(∂/∂ϑ

_i

) σ

_j²

(ϑ) = ∂/∂ϑ

_i

) g

²

(x

_j

, S

_ϑ

) = ˜ L

_i

(x

_j

, ϑ) from which it follows

E ˜

f ˜

_i

(Y, ϑ)

2

= F

_i

+ B

_i

. This implies inequality (6.5). Hence Theorem 6.1.

7 Parametric kernel function family

In this section we define and study some special parametric kernel functions

family which will be used to prove the sharp lower bound (4.6).

(21)

Let us begin by kernel functions. We fix η > 0 and we set I

_η

(x) = η

⁻¹

Z

R

1

_{(|u|≤1−η)}

V

u − x η

du , (7.1)

where 1

_A

is the indicator of a set A, the kernel V ∈ C

^∞

( R ) is such that V (u) = 0 for | u | ≥ 1 and

Z

₁

−1

V (u) du = 1 . It is easy to see that the function I

_η

(x) possesses the properties :

0 ≤ I

_η

≤ 1 , I

_η

(x) = 1 for | x | ≤ 1 − 2η and I

_η

(x) = 0 for | x | ≥ 1 .

Moreover, for any c > 0 and m ≥ 1

η→0

lim sup

f:|f|_∗≤c

Z

R

f (x)I

_η^m

(x)dx − Z

1

−1

f (x)dx

= 0 , (7.2)

where | f |

∗

= sup

_−1≤x≤1

| f (x) | .

We divide the interval [0, 1] into M equal parts of length 2h and on each of them we construct a kernel-type function that was used in Ibragimov, Hasminskii, 1981, to obtain the lower bound for estimation at a fixed point.

A such constructed on each interval function equals to zero at the extrem- ities together with all derivatives. It means that Fourier partial sums with respect to the trigonometric basis in L

2

[ − 1, 1] give a natural parametric approximation to the function on each interval.

Let (e

_j

)

_j≥1

be the trigonometric basis in L

₂

[ − 1, 1], i.e.

e

₁

= 1/ √

2 , e

_j

(x) = T r

_j

(π[j/2]x) , j ≥ 2 , (7.3)

where T r

_j

(x) = cos(x) for even j and T r

_j

(x) = sin(x) for odd j.

(22)

Now, for any array z = { (z

_m,j

)

_1≤m≤M

n,1≤j≤N_n

} we define the following function

S

_z,n

(x) =

M_n

X

m=1 N_n

X

j=1

z

_m,j

D

_m,j

(x) , (7.4) where D

_m,j

(x) = e

_j

(v

_m

(x)) I

_η

(v

_m

(x)),

v

_m

(x) = (x − x ˜

_m

)/h

_n

, x ˜

_m

= 2mh

_n

and M

_n

= [1/(2h

_n

)] − 1 . We assume that the sequences (N

_n

)

_n≥1

and (h

_n

)

_n≥1

, satisfy the following conditions.

A

₁

) The sequence N

_n

→ ∞ as n → ∞ and for any p > 0

n→∞

lim N

_n^p

/n = 0 .

Moreover, there exist 0 < δ

₁

< 1 and δ

₂

> 0 such that

h

_n

= O(n

^−δ¹

) and h

⁻¹_n

= O(n

^δ²

) as n → ∞ .

To define a prior distribution on the family of arrays, we choose the following random array ϑ = { (ϑ

_m,j

)

_1≤m≤M

n,1≤j≤N_n

} with

ϑ

_m,j

= t

_m,j

ζ

_m,j

, (7.5)

where (ζ

_m,j

) are i.i.d. N (0, 1) random variables and (t

_m,j

)

_1≤m≤M

n,1≤j≤N_n

are some nonrandom positive coefficients. We make use of gaussian variables since they possess the minimal Fisher information and therefore maximize the lower bound (6.5). We set

t

^∗_n

= max

1≤m≤M_n N_n

X

j=1

t

_m,j

. (7.6)

(23)

We assume that the coefficients (t

_m,j

)

_1≤m≤M

n,1≤j≤N_n

satisfy the following conditions.

A

₂

) There exists a sequence of positive numbers (d

_n

)

_n≥1

such that

n→∞

lim d

_n

h

^2k−1_n

M_n

X

m=1 N_n

X

j=1

t

²_m,j

j

^2(k−1)

= 0 , lim

n→∞

p d

_n

t

^∗_n

= 0 , (7.7)

moreover, for any p > 0,

n→∞

lim n

^p

exp {− d

_n

/2 } = 0 .

A

₃

) For some 0 < ε < 1 lim sup

n→∞

1 h

^2k−1_n

M_n

X

m=1 N_n

X

j=1

t

²_m,j

j

^2k

≤ (1 − ε)r 2

π

2k

.

A

₄

) There exists ǫ

₀

> 0 such that

n→∞

lim 1 h

^4k−2+ǫn ⁰

M_n

X

m=1 N_n

X

j=1

t

⁴_m,j

j

^4k

= 0 .

Proposition 7.1. Let conditions A

₁

)–A

₂

). Then, for any p > 0 and for any δ > 0,

n→∞

lim n

^p

max

0≤l≤k−1

P

k S

_ϑ,n^(l)

k > δ

= 0 .

Proof. First note that for 0 ≤ x ≤ 1 we can represent the lth derivative as S

_ϑ,n^(l)

(x) = 1

h

^l

M_n

X

m=1 l

X

i=0

l i

I

_η^(l−i)

(v

_m

(x)) Q

_i,m

(v

_m

(x)) , (7.8)

(24)

where

Q

_i,m

(v) =

N_n

X

j=1

ϑ

m,j

e

⁽ⁱ⁾_j

(v) . Therefore

k S

_ϑ,n^(l)

k

²

= 1 h

^2l−1_n

Mn

X

m=1

Z

₁

−1 l

X

i=0

l i

I

_η^(l−i)

(v) Q

_i,m

(v)

!

2

dv and by the Bounyakovskii-Cauchy-Schwarz inequality we obtain that

k S

_ϑ,n^(l)

k

²

≤ C

^∗

(l, η) h

^2l−1_n

l

X

i=0

Q

_i,m

(7.9)

with C

^∗

(l, η) = max

_−1≤v≤1

P

_l

i=0

l i

I

_η^(l−i)

(v)

2

and

Q

_i,m

=

Mn

X

m=1

Z

₁

−1

Q

²_i,m

(v) dv . Now we show that for any 0 ≤ i ≤ k − 1 and δ > 0

n→∞

lim n

^p

P

Q

_i,m

> δh

^2k−1_n

= 0 . (7.10)

To that end we introduce the following set Ξ

_n

= { max

1≤m≤Mn

1≤j≤N

max ζ

_m,j²

≤ d

_n

} , (7.11) where the sequence (d

_n

)

_n≥1

is given in condition A

₂

). Therefore, taking into account that

Z

1

−1

Q

²_i,m

(v) dv =

N_n

X

j=1

ϑ

²_m,j

Z

1

−1

(e

⁽ⁱ⁾_j

(v))

²

dv

≤ π 2

2i N_n

X

j=1

t

²_m,j

j

²ⁱ

ζ

_m,j²

,

(25)

the function Q

_i,m

can be estimated on the set Ξ

_n

as Q

_i,m

≤ π

2

2i

d

_n

Mn

X

m=1 N_n

X

j=1

t

²_m,j

j

²ⁱ

and by (7.7) we get, for any δ > 0 and sufficiently large n, P

Q

_i,m

> δh

^2k−1_n

≤ P Ξ

^c_n

.

Moreover, for sufficiently large n P Ξ

^c_n

≤ M

_n

N

_n

e

^−dⁿ^/2

.

Therefore, conditions A

₁

) and (7.7) imply lim sup

n→∞

n

^p

P Ξ

^c_n

= 0 , (7.12)

for any p > 0. Hence Proposition 7.1.

Proposition 7.2. Let conditions A

₁

)–A

₄

). Then, for any p > 0,

n→∞

lim n

^p

P(S

_ϑ,n

∈ / W

_r^k

) = 0 .

Proof. First of all we prove that for ε from condition A

₃

)

n→∞

lim n

^p

P

k S

_ϑ,n^(k)

k > p

(1 − ε/4)r

= 0 . (7.13)

Indeed, putting in (7.8) l = k we can represent the kth derivative of S

_ϑ,n

as follows

S

_ϑ,n^(k)

(x) = ˆ S

_k

(x) + S

_k

(x) (7.14) with

S ˆ

_k

(x) = 1 h

^k

M_n

X

m=1 k−1

X

i=0

k i

I

_η^(k−i)

(v

_m

(x)) Q

_i,m

(v

_m

(x))

(26)

and

S

_k

(x) = 1 h

^k

M_n

X

m=1

I

_η

(v

_m

(x)) Q

_k,m

(v

_m

(x)) .

First, note that, we can estimate the norm of ˆ S

_k

(x) by the same way as in inequality (7.9), i.e.

k S ˆ

_k

k

²

≤ C

^∗

(k, η) h

^2k−1_n

k−1

X

i=0

Q

_i,m

.

By making use of (7.10) we obtain that, for any p > 0 and for any δ > 0,

n→∞

lim n

^p

P

k S ˆ

_k

k > δ

= 0 . (7.15)

Let us consider now the last term in (7.14). Taking into account that 0 ≤ I

_η

(v) ≤ 1 we get

k S

_k

k

²

= 1 h

^2k−1_n

M_n

X

m=1

Z

1

−1

I

_η²

(v)Q

²_k,m

(v)dv

≤ π 2

2k

1 h

^2k−1_n

M_n

X

m=1 N_n

X

j=1

t

²_m,j

j

^2k

ζ

_m,j²

.

Therefore from condition A

₃

) we get for sufficiently large n k S

_k

k

²

≤ (1 − ε/2)r + π

2

2k M_n

X

m=1

ζ

_m

:= (1 − ε/2)r + π 2

2k

Y

_n

with

ζ

_m

= 1 h

^2k−1_n

N_n

X

j=1

t

²_m,j

j

^2k

ζ ˜

_m,j

and ζ ˜

_m,j

= ζ

_m,j²

− 1 . We show that for any p > 0 and for any δ > 0

n→∞

lim n

^p

P ( | Y

_n

| > δ) = 0 . (7.16) Indeed, by the Chebyshev inequality for any ι > 0

P ( | Y

_n

| > δ) ≤ E (Y

_n

)

^2ι

/δ

^2ι

. (7.17)

(27)

Note now that according to the Burkholder-Davis-Gundy inequality for any ι > 1 there exists a constant B

^∗

(ι) > 0 such that

E (Y

_n

)

^2ι

≤ B

^∗

(ι) E





M_n

X

m=1

ζ

²_m





ι

.

Moreover, by putting

ζ ˜

_∗

= max

1≤m≤M_n

max

1≤j≤N_n

ζ ˜

_m,j²

we obtain that

ζ

²_m

≤ N

_n

h

^4k−2_n

N_n

X

j=1

t

⁴_m,j

j

^4k

ζ ˜

_∗

. Therefore, by condition A

₄

) for sufficiently large n

E (Y

_n

)

^2ι

≤ B

^∗

(ι) N

_n^ι

h

^ιǫ_n⁰

E ζ ˜

_∗^ι

≤ B

^∗

(ι) E (ζ

²

− 1)

^2ι

M

_n

N

_n^ι+1

h

^ιǫ_n⁰

,

where ζ ∼ N (0, 1). Taking into account here condition A

₁

) we obtain for sufficiently large n

E (Y

_n

)

^2ι

≤ n

^−δ¹^(ιǫ⁰⁻²⁾

. Thus, choosing in (7.17)

ι > p/(ǫ

₀

δ

₁

) + 2/ǫ

₀

we obtain limiting equality (7.16) which together with (7.14)-(7.15) implies (7.13). Now it is easy to deduce that Proposition 7.1 yields Proposition 7.2.

Proposition 7.3. Let conditions A

₁

)–A

₄

). Then, for any p > 0,

n→∞

lim n

^p

E k S

_ϑ,n

k

²

1

_{S

ϑ,n∈W/ _r^k}

+ 1

_Ξc n

= 0 .

(28)

Proof. First of all, we remind that due to condition A

₂

)

n→∞

lim

M_n

X

m=1 N_n

X

j=1

t

²_m,j

≤ lim

n→∞

d

_n

h

^2k−1_n

M_n

X

m=1 N_n

X

j=1

t

²_m,j

j

^2(k−1)

= 0 . Therefore, taking into account that

k S

_ϑ,n

k

²

≤ h

_n

M_n

X

m=1 N_n

X

j=1

t

²_m,j

ζ

_m,j²

(7.18)

we obtain, for sufficiently large n, E k S

_ϑ,n

k

²

1

_{S

ϑ,n∈W/ _r^k}

+ 1

_Ξc n

≤ max

m,j

E ζ

_m,j²

1

_{S

ϑ,n∈W/ _r^k}

+ 1

_Ξc n

.

Moreover, for any 1 ≤ m ≤ M

_n

and 1 ≤ j ≤ N

_n

, we estimate the last term as

E ζ

_m,j²

1

_{S

ϑ,n∈W/ _r^k}

+ 1

_Ξc n

≤ n P(S

_ϑ,n

∈ / W

_r^k

)

+ n P(Ξ

^c_n

) + 2E ζ

²

1

_{ζ2≥n}

,

where ζ ∼ N (0, 1). By applying now Proposition 7.2 and limit (7.12) we obtain Proposition 7.3.

Proposition 7.4. Let conditions A

₁

)–A

₄

). Then for any function g satisfying conditions (3.7) and H

₄

)

n→∞

lim sup

0≤x≤1

E

g

⁻²

(x, S

_ϑ,n

) − g

₀⁻²

(x) = 0 .

Proof. First, note that on the set Ξ the random function S

_ϑ,n

is uniformly bounded, i.e.

| S

_ϑ,n

|

∗

= sup

0≤x≤1

| S

_ϑ,n

(x) | ≤ p

d

_n

t

^∗_n

, (7.19)

(29)

where the coefficient t

^∗_n

is defined in (7.6). Therefore by condition H

₁

) we obtain

E

g

⁻²

(x, S

_ϑ,n

) − g

₀⁻²

(x)

≤ max

|S|_∗≤

√

d_nt^∗_n

| g

⁻²

(x, S) − g

₀⁻²

(x) | +(2/g

_∗

) P Ξ

^c_n

.

Conditions A

₂

) and H

₄

) together with the limit relation (7.12) imply Propo- sition 7.4.

8 Lower bound

In this section we prove Theorem 4.3. To that end we establish the following auxiliary result.

Lemma 8.1. For any 0 < δ < 1 and any estimate S ˆ

_n

of S ∈ W

_r^k

, k S ˆ

_n

− S k

²n

≥ (1 − δ) k T

_n

( ˆ S) − S k

²

− (δ

⁻¹

− 1) r/n

²

, where T

_n

( ˆ S)(x) = P

n

k=1

S ˆ

_n

(x

_k

)1

_(x_k−1_,x_k_]

(x).

Proof of this Lemma is given in Appendix A.2.

This Lemma implies that to prove (4.6), it suffices to show the same asymptotic inequality for the integral risk, i.e.

lim inf

n→∞

inf

Sˆn

n

^2k/(2k+1)

R

0

( ˆ S

_n

) ≥ 1 , (8.1) where

R

0

( ˆ S

_n

) = sup

S∈W_r^k

E

_S,q

k S ˆ

_n

− S k

²

/γ

_k

(S) , q is the gaussian (0, 1) density of the noise (ξ

_j

) and k S k

²

= R

₁

0

S

²

(x)dx.

(30)

To show (8.1) we will make use of the sequence of random functions (S

_ϑ,n

)

_n≥1

defined in (7.4)-(7.5) with the coefficients (t

_m,j

) satisfying conditions A

₁

)–A

₄

) which will be chosen later.

For any estimator ˆ S

n

, we denote by ˆ S

_n⁰

its projection onto W

_r^k

, i.e.

S ˆ

_n⁰

= Pr

_Wk

r

( ˆ S

_n

). Since W

_r^k

is a convex set, we get that k S ˆ

_n

− S k

²

≥ k S ˆ

_n⁰

− S k

²

. Therefore, we can write that

R

0

( ˆ S

_n

) ≥ Z

{z:S_z,n∈W_r^k}∩Ξ_n

E

_S

z,n,q

k S ˆ

_n⁰

− S

_z,n

k

²

γ

_k

(S

_z,n

) µ

_ϑ

(dz) .

Here µ

_ϑ

denotes the distribution of ϑ in R

^l

with l = M

_n

N

_n

. We recall also that the set Ξ

_n

is defined in (7.11). Moreover, taking into account here inequality (7.19) we estimate the risk R

0

( ˆ S

n

) from below as

R

0

( ˆ S

_n

) ≥ 1 γ

_n^∗

Z

{z:S_z,n∈W_r^k}∩Ξ_n

E

_S

z,n,q

k S ˆ

_n⁰

− S

_z,n

k

²

µ

_ϑ

(dz) with

γ

_n^∗

= sup

|S|_∗≤

√

d_nt^∗_n

γ

_k

(S) . (8.2)

Let us introduce now the corresponding Bayes risk R ˜

0

( ˆ S

_n⁰

) =

Z

R^l

E

_S

z,n,q

k S ˆ

_n⁰

− S

_z,n

k

²

µ

_ϑ

(dz) . (8.3) Now through this risk we rewrite the lower bound for R

0

( ˆ S

_n

) as

R

0

( ˆ S

_n⁰

) ≥ R ˜

0

( ˆ S

_n⁰

)/γ

_n^∗

− 2 ̟

_n

/γ

_n^∗

(8.4) with

̟

_n

= E(1

_{S

ϑ,n∈W/ _r^k}

+ 1

_Ξ^c

n

)(r + k S

_ϑ,n

k

²

) .

(31)

First of all, we reduce the nonparametric problem to parametric one. For this we replace the functions ˆ S

_n⁰

and S by their Fourier series with respect to the basis

˜

e

_m,i

(x) = (1/ √

h) e

i

(v

m

(x)) 1

_(|v_m_(x)|≤1)

.

By making use of this basis we can estimate the norm k S ˆ

_n⁰

− S

_z,n

k

²

from below as

k S ˆ

_n⁰

− S

_z,n

k

²

≥

M_n

X

m=1 N_n

X

j=1

(ˆ λ

_m,j

− λ

_m,j

(z))

²

, where

λ ˆ

_m,j

= Z

₁

0

S ˆ

_n⁰

(x)˜ e

_m,j

(x)dx and λ

_m,j

(z) = Z

₁

0

S

_z,n

(x)˜ e

_m,j

(x) dx . Moreover, from definition (7.4) one gets

λ

_m,j

(z) = √ h

N_n

X

i=1

z

_m,i

Z

₁

−1

e

_i

(u)e

_j

(u)I

_η

(u) du .

It is easy to see that the functions λ

_m,j

( · ) satisfy condition (6.2) for gaussian prior densities. In this case (see the definition in (6.5)) we have

Λ

_m,j

= (∂/∂z

_m,j

)λ

_m,j

(z) = √

he

_j

(I

_η

) , where

e

_j

(f ) = Z

1

−1

e

²_j

(v) f (v) dv . (8.5) Now to obtain a lower bound for the Bayes risk ˜ R

0

( ˆ S

_n⁰

) we make use of Theorem 6.1 which implies that

R ˜

0

( ˆ S

_n⁰

) ≥

M_n

X

m=1 N_n

X

j=1

he

²_j

(I

_η

)

F

_m,j

+ B

_m,j

+ t

⁻²_m,j

, (8.6)

(32)

where F

_m,j

= P

_n

i=1

D

_m,j²

(x

_i

) E g

⁻²

(x

_i

, S

_ϑ,n

) and B

_m,j

= 1

2

n

X

i=1

E

L ˜

_m,j

(x

_i

, S

_ϑ,n

) g

²

(x

_i

, S

_ϑ,n

)

!

2

with ˜ L

_m,j

(x, S) = L

_x,S

D

_m,j

. In the appendix we show that

n→∞

lim sup

1≤m≤M_n

sup

1≤j≤N_n

F

_m,j

/(nh) − e

j

(I

_η²

) g

⁻²₀

(˜ x

m

)

= 0 (8.7) and

n→∞

lim sup

1≤m≤M_n

sup

1≤j≤N_n

B

_m,j

/(nh)

= 0 . (8.8)

This means that, for any ν > 0 and for sufficiently large n, sup

1≤m≤M_n

sup

1≤j≤N_n

F

_m,j

+ B

_m,j

+ t

⁻²_m,j

nhe

_j

(I

_η²

)g

₀⁻²

(˜ x

_m

) + t

⁻²_m,j

≤ 1 + ν . Therefore, if we denote in (8.6)

κ

²_m,j

= nh g

₀⁻²

(˜ x

_m

) t

²_m,j

and τ

_j

(η, y) = e

²_j

(I

_η

) y/(e

²_j

(I

_η²

)y + 1) we obtain that, for sufficiently large n,

n

^2k/(2k+1)

R ˜

0

( ˆ S

_n⁰

) ≥ 1

1 + ν n

^−1/(2k+1)

M_n

X

m=1

g

²₀

(˜ x

_m

)

N_n

X

j=1

τ

_j

(η, κ

²_m,j

) . In the appendix we show that

η→0

lim sup

N≥1

sup

(y₁,...,y_N)∈R^N₊

N

X

j=1

τ

_j

(η, y

_j

)/

N

X

j=1

τ (y

_j

) − 1

= 0 , (8.9) where

τ (y) = y/(y + 1) . Therefore we can write that, for sufficiently large n,

n

^2k+1^2k

R ˜

0

( ˆ S

_n⁰

) ≥ 1 − ν

1 + ν n

⁻^2k+1¹

M_n

X

m=1

g

₀²

(˜ x

_m

) J

_N

n

(κ

²_m,1

, . . . , κ

²_m,N

n

) , (8.10)