Adaptive efficient robust estimation for nonparametric autoregressive models

(1)

HAL Id: hal-02366202

https://hal.archives-ouvertes.fr/hal-02366202

Preprint submitted on 15 Nov 2019

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive eﬀicient robust estimation for nonparametric autoregressive models

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov

To cite this version:

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov. Adaptive eﬀicient robust estimation for

nonparametric autoregressive models. 2019. �hal-02366202�

(2)

Adaptive efficient robust estimation for nonparametric autoregressive models ^∗

Ouerdia Arkoun

^†

, Jean-Yves Brua

^‡

and Serguei Pergamenshchikov

^§

Abstract

In this paper for the first time the adaptive efficient estimation problem for nonparametric autoregressive models has been studied.

First of all, through the Van Trees inequality the sharp bound for the robust quadratic risks, i.e. the Pinsker constant (see, for example, in [19]), in explicit form has been obtained. Then, through the sharp oracle inequalities method developed in [4] for non parametric autoregressions an adaptive efficient model selection procedure is proposed, i.e. such for which the upper bound of its robust quadratic risk coincides with the obtained Pinsker constant.

MSC: primary 62G08, secondary 62G05

Keywords: Non-parametric estimation; Non parametric autoregresion; Non- asymptotic estimation; Robust risk; Model selection; Sharp oracle inequalities.

∗This work was supported by the RSF grant 17-11-01049 (National Research Tomsk State University).

†Sup’Biotech, Laboratoire BIRL, 66 Rue Guy Moquet, 94800 Villejuif, France and Laboratoire de Mathématiques Raphael Salem, Normandie Université, UMR 6085 CNRS- Université de Rouen, France, e-mail: [email protected]

‡Laboratoire de Mathématiques Raphael Salem, Normandie Université, UMR 6085 CNRS- Université de Rouen, France, e-mail : [email protected]

§Laboratoire de Math´ematiques Raphael Salem, UMR 6085 CNRS- Universit´e de Rouen Normandie, France and International Laboratory of Statistics of Stochastic Pro- cesses and Quantitative Finance, National Research Tomsk State University, e-mail:

[email protected]

(3)

1 Introduction

1.1 Model

In this paper we consider the nonparametric autoregression model defined as y

_k

= S(x

_k

)y

_k−1

+ ξ

_k

and x

_k

= a + k(b − a)

n , (1.1)

where S(·) ∈ L

₂

[a, b] is unknown function, a < b are fixed known constants, 1 ≤ k ≤ n, the initial value y

₀

is a constant and the noise (ξ

_k

)

_k≥1

is i.i.d.

sequence of unobservable random variables with Eξ

₁

= 0 and Eξ

₁²

= 1. In the sequel we denote by p the distribution density of the random variable ξ

₁

. The problem is to estimate the function S(·) on the basis of the observations (y

_k

)

_1≤k≤n

under the condition that the noise distribution p is unknown and belongs to some noise distributions class P . There is a number of papers which consider these models such as [6], [7] and [5]. In all these papers, the authors propose some asymptotic (as n → ∞) methods for different identification studies without considering optimal estimation issues. Firstly, minimax estimation problems for the model (1.1) has been treated in [2] and [18] in the nonadaptive case, i.e. for the known regularity of the function S. Then, in [1] and [3] it is proposed to use the sequential analysis method for the adaptive pointwise estimation problem in the case when the H¨ older regularity is unknown.

1.2 Main contributions

In this paper we consider the adaptive estimation problem for the quadratic risk defined as

R

_p

( S b

_n

, S) = E

_p,S

k S b

_n

− Sk

²

, kSk

²

= Z

b

a

S

²

(x)dx , (1.2) where S b

_n

is an estimator of S based on observations (y

_k

)

_1≤k≤n

and E

_p,S

is the expectation with respect to the distribution law P

_p,S

of the process (y

_k

)

_1≤k≤n

given the distribution density p and the coefficient S. Moreover, taking into account that the distribution p is unknown, we use the robust nonparametric estimation approach proposed in [8]. To this end we set the robust risk as

R

^∗

( S b

_n

, S) = sup

p∈P

R

_p

( S b

_n

, S) , (1.3)

where P is a family of the distributions defined in Section 2.

(4)

To estimate the function S in model (1.1) we make use of the model selection procedures proposed in [4] based on the family of the optimal poinwise truncated sequential estimators from [3] for which using the model selection method developed in [9] a sharp oracle inequality is shown. In this paper, using this inequality we show that the model selection procedure is efficient in adaptive setting for the robust quadratic risks (1.3). To this end, first of all we have to study the sharp lower bound for the these risks, i.e. we have to study the best potential accuracy estimation for the model (1.1) which is called the Pinsker constant. For this we use the approach proposed in [10] - [11] which is based the Van-Trees inequality. It turns out that for the model (1.1) the Pinsker constant has the same form as for the filtration signal problem in the ”signal - white noise” model studied in [19] but with new coefficient which equals to the optimal variance given by the Hajek - Le Cam inequality for the parametric model (1.1). This is the new result in the efficient non parametric estimation theory for the statistical models with dependent observations. Then, using the oracle inequality from [4] and the weight least square estimation method we show that for the model selection procedure with the Pinsker weight coefficients the upper bound asymptotically coincides with the obtained Pinsker constant without using the regularity properties of the unknown functions, i.e. it is efficient in adaptive setting with respect to the robust risks (1.3).

1.3 Plan of the paper

The paper is organized as follows. In Section 2 we give all conditions and construct the sequential point-wise estimation procedures. to pass from the auto-regression model to the corresponding regression model. In Section 3 we construct the model selection procedure based on the sequential estimators from Section 2. In Section 4 we announce the main results. In Section 5 we show the Van - Trees inequality for the model (1.1). In section 6 we obtain the lower bound for the robust risks. In Section 7 we obtain the upper bounds for the robust risks. In Appendix A we give the all auxiliary and technic tools.

2 Sequential procedures.

As in [3] we assume that in the model (1.1) the i.i.d. random variables (ξ

_k

)

_k≥1

have a density p (with respect to the Lebesgue measure) from the functional

(5)

class P defined as P :=

p ≥ 0 : Z

+∞

−∞

p(x) dx = 1 ,

Z

+∞

−∞

x p(x) dx = 0 ,

Z

+∞

−∞

x

²

p(x) dx = 1 and sup

k≥1

R

+∞

−∞

|x|

^2k

p(x) dx ς

^k

(2k − 1)!! ≤ 1

)

, (2.1) where ς ≥ 1 is some fixed parameter, which may be a function of the number observation n, i.e. ς = ς (n), such that for any b > 0

lim

n→∞

ς (n)

n

^b

= 0 . (2.2)

Note that the (0, 1)-Gaussian density belongs to P. In the sequel we denote this density by p

₀

. It is clear that for any q > 0

m

^∗_q

= sup

p∈P

E

_p

|ξ

₁

|

^q

< ∞ , (2.3) where E

_p

is the expectation with respect to the density p from P . To obtain the stable (uniformly with respect to the function S ) model (1.1), we assume that for some fixed 0 < < 1 and L > 0 the unknown function S belongs to the ε - stability set introduced in [3] as

Θ

_,L

= n

S ∈ C

₁

([a, b], R ) : |S|

_∗

≤ 1 − and | S| ˙

_∗

≤ L o

, (2.4)

where C

₁

[a, b] is the Banach space of continuously differentiable [a, b] → R functions and |S|

_∗

= sup

_a≤x≤b

|S(x)|.

We will use as a basic procedures the point wise procedure from [3] at the points (z

_l

)

_1≤l≤d

defined as

z

_l

= a + l

d (b − a) , (2.5)

where d is an integer value function of n, i.e. d = d

_n

, such that lim

n→∞

d

_n

√ n = 1 . (2.6)

So we propose to use the first ι

_l

observations for the auxiliary estimation of S(z

_l

). We set

S b

_l

= 1 A

_ι

l

ι_l

X

j=1

Q

_l,j

y

_j−1

y

_j

, A

_ι

l

=

ι_l

X

j=1

Q

_l,j

y

_j−1²

, (2.7)

(6)

where Q

_l,j

= Q(u

_l,j

) and the kernel Q(·) is the indicator function of the interval [−1; 1], i.e. Q(u) = 1

_[−1,1]

(u). The points (u

_l,j

) are defined as

u

_l,j

= x

_j

− z

_l

h . (2.8)

Note that to estimate S(z

_l

) on the basis of the kernel estimate with the kernel Q we use only the observations (y

_j

)

_k

1,l≤j≤k_2,l

from the h - neighbor of the point z

_l

, i.e.

k

_1,l

= [n e z

_l

− ne h] + 1 and k

_2,l

= [n z e

_l

+ ne h] ∧ n , (2.9) where e z

_l

= (z

_l

− a)/(b − a) and e h = h/(b − a). Note that, only for the last point z

_d

= b the k

_2,d

= n. We chose ι

_l

in (2.7) as

ι

_l

= k

_1,l

+ q and q = q

_n

= [(ne h)

^µ⁰

] (2.10) for some 0 < µ

₀

< 1. In the sequel for any 0 ≤ k < m ≤ n we set

A

_k,m

=

m

X

j=k+1

Q

_l,j

y

²_j−1

and A

_m

= A

_0,m

. (2.11) Next, similarly to [1], we use a some kernel sequential procedure based on the observations (y

_j

)

_ι

l≤j≤n

. To transform the kernel estimator in the linear function of observations and we replace the number of observations n by the following stopping time

τ

_l

= inf{ι

_l

+ 1 ≤ k ≤ k

_2,l

: A

_ι

l,k

≥ H

_l

} , (2.12) where inf {∅} = k

_2,l

and the positive threshold H

_l

will be chosen as a positive random variable measurable with respect to the σ - field {y

₁

, . . . , y

_ι

l

}.

Now we define the sequential estimator as

S

_l^∗

= 1 H

_l





τ_l−1

X

j=ι_l+1

Q

_l,j

y

_j−1

y

_j

+ κ

l

Q(u

_l,τ

l

) y

_τ

l−1

y

_τ

l



 1

_Γ

l

, (2.13) where Γ

_l

= {A

_ι

l,k_2,l−1

≥ H

_l

} and the correcting coefficient 0 < κ

l

≤ 1 on this set is defined as

A

_ι

l,τ_l−1

+ κ

_l²

Q(u

_l,τ

l

)y

_τ²

l−1

= H

_l

. (2.14)

(7)

Note that, to obtain the efficient kernel estimate of S(z

_l

) we need to use the all k

_2,l

− ι

_l

− 1 observations. Similarly to [15], one can show that τ

_l

≈ γ

_l

H

_l

as H → ∞, where

γ

_l

= 1 − S

²

(z

_l

) . (2.15) Therefore, one needs to chose H as (k

_2,l

− ι

_l

− 1)/γ

_l

. Taking into account that the coefficients γ

_l

are unknown we define the threshold H

_l

as

H

_l

= 1 − e

e γ

_l

(k

_2,l

− ι

_l

− 1) and e = 1

2 + ln n , (2.16) where e γ

_l

= 1 − S e

_ι²

l

and S e

_ι

l

is the projection of the estimator S b

_ι

l

in the interval ] − 1 + e , 1 − e [, i.e.

S e

_ι

l

= min(max( S b

_ι

l

, −1 + e ), 1 − e ) . (2.17) To obtain the uncorrelated stochastic terms in the kernel estimators for S(z

_l

) we chose the bandwidth h as

h = b − a

2d . (2.18)

As to the estimator S b

_ι

l

, we can show the following property.

Proposition 2.1. The convergence rate in probability of the estimator (2.17) is more rapid than any power function, i.e. for any b > 0

lim

n→∞

n

^b

max

1≤l≤d

sup

S∈Θ_,L

sup

p∈P

P

_p,S

| S e

_ι

l

− S(z

_l

)| >

₀

= 0 , (2.19) where

₀

=

₀

(n) → 0 as n → ∞ such that lim

_n→∞

n

^δ^ˇ

₀

= ∞ for any δ > ˇ 0.

Now we set

Y

_l

= S

_H,h^∗

(z

_l

)1

_Γ

and Γ = ∩

^d_l=1

Γ

_l

. (2.20) Using the convergence (2.19) we study the probability properties of the set Γ in the following proposition.

Proposition 2.2. For any b > 0 the probability of the set Γ satisfies the following asymptotic equality

lim

n→∞

n

^b

sup

S∈Θ_,L

P

_p,S

(Γ

^c

) = 0 . (2.21)

(8)

In view of this proposition we can negligible the set Γ

^c

. So, using the estimators (2.20) on the set Γ we obtain the discrete time regression model

Y

_l

= S(z

_l

) + ζ

_l

and ζ

_l

= ξ

_l^∗

+ $

_l

, (2.22) in which

ξ

_l^∗

=

P

τ_l−1

j=ι_l+1

Q

_l,j

y

_j−1

ξ

_j

+ κ

l

Q(u

_l,τ

l

) y

_τ

l−1

ξ

_τ

l

H

_l

(2.23)

and $

_l

= $

_1,l

+ $

_2,l

, where

$

_1,l

=

P

τ_l−1

j=ι_l+1

Q

_l,j

y

_j−1²

∆

_l,j

+ κ

_l²

Q(u

_l,τ

l

) y

_τ²

l−1

∆

_l,τ

l

H

_l

, ∆

_l,j

= S(x

_j

) − S(z

_l

) and

$

_2,l

=

( κ

l

− κ

_l²

) Q(u

_l,τ

l

) y

_τ²

l−1

S(x

_τ

l

)

H

_l

.

Note that in the model sec:In.1-11-1R the random variables (ξ

_j^∗

)

_1≤j≤d

are defined only on the set Γ. By the technical reasons we need the definitions for these variables on the set Γ

^c

was well. To this end for any j ≥ 1 we set

Q ˇ

_l,j

= Q

_l,j

y

_j−1

1

_{j<k

2,l}

+ p

H

_l

Q

_l,j

1

_{j=k

2,l}

(2.24) and ˇ A

_ι

l,m

= P

m

j=ι_l+1

Q ˇ

²_l,j

. Note, that for any j ≥ 1 and l 6= m

Q ˇ

_l,j

Q ˇ

_m,j

= 0 . (2.25)

and ˇ A

_ι

l,k_2,l

≥ H

_l

. So we can modify now stopping time (2.12) as ˇ

τ

_l

= inf{k ≥ ι

_l

+ 1 : ˇ A

_ι

l,k

≥ H

_l

} . (2.26) Obviously, ˇ τ

_l

≤ k

_2,l

and ˇ τ

_l

= τ

_l

on the set Γ for any 1 ≤ l ≤ d. Now similarly to (2.14) we define the correction coefficient as

A ˇ

_ι

l,ˇτ_l−1

+ ˇ κ

_l²

Q ˇ

²_l,ˇ_τ

l

= H

_l

. (2.27)

It is clear that 0 < κ ˇ

l

≤ 1 and ˇ κ

l

= κ

l

on the set Γ for 1 ≤ l ≤ d. Using this coefficient we set

η

_l

=

P

ˇτ_l−1

j=ι_l+1

Q ˇ

_l,j

ξ

_j

+ ˇ κ

l

Q ˇ

_l,_τ_ˇ

l

ξ

_τ_ˇ

l

H

_l

. (2.28)

(9)

Note that on the set Γ for any 1 ≤ l ≤ d the random variables η

_l

= ξ

_l^∗

. Moreover (see Lemma A.2 in [4]), for any 1 ≤ l ≤ d and p ∈ P

E

_p,S

(η

_l

|G

_l

) = 0 , E

_p,S

η

_l²

|G

_l

= σ

_l²

and E

_p,S

η

_l⁴

|G

_l

≤ mσ ˇ

_l⁴

, (2.29) where σ

_l

= H

_l^−1/2

, G

_l

= σ{η

₁

, . . . , η

_l−1

, σ

_l

} and ˇ m = 4(144/ √

3)

⁴

m

^∗₄

. It is clear that

σ

_0,∗

≤ min

1≤l≤d

σ

_l²

≤ max

1≤l≤d

σ

_l²

≤ σ

_1,∗

, (2.30) where

σ

_0,∗

= 1 −

²

2(1 − e )nh and σ

_1,∗

= 1

(1 − e )(2nh − q − 3) .

Now, taking into account that |$

_1,l

| ≤ Lh, for any S ∈ Θ

_,L

we obtain that sup

S∈Θ_,L

E

_p,S

1

_Γ

$

²_l

≤

L

²

h

²

+ υ ˇ

_n

(nh)

²

, (2.31)

where ˇ υ

_n

= sup

_p∈P

sup

_S∈Θ

,L

E

_p,S

max

_1≤j≤n

y

_j⁴

. The behavior of this coefficient is studied in the following Proposition.

Proposition 2.3. For any b > 0 the sequence (ˇ υ

_n

)

_n≥1

satisfies the following limiting equality

lim

n→∞

n

^−b

υ ˇ

_n

= 0 . (2.32)

Remark 2.1. It should be noted that the property (2.32) means that the asymptotic behavior of the upper bound (2.31) approximately almost as h

⁻²

when n → ∞. We will use this in the oracle inequalities below.

Remark 2.2. Note, that to estimate the function S in (1.1) we use the approach developed in [11] for the diffusion processes. To this end we use the efficient sequential kernel procedures developed in [1, 2, 3]. It should be emphasized that to obtain an efficient estimator, i.e. an estimator with the minimal asymptotic risk, one needs to take only indicator kernel as in (2.13).

Remark 2.3. Ii should be noted also that the sequential estimator (2.13)

has the same form as in [3], but except the last term, in which the correction

coefficient is replaced by the square root of the coefficient used in [14]. We

modify this procedure to calculate the variance of the stochastic term (2.23).

(10)

3 Model selection

In this section we consider the nonparametric estimation problem in the non asymptotic setting for the regression model sec:In.1-11-1R for some set Γ ⊆ Ω. The design points (z

_l

)

_1≤l≤d

are defined in (2.5). The function S(·) is unknown and has to be estimated from observations Y

₁

, . . . , Y

_d

. Moreover, we assume that the unobserved random variables (η

_l

)

_1≤l≤d

satisfy the properties (2.29) with some nonrandom constant ˇ m > 1 and the known random positive coefficients (σ

_l

)

_1≤l≤d

satisfy the inequlity (2.30) for some nonrandom positive constants σ

_0,∗

and σ

_1,∗

Concerning the random sequence $ = ($

_l

)

_1≤l≤d

we suppose that

E

_p,S

1

_Γ

k$k

²_d

< ∞ . (3.1) The performance of any estimator S b will be measured by the empirical squared error

k S b − Sk

²_d

= ( S b − S, S b − S)

_d

= b − a d

d

X

l=1

( S(z b

_l

) − S(z

_l

))

²

. (3.2) Now we fix a basis (φ

_j

)

_1≤j≤d

which is orthonormal for the empirical inner product:

(φ

_i

, φ

_j

)

_d

= b − a d

d

X

l=1

φ

_i

(z

_l

)φ

_j

(z

_l

) = 1

_{i=j}

. (3.3) For example, we can take the trigonometric basis (φ

_j

)

_j≥₁

in L

₂

[a, b], i.e.

φ

₁

= 1

√ b − a , φ

_j

(x) = r 2

b − a Tr

_j

(2π[j/2]l

₀

(x)) , j ≥ 2 , (3.4) where the function Tr

_j

(x) = cos(x) for even j and Tr

_j

(x) = sin(x) for odd j , [x] denotes integer part of x. and l

₀

(x) = (x − a)/(b − a). Note that, in this case to obtain the property (3.3) the numbers of points d must be odd.

To obtain the property (2.6) we can choose, for example, d = 2[ √

n/2] + 1, where [a] is the integer part of a ∈ R .

Note that, using the orthonormality property (3.3) we can represent for any 1 ≤ l ≤ d the function S as

S(z

_l

) =

d

X

j=1

θ

_j,d

φ

_j

(z

_l

) and θ

_j,d

= S, φ

_j

d

. (3.5)

(11)

So, to estimate the function S we have to estimate the Fourrier coefficients (θ

_j,d

)

_1≤j≤d

. To this end we reply the the function S by the observations, i.e.

θ b

_j,d

= b − a d

d

X

l=1

Y

_l

φ

_j

(z

_l

) . (3.6) From sec:In.1-11-1R we obtain immediately the following regression shceme

b θ

_j,d

= θ

_j,d

+ ζ

_j,d

with ζ

_j,d

=

r b − a

d η

_j,d

+ $

_j,d

, (3.7) where

η

_j,d

=

r b − a d

d

X

l=1

η

_l

φ

_j

(z

_l

) and $

_j,d

= b − a d

d

X

l=1

$

_l

φ

_j

(z

_l

) .

Note that the upper bound (2.30) and the Bounyakovskii-Cauchy-Schwarz inequality imply that

|$

_j,d

| ≤ k$k

_d

kφ

_j

k

_d

= k$k

_d

. Now we set

B

_n

= nk$k

²_d

. (3.8)

Note here that Proposition 3.3 from [4] implies directly that for any b > 0 lim

n→∞

1 n

^b

sup

p∈P

sup

S∈Θ_ε,L

E

_p,S

B

_n

1

_Γ

= 0 . (3.9) We estimate the function S on the sieve (2.5) by the weighted least squares estimator

S b

_λ

(z

_l

) =

d

X

j=1

λ(j) θ b

_j,d

φ

_j

(z

_l

) 1

_Γ

, 1 ≤ l ≤ d , (3.10) where the weight vector λ = (λ(1), . . . , λ(d))

⁰

belongs to some finite set Λ ⊂ [0, 1]

^d

, the prime denotes the transposition. We set for any a ≤ t ≤ b

S b

_λ

(t) =

d

X

l=1

S b

_λ

(z

_l

)1

_{z

l−1<t≤z_l}

. (3.11)

Denote by ν be the cardinal number of the set Λ and Λ

_∗

= max

λ∈Λ d

X

j=1

λ(j) .

(12)

A

₁

) For any δ > ˇ 0 lim

n→∞

φ

^∗_n

+ ν

_n

n

^δ^ˇ

= 0 and lim

n→∞

Λ

_∗

(n)

n

^1/6+ˇ^δ

= 0 , (3.12) where φ

^∗_n

= max

_1≤j≤n

max

_x

0≤x≤x₁

|φ

_j

(x)|.

In order to obtain a good estimator, we have to write a rule to choose a weight vector λ ∈ Λ in (3.10). We define the empirical squared risk as

Err

_d

(λ) = k S b

_λ

− Sk

²_d

. Using (3.5) and (3.10) we can rewire this risk as

Err

_d

(λ) =

d

X

j=1

λ

²

(j)b θ

²_j,d

− 2

d

X

j=1

λ(j)b θ

_j,d

θ

_j,d

+

d

X

j=1

θ

²_j,d

. (3.13)

Since the coefficient θ

_j,d

is unknown, we need to replace the term θ b

_j,d

θ

_j,d

by some its estimator which we choose as

θ e

_j,d

= θ b

_j,d²

− b − a

d s

_j,d

with s

_j,d

= b − a d

d

X

l=1

σ

²_l

φ

²_j

(z

_l

) . (3.14) Note that from (2.30) - (3.3) it follows that

s

_j,d

≤ σ

_1,∗

. (3.15)

Finally, we define the cost function of the form J

_d

(λ) =

d

X

j=1

λ

²

(j)b θ

_j,d²

− 2

d

X

j=1

λ(j) θ e

_j,d

+ δP

_d

(λ) , (3.16) where the penalty term is defined as

P

_d

(λ) = b − a d

d

X

j=1

λ

²

(j )s

_j,d

(3.17)

and 0 < δ < 1 is some positive constant which will be chosen later. We set

λ b = argmin

_λ∈Λ

J

_d

(λ) and S b

_∗

= S b

_b_λ

. (3.18)

To study the efficiency property we specify the weight coefficients (λ(j))

_1≤j≤n

as it is proposed, for example, in [10]. First, for some 0 < ε < 1 introduce

(13)

the two dimensional grid to adapt to the unknown parameters (regularity and size) of the Sobolev bull, i.e. we set

A = {1, . . . , k

^∗

} × {ε, . . . , mε} , (3.19) where m = [1/ε

²

]. We assume that both parameters k

^∗

≥ 1 and ε are functions of n, i.e. k

^∗

= k

^∗

(n) and ε = ε(n), such that



 

 

lim

_n→∞

k

^∗

(n) = +∞ , lim

_n→∞

k

^∗

(n) ln n = 0 , lim

_n→∞

ε(n) = 0 and lim

_n→∞

n

^ˇ^δ

ε(n) = +∞

(3.20)

for any ˇ δ > 0. One can take, for example, for n ≥ 2 ε(n) = 1

ln n and k

^∗

(n) = k

₀^∗

+ √

ln n , (3.21)

where k

₀^∗

≥ 0 is some fixed constant. For each α = (β, l) ∈ A, we introduce the weight sequence

λ

_α

= (λ

_α

(j))

_1≤j≤p

with the elements

λ

_α

(j) = 1

_{1≤j<j

∗}

+ 1 − (j/ω

_α

)

^β

1

_{j

∗≤j≤ω_α}

, (3.22) where j

_∗

= 1 + [ln n], ω

_α

= ($

_β

l n)

^1/(2β+1)

,

$

_β

= (β + 1)(2β + 1)

π

^2β

β = 2β

π

^2β

ι

_β

and ι

_β

= 2β

²

(β + 1)(2β + 1) . Now we define the set Λ as

Λ = {λ

_α

, α ∈ A} . (3.23)

Note, that these weight coefficients are used in [16, 17] for continuous time regression models to show the asymptotic efficiency. It will be noted that in this case the cardinal of the set Λ is ν = k

^∗

m. It is clear that the properties (3.20) imply the condition (3.12).

In [4] we shown the following result.

Theorem 3.1. Assume that the conditions (2.2) and (3.12) hold. Then for any n ≥ 3, any S ∈ Θ

_,L

and any 0 < δ ≤ 1/12, the procedure (3.18) with the coefficients (3.23) satisfies the following oracle inequality

R

^∗

( S b

∗

, S) ≤ (1 + 4δ)(1 + δ)

²

1 − 6δ min

λ∈Λ

R

^∗

( S b

_λ

, S) + D

^∗_n

δn , (3.24)

(14)

where the term D

^∗_n

is such that for any b > 0 lim

n→∞

D

^∗_n

n

^b

= 0 .

Remark 3.1. It sjhould be noted that the weight least square estimators (3.10) with the weight coefficients (3.23) is efficient for the Sobolev ball (see, for example, [? 19]). So, Theorem 3.1 means that this model selection procedure is best among all the effective procedures in the sharp oracle inequality sense (3.24) and below we will use this property to show the efficiency property in adaptive setting, i.e. in the case, when the regularity property of the function S (1.1) is unknown.

4 Main results

For any fixed r > 0 and k ≥ 1 we define the Sobolev ellipse as W

_k,r

= {f ∈ Θ

_,L

:

k

X

j=0

a

_j

θ

_j²

≤ r} , (4.1)

where a

_j

= P

k

l=0

(2π[j/2])

^2l

, (θ

_j

)

_j≥1

are the trigonometric Fourier coefficients, i.e.

θ

_j

= Z

b

a

f (x)φ

_j

(x)dx

and (φ

_j

)

_j≥1

is the trigonometric basis defined in (3.4). It is clear we can represent this functional class as

W

_k,r

= {f ∈ Θ

_,L

:

k

X

j=0

kf

^(j)

k

²

≤ r} , (4.2) In order to formulate the asymptotic results we define the following nor- malizing coefficients. First, for any r > 0 we set

l

_∗

(r) = ((1 + 2k)r)

^1/(2k+1)

k π(k + 1)

2k/(2k+1)

(4.3) and

ς (S) = Z

b

a

(1 − S

²

(u))du . (4.4)

(15)

It is well known that for any S ∈ W

_k,r

the optimal rate of convergence is n

^−2k/(2k+1)

. First we study the lower bound for the asymptotic risks in the class of all estimators E

_n

, i.e. any measurable function with respect to the observations σ{y

₁

, . . . , y

_n

}.

Theorem 4.1. For the model (1.1) with the noise distribution from the class P defined in (2.1)

lim inf

n→∞

inf

Sb_n∈E_n

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S)R

^∗

( S b

_n

, S) ≥ l

_∗

(r) , (4.5)

where υ(S) = (ς (S))

^−2k/(2k+1)

.

Now we stady the asymptotic upper bound for the quadratic risk of the estimator S b

_∗

. To this end we assume the following condition for the penalty coefficient δ in the objective function (3.16).

A

₂

) Assume that the parameter δ is a function of n, i.e. δ = δ

_n

such that for any b > 0

lim

n→∞

δ

_n

n

^b

= 0 . (4.6)

Theorem 4.2. Assume that the conditions A

₁

) – A

₂

) hold. The model selection procedure S b

_∗

defined in (3.18) with the penalty coefficient given in A

₂

) admits the following asymptotic upper bound

lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S) R( S b

_∗

, S) ≤ l

_∗

(r) . (4.7) Corollary 4.3. Assume that the conditions A

₁

) – A

₂

) hold. The model selection procedure S b

_∗

defined in (3.18) with the penalty coefficient given in A

₂

) is efficient, i.e.

lim

n→∞

inf

Sb_n∈E_n

sup

_S∈W

k,r

υ(S) R

^∗

( S b

_n

, S) sup

_S∈W

k,r

υ(S)R( S b

_∗

, S) = 1 . (4.8) Moreover,

lim

n→∞

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S) R( S b

_∗

, S) = l

_∗

(r) . (4.9)

Remark 4.1. Note that the limit equalties (4.8) and (4.9) imply that the

function l

_∗

(r)/υ(S) is the minimal value of the normalised asymptotic quadratic

(16)

robust risk, i.e. Pinsker constant in this case. We remind that the coefficient l

_∗

(r) is the well known Pinsker constant for the ”signal+standard white noise” model obtained in [19]. Therefore, the Pinsker constant for the model (1.1) is represented by the Pinsker constant for the ”signal+white noise”

model in which the noise intensity is given by the function (4.4).

5 The van Trees inequality

In this section we consider the following continuous time parametric model (1.1) with the (0, 1) gaussian i.i.d. random variable (ξ

_j

)

_1≤j≤n

and the parametric linear function S, i.e.

S

_θ

(x) =

d

X

l=1

θ

_l

ψ

_l

(x) , θ = (θ

₁

, . . . , Ξ

_d

)

⁰

∈ R

ⁿ

. (5.1) The functions (ψ

_i

)

_1≤i≤d

are orthogonal with respect to the scalar product (3.3).

Let now P

ⁿ_θ

be the distribution in R

ⁿ

of the observations y = (y

₁

, . . . , y

_n

) in the model (1.1) with the function (5.1) and ν

_ξⁿ

be the distribution in R

ⁿ

of the gaussian vector (ξ

₁

, . . . , ξ

_n

). In this case the Radon - Nykodim density is given as

f

_n

(y, θ) = dP

⁽ⁿ⁾_θ

dν

_ξⁿ

= exp







n

X

j=1

S

_θ

(x

_j

)y

_j−1

y

_j

− 1 2

n

X

j=1

S

_θ²

(x

_j

)y

_j−1²







. (5.2) Let u be a prior distribution density on R

^d

for the parameter θ of the following form:

u(θ) =

d

Y

j=1

u

_j

(θ

_j

) ,

where u

_j

is some continuously differentiable probability density in R with the support ] − L

_j

, L

_j

[, i.e. u

_j

(z) > 0 for any −L

_j

< z < L

_j

and u

_j

(z) = 0 for all |z| ≥ L

_j

, such that the Fisher information is finite, i.e.

I

_j

= Z

L_j

−L_j

˙ u

²_l

(z)

u

_j

(z) dz < ∞ . (5.3)

Now, we set

Ξ =] − L

₁

, L

₁

[ × . . . × ] − L

_d

, L

_d

[ ⊆ R

^d

. (5.4)

(17)

Let g(θ) be a continuously differentiable Ξ → R function such that, for each 1 ≤ j ≤ d,

lim

|θ_j|→L_j

g(θ) u

_j

(θ

_j

) = 0 and Z

R^d

|g

_j⁰

(θ)| u(θ) dθ < ∞ , (5.5) where g

_j⁰

(θ) = ∂g(θ)/∂θ

_j

.

For any B(X )×B( R

^d

)-measurable integrable function H = H(y, θ) we denote E e H =

Z

Ξ

Z

Rⁿ

H(y, θ) dP

ⁿ_θ

u(θ)dθ

= Z

Ξ

Z

Rⁿ

H(y, θ) f

_n

(y, θ) u(θ)dν

_ξ⁽ⁿ⁾

dθ .

Now we obtain an lower bound for the corresponding bayesian risks in the case when the model (1.1) is gaussian with the function (5.1).

Lemma 5.1. For any F

_n^y

-measurable square integrable function b g

_n

and for any 1 ≤ j ≤ d, the following inequality holds

E( e b g

_n

− g(θ))

²

≥ g

²_j

E e Ψ

_n,j

+ I

_j

, (5.6) where Ψ

_n,j

= P

n

j=1

ψ

_j²

(x

_l

) y

²_l−1

and g

_j

= R

Ξ

g

_j⁰

(θ) u(θ) dθ.

Proof. First, for any θ ∈ Ξ we set U e

_j

= U e

_j

(y, θ) = 1

f (y, θ)u(θ)

∂ (f(y, θ)u(θ))

∂θ

_j

.

Taking into account the condition (5.5) and integrating by parts we get E e

( b g

_n

− g (θ)) U e

_j

= Z

Rⁿ×Ξ

( b g

_n

(y) − g(θ)) ∂

∂θ

_j

(f(y, θ)u(θ)) dθ dν

_ξ⁽ⁿ⁾

= Z

Rⁿ×Ξˇ_j

Z

L_j

−L_j

g

⁰_j

(θ) f(y, θ)u(θ)dθ

_j

! 

 Y

i6=j

dθ

_i



 dν

_ξ⁽ⁿ⁾

= g

_j

, where

Ξ ˇ

_j

= Y

i6=j

] − L

_i

, L

_i

[ .

(18)

Now by the Bouniakovskii-Cauchy-Schwarz inequality we obtain the following lower bound for the quadratic risk

E( e b g

_n

− g(θ))

²

≥ g

²_j

E e U e

_j²

.

To study the denominator in the left hand of this inequality note that in view of the representation (5.2)

1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

=

n

X

l=1

ψ

_j

(x

_l

) y

_l−1

(y

_l

− S

_θ

(x

_l

)y

_l−1

) .

Therefore, for each θ ∈ Ξ,

E

⁽ⁿ⁾_θ

1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

= 0 and

E

⁽ⁿ⁾_θ

1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

2

= E

⁽ⁿ⁾_θ

n

X

l=1

ψ

²_j

(x

_l

) y

_l−1²

= E

⁽ⁿ⁾_θ

Ψ

_n,l

. Using the equality

U e

_j

= 1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

+ 1 u(θ)

∂ u(θ)

∂θ

_j

,

we get

E e U e

_j²

= E e Ψ

_n,j

+ I

_j

,

where the Fisher information I

_j

is defined in (5.3). Hence Lemma 5.1.

Remark 5.1. It should be noted that in the definition of the prior distribution the bound L

_j

may be equal to infinity either for some 1 ≤ j ≤ d or for all 1 ≤ j ≤ d.

6 Low bound

First, note that

R

^∗

( S b

_n

, S) ≥ R

_p

0

( S b

_n

, S) , (6.1)

(19)

where p

₀

is the (0, 1) gaussian density. Now for any fixed 0 < ε < 1 we set d = d

_n

=

k + 1

k (ςn)

^1/(2k+1)

l

_∗

(r

_ε

)

(6.2) where ς = 1/(b − a), r

_ε

= (1 − ε)r and

l

_∗

(r

_ε

) = ((1 + 2k)r

_ε

)

^1/(2k+1)

k π(k + 1)

2k/(2k+1)

= (1 − ε)

^1/(2k+1)

l

_∗

(r) . For any vector θ = (θ

_j

)

_1≤j≤d

∈ R

^d

, we set

S

_θ

(x) =

d_n

X

j=1

θ

_j

φ

_j

(x) , (6.3)

where (φ

_j

)

_1≤j≤d

n

is the trigonometric basis defined in (3.4). As ir is shown in [13] there exist continuously differentiable density ˇ p

_L

with the support on [−L, L]] such that R

L

−L

xˇ p

_L

(x)dx = 0, R

L

−L

x

²

p ˇ

_L

(x)dx = 1 and I ˇ

_L

=

Z

L

−L

(ˇ p

⁰

(x))

²

ˇ

p(x) p(x)dx ˇ = 1 + ˇ

_L

,

where ˇ

_L

→ 0 as L → ∞. To define the bayesian risk we choose a prior distribution on R

^d

as

θ = (θ

_j

)

_1≤j≤d

n

and κ

_j

= s

_j

η ˇ

_j

, (6.4) where ˇ η

_j

are i.i.d. random variables with the density ˇ p

_L

,

s

_j

= s s

^∗_j

ςn and s

^∗_j

= d

_n

j

k

− 1 .

Furthermore, for any function f , we denote by h(f) its projection in L

₂

[0, 1]

onto W

_k,r

, i.e.

h(f) = Pr

_W

k,r

(f) .

Since W

_k,r

is a convex set, we obtain, that for any function S ∈ W

_k,r

k S b − Sk

²

≥ kb h − Sk

²

with h b = h( S) b .

From the definition of the prio distribution (6.4) we obtain that a.s.

max

a≤x≤b

|S

_θ

(x)| + | S ˙

_θ

(x)|

≤

r 2 b − a

b − a + 1

b − a

_n

:=

^∗_n

, (6.5)

(20)

where

_n

= L

√ n

d_n

X

j=1

d

_n

j

k/2

j → 0 as n → ∞

for any k ≥ 2. Therefore, for sufficiently large n the function (6.3) belongs to the class (2.4) and the last property yileds

sup

S∈W_k,r

υ(S) R

_p

0

( S, S) b ≥ Z

{z∈R^d:S_z∈W_k,r}

υ(S

_z

)E

_p

0,S_z

kb h − S

_z

k

²

µ

_κ

(dz)

≥ υ

_∗

Z

{z∈R^d:S_z∈W_k,r}

E

_p

0,S_z

kb h − S

_z

k

²

µ

_κ

(dz) , where

υ

_∗

= inf

|S|_∗≤^∗

n

υ(S) → ς

^2k/(2k+1)

as n → ∞ . Using the distribution µ

_κ

we introduce the following Bayes risk

R e

₀

( S) = b Z

R^d

R

_p

0

( S, S b

_z

) µ

_κ

(dz) . Taking into account now that kb hk

²

≤ r we obtain

sup

S∈W_k,r

υ(S) R

_p

0

( S, S) b ≥ υ

_∗

R e

₀

(b h) − 2 υ

_∗

R

_0,n

(6.6) with

R

_0,n

= Z

{z∈R^d:S_z∈W/ _k,r}

(r + kS

_z

k

²

) µ

_κ

(dz) .

In Lemma A.2 we studied the last term in this inequality. Now it is easy to see that

kb h − S

z

k

²

≥

d_n

X

j=1

( b z

_j

− z

_j

)

²

, where z b

_j

= R

1

0

h(t) b φ

_j

(t)dt. So, in view of Lemma 5.1, we obtain R e

₀

(b h) ≥ 1

ςn

d_n

X

j=1

1 1 + ˇ I

_L

(s

^∗_j

)

⁻¹

≥ 1 ςn max(1, I ˇ

_L

)

d_n

X

j=1

1 − j

^k

d

^k_n

.

Therefore, using now the definition (6.2), Lemma A.2 and the inequality (6.1) we obtain that

lim inf

n→∞

inf

S∈Πb _n

n

^2k+1^2k

sup

S∈W_k,r

υ(S) R

^∗

( S b

_n

, S) ≥ (1 − ε)

^2k+1¹

1 max(1, I ˇ

_L

) l

_∗

.

Taking here limit as ε → 0 and L → ∞ we come to the Theorem 4.1.

(21)

7 Upper bound

7.1 Known regularity

We start with the estimation problem for the functions S from W

_k,r

with known parameters k, r and ς(S) defined in (4.4). In this case we use the estimator from family (3.23)

S e = S b

αe

, (7.1)

where α e = (k, e t

_n

), e l

_n

= [r(S)/ε] ε and r(S) = r/ς(S). We remind, that ε = 1/ ln n. Note that for sufficiently large n, the parameter α e belongs to the set (3.19). In this section we obtain the upper bound for the empiric risk (3.2).

Theorem 7.1. The estimator S e constructed on the trigonometric basis satisfies the following asymptotic upper bound

lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S) E

_p,S

k S e − Sk

²_d

1

_Γ

≤ l

_∗

(r) . (7.2) Proof. We denote e λ = λ

αe

and ω e = ω

αe

. Now we recall that the Fourier coefficients on the set Γ

b θ

_j,d

= θ

_j,d

+ ζ

_j,d

with ζ

_j,d

=

r b − a

d η

_j,d

+ $

_j,d

, Therefore, on the set Γ we can represent the empiric squared error as

k S e − Sk

²_d

=

d

X

j=1

(1 − e λ(j ))

²

θ

_j,d²

− 2M

_n

− 2

d

X

j=1

(1 − e λ(j )) e λ(j )θ

_j,d

$

_j,d

+

d

X

j=1

λ e

²

(j) ζ

_j,d²

, where

M

_n

=

r b − a d

d

X

j=1

(1 − e λ(j)) e λ(j)θ

_j,d

η

_j,d

. Now for any 0 < ε < ˇ 1

2

d

X

j=1

(1 − e λ(j))e λ(j)θ

_j,d

$

_j,d

≤ ε ˇ

d

X

j=1

(1 − e λ(j ))

²

θ

²_j,d

+ ˇ ε

⁻¹

d

X

j=1

$

²_j,d

.

Adaptive efficient robust estimation for nonparametric autoregressive models

HAL Id: hal-02366202

https://hal.archives-ouvertes.fr/hal-02366202

Preprint submitted on 15 Nov 2019

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive eﬀicient robust estimation for nonparametric autoregressive models

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov

To cite this version:

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov. Adaptive eﬀicient robust estimation for

nonparametric autoregressive models. 2019. �hal-02366202�