Efficient robust nonparametric estimation in a semimartingale regression model

(1)

HAL Id: hal-00526915

https://hal.archives-ouvertes.fr/hal-00526915

Preprint submitted on 16 Oct 2010

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de

Eﬀicient robust nonparametric estimation in a semimartingale regression model

Victor Konev, Serguei Pergamenchtchikov

To cite this version:

Victor Konev, Serguei Pergamenchtchikov. Eﬀicient robust nonparametric estimation in a semimartin-

gale regression model. 2010. �hal-00526915�

(2)

Efficient robust nonparametric estimation in a semimartingale regression model ^∗

Konev Victor

^†

and Pergamenshchikov Serguei

^‡

October 16, 2010

Abstract

The paper considers the problem of robust estimating a periodic function in a continuous time regression model with dependent disturbances given by a general square integrable semimartingale with unknown distribution. An example of such a noise is non-gaussian Ornstein-Uhlenbeck process with the L´evy process subordinator, which is used to model the financial Black-Scholes type markets with jumps.

An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown.

Keywords: Non-asymptotic estimation; Robust risk; Model selection;

Sharp oracle inequality; Asymptotic efficiency.

AMS 2000 Subject Classifications: 62G08, 62G05

∗The paper is supported by the RFBR-Grant 09-01-00172-a.

†Department of Applied Mathematics and Cybernetics, Tomsk State University, Lenin str. 36, 634050 Tomsk, Russia, e-mail: [email protected]

‡Laboratoire de Mathématiques Raphael Salem, Avenue de l’Université, BP. 12, Uni- versité de Rouen, F76801, Saint Etienne du Rouvray, Cedex France and Department of Mathematics and Mechanics,Tomsk State University, Lenin str. 36, 634041 Tomsk, Russia, e-mail: [email protected]

(3)

1 Introduction

Consider a regression model in continuous time

dy

_t

= S(t)dt + dξ

_t

, 0 ≤ t ≤ n , (1.1) where S is an unknown 1-periodic

R

→

R

function, S ∈ L

2

[0, 1];

(ξ

_t

)

_t≥0

is an unobservable semimartingale noise with the values in the Skorokhod space D [0, n] such that, for any function f from L

2

[0, n], the stochastic integral

I

_n

(f ) =

Z n

0

f

_s

dξ

_s

(1.2)

is well defined and has the following properties E

_Q

I

_n

(f ) = 0 and E

_Q

I

_n²

(f ) ≤ σ

_Q

Z n 0

f

_s²

ds . (1.3) Here E

_Q

denotes the expectation with respect to the distribution Q in D [0, n] of the process (ξ

_t

)

_0≤t≤n

, which is assumed to belong to some probability family Q

n

specified below; σ

_Q

> 0 is some positive constant depending on the distribution Q.

The problem is to estimate the unknown function S in the model (1.1) on the basis of observations (y

_t

)

_0≤t≤n

.

The class of the disturbances ξ satisfying conditions (1.3) is rather wide and comprises, in particular, the L´evy processes which are used in different important problems (see [4], for details). The models (1.1) with the L´evy’s type noise naturally arise (see [18]) in the nonparametric functional statistics problems (see, for example, [8]). Moreover, as is shown in Section 2, Non-Gaussian Ornstein-Uhlenbeck-based models also enter this class. The latter models are successfully used to model the Black-Scholes type financial markets with jumps (see [2], [6] for details and other references).

We define the error of an estimate S

b

(any real-valued function measurable with respect to σ { y

_t

, 0 ≤ t ≤ n } ) for S by its integral quadratic risk

R

Q

( S, S) :=

b

E

_Q,S

k S

b

− S k

²

, (1.4) where E

_Q,S

stands for the expectation with respect to the distribution P

_Q,S

of the process (1.1) with a fixed distribution Q of the noise (ξ

_t

)

_0≤t≤n

and a given function S; k · k is the norm in L

₂

[0, 1], i.e.

k f k

²

:=

Z 1 0

f

²

(t)dt . (1.5)

(4)

Since in our case the noise distribution Q is unknown, it seems natural to measure the quality of an estimate S

b

by the robust risk defined as

R

^∗_n

( S, S) = sup

b

Q∈Q_n

R

Q

( S, S

b

) (1.6) which assumes taking supremum of the error (1.4) over the whole family of admissible distributions Q

n

.

It is natural to treat the stated problem with respect to the quadratic risk from the standpoint of the model selection approach. It will be noted that the origin of this method goes back to early seventies with the pioneering papers by Akaike [1] and Mallows [21] who proposed to introduce penalizing in a log-likelihood type criterion. The further progress has been made by Barron, Birgé and Massart [3], [22], who developed a non-asymptotic model selection method which enables one to derive non-asymptotic oracle inequalities for nonparametric regression models with the i.i.d. gaussian disturbances. An oracle inequality yields the upper bound for the estimate risk via the minimal risk corresponding to a chosen family of estimates. Galtchouk and Pergamenshchikov [9] applied the Barron-Birgé-Massart technic to the problem of estimating a nonparametric drift function in ergodic diffusion processes. Fourdrinier and Pergamenshchikov [7] extended the Barron-Birgé-Massart method to the models with the spherically symmetric dependent observations. They proposed a model selection procedure based on the improved least squares estimates. Lately, the authors [17] applied this method to the nonparametric problem of estimating a periodic function in a model with a gaussian colored noise in continuous time. In all cited papers, the non-asymptotic oracle inequalities have been derived, which enable one to establish the optimal convergence rate for the minimax risks. In addition to the optimal convergence rate, the other important problem is that of the efficiency of adaptive estimation procedures. In order to examine the efficiency property of a procedure one has to obtain the

sharp oracle inequalities,

i.e. such in which the factor at the principal term in the right-hand of the inequality is close to unity.

The first result on sharp inequalities is most likely due to Kneip

[15] who studied a gaussian regression model. It will be observed

that the derivation of oracle inequalities usually rests upon the fact

that the initial model, by applying the Fourier transformation, is re-

duced to the gaussian model with independent observations. How-

ever, such a transform is possible only for gaussian models with in-

dependent homogeneous observations or for the inhomogeneous ones

(5)

with the known correlation characteristics. This restriction signifi- cantly narrows the area of application of the proposed model selection procedures and rules out a broad class of models including, in particular, widely used in econometrics heteroscedastic regression models (see, for example, [14]). For constructing adaptive procedures in the case of inhomogeneous observations one needs to modify the approach to the estimation problem. Galtchouk and Pergamenshchikov [11]-[12] have developed a new estimation method intended for the heteroscedastic regression models in discrete time. The heart of this method is to combine the Barron-Birg´e-Massart non-asymptotic pe- nalization method [3] and the Pinsker weighted least square method minimizing the asymptotic risk (see, for example, [23], [24]). This yields a significant improvement in the performance of the procedure (see numerical example in [11]).

The goal of this paper is to develop the robust efficient model selection method for the model (1.1) with dependent disturbances having unknown distribution. We follow the approach, proposed by Galtchouk and Pergamenshchikov in [11], in the construction of the procedure. Unfortunately, their method of obtaining the oracle inequalities is essentially based on the independence of observations and can not be applied here. The paper proposes the new analytical tools which allow one to obtain the sharp non-asymptotic oracle inequalities for robust risks under general conditions on the distribution of the noise in the model (1.1). This method enables us to treat both the cases of dependent and independent observations from the same standpoint, does not assume the knowledge of the noise distribution and leads to the efficient estimation procedure with respect to the risk (1.6). The validity of the conditions imposed on the noise in the equation (1.1) is verified for a non-gaussian Ornstein-Uhlenbeck process (see Section 2).

The rest of the paper is organized as follows. In Section 3 we con-

struct the model selection procedure on the basis of weighted least

squares estimates and state the main results in the form of oracle in-

equalities for the quadratic risk (1.4) and the robust risk (1.6). Here we

specify also the set of admissible weight sequences in the model selec-

tion procedure. In Section 4 we proof some properties of the stochastic

integrals with respect to the non-gaussian Ornstein-Uhlenbeck process

(2.1). Section 5 gives the proofs of the main results. In Sections 6, 7 it

is shown that the proposed model selection procedure for estimating

S in (1.1) is asymptotically efficient with respect to the robust risk

(6)

(1.6). In Appendix some auxiliary propositions are given.

2 Non-Gaussian Ornstein-Uhlenbeck process

In this section we consider an important example of the disturbances (ξ

_t

)

_t≥0

in (1.1) given by a non-gaussian Ornstein-Uhlenbeck process with the L´evy subordinator. Such processes are used in the financial Black-Scholes type markets with jumps (see, for example [6] and the references therein). Let the noise process in (1.1) obey the equation

dξ

_t

= aξ

_t

dt + du

_t

, ξ

₀

= 0 , (2.1) where a ≤ 0, u

_t

= ̺

₁

w

_t

+ ̺

₂

z

_t

, ̺

₁

and ̺

₂

are unknown constants, (w

_t

)

_t≥0

is a standard Brownian motion, (z

_t

)

_t≥0

is a compound Poisson process defined as

z

_t

=

N_t

X

j=1

Y

_j

.

Here (N

_t

)

_t≥0

is a standard homogeneous Poisson process with unknown intensity λ > 0 and (Y

_j

)

_j≥1

is an i.i.d. sequence of random variables with

EY

_j

= 0 , EY

_j²

= 1 and EY

_j⁴

< ∞ . (2.2) Let (T )

_k≥1

denote the arrival times of the process (N

_t

)

_t≥0

, that is,

T

_k

= inf { t ≥ 0 : N

_t

= k } . (2.3) We assume that the parameters λ, a, ̺

₁

and ̺

₂

satisfy the conditions

− a

_max

≤ a ≤ 0 , 0 ≤ λ ≤ λ

_max

, ̺

^∗_min

≤ ̺

^∗

≤ ̺

^∗_max

, (2.4)

where ̺

^∗

= ̺

²₁

+ λ̺

²₂

. Let Q

n

denote the family of all distributions of

process (2.1) on D [0, n] with the parameters a, λ, ̺

₁

and ̺

₂

satisfying

the conditions (2.4) with fixed bounds λ

_max

> 0, a

_max

> 0, ̺

^∗_min

> 0

and ̺

^∗_max

> 0.

(7)

3 Model selection

This Section gives the construction of a model selection procedure for estimating a function S in (1.1) on the basis of weighted least square estimates and states the main results.

For estimating the unknown function S in the model (1.1), we ap- ply its Fourier expansion in the trigonometric basis (φ

_j

)

_j≥1

in L

2

[0, 1]

defined as

φ

₁

= 1 , φ

_j

(x) = √

2 T r

_j

(2π[j/2]x) , j ≥ 2 , (3.1) where the function T r

_j

(x) = cos(x) for even j and T r

_j

(x) = sin(x) for odd j; [x] denotes the integer part of x. The corresponding Fourier coefficients

θ

_j

= (S, φ

j

) =

Z 1

0

S(t) φ

_j

(t) dt (3.2) can be estimated as

θ

b_j,n

= 1 n

Z n 0

φ

_j

(t) dy

_t

. (3.3)

In view of (1.1), we obtain

b

θ

_j,n

= θ

_j

+ 1

√ n ξ

_j,n

, ξ

_j,n

= 1

√ n I

_n

(φ

_j

) (3.4) where I

_n

(φ

_j

) is given in (1.2).

For any sequence x = (x

_j

)

_j≥1

, we set

| x |

²

=

X∞ j=1

x

²_j

and #(x) =

X∞ j=1

1

_{|x

j|>0}

. (3.5) Now we impose some additional conditions on the distribution of the noise (ξ

_t

)

_t≥0

in (1.1).

C

₁

) There exists a positive constant ς

_Q

> 0 such that for any n ≥ 1

L

_1,n

(Q) = sup

x∈H,#(x)≤n

X∞ j=1

x

_j

E

_Q

ξ

_j,n²

− ς

_Q

< ∞ ,

(8)

where H = [ − 1, 1]

^∞

.

C

₂

) Assume that for all n ≥ 1

L

_2,n

(Q) = sup

|x|≤1,#(x)≤n

E

_Q



 X∞ j=1

x

_j

(ξ

_j,n²

− E

_Q

ξ

_j,n²

)





2

< ∞ .

As is shown in the proof of Theorem 3.2 in Section 5 , both Con- ditions C

₁

) and C

₂

) hold for the process (2.1). Further we define a class of weighted least squares estimates for S(t) as

S

b_γ

=

X∞ j=1

γ(j)b θ

_j,n

φ

_j

, (3.6)

where γ = (γ(j))

_j≥1

is a sequence of weight coefficients such that 0 ≤ γ(j) ≤ 1 and 0 < #(γ) ≤ n . (3.7) Let Γ denote a finite set of weight sequences γ = (γ(j))

_j≥1

with these properties, ν = card(Γ) be its cardinal number and

µ = max

γ∈Γ

#(γ) . (3.8)

The model selection procedure for the unknown function S in (1.1) will be constructed on the basis of a family of estimates ( S

b_γ

)

_γ∈Γ

. The choice of a specific set of weight sequences Γ is discussed at the end of this section. In order to find a proper weight sequence γ in the set Γ one needs to specify a cost function. When choosing an appropriate cost function one can use the following argument. The empirical squared error

Err

_n

(γ) = k S

b_γ

− S k

²

can be written as

Err

_n

(γ) =

X∞ j=1

γ

²

(j)b θ

_j,n²

− 2

X∞ j=1

γ(j)b θ

_j,n

θ

_j

+

X∞ j=1

θ

_j²

. (3.9)

Since the Fourier coefficients (θ

_j

)

_j≥1

are unknown, the weight coef-

ficients (γ

_j

)

_j≥1

can not be determined by minimizing this quantity.

(9)

To circumvent this difficulty one needs to replace the terms θ

b_j,n

θ

_j

by some their estimators

e

θ

_j,n

. We set

θ

e_j,n

= θ

b²_j,n

− σ

b_n

n (3.10)

where

b

σ

_n

is some estimator for the quantity ς

_Q

in the condition C

₁

).

For this change in the empirical squared error, one has to pay some penalty. Thus, one comes to the cost function of the form

J

_n

(γ) =

X∞ j=1

γ

²

(j) θ

b_j,n²

− 2

X∞ j=1

γ(j) θ

e_j,n

+ ρ P

b_n

(γ) (3.11) where ρ is some positive constant, P

b

(γ) is the penalty term defined as

P

b_n

(γ) = σ

b_n

| γ |

²

n . (3.12)

In the case, when the value of σ in C

₁

) is known, one can take

b

σ

_n

= ς

_Q

and

P

_n

(γ) = ς

_Q

| γ |

²

n . (3.13)

Substituting the weight coefficients, minimizing the cost function

b

γ = argmin

_γ∈Γ

J

_n

(γ ) , (3.14) in (3.6) leads to the model selection procedure

S

b_∗

= S

b_b_γ

. (3.15)

It will be noted that γ

b

exists, since Γ is a finite set. If the minimizing sequence in (3.14) γ

b

is not unique, one can take any minimizer.

Theorem 3.1. Let Q

n

be a family of the distributions Q on D [0, n]

such that the conditions C

₁

) and C

₂

) hold for each Q from Q

n

. Then for any n ≥ 1 and 0 < ρ < 1/3, the estimator (3.15), for each Q ∈ Q

n

, satisfies the oracle inequality

R

Q

( S

b_∗

, S) ≤ 1 + 3ρ − 2ρ

²

1 − 3ρ min

γ∈Γ

R

Q

( S

b_γ

, S) + 1

n B

Q

(n, ρ) (3.16) where the risk R

Q

( · , S) is defined in (1.4),

B

Q

(n, ρ) = Ψ

_Q

(n, ρ) + 6µ E

_Q,S

|b σ

_n

− ς

_Q

|

1 − 3ρ

(10)

and

Ψ

_Q

(n, ρ) = 2ς

_Q

σ

_Q

ν + 4ς

_Q

L

_1,n

(Q) + 2νL

_2,n

(Q)

ς

_Q

ρ(1 − 3ρ) . (3.17)

This theorem is proved in [18].

Now we will obtain the oracle inequality for the model (1.1), (2.1).

To write down the oracle inequality in this case, one needs the following parameters

λ

₁

= λ̺

²₁

+ (λ̺

₂

)

²

and λ

₂

= ̺

²₁

̺

^∗

+ λ̺

²₂

. (3.18) Moreover, we set

M

^∗

= 4̺

²₁

+ ̺

²₂

D

^∗₁

+ 80λ

₂

+ 12D

^∗₂

+ 21̺

₃

(3.19) where D

^∗₁

= 4λ̺

²₁

+ 7λ

²

̺

²₂

, D

^∗₂

= 4̺

²₁

̺

^∗

+ ̺

²₂

D

^∗₁

+ 23λ

₂

and ̺

₃

= λ̺

⁴₂

EY

₁⁴

.

Theorem 3.2. Let Q

n

be a family of the distributions of the process (2.1) with the parameters meeting the conditions (2.4). Then, for any n ≥ 1 and 0 < ρ < 1/3, the estimator (3.15) satisfies, for each Q ∈ Q

n

, the oracle inequality (3.16) with

Ψ

_Q

(n, ρ) = 6̺

^∗_max

ν + 4̺

^∗_max

L

^∗₁

+ 56νM

^∗

̺

^∗_min

ρ(1 − 3ρ) , (3.20) where

L

^∗₁

= 2(1 + a

_max

(a

_max

+ 1))̺

^∗_max

. Proof of this theorem is given in Section 5.

Remark 3.1. Note that the term (3.20) does not depend on the parameters specifying the distribution family Q

n

. This means that the oracle inequality (3.16) is uniform over stability region for the process (2.1) including the stability bound, i.e. the case a = 0.

To obtain the oracle inequality for the robust risks one has to impose additional conditions on the distribution family Q

n

in (1.6).

To this end, we introduce the family of distributions Q on D [0, n] with the growth restriction on L

_1,n

(Q) + L

_2,n

(Q), i.e.

P

_n^∗

=

Q ∈ P

n

: L

_1,n

(Q) + L

_2,n

(Q) ≤ l

_n

, (3.21)

(11)

where P

n

denotes the family of all probability measures on D [0, n], l

_n

is a slowly increasing positive function, i.e. l

_n

→ + ∞ as n → + ∞ and for any δ > 0

n→∞

lim l

_n

n

^δ

= 0 . In the sequel we use the following condition

H

₀

) Assume the distribution family Q

n

is a subset of the class (3.21), i.e. Q

n

⊆ P

_n^∗

, such that

0 < ς

_∗

:= inf

_Q∈Q

n

ς

_Q

≤ sup

_Q∈Q

n

ς

_Q

:= ς

^∗

< ∞ ; σ

^∗

:= inf

_Q∈Q

n

σ

_Q

< ∞ .

(3.22)

Theorem 3.3. Assume that the family Q

n

in the robust risk (1.6) satisfies the condition H

₀

). Then for any n ≥ 1 and 0 < ρ < 1/3, the estimator (3.15) satisfies the oracle inequality

R

^∗

( S

b_∗

, S) ≤ 1 + 3ρ − 2ρ

²

1 − 3ρ min

γ∈Γ

R

^∗

( S

b_γ

, S ) + 1

n B

^∗

(n, ρ) (3.23) where

B

^∗

(n, ρ) = Ψ

^∗

(n, ρ) + 6µ 1 − 3ρ sup

Q∈Qn

E

_Q,S

|b σ

_n

− ς

_Q

| and

Ψ

^∗

(n, ρ) = 2ς

^∗

σ

^∗

ν + 4ς

^∗

l

_n

+ 2νl

_n

ς

_∗

ρ(1 − 3ρ) .

3.1 Estimation of ς

_Q

Now we consider the case of unknown quantity σ in the condition C

₁

).

One can estimate σ as

b

σ

_n

=

Xn j=l

θ

b²_j,n

with l = [ √

n] + 1 . (3.24)

(12)

Proposition 3.4. Assume that the family distribution Q

n

satisfies the condition H

₀

) and the unknown function S( · ) in the model (1.1) is continuously differentiable for 0 ≤ t < 1 such that

| S ˙ |

1

=

Z 1

0

| S(t) ˙ | dt < + ∞ . (3.25) Then, for any n ≥ 1,

sup

Q∈Q_n

E

_Q,S

|b σ

_n

− ς

_Q

| ≤ κ

^∗_n

(S)

√ n (3.26)

where

κ

^∗_n

(S) = 4 | S ˙ |

²₁

+ ς

^∗

+

p

l

_n

+ 4 | S ˙ |

1

√ σ

^∗

n

^1/4

+ l

_n

n

^1/2

. Proof. Substituting (3.4) in (3.24) yields

b

σ

_n

=

Xn

j=l

θ

_j²

+ 2

√ n

Xn

j=l

θ

_j

ξ

_j,n

+ 1 n

Xn j=l

ξ

_j,n²

. (3.27) Further, denoting

x

^′_j

= 1

_{l≤j≤n}

and x

^′′_j

= 1

√ n 1

_{l≤j≤n}

, we represent the last term in (3.27) as

1 n

Xn j=l

ξ

_j,n²

= 1

n B

_1,n

(x

^′

) + 1

√ n B

_2,n

(x

^′′

) + n − l + 1 n ς

_Q

, where

B

_1,n

(x) =

X∞ j=1

x

_j

E

_Q,S

ξ

_j,n²

− ς

_Q

and B

_2,n

(x) =

X∞ j=1

x

_j

(ξ

_j,n²

− E

_Q,S

ξ

_j,n²

) .

Combining these equations leads to the inequality E

_Q,S

|b σ

_n

− ς

_Q

| ≤

X

j≥l

θ

_j²

+ 2

√ n E

_Q,S

|

Xn

j=l

θ

_j

ξ

_j,n

|

+ 1

n | B

_1,n

(x

^′

) | + 1

√ n E

_Q,S

| B

_2,n

(x

^′′

) | + l − 1

n ς

^∗

.

(13)

By Lemma A.3 and the conditions C

₁

), C

₂

), one gets E

_Q,S

|b σ

_n

− ς

_Q

| ≤

X

j≥l

θ

²_j

+ 2

√ n E

_Q,S

|

Xn

j=l

θ

_j

ξ

_j,n

|

+ L

_1,n

(Q)

n + L

_2,n

(Q)

√ n + ς

^∗

√ n .

In view of the inequality (1.3), the last term can be estimated as E

_Q,S

|

Xn j=l

θ

_j

ξ

_j,n

| ≤

vu utσ_Q

Xn j=l

θ

_j²

≤ √ σ

^∗

| S ˙ |

1

√ 2 l .

By applying the inequalities (3.22) we obtain the upper bound (3.26).

Hence Proposition 3.4.

Theorem 3.1 and Proposition 3.4 imply the following result.

Theorem 3.5. Assume that the family distribution Q

n

satisfies the condition H

₀

) and the unknown function S is continuously differentiable satisfying the condition (3.25). Then, for any n ≥ 1 and 0 < ρ < 1/3, the model selection procedure (3.15) with the estimator (3.24) satisfies the oracle inequality

R

^∗

( S

b_∗

, S) ≤ 1 + 3ρ − 2ρ

²

1 − 3ρ min

γ∈Γ

R

^∗

( S

b_γ

, S) + 1

n B

₁^∗

(n, ρ) , (3.28) where

B

₁^∗

(n, ρ) = Ψ

^∗

(n, ρ) + 6µκ

^∗_n

(S) (1 − 3ρ) √

n .

3.2 Specification of weights in the model selection procedure (3.15)

We will specify the weight coefficients (γ(j))

_j≥1

in the way proposed in [11] for a heteroscedastic discrete time regression model. Consider a numerical grid of the form

A

n

= { 1, . . . , k

^∗

} × { t

₁

, . . . , t

_m

} , (3.29)

where t

_i

= iε and m = [1/ε

²

]. We assume that both parameters

k

^∗

≥ 1 and 0 < ε ≤ 1 are functions of n, i.e. k

^∗

= k

^∗

(n) and ε = ε(n),

(14)

such that







lim

_n→∞

k

^∗

(n) = + ∞ , lim

_n→∞

k

^∗

(n) ln n = 0 , lim

_n→∞

ε(n) = 0 and lim

_n→∞

n

^δ

ε(n) = + ∞

(3.30)

for any δ > 0. One can take, for example, ε(n) = 1

ln(n + 1) and k

^∗

(n) =

p

ln(n + 1) .

For each α = (β, t) ∈ A

n

, we introduce the weight sequence γ

_α

= (γ

_α

(j))

_j≥1

given as

γ

_α

(j) = 1

_{1≤j≤j

0}

+

1 − (j/ω

_α

)

^β

1

_{j

0<j≤ωα}

(3.31) where j

₀

= j

₀

(α) = [ω

_α

/ ln n],

ω

_α

= (τ

_β

t n)

^1/(2β+1)

and τ

_β

= (β + 1)(2β + 1) π

^2β

β . We set

Γ = { γ

_α

, α ∈ A

n

} . (3.32) It will be noted that in this case ν = k

^∗

m.

Remark 3.2. It will be observed that the specific form of weights (3.31) was proposed by Pinsker [24] for the filtration problem with known smoothness of regression function observed with an additive gaussian white noise in the continuous time. Nussbaum [23] used these weights for the gaussian regression estimation problem in discrete time.

The minimal mean square risk, called the Pinsker constant, is provided by the weight least squares estimate with the weights where the index α depends on the smoothness order of the function S. In this case the smoothness order is unknown and, instead of one estimate, one has to use a whole family of estimates containing in particular the optimal one.

The problem is to study the properties of the whole class of esti-

mates. Below we derive an oracle inequality for this class which yields

the best mean square risk up to a multiplicative and additive constants

provided that the the smoothness of the unknown function S is not

available. Moreover, it will be shown that the multiplicative constant

tends to unity and the additive one vanishes as n → ∞ with the rate

higher than any minimax rate.

(15)

In view of the assumptions (3.30), for any δ > 0, one has

n→∞

lim ν n

^δ

= 0 . Moreover, by (3.31) for any α ∈ K

n

X∞ j=1

1

_{γ

α(j)>0}

≤ ω

_α

.

Therefore, taking into account that A

_β

≤ A

₁

< 1 for β ≥ 1, we get µ = µ

_n

≤ (n/ε)

^1/3

.

Therefore, for any δ > 0,

n→∞

lim µ

_n

n

^1/3+δ

= 0 .

To study the asymptotic behaviour of the term B

^∗₁

(n, ρ) we assume that the parameter ρ in the cost function (3.11) depends on n, i.e.

ρ = ρ

_n

such that ρ

_n

→ 0 as n → ∞ and for any δ > 0

n→∞

lim n

^δ

ρ

_n

= 0 . (3.33) Applying this limiting relation to the analysis of the asymptotic behavior of the additive term D

n

(ρ) in (3.28) one comes to the following result.

Theorem 3.6. Assume that the family distribution Q

n

satisfies the condition H

₀

) and the unknown function S is continuously differentiable satisfying the condition (3.25). Then, for any n ≥ 1, the model selection procedure (3.15), (3.33), (3.24), (3.32) satisfies the oracle inequality (3.28) with the additive term B

₁^∗

(n, ρ) obeying, for any δ > 0, the following limiting relation

n→∞

lim

B

^∗₁

(n, ρ

_n

)

n

^δ

= 0 .

(16)

4 Stochastic integrals with respect to the process (2.1)

In this Section we establish some properties of a stochastic integral I

_t

(f ) =

Z t 0

f

_s

dξ

_s

0 ≤ t ≤ n , (4.1) with respect to to the process (2.1). We will need some notations. Let us denote

ε

_f

(t) = a

Z t

0

e

^a(t−v)

f (v) (1 + e

^2av

) dv , (4.2) where f is [0, + ∞ ) →

R

function integrated on any finite interval. We introduce also the following transformation

τ

_f,g

(t) = 1 2

Z t 0

2f (s)g(s) + ε

^∗_f,g

(s)

ds (4.3)

of square integrable [0, + ∞ ) →

R

functions f and g. Here ε

^∗_f,g

(t) = f (t)ε

_g

(t) + ε

_f

(t)g(t) .

It will be noted that aτ

_f,1

(t) = 1

2 ε

_f

(t) and aτ

_1,1

(t) = 1

2 e

^2at

− 1

. (4.4)

Proposition 4.1. If f and g are from L

2

[0, n] then

E I

_t

(f )I

_t

(g) = ̺

^∗

τ

_f,g

(t) (4.5) where ̺

^∗

is given in (2.4).

Proof. Noting that the process I

_t

(f ) satisfies the stochastic equation dI

_t

(f ) = af (t)ξ

_t

dt + f (t)du

_t

, I

₀

(f ) = 0 ,

and applying the Ito formula one obtains (4.5). Hence Proposition 4.1.

Further, for integrated [0, + ∞ ) →

R

functions f and g, we define the [0, + ∞ ) × [0, + ∞ ) →

R

function

D

_f,g

(x, z) =

Z x

0

L

^∗_f,g

(y, z) dy + f (z)g(z) , (4.6)

(17)

where L

^∗_f,g

(y, z) = g(y + z)L

_f

(y, z) + f (y + z)L

_g

(y, z);

L

_f

(x, z) = ae

^ax

f (z) + a

Z x

0

e

^av

f (v + z) dv

.

Proposition 4.2. Let G

k

= σ { T

₁

, . . . , T

_k

} , where k ≥ 1, be σ-algebra generated by the stopping times (2.3), f and g be bounded left-continuous [0, ∞ ) × Ω →

R

functions measurable with respect to B [0, + ∞ )

N

G

k

(the product σ algebra created by B [0, + ∞ ) and G

k

). Then E

I

_T

k−

(f ) |G

k

= 0

and E

I

_T

k−

(f ) I

_T

k−

(g) |G

k

= ̺

²₁

τ

_f,g

(T

_k

) + ̺

²₂

k−1X

l=1

D

_f,g

(T

_k

− T

_l

, T

_l

) .

Proof. By the Ito formula one has I

_t

(f ) I

_t

(g) =

Z t 0

(̺

²₁

f (s)g(s) + a(f (s)I

_s

(g) + g(s)I

_s

(f ))ξ

_s

)ds + ̺

²₂ X

l≥1

f (T

_l

) g(T

_l

) Y

_l²

1

_{T

l≤t}

+

Z t

0

(f (s)I

_s−

(g) + g(s)I

_s−

(f )))du

_s

. (4.7) Taking the conditional expectation E ( ·|G

k

), on the set { T

_k

> t } , yields

E (I

_t

(f ) I

_t

(g) |G

k

) =

Z t

0

̺

²₁

f (s)g(s)ds + ̺

²₂ X

l≥1

f(T

_l

) g(T

_l

) 1

_{T

l≤t}

+ a

Z t

0

(f (s)E(I

_s

(g)ξ

_s

|G

k

) + g(s)E(I

_s

(f )ξ

_s

|G

k

)) ds . Now to calculate the function Z

_t

= E(I

_t

(f )ξ

_t

|G

k

) we put g = 1. Taking into account that

E(ξ

_t²

|G

k

) = ̺

²₁

2a (e

^2at

− 1) + ̺

²₂X

l≥1

e

^2a(t−T^l⁾

1

_{T

l≤t}

,

(18)

one obtains, for T

_j−1

≤ t < T

_j

,

Z ˙

_t

= aZ

_t

+ f(t)ψ

_t

, where

ψ

_t

= ̺

²₁

2 1 + e

^2at

+ ̺

²₂

a

X

l≥1

e

^2a(t−T^l⁾

1

_{T

l≤t}

. Therefore, for T

_j−1

≤ t < T

_j

,

Z

_t

= e

^a(t−T^j⁻¹⁾

Z

_T

j−1

+

Z t

T_j

−1

e

^a(t−s)

f (s)ψ

_s

ds .

¿From here and (4.6) with g = 1 one has Z

_T

j

= e

^a(T^j^−T^j⁻¹⁾

Z

_T

j−1

+ η

_j

, Z

_T₀

= Z

₀

= 0 , (4.8) where

η

_j

=

Z Tj

T_j−1

e

^a(T^j^−s)

f (s)ψ

_s

ds + ̺

²₂

f (T

_j

) . Solving the equation (4.7) one obtains

Z

_T

j

=

Z T_j

0

e

^a(T^j^−s)

f (s)ψ

_s

ds + ̺

²₂ Xj l=1

e

^a(T^j^−T^l⁾

f (T

_l

) . Therefore,

Z

_t

=

Z t

0

e

^a(t−s)

f (s)ψ

_s

ds + ̺

²₂X

l≥1

e

^a(t−T^l⁾

f (T

_l

)1

_{T

l≤t}

, i.e

aE(I

_t

(f )ξ

_t

|G

k

) = ̺

²₁

2 ε

_f

(t) + ̺

²₂ X

j≥1

L

_f

(t − T

_j

, T

_j

) 1

_{T

j≤t}

.

¿From here one comes to the desired equality. Hence Proposition 4.2.

Proposition 4.3. Let F, f and g be non random bounded left-continuous [0, ∞ ) →

R

functions. Then

E

X

k≥1

F (T

_k

) I

_T

k−

(f ) I

_T

k−

(g) 1

_{T

k≤t}

=

Z t

0

F (v) H

_f,g

(v) dv ,

(19)

where

H

_f,g

(t) = λ̺

²₁

τ

_f,g

(t) + (λ̺

₂

)

² Z t

0

D

_f,g

(t − z, z)dz . Proof. We have

ι(t) = E

X

k≥1

F(T

_k

) I

_T

k−

(f ) I

_T

k−

(g) 1

_{T

k≤t}

. By Proposition 4.2 one gets

ι(t) = ̺

²₁

E

X

k≥1

F (T

_k

) τ

_f,g

(T

_k

) 1

_{T

k≤t}

+ ̺

²₂

E

X

k≥1

F (T

_k

)

Xk−1

l=1

D

_f,g

(T

_k

− T

_l

, T

_l

)1

_{T

k≤t}

:= ι

₁

(t) + ι

₂

(t) , where

ι

₁

(t) = λ

Z t

0

X

l≥1

F (z)τ

_f,g

(z) (λz)

^l−1

(l − 1)! e

^−λz

dz =

Z t

0

F (z)τ

_f,g

(z)dz . To calculate ι

₂

(t) we note that

ι

₂

(t) = E

X

l≥1

1

_{T

l≤t}

X

k≥l+1

F (T

_k

)D

_f,g

(T

_k

− T

_l

, T

_l

)1

_{T

k≤t}

. Taking into account that T

_k

− T

_l

is independent of T

_l

for any k > l we obtain

ι

₂

(t) = λE

X

l≥1

1

_{T

l≤t}

Z t−Tl

0

X

k≥l+1

F (z + T

_l

)D

_f,g

(z, T

_l

) (λz)

^k−l−1

(k − l − 1)! e

^−λz

dz

= λE

X

l≥1

1

_{T

l≤t}

Z t−T_l 0

F (z + T

_l

)D

_f,g

(z, T

_l

) dz

= λ

² Z t

0

Z t−x 0

F (z + x) D

_f,g

(z, x)dz

dx .

(20)

Note that

aH

_f,1

(t) = λ

₁

2 ε

_f

(t) and aH

_1,1

(t) = λ

₁

2 e

^2at

− 1

, (4.9) where λ

₁

given in (3.18). Now we set

I

e_t

(f ) = I

_t²

(f ) − E I

_t²

(f ) . (4.10) Further we need the following correlation measures for two integrated [0, + ∞ ) →

R

functions f and g

̟

_f,g

= max

0≤v≤n

max

0≤t≤n−v

|

Z t

0

f (u + v)g(u)du | (4.11) and

̟

^∗_f,g

= max ̟

_f,g

, ̟

_g,f

. (4.12)

For any bounded [0, ∞ ) →

R

function f we introduce the following uniform norm

k f k

∗

= sup

0≤t≤n

| f (t) | .

To check the condition C

₂

) one needs the following non-asymptotic upper bound

Theorem 4.4. For any bounded left-continuous [0, ∞ ) →

R

functions f , g

| E I

e_n

(f ) I

e_n

(g) | ≤ nM

^∗

̟

_f,g^∗

+ k f k

∗

k g k

∗

k f k

∗

k g k

∗

, (4.13) where M

^∗

is defined in (3.19).

Proof. By the Ito formula one comes to the following stochastic equation

d I

e_t

(f ) = 2av

_t

(f )f(t)dt + dM

_t

(f ) , I

e₀

(f ) = 0 , with v

_t

(f ) = I

_t

(f )ξ

_t

− E I

_t

(f )ξ

_t

,

dM

_t

(f ) = 2I

_t−

(f )f (t)du

_t

+ ̺

²₂

f

²

(t)dm

_t

, M

₀

(f ) = 0 . In view of Propositions 4.1–4.3, one finds

E [e I (f ), I(g)]

e _t

= E [M(f ), M (g)]

_t

=

Z t

0

V

_f,g

(s) ds , (4.14)

(21)

where V

_f,g

(t) = 4f (t)g(t)G

_f,g

(t)+̺

₃

f

²

(t) g

²

(t) and G

_f,g

= ̺

²₁

̺

^∗

τ

_f,g

(t)+

̺

²₂

H

_f,g

(t). Note that Lemma A.1 implies

0≤t≤n

max | G

_f,g

(t) | ≤ (4̺

²₁

̺

^∗

+ ̺

²₂

D

^∗₁

)̟

^∗_f,g

. (4.15) One can easily check that

V

_1,1

(t) = 2λ

₂

e

^2at

− 1

a + ̺

₃

. (4.16)

The constants λ

₂

, ̺

₃

and D

^∗₁

are given in (3.18)-(3.19). Moreover, by the Ito formula we get

dv

_t

(f ) = av

_t

(f )dt + af (t)ζ

_t

dt + dK

_t

(f ) , v

₀

(f ) = 0 , where ζ

_t

= ξ

_t²

− E ξ

_t²

,

K

_t

(f ) =

Z t

0

I

_s−^∗

(f ) du

_s

+

Z t

0

̺

²₂

f (s)dm

_s

(4.17) with

I

_t^∗

(f ) = I

_t

(f ) + f(t)ξ

_t

and m

_t

=

X

0≤s≤t

∆z

_s²

− λt . Proposition 4.1 implies

E I

_t^∗

(f )I

_t^∗

(g) = ̺

^∗

τ

_f,g^∗

(t) , where

τ

_f,g^∗

(t) = τ

_f,g

(t) + f (t)τ

_1,g

(t) + g(t)τ

_f,1

(t) + f(t)g(t) τ

_1,1

(t) .

¿From (4.3)–(4.4) it follows that

τ

_f,g^∗

(t) = τ

_f,g

(t) + ε

^∗_f,g

(t) + f (t)g(t)(e

^2at

− 1)

2a . (4.18)

By applying Proposition 4.3 one finds E

X

k≥1

I

_T^∗

k−

(f ) I

_T^∗

k−

(g) 1

_{T

k≤t}

=

Z t

0

H

_f,g^∗

(v) dv , where

H

_f,g^∗

(t) = H

_f,g

(t) + f (t)H

_1,g

(t) + g(t)H

_f,1

(t) + f (t)g(t) H

_1,1

(t) .

(22)

¿From here and (4.9) we get

H

_f,g^∗

(t) = H

_f,g

(t) + λ

₁

ε

^∗_f,g

(t) + f (t)g(t)(e

^2at

− 1)

2a . (4.19)

Taking this into account, we calculate that, for any square integrated functions f and g,

E [K (f ) , K(g)]

_t

=

Z t

0

G

^∗_f,g

(s) + ̺

₃

f (s) g(s)

ds , (4.20) where

G

^∗_f,g

(t) = ̺

²₁

̺

^∗

τ

_f,g^∗

(t) + ̺

²₂

H

_f,g^∗

(t) . Further by applying Propositions 4.1–4.3 we obtain

E [K(f ), M (g)]

_t

=

Z t

0

U

_f,g

(s)ds , (4.21) where

U

_f,g

(s) = 2g(s)G

_f,g

(s) + 2g(s)f(s)G

_1,g

(s) + ̺

₃

f (s)g

²

(s) . By the Ito formula one finds for t ≥ 0

E I

e_t

(f) I

e_t

(g) = E [e I (f ) , I

e

(g)]

_t

+ 2

Z t

0

f (s) T

f,g

(s) + g(s) T

g,f

(s) ds , (4.22) where T

f,g

(t) = aE v

_t

(f ) I

e_t

(g). Since for g = 1 the processes I

e_t

(g) and v

_t

(g) coincide with ζ

_t

, i.e. I

e_t

(1) = v

_t

(1) = ζ

_t

, (4.17) implies

Eζ

_t²

=

Z t

0

e

^4a(t−s)

V

_1,1

(s)ds = e

^4at

2λ

₂

+ a̺

₃

4a

²

+ e

^2at

λ

₂

a

²

+ 2λ

₂

− a̺

₃

4a

²

.

(4.23) We define the function

A

_f

(t) =

Z t

0

e

^3a(t−s)

f (s)a

²

E ζ

_s²

+ κ

_f

(s)

ds , (4.24) where

κ

_f

(t) = λ

₂

ε

_f

(t) + f (t) e

^2at

− 1

+ a̺

₃

f(t) . Denote

V

_f,g

(s) = A

^∗_f,g

(s) + G

^∗_f,g

(s) + ̺

₃

f (s)g(s) , (4.25)

Efficient robust nonparametric estimation in a semimartingale regression model

HAL Id: hal-00526915

https://hal.archives-ouvertes.fr/hal-00526915

Preprint submitted on 16 Oct 2010

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from

L’archive ouverte pluridisciplinaire

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de

Eﬀicient robust nonparametric estimation in a semimartingale regression model

Victor Konev, Serguei Pergamenchtchikov

To cite this version:

Victor Konev, Serguei Pergamenchtchikov. Eﬀicient robust nonparametric estimation in a semimartin-

gale regression model. 2010. �hal-00526915�