Efficient robust nonparametric estimation in a semimartingale regression model

(1)

www.imstat.org/aihp 2012, Vol. 48, No. 4, 1217–1244

DOI:10.1214/12-AIHP488

Efficient robust nonparametric estimation in a semimartingale regression model ¹

Victor Konev

^a

and Serguei Pergamenshchikov

^b

aDepartment of Applied Mathematics and Cybernetics, Tomsk State University, Lenin str. 36, 634050 Tomsk, Russia. E-mail:[email protected] bLaboratoire de Mathématiques Raphael Salem, Avenue de l’Université, BP. 12, Université de Rouen, F76801, Saint Etienne du Rouvray, Cedex

France and Department of Mathematics and Mechanics, Tomsk State University, Lenin str. 36, 634041 Tomsk, Russia.

E-mail:[email protected]

Received 17 November 2010; revised 2 February 2012; accepted 12 March 2012

Abstract. The paper considers the problem of robust estimating a periodic function in a continuous time regression model with the dependent disturbances given by a general square integrable semimartingale with an unknown distribution. An example of such a noise is a non-Gaussian Ornstein–Uhlenbeck process with jumps (see (J. R. Stat. Soc. Ser. B Stat. Methodol.63(2001) 167–241), (Ann. Appl. Probab.18(2008) 879–908)). An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown. It is established that, in the case of the non-Gaussian Ornstein–Uhlenbeck noise, the sharp lower bound for the robust quadratic risk is determined by the limit value of the noise intensity at high frequencies. An example with a martinagale noise exhibits that the risk convergence rate becomes worse if the noise intensity is unbounded.

Résumé. Dans cette article nous considérons le problème d’estimation robuste d’une fonction périodique dans un modèle de régression en temps continu avec un bruit dépendant décrit par une semi martingale carrée intégrable de distribution inconnue. Un exemple de ce bruit est un processus d’Ornstein–Uhlenbeck non gaussien avec sauts (voir (J. R. Stat. Soc. Ser. B Stat. Methodol.63 (2001) 167–241), (Ann. Appl. Probab.18(2008) 879–908)). Nous proposons une procédure adaptative de sélection de modèle basée sur les estimateurs des moindres carrés pondérés. Sous des conditions générales sur les deux premiers moments de la distribution du bruit, des inégalités d’Oracle non asymptotiques pointues pour des risques quadratiques robustes sont obtenues et l’efficacité robuste est établie. Nous avons établi aussi que dans le cas du processus d’Ornstein–Uhlenbeck non Gaussian, la borne inférieure pour le risque quadratique robuste est donnée par la limite de l’intensité du bruit quand la fréquence tend vers l’infini. Nous donnons un exemple d’un modèle de régression avec un bruit martingale où la vitesse de convergence du risque quadratique devient plus lente si l’intensité du bruit tend vers l’infini.

MSC:Primary 62G08; secondary 62G05

Keywords:Non-asymptotic estimation; Robust risk; Model selection; Sharp oracle inequality; Asymptotic efficiency

1. Introduction

Consider a regression model in continuous time

dyt=S(t)dt+dξt, 0≤t≤n, (1.1)

1Supported by the RFBR-Grant 09-01-00172-a.

(2)

whereS is an unknown 1-periodicR→Rfunction,S∈L2[0,1];(ξ_t)_t_≥₀is an unobservable semimartingale noise with the values in the Skorokhod spaceD[0, n]such that, for any functionf fromL2[0, n], the stochastic integral

In(f )= n

0

f (s)dξs (1.2)

is well defined and has the following properties EQI_n(f )=0 and EQI_n²(f )≤σ_Q

_n

0

f²(s)ds. (1.3)

HereEQ denotes the expectation with respect to the distribution QinD[0, n] of the process (ξt)0≤t≤n, which is assumed to belong to some probability familyQnspecified below;σ_Q>0 is some positive constant depending on the distributionQ.

The problem is to estimate the unknown functionSin the model (1.1) on the basis of observations(y_t)₀_≤_t_≤_n. The class of the disturbancesξ satisfying conditions (1.3) is rather wide and comprises, in particular, the Lévy processes which are used in different applied problems (see [4,16], for details). The models (1.1) with the Lévy’s type noise naturally arise (see [18]) in the nonparametric functional statistics problems (see, for example, [8]). Moreover, as is shown in Section2, non-Gaussian Ornstein–Uhlenbeck-based models, introduced in [2], enter this class.

We define the error of an estimateS(any real-valued function measurable with respect toσ{y_t,0≤t≤n}) forS by its integral quadratic risk

RQ(S, S):=EQ,SS−S², (1.4)

whereEQ,Sstands for the expectation with respect to the distributionPQ,Sof the process (1.1) with a fixed distribution Qof the noise(ξ_t)₀_≤_t_≤_nand a given functionS; · is the norm inL2[0,1], i.e.

f²:=

₁

0

f²(t)dt. (1.5)

Since in our case the noise distributionQis unknown, it seems natural similar to [10] to measure the quality of an estimateSby the robust risk defined as

R^∗n(S, S)= sup

Q∈QⁿRQ(S, S) (1.6)

which assumes taking supremum of the error (1.4) over the whole family of admissible distributionsQn.

We will treat the stated problem from the standpoint of the model selection approach. It will be noted that the origin of this method goes back to early seventies with the pioneering papers by Akaike [1] and Mallows [22] who proposed to introduce penalizing in a log-likelihood type criterion. The further progress has been made by Barron, Birgé and Massart [3,23], who developed a non-asymptotic model selection method which enables one to derive non- asymptotic oracle inequalities for nonparametric regression models with the i.i.d. Gaussian disturbances. An oracle inequality yields the upper bound for the estimate risk via the minimal risk corresponding to a chosen family of estimates. Galtchouk and Pergamenshchikov [9] applied the Barron–Birgé–Massart technic to the problem of estimating nonparametric drift functions in ergodic diffusion processes. Fourdrinier and Pergamenshchikov [7] extended the Barron–Birgé–Massart method to the models with the spherically symmetric dependent observations. They proposed a model selection procedure based on the improved least squares estimates. Lately, the authors [17] applied this method to the nonparametric problem of estimating a periodic function in a continuous time model with a Gaussian colored noise. In all cited papers, the non-asymptotic oracle inequalities have been derived, which enable one to establish the optimal convergence rate for the minimax risks. In addition to the optimal convergence rate, the other important problem is that of the efficiency of adaptive estimation procedures. In order to examine the efficiency property of a procedure one has to obtain thesharp oracle inequalities, i.e. such in which the factor at the principal term in the right-hand of the inequality is close to unity.

The first result on sharp inequalities is most likely due to Kneip [15] who studied a Gaussian regression model in discrete time. It will be observed that the derivation of oracle inequalities usually rests upon the fact that the initial

(3)

model, by applying the Fourier transformation, can be reduced to a Gaussian model with independent observations.

However, such a transform is possible only for Gaussian models with independent homogeneous observations or for the inhomogeneous ones with the known correlation characteristics. This restriction significantly narrows the area of application of the proposed model selection procedures and rules out a broad class of models including, in particular, heteroscedastic regression models widely used in econometrics (see, for example, [5,14]). For constructing adaptive procedures in the case of inhomogeneous observations one needs to modify the approach to the estimation problem.

Galtchouk and Pergamenshchikov [11–13] have developed a new estimation method intended for the heteroscedastic regression models in discrete time. The heart of this method is to combine the Barron–Birgé–Massart non-asymptotic penalization method [3] and the Pinsker weighted least square method which minimizes the asymptotic risk (see, for example, [24,25]). This yields a significant improvement in the performance of the procedure (see numerical example in [11]).

The goal of this paper is to develop the robust efficient model selection method for the model (1.1) with dependent disturbances having unknown distribution. We follow the approach proposed by Galtchouk and Pergamenshchikov [11] in the construction of the procedure. Unfortunately, their method of obtaining the oracle inequalities is essentially based on the independence of observations and can not be applied here. This paper proposes the new analytical tools which allow one to obtain the sharp non-asymptotic oracle inequalities for robust risks under general conditions on the distribution of the noise in the model (1.1). This method enables us to treat both the cases of dependent and independent observations from the same standpoint, it does not assume the knowledge of the noise distribution and leads to the efficient estimation procedure with respect to the risk (1.6). The validity of the conditions, imposed on the noise in Eq. (1.1) is verified for a non-Gaussian Ornstein–Uhlenbeck process and for a martinagale with the increasing variance (see Section2).

The rest of the paper is organized as follows. In Section3we construct the model selection procedure on the basis of weighted least squares estimates and state the main results in the form of oracle inequalities for the quadratic risk (1.4) and the robust risk (1.6). Here we also specify the set of admissible weight sequences in the model selection procedure. In Section 4 we establish some properties of the stochastic integrals with respect to the non-Gaussian Ornstein–Uhlenbeck process (2.1). Section7gives the proofs of the main results. In Sections5,6it is shown that the proposed model selection procedure for estimatingSin (1.1) is asymptotically efficient with respect to the robust risk (1.6). Section7gives the proofs of the oracle inequalities for the regression model (1.1) with the noises introduced in Section2. In theAppendixsome auxiliary propositions are given.

2. Semimartingale noises

In this section two examples of the disturbances(ξ_t)_t_≥₀in (1.1) are given.

2.1. Non-Gaussian Ornstein–Uhlenbeck process

First we consider the disturbances(ξ_t)_t_≥₀ in (1.1) given by a non-Gaussian Ornstein–Uhlenbeck process with the Lévy subordinator. Such processes are used in the financial Black–Scholes type markets with jumps (see for example [6] and the references therein). Let the noise process in (1.1) obey the equation

dξt=aξ_tdt+dut, ξ₀=0, (2.1)

wherea≤0,u_t=₁w_t+₂z_t,₁and₂are unknown constants,(w_t)_t_≥₀is a standard Brownian motion,(z_t)_t_≥₀is a compound Poisson process defined as

zt=

Nt

j=1

Yj. (2.2)

Here (N_t)_t_≥₀ is a standard homogeneous Poisson process with unknown intensity λ >0 and (Y_j)_j_≥₁ is an i.i.d.

sequence of random variables with

EYj=0, EY_j²=1 and EY_j⁴<∞. (2.3)

(4)

Let(T )_k_≥₁denote the arrival times of the process(N_t)_t_≥₀, that is,

T_k=inf{t≥0: N_t=k}. (2.4)

We assume that the parametersλ,a,1and2satisfy the conditions

−amax≤a≤0, λ≥λ_∗, min≤₁²+λ²₂≤max. (2.5)

LetQndenote the family of all distributions of process (2.1) on the spaceD[0, n]with the parametersa,λ,₁and₂ satisfying the conditions (2.5) with fixed boundsλ_∗>0,a_max>0,_min>0 and_max>0.

It will be observed that the process (1.1)–(2.1) may be used for modelling of the stock prices in the financial markets of the Black–Scholes type with jumps (see, e.g., [20], p. 141). In this case the price process(X_t)₀_≤_t_≤_n is governed by the stochastic differential equation:

dXt

X_t =S(t)dt+dξt, (2.6)

where(ξt)t≥0is an internal random factor specified by Eq. (2.1) andS(t)is a periodic stock-apprecaition rate which has to be estimated from the observations

yt=y0+ _t

0

X_u⁻¹dXu.

The solution to Eq. (2.6) is given by the Dolean exponent, i.e.

X_t=X₀exp

y_t−y₀+ _t

0

S(u)−²₁ 2

du+Λ_t

,

whereΛ_t= _T_j_≤_t(ln(1+₂Y_j)−₂Y_j).

To use the model (2.6) for describing the stock prices dynamics one needs to require that for allj≥1 1+2Yj>0 a.s.

2.2. Martingale noise

Next we consider a martinagale noise obeying the equation

dξt=₁(t)dwt+₂(t)dzt, (2.7)

where₁ and₂ are continuously differentiable R₊→R nonrandom functions; the process(z_t)_t_≥₀ is defined in (2.2)–(2.3). Assume that, there exist constantsλ_∗>0,_min>0 and aR₊→R₊continuous function_max(·)such that for allt≥0

λ≥λ_∗, _min≤₁²(t)+λ²₂(t)≤_max(t) (2.8)

and, for anyδ >0,

tlim→∞

max(t)

t^δ =0. (2.9)

Moreover, we assume that the derivatives of functions_i for some positive constants_∗and_∗satisfy the following conditions

sup

t≥0

(t+1) max

1≤i≤2

d dt_i²(t)

≤_∗, sup

t≥0

d² dt²²_i(t)

≤_∗. (2.10)

In this case we denote byQnthe family of all distributions of the process (2.7) onD[0, n]satisfying the conditions (2.8) and (2.10) for someλ_∗,min,max(·),_∗and_∗.

(5)

3. Model selection

This section gives the construction of a model selection procedure for estimating a functionSin (1.1) on the basis of weighted least square estimates and states the main results.

For estimating the unknown function S in the model (1.1), we apply its Fourier expansion in the trigonometric basis(φ_j)_j_≥₁inL2[0,1]defined as

φ₁=1, φ_j(x)=√ 2 Trj

2π[j/2]x

, j≥2, (3.1)

where the function Trj(x)=cos(x)for evenj and Trj(x)=sin(x)for oddj;[x]denotes the integer part ofx. The corresponding Fourier coefficients

θ_j=(S, φ_j)= 1

0

S(t)φ_j(t)dt (3.2)

can be estimated as θj,n=1

n n

0

φj(t)dyt. (3.3)

In view of (1.1), one obtains

θ_j,n=θ_j+ 1

√nξ_j,n, ξ_j,n= 1

√nI_n(φ_j), (3.4)

whereI_n(φ_j)is given in (1.2).

For any sequencex=(x_j)_j_≥₁, we set

|x|²= ∞ j=1

x_j² and #(x)= ∞ j=1

1_{|x_j|>0}. (3.5)

Now we impose some additional conditions on the familyQnof distributions of the noise(ξ_t)_t_≥₀in (1.1).

(C1) There exists a variance proxyς_Q>0such that for anyn≥1 L1,n(Q)= sup

x∈H,#(x)≤n

∞ j=1

x_j

EQξ_j,n² −ς_Q <∞, whereH= [−1,1]^∞.

(C2) Assume that for eachn≥1 L2,n(Q)= sup

|x|≤1,#(x)≤n

EQ

_∞

j=1

xjξj,n

2

<∞,

whereξ_j,n=ξ_j,n² −EQξ_j,n² .

As is shown in the proof of Theorem3.5in Section7, both conditions(C1)and(C2)hold for the process (2.1).

Further we introduce a class of weighted least squares estimates forS(t)as S_γ =^∞

j=1

γ (j )θ_j,nφ_j, (3.6)

whereγ=(γ (j ))_j_≥₁is a sequence of weight coefficients such that

0≤γ (j )≤1 and 0<#(γ )≤n. (3.7)

(6)

LetΓ denote a finite set of such weight sequencesγ=(γ (j ))_j_≥₁,ν=card(Γ )be its cardinal number and μ=max

γ∈Γ #(γ ). (3.8)

The model selection procedure for the unknown functionSin (1.1) will be constructed on the basis of a family of estimates(S_γ)_γ_∈_Γ. The choice of a specific set of weight sequencesΓ is discussed at the end of this section. To find a proper weight sequenceγ in the setΓ, one needs to specify a cost function. When choosing an appropriate cost function one can use the following argument. The empirical squared error

Errn(γ )= S_γ−S² can be written as

Errn(γ )= ∞ j=1

γ²(j )θ_j,n² −2 ∞ j=1

γ (j )θ_j,nθ_j+ ∞ j=1

θ_j². (3.9)

Since the Fourier coefficients(θ_j)_j_≥1are unknown, the weight coefficients(γ_j)_j_≥1can not be found by minimizing this quantity. To circumvent this difficulty one needs to replace the termsθ_j,nθ_j by their estimatorsθ_j,n. We set

θ_j,n=θ_j,n² −σ_n

n, (3.10)

whereσ_nis some estimator for the variance proxyς_Qin the condition(C1).

For this change in the empirical squared error, one has to pay some penalty. Thus, one comes to the cost function of the form

J_n(γ )=^∞

j=1

γ²(j )θ_j,n² −2 ∞ j=1

γ (j )θ_j,n+ρP_n(γ ), (3.11)

whereρis some positive constant,P (γ ) is the penalty term defined as Pn(γ )=σ_n|γ|²

n . (3.12)

In the case, when the value ofς_Qin(C1)is known, one can takeσ_n=ς_Qand Pn(γ )=ς_Q|γ|²

n . (3.13)

Substituting the weight coefficients, minimizing the cost function

γ=argmin

γ∈Γ

Jn(γ ), (3.14)

in (3.6) leads to the model selection procedure

S_∗=S_γ. (3.15)

It will be noted thatγ exists becauseΓ is a finite set. If the minimizing sequence in (3.14)γ is not unique, one can take any minimizer.

First we consider the case when the proxy varianceς_Qin(C1)known.

Proposition 3.1. If the conditions(C1)and(C2)hold for the distributionQof the processξ in(1.1),then the risk (1.4)of estimate(3.15)forSsatisfies the oracle inequality

RQ(S_∗, S)≤1+3ρ−2ρ² 1−3ρ min

γ∈ΓRQ(S_γ, S)+1

nBQ(n, ρ), (3.16)

(7)

where

BQ(n, ρ)=2ς_Qσ_Qν+2ςQL1,n(Q)+νL2,n(Q)

ς_Qρ(1−3ρ) +6μRQ(σ_n, ς_Q) 1−3ρ . This result can be proved along the lines of Theorem 1 in [18].

Now we specify the classQnof admissible distributionsQin the robust risk in (1.6).

LetQnbe a set of noise distributionsQon the spaceD[0, n]satisfying (1.3),(C1),(C2)and the following conditions.

(H0) The factorσ_Qin(1.3)and the proxy varianceς_Qin(C1)are such that for eachn≥1 ς_n^∗:= sup

Q∈Qn

ς_Q<∞ and σ_n^∗:= sup

Q∈Qn

σ_Q<∞,

(3.17) ς_∗:= inf

Q∈Qⁿς_Q>0 and, moreover, for anyδ >0

nlim→∞

ς_n^∗+σ_n^∗

n^δ =0. (3.18)

(H1) The functionalsL1,n(Q)andL2,n(Q)in(C1),(C2)are uniformly bounded on the setQn, i.e. for eachQ∈Qn

L1,n(Q)≤L^∗_1,n, L2,n(Q)≤L^∗_2,n

and the numerical sequences(L^∗_i,n)n≥1,i=1,2, are such that for anyδ >0

nlim→∞

L^∗_1,n+L^∗_2,n n^δ =0.

Theorem 3.2. Suppose that the family of admissible noise distributionsQn for the model (1.1)is defined by the conditions(C1),(C2)and(H0),(H1).Then the robust risk(1.6)of the estimate(3.15)forS(t)satisfies for anyn≥1 and0< ρ <1/3the oracle inequality

R^∗n(S_∗, S)≤1+3ρ−2ρ² 1−3ρ min

γ∈ΓR^∗n(Sγ, S)+1

nB^∗(n, ρ), (3.19)

where

B^∗(n, ρ)=2ς_n^∗σ_n^∗ν+2ς_n^∗L^∗_1,n+νL^∗_2,n

ς_∗ρ(1−3ρ) +6μR^∗n(σn) 1−3ρ andR^∗n(σ_n)=sup_Q_∈Q_nRQ(σ_n, ς_Q).

3.1. The case of unknownςQ

If the variance proxyς_Qin the condition(C1)is unknown it can be estimated as σ_n=

n j=[√

n]+1

θ_j,n² , n≥2. (3.20)

Proposition 3.3. Suppose that the conditions(C1)and (C2) hold for the model(1.1)and S(·) is a continuously differentiable function such that

| ˙S|1= ₁

0

S(t)˙ dt <+∞. (3.21)

(8)

Then,for anyn≥2,

RQ(σn, ςQ)≤κ_n(Q, S)

√n , (3.22)

where

κn(Q, S)=4| ˙S|²1

1+

√σ_Q n^1/4

+ςQ+

L2,n(Q)+L1,n(Q) n^1/2 . This assertion is a direct consequence of Proposition 4 in [18].

Propositions3.1and3.3allow one to obtain the following non-asymptotic oracle inequality.

Theorem 3.4. Let the distribution familyQnbe as in Theorem3.2with unknownς_Q andS in(1.1)satisfy(3.21).

Then,for anyn≥1and0< ρ <1/3,the model selection procedure(3.15), (3.20)satisfies the oracle inequality R^∗n(S_∗, S)≤1+3ρ−2ρ²

1−3ρ min

γ∈ΓR^∗n(S_γ, S)+1

nB1^∗(n, ρ), (3.23)

where

B^∗₁(n, ρ)=2ς_n^∗σ_n^∗ν+2ς_n^∗L^∗_1,n+νL^∗_2,n

ς_∗ρ(1−3ρ) + 6μκ_n^∗(S) (1−3ρ)√ n

and

κ_n^∗(S)=4| ˙S|²₁

1+ σ_n^∗

n^1/4

+ς_n^∗+

L^∗_2,n+L^∗_1,n n^1/2. Moreover,for anyδ >0,

nlim→∞

B^∗₁(n, ρ) n^δ =0.

Now we will obtain the oracle inequalities for the model (1.1) with the noises introduced in Section2. We will need the following parameter

M^∗=116²_max+33²_maxEY₁⁴

λ_∗ . (3.24)

Theorem 3.5. LetQnbe the distribution family for the Ornstein–Uhlenbeck process(2.1)with the parameters meet- ing(2.5).Then,for anyn≥1 and0< ρ <1/3,the estimator(3.15)satisfies the oracle inequality(3.23)with the parametersσ_n^∗=3max,ς_n^∗=max,ς_∗=min,L^∗_1,n=2(4a²_max+15amax+2)maxandL^∗_2,n=82M^∗.

Proof of this theorem is given in Section7.

Remark 3.1. It will be noted that the oracle inequality(3.23)for the model(1.1)–(2.1)holds uniformly in the stability region of the process(2.1)including its boundary,i.e.the case whena=0.

When considering the estimation problem for the model (1.1) with the martingale noise (2.7) we will use two sequences

l_1,n=2(1+λ_∗)_∗ln(n+1) and

l2,n=(1+λ_∗)

max(0)+2_∗ln(n+1)

(3.25)

(9)

with the constants defined in (2.8), (2.10).

Theorem 3.6. LetQn be the family of distributions of the process(2.7)with the parameters meeting(2.8)–(2.10).

Then,for anyn≥1and0< ρ <1/3,the estimator(3.15)satisfies the oracle inequality(3.23)withσ_n^∗= max∗,n, ς_∗=_min,

ς_n^∗=n⁻¹ _n

0

max(u)du, L^∗_1,n=_∗+_∗/2 and

L^∗_2,n=l_1,n²

n +2l²_2,n+4λ_∗EY₁⁴_max⁴_∗,n. Proof of this theorem is given in Section7.

Remark 3.2. If in the model(2.7) limt→∞_max(t)= ∞,thenς_n^∗≈_max(n)→ ∞asn→ ∞and,by virtue of the condition(2.9), limn→∞ς_n^∗/n^δ=0for eachδ >0.

3.2. Specification of weights in the model selection procedure(3.15)

We will specify the weight coefficients(γ (j ))_j_≥1in the way proposed in [11] for a heteroscedastic regression model in discrete time. Consider a numerical grid of the form

An=

1, . . . , k^∗

× {t1, . . . , tm}, (3.26)

wheret_i=iεandm= [1/ε²]. Both parametersk^∗≥1 and 0< ε≤1 are assumed to be functions ofn, i.e.k^∗=k^∗(n) andε=ε(n), such that for anyδ >0

limn→∞k^∗(n)= +∞, limn→∞k^∗(n) lnn =0,

limn→∞ς_n^∗ε(n)=0 and limn→∞n^δε(n)= +∞, (3.27)

whereς_n^∗is the least upper bound of the noise variance proxy defined in (3.17). One can take, for example, ε(n)= 1

ln(n+1) and k^∗(n)=

ln(n+1).

For eachα=(β, t)∈An, we introduce the weight sequenceγ_α=(γ_α(j ))_j_≥₁as γα(j )=1_{1≤j≤j₀}+

1−(j/ωα)^β

1_{j₀<j≤ω_α}, (3.28)

wherej0=j0(α)= [ω_α/ln(n+1)],

ω_α=(τ_βt n)^1/(2β⁺¹⁾ and τ_β=(β+1)(2β+1) π^2ββ . We set

Γ = {γ_α, α∈An}. (3.29)

It will be noted that in this caseν=k^∗m.

Remark 3.3. It will be observed that the specific form of weights(3.28)was proposed by Pinsker[25]for the filtra- tion problem with known smoothness of the regression function observed with an additive Gaussian white noise in continuous time.Nussbaum[24]used these weights for the Gaussian regression estimation problem in discrete time.

The minimal mean square risk,called the Pinsker constant,is provided by the weight least squares estimate with the weights where the indexαdepends on the smoothness order of the functionS.If the smoothness order is unknown one has to use,instead of one estimate,a whole family of estimates containing,in particular,the optimal one.

(10)

In this case the problem is to study the properties of the whole class of estimates. Below we derive an oracle inequality for this class which yields the best mean square risk up to a multiplicative and additive constants provided that the smoothness of the unknown functionS is not available.Moreover,it will be shown that the multiplicative constant tends to unity and the additive one vanishes asn→ ∞with the rate higher than any minimax rate.

In view of the assumptions (3.27), for anyδ >0, one has

nlim→∞

ν n^δ =0.

Moreover, by (3.28) for anyα∈Un

∞ j=1

1_{γα(j )>0}≤ωα.

Therefore, taking into account thatA_β≤A₁<1 forβ≥1, we get μ=μ_n≤(n/ε)^1/3

and for anyδ >0

nlim→∞

μ_n n^1/3⁺^δ =0.

To study the asymptotic behaviour of the termB₁^∗(n, ρ)we assume that the parameterρ in the cost function (3.11) depends onn, i.e.ρ=ρnsuch thatρn→0 asn→ ∞and for anyδ >0

nlim→∞n^δρ_n=0. (3.30)

Applying this limiting relation in the analysis of the additive termB^∗₁(n, ρ)in (3.23) yields the following result.

Theorem 3.7. Assume that the family distributionQn satisfies the condition(H0)and the unknown function S is continuously differentiable satisfying the condition(3.21).Then,for anyn≥1,the model selection procedure(3.15), (3.30), (3.20), (3.29)satisfies the oracle inequality(3.23)with the additive termB₁^∗(n, ρ)obeying,for anyδ >0,the following limiting relation

nlim→∞

B^∗₁(n, ρ_n) n^δ =0.

4. Stochastic integrals with respect to the process (2.1)

In this section we establish some properties of the stochastic integral I_t(f )=

_t

0

f (s)dξs, 0≤t≤n, (4.1)

with respect to the process (2.1). We need some notations. Let us denote ε_f(t)=a

t 0

e^a(t⁻^v)f (v)

1+e^2av

dv, (4.2)

wheref is a[0,+∞)→Rfunction integrated on any finite interval. We introduce also the following transformation τ_f,g(t)=1

2 _t

0

2f (s)g(s)+ε^∗_f,g(s)

ds (4.3)

(11)

of square integrable[0,+∞)→Rfunctionsf andg. Here ε^∗_f,g(t)=f (t )εg(t)+εf(t)g(t).

It will be noted that aτ_f,1(t)=1

2ε_f(t) and aτ_1,1(t)=1 2

e^2at−1

. (4.4)

Moreover, we set

τ_f,g^∗ (t)=τ_f,g(t)+f (t )τ_1,g(t)+g(t)τ_f,1(t)+f (t )g(t )τ_1,1(t). (4.5) We can rewrite this function as

τ_f,g^∗ (t)=τ_f,g(t)+ε^∗_f,g(t)+f (t )g(t )(e^2at−1)

2a .

Proposition 4.1. Iff andgare fromL2[0, n]then

EIt(f )I_t(g)=τ_f,g(t), (4.6)

where=²₁+λ₂².

Proof. Noting that the processI_t(f )satisfies the stochastic equation dIt(f )=af (t)ξtdt+f (t )dut, I0(f )=0,

and applying the Ito formula (see, for example, [21]) one obtains I_t(f )I_t(g)=

_t

0

₁²f (s)g(s)+a

f (s)ζ_s(g)+g(s)ζ_s(f ) ds +₂²

l≥1

f (T_l)g(T_l)Y_l²1_{T_l≤t}+ _t

0

Υ_s₋(f, g)dus, (4.7)

whereζs(f )=Is(f )ξsandΥs(f, g)=f (s)Is(g)+g(s)Is(f ). This yields EIt(f )I_t(g)=a

_t

0

f (s)Z_s(g)+g(s)Z_s(f ) ds +

_t

0

f (s)g(s)ds, (4.8)

whereZs(f )=Eζs(f ). Putting hereg=1 and taking into account thatEξ_s²=(e^2as−1)/2a, we obtainZs(f )=

ε_f(s)/2a. This implies immediately (4.6). Hence Proposition4.1.

Further, for integrated[0,+∞)→Rfunctionsf andg, we define the[0,+∞)× [0,+∞)→Rfunction Df,g(x, z)=

x 0

L^∗_f,g(y, z)dy+f (z)g(z), (4.9)

whereL^∗_f,g(y, z)=g(y+z)L_f(y, z)+f (y+z)L_g(y, z);

L_f(x, z)=ae^ax

f (z)+a _x

0

e^avf (v+z)dv

.

(12)

Proposition 4.2. Let G=σ{T_k, k≥1},beσ-algebra generated by the stopping times(2.4),f and g be bounded left-continuous[0,∞)×Ω→Rfunctions measurable with respect toB[0,+∞)⊗G(the productσ-algebra created byB[0,+∞)andG).Then,for anyk≥1,

E

I_T_k₋(f )|G

=0

andE(IT_k−(f )IT_k−(g)|G)=²₁τf,g(Tk)+²₂ ^k_l₌⁻₁¹Df,g(Tk−Tl, Tl).

Proof. Taking the conditional expectationE(·|G)in (4.7) yields E

I_t(f )I_t(g)|G

= t

0

²₁f (s)g(s)ds+₂²

l≥1

f (T_l)g(T_l)1_{Tl≤t}

+a _t

0

f (s)Z_g(s)+g(s)Z_f(s) ds, whereZf(s)=E(Is(f )ξs|G). By direct calculation we find

Z_f(t)=a _t

0

f (s)E(ξtξ_s|G)ds+₁² _t

0

e^a(t⁻^s)f (s)ds +²₂

l≥1

f (T_l)e^a(t⁻^T^l⁾1_{Tl≤t}.

Taking into account here that for any 0≤s≤t E(ξtξs|G)=e^a(t⁻^s)

²₁ 2a

e^2as−1

+²₂

l≥1

e^2a(s⁻^T^l⁾1_{T_l≤s}

,

one obtains, aZf(t)=₁²

2 εf(t)+²₂

j≥1

Lf(t−Tj, Tj)1_{T_j≤t}.

From here one comes to the desired equality.

Proposition 4.3. LetF,f andgbe nonrandom bounded left-continuous[0,∞)→Rfunctions.Then

E

k≥1

F (T_k)I_T_k₋(f )I_T_k₋(g)1_{Tk≤t}=λ _t

0

F (v)τ_f,g(v)dv.

Proof. We setι(t )=E _k_≥₁F (T_k)I_T_k₋(f )I_T_k₋(g)1_{Tk≤t}. By applying Proposition4.2one gets

ι(t )=₁²E

k≥1

F (Tk)τf,g(Tk)1_{T_k≤t}+₂²E

k≥1

F (Tk)

k−1

l=1

Df,g(Tk−Tl, Tl)1_{T_k≤t}

:=₁²ι1(t)+₂²ι2(t),

where

ι₁(t)=λ _t

0

l≥1

F (z)τ_f,g(z)(λz)^l⁻¹

(l−1)!e⁻^λzdz=λ _t

0

F (z)τ_f,g(z)dz.

(13)

To calculateι₂(t)we note that ι₂(t)=E

l≥1

1_{Tl≤t}

k≥l+1

F (T_k)D_f,g(T_k−T_l, T_l)1_{Tk≤t}.

Taking into account thatT_k−T_lis independent ofT_lfor anyk > lwe obtain ι2(t)=λE

l≥1

1_{T_l≤t}

_t₋_T_l

0

k≥l+1

F (z+Tl)Df,g(z, Tl) (λz)^k⁻^l⁻¹

(k−l−1)!e⁻^λzdz

=λE

l≥1

1_{T_l≤t}

_t₋_T_l

0

F (z+T_l)D_f,g(z, T_l)dz

=λ² _t

0

_t₋_x

0

F (z+x)D_f,g(z, x)dz dx=λ²

_t

0

F (z)τ_f,g(z)dz.

Hence Proposition4.3.

Now we set

I_t(f )=I_t²(f )−EI_t²(f ). (4.10)

Further we need the following correlation measures for two integrated[0,+∞)→Rfunctionsf andg _f,g= max

0≤v≤n max

0≤t≤n−v

_t

0

f (u+v)g(u)du

(4.11)

and

_f,g^∗ =max(f,g, _g,f). (4.12)

For any bounded[0,∞)→Rfunctionf we introduce the following uniform norm f∗,n= sup

0≤t≤n

f (t ).

To check the condition(C2)we need the following non-asymptotic upper bound

Theorem 4.4. For any left-continuous[0,∞)→Rfunctionsf,gwithf_∗,n≤1andg_∗,n≤1 EIn(f )In(g)≤nMQ

1+_f,g^∗ +_f,1^∗ +_1,g^∗

, (4.13)

whereMQ=116²+33λ⁴₂EY₁⁴.

Proof. Taking in (4.7)–(4.8)g=f andV_t(f )=ζ_t(f )−Z_t(f ), one comes to the following stochastic equation dIt(f )=2aVt(f )f (t)dt+dMt(f ), I0(f )=0, (4.14) whereM_t(f )=2t

0I_s₋(f )f (s)dus+₂²t

0f²(s)dms and m_t=

0≤s≤t

z²_s−λt.

Efficient robust nonparametric estimation in a semimartingale regression model