www.imstat.org/aihp 2012, Vol. 48, No. 4, 1217–1244
DOI:10.1214/12-AIHP488
© Association des Publications de l’Institut Henri Poincaré, 2012
Efficient robust nonparametric estimation in a semimartingale regression model 1
Victor Konev
aand Serguei Pergamenshchikov
baDepartment of Applied Mathematics and Cybernetics, Tomsk State University, Lenin str. 36, 634050 Tomsk, Russia. E-mail:[email protected] bLaboratoire de Mathématiques Raphael Salem, Avenue de l’Université, BP. 12, Université de Rouen, F76801, Saint Etienne du Rouvray, Cedex
France and Department of Mathematics and Mechanics, Tomsk State University, Lenin str. 36, 634041 Tomsk, Russia.
E-mail:[email protected]
Received 17 November 2010; revised 2 February 2012; accepted 12 March 2012
Abstract. The paper considers the problem of robust estimating a periodic function in a continuous time regression model with the dependent disturbances given by a general square integrable semimartingale with an unknown distribution. An example of such a noise is a non-Gaussian Ornstein–Uhlenbeck process with jumps (see (J. R. Stat. Soc. Ser. B Stat. Methodol.63(2001) 167–241), (Ann. Appl. Probab.18(2008) 879–908)). An adaptive model selection procedure, based on the weighted least square estimates, is proposed. Under general moment conditions on the noise distribution, sharp non-asymptotic oracle inequalities for the robust risks have been derived and the robust efficiency of the model selection procedure has been shown. It is established that, in the case of the non-Gaussian Ornstein–Uhlenbeck noise, the sharp lower bound for the robust quadratic risk is determined by the limit value of the noise intensity at high frequencies. An example with a martinagale noise exhibits that the risk convergence rate becomes worse if the noise intensity is unbounded.
Résumé. Dans cette article nous considérons le problème d’estimation robuste d’une fonction périodique dans un modèle de régression en temps continu avec un bruit dépendant décrit par une semi martingale carrée intégrable de distribution inconnue. Un exemple de ce bruit est un processus d’Ornstein–Uhlenbeck non gaussien avec sauts (voir (J. R. Stat. Soc. Ser. B Stat. Methodol.63 (2001) 167–241), (Ann. Appl. Probab.18(2008) 879–908)). Nous proposons une procédure adaptative de sélection de modèle basée sur les estimateurs des moindres carrés pondérés. Sous des conditions générales sur les deux premiers moments de la distribution du bruit, des inégalités d’Oracle non asymptotiques pointues pour des risques quadratiques robustes sont obtenues et l’efficacité robuste est établie. Nous avons établi aussi que dans le cas du processus d’Ornstein–Uhlenbeck non Gaussian, la borne inférieure pour le risque quadratique robuste est donnée par la limite de l’intensité du bruit quand la fréquence tend vers l’infini. Nous donnons un exemple d’un modèle de régression avec un bruit martingale où la vitesse de convergence du risque quadratique devient plus lente si l’intensité du bruit tend vers l’infini.
MSC:Primary 62G08; secondary 62G05
Keywords:Non-asymptotic estimation; Robust risk; Model selection; Sharp oracle inequality; Asymptotic efficiency
1. Introduction
Consider a regression model in continuous time
dyt=S(t)dt+dξt, 0≤t≤n, (1.1)
1Supported by the RFBR-Grant 09-01-00172-a.
whereS is an unknown 1-periodicR→Rfunction,S∈L2[0,1];(ξt)t≥0is an unobservable semimartingale noise with the values in the Skorokhod spaceD[0, n]such that, for any functionf fromL2[0, n], the stochastic integral
In(f )= n
0
f (s)dξs (1.2)
is well defined and has the following properties EQIn(f )=0 and EQIn2(f )≤σQ
n
0
f2(s)ds. (1.3)
HereEQ denotes the expectation with respect to the distribution QinD[0, n] of the process (ξt)0≤t≤n, which is assumed to belong to some probability familyQnspecified below;σQ>0 is some positive constant depending on the distributionQ.
The problem is to estimate the unknown functionSin the model (1.1) on the basis of observations(yt)0≤t≤n. The class of the disturbancesξ satisfying conditions (1.3) is rather wide and comprises, in particular, the Lévy processes which are used in different applied problems (see [4,16], for details). The models (1.1) with the Lévy’s type noise naturally arise (see [18]) in the nonparametric functional statistics problems (see, for example, [8]). Moreover, as is shown in Section2, non-Gaussian Ornstein–Uhlenbeck-based models, introduced in [2], enter this class.
We define the error of an estimateS(any real-valued function measurable with respect toσ{yt,0≤t≤n}) forS by its integral quadratic risk
RQ(S, S):=EQ,SS−S2, (1.4)
whereEQ,Sstands for the expectation with respect to the distributionPQ,Sof the process (1.1) with a fixed distribution Qof the noise(ξt)0≤t≤nand a given functionS; · is the norm inL2[0,1], i.e.
f2:=
1
0
f2(t)dt. (1.5)
Since in our case the noise distributionQis unknown, it seems natural similar to [10] to measure the quality of an estimateSby the robust risk defined as
R∗n(S, S)= sup
Q∈QnRQ(S, S) (1.6)
which assumes taking supremum of the error (1.4) over the whole family of admissible distributionsQn.
We will treat the stated problem from the standpoint of the model selection approach. It will be noted that the origin of this method goes back to early seventies with the pioneering papers by Akaike [1] and Mallows [22] who proposed to introduce penalizing in a log-likelihood type criterion. The further progress has been made by Barron, Birgé and Massart [3,23], who developed a non-asymptotic model selection method which enables one to derive non- asymptotic oracle inequalities for nonparametric regression models with the i.i.d. Gaussian disturbances. An oracle inequality yields the upper bound for the estimate risk via the minimal risk corresponding to a chosen family of estimates. Galtchouk and Pergamenshchikov [9] applied the Barron–Birgé–Massart technic to the problem of estimat- ing nonparametric drift functions in ergodic diffusion processes. Fourdrinier and Pergamenshchikov [7] extended the Barron–Birgé–Massart method to the models with the spherically symmetric dependent observations. They proposed a model selection procedure based on the improved least squares estimates. Lately, the authors [17] applied this method to the nonparametric problem of estimating a periodic function in a continuous time model with a Gaussian colored noise. In all cited papers, the non-asymptotic oracle inequalities have been derived, which enable one to establish the optimal convergence rate for the minimax risks. In addition to the optimal convergence rate, the other important problem is that of the efficiency of adaptive estimation procedures. In order to examine the efficiency property of a procedure one has to obtain thesharp oracle inequalities, i.e. such in which the factor at the principal term in the right-hand of the inequality is close to unity.
The first result on sharp inequalities is most likely due to Kneip [15] who studied a Gaussian regression model in discrete time. It will be observed that the derivation of oracle inequalities usually rests upon the fact that the initial
model, by applying the Fourier transformation, can be reduced to a Gaussian model with independent observations.
However, such a transform is possible only for Gaussian models with independent homogeneous observations or for the inhomogeneous ones with the known correlation characteristics. This restriction significantly narrows the area of application of the proposed model selection procedures and rules out a broad class of models including, in particular, heteroscedastic regression models widely used in econometrics (see, for example, [5,14]). For constructing adaptive procedures in the case of inhomogeneous observations one needs to modify the approach to the estimation problem.
Galtchouk and Pergamenshchikov [11–13] have developed a new estimation method intended for the heteroscedastic regression models in discrete time. The heart of this method is to combine the Barron–Birgé–Massart non-asymptotic penalization method [3] and the Pinsker weighted least square method which minimizes the asymptotic risk (see, for example, [24,25]). This yields a significant improvement in the performance of the procedure (see numerical example in [11]).
The goal of this paper is to develop the robust efficient model selection method for the model (1.1) with dependent disturbances having unknown distribution. We follow the approach proposed by Galtchouk and Pergamenshchikov [11] in the construction of the procedure. Unfortunately, their method of obtaining the oracle inequalities is essentially based on the independence of observations and can not be applied here. This paper proposes the new analytical tools which allow one to obtain the sharp non-asymptotic oracle inequalities for robust risks under general conditions on the distribution of the noise in the model (1.1). This method enables us to treat both the cases of dependent and independent observations from the same standpoint, it does not assume the knowledge of the noise distribution and leads to the efficient estimation procedure with respect to the risk (1.6). The validity of the conditions, imposed on the noise in Eq. (1.1) is verified for a non-Gaussian Ornstein–Uhlenbeck process and for a martinagale with the increasing variance (see Section2).
The rest of the paper is organized as follows. In Section3we construct the model selection procedure on the basis of weighted least squares estimates and state the main results in the form of oracle inequalities for the quadratic risk (1.4) and the robust risk (1.6). Here we also specify the set of admissible weight sequences in the model selection procedure. In Section 4 we establish some properties of the stochastic integrals with respect to the non-Gaussian Ornstein–Uhlenbeck process (2.1). Section7gives the proofs of the main results. In Sections5,6it is shown that the proposed model selection procedure for estimatingSin (1.1) is asymptotically efficient with respect to the robust risk (1.6). Section7gives the proofs of the oracle inequalities for the regression model (1.1) with the noises introduced in Section2. In theAppendixsome auxiliary propositions are given.
2. Semimartingale noises
In this section two examples of the disturbances(ξt)t≥0in (1.1) are given.
2.1. Non-Gaussian Ornstein–Uhlenbeck process
First we consider the disturbances(ξt)t≥0 in (1.1) given by a non-Gaussian Ornstein–Uhlenbeck process with the Lévy subordinator. Such processes are used in the financial Black–Scholes type markets with jumps (see for example [6] and the references therein). Let the noise process in (1.1) obey the equation
dξt=aξtdt+dut, ξ0=0, (2.1)
wherea≤0,ut=1wt+2zt,1and2are unknown constants,(wt)t≥0is a standard Brownian motion,(zt)t≥0is a compound Poisson process defined as
zt=
Nt
j=1
Yj. (2.2)
Here (Nt)t≥0 is a standard homogeneous Poisson process with unknown intensity λ >0 and (Yj)j≥1 is an i.i.d.
sequence of random variables with
EYj=0, EYj2=1 and EYj4<∞. (2.3)
Let(T )k≥1denote the arrival times of the process(Nt)t≥0, that is,
Tk=inf{t≥0: Nt=k}. (2.4)
We assume that the parametersλ,a,1and2satisfy the conditions
−amax≤a≤0, λ≥λ∗, min≤12+λ22≤max. (2.5)
LetQndenote the family of all distributions of process (2.1) on the spaceD[0, n]with the parametersa,λ,1and2 satisfying the conditions (2.5) with fixed boundsλ∗>0,amax>0,min>0 andmax>0.
It will be observed that the process (1.1)–(2.1) may be used for modelling of the stock prices in the financial markets of the Black–Scholes type with jumps (see, e.g., [20], p. 141). In this case the price process(Xt)0≤t≤n is governed by the stochastic differential equation:
dXt
Xt =S(t)dt+dξt, (2.6)
where(ξt)t≥0is an internal random factor specified by Eq. (2.1) andS(t)is a periodic stock-apprecaition rate which has to be estimated from the observations
yt=y0+ t
0
Xu−1dXu.
The solution to Eq. (2.6) is given by the Dolean exponent, i.e.
Xt=X0exp
yt−y0+ t
0
S(u)−21 2
du+Λt
,
whereΛt= Tj≤t(ln(1+2Yj)−2Yj).
To use the model (2.6) for describing the stock prices dynamics one needs to require that for allj≥1 1+2Yj>0 a.s.
2.2. Martingale noise
Next we consider a martinagale noise obeying the equation
dξt=1(t)dwt+2(t)dzt, (2.7)
where1 and2 are continuously differentiable R+→R nonrandom functions; the process(zt)t≥0 is defined in (2.2)–(2.3). Assume that, there exist constantsλ∗>0,min>0 and aR+→R+continuous functionmax(·)such that for allt≥0
λ≥λ∗, min≤12(t)+λ22(t)≤max(t) (2.8)
and, for anyδ >0,
tlim→∞
max(t)
tδ =0. (2.9)
Moreover, we assume that the derivatives of functionsi for some positive constants∗and ∗satisfy the following conditions
sup
t≥0
(t+1) max
1≤i≤2
d dti2(t)
≤∗, sup
t≥0
d2 dt22i(t)
≤ ∗. (2.10)
In this case we denote byQnthe family of all distributions of the process (2.7) onD[0, n]satisfying the conditions (2.8) and (2.10) for someλ∗,min,max(·),∗and∗ .
3. Model selection
This section gives the construction of a model selection procedure for estimating a functionSin (1.1) on the basis of weighted least square estimates and states the main results.
For estimating the unknown function S in the model (1.1), we apply its Fourier expansion in the trigonometric basis(φj)j≥1inL2[0,1]defined as
φ1=1, φj(x)=√ 2 Trj
2π[j/2]x
, j≥2, (3.1)
where the function Trj(x)=cos(x)for evenj and Trj(x)=sin(x)for oddj;[x]denotes the integer part ofx. The corresponding Fourier coefficients
θj=(S, φj)= 1
0
S(t)φj(t)dt (3.2)
can be estimated as θj,n=1
n n
0
φj(t)dyt. (3.3)
In view of (1.1), one obtains
θj,n=θj+ 1
√nξj,n, ξj,n= 1
√nIn(φj), (3.4)
whereIn(φj)is given in (1.2).
For any sequencex=(xj)j≥1, we set
|x|2= ∞ j=1
xj2 and #(x)= ∞ j=1
1{|xj|>0}. (3.5)
Now we impose some additional conditions on the familyQnof distributions of the noise(ξt)t≥0in (1.1).
(C1) There exists a variance proxyςQ>0such that for anyn≥1 L1,n(Q)= sup
x∈H,#(x)≤n
∞ j=1
xj
EQξj,n2 −ςQ <∞, whereH= [−1,1]∞.
(C2) Assume that for eachn≥1 L2,n(Q)= sup
|x|≤1,#(x)≤n
EQ
∞
j=1
xjξj,n
2
<∞,
whereξj,n=ξj,n2 −EQξj,n2 .
As is shown in the proof of Theorem3.5in Section7, both conditions(C1)and(C2)hold for the process (2.1).
Further we introduce a class of weighted least squares estimates forS(t)as Sγ =∞
j=1
γ (j )θj,nφj, (3.6)
whereγ=(γ (j ))j≥1is a sequence of weight coefficients such that
0≤γ (j )≤1 and 0<#(γ )≤n. (3.7)
LetΓ denote a finite set of such weight sequencesγ=(γ (j ))j≥1,ν=card(Γ )be its cardinal number and μ=max
γ∈Γ #(γ ). (3.8)
The model selection procedure for the unknown functionSin (1.1) will be constructed on the basis of a family of estimates(Sγ)γ∈Γ. The choice of a specific set of weight sequencesΓ is discussed at the end of this section. To find a proper weight sequenceγ in the setΓ, one needs to specify a cost function. When choosing an appropriate cost function one can use the following argument. The empirical squared error
Errn(γ )= Sγ−S2 can be written as
Errn(γ )= ∞ j=1
γ2(j )θj,n2 −2 ∞ j=1
γ (j )θj,nθj+ ∞ j=1
θj2. (3.9)
Since the Fourier coefficients(θj)j≥1are unknown, the weight coefficients(γj)j≥1can not be found by minimizing this quantity. To circumvent this difficulty one needs to replace the termsθj,nθj by their estimatorsθj,n. We set
θj,n=θj,n2 −σn
n, (3.10)
whereσnis some estimator for the variance proxyςQin the condition(C1).
For this change in the empirical squared error, one has to pay some penalty. Thus, one comes to the cost function of the form
Jn(γ )=∞
j=1
γ2(j )θj,n2 −2 ∞ j=1
γ (j )θj,n+ρPn(γ ), (3.11)
whereρis some positive constant,P (γ ) is the penalty term defined as Pn(γ )=σn|γ|2
n . (3.12)
In the case, when the value ofςQin(C1)is known, one can takeσn=ςQand Pn(γ )=ςQ|γ|2
n . (3.13)
Substituting the weight coefficients, minimizing the cost function
γ=argmin
γ∈Γ
Jn(γ ), (3.14)
in (3.6) leads to the model selection procedure
S∗=Sγ. (3.15)
It will be noted thatγ exists becauseΓ is a finite set. If the minimizing sequence in (3.14)γ is not unique, one can take any minimizer.
First we consider the case when the proxy varianceςQin(C1)known.
Proposition 3.1. If the conditions(C1)and(C2)hold for the distributionQof the processξ in(1.1),then the risk (1.4)of estimate(3.15)forSsatisfies the oracle inequality
RQ(S∗, S)≤1+3ρ−2ρ2 1−3ρ min
γ∈ΓRQ(Sγ, S)+1
nBQ(n, ρ), (3.16)
where
BQ(n, ρ)=2ςQσQν+2ςQL1,n(Q)+νL2,n(Q)
ςQρ(1−3ρ) +6μRQ(σn, ςQ) 1−3ρ . This result can be proved along the lines of Theorem 1 in [18].
Now we specify the classQnof admissible distributionsQin the robust risk in (1.6).
LetQnbe a set of noise distributionsQon the spaceD[0, n]satisfying (1.3),(C1),(C2)and the following condi- tions.
(H0) The factorσQin(1.3)and the proxy varianceςQin(C1)are such that for eachn≥1 ςn∗:= sup
Q∈Qn
ςQ<∞ and σn∗:= sup
Q∈Qn
σQ<∞,
(3.17) ς∗:= inf
Q∈QnςQ>0 and, moreover, for anyδ >0
nlim→∞
ςn∗+σn∗
nδ =0. (3.18)
(H1) The functionalsL1,n(Q)andL2,n(Q)in(C1),(C2)are uniformly bounded on the setQn, i.e. for eachQ∈Qn
L1,n(Q)≤L∗1,n, L2,n(Q)≤L∗2,n
and the numerical sequences(L∗i,n)n≥1,i=1,2, are such that for anyδ >0
nlim→∞
L∗1,n+L∗2,n nδ =0.
Theorem 3.2. Suppose that the family of admissible noise distributionsQn for the model (1.1)is defined by the conditions(C1),(C2)and(H0),(H1).Then the robust risk(1.6)of the estimate(3.15)forS(t)satisfies for anyn≥1 and0< ρ <1/3the oracle inequality
R∗n(S∗, S)≤1+3ρ−2ρ2 1−3ρ min
γ∈ΓR∗n(Sγ, S)+1
nB∗(n, ρ), (3.19)
where
B∗(n, ρ)=2ςn∗σn∗ν+2ςn∗L∗1,n+νL∗2,n
ς∗ρ(1−3ρ) +6μR∗n(σn) 1−3ρ andR∗n(σn)=supQ∈QnRQ(σn, ςQ).
3.1. The case of unknownςQ
If the variance proxyςQin the condition(C1)is unknown it can be estimated as σn=
n j=[√
n]+1
θj,n2 , n≥2. (3.20)
Proposition 3.3. Suppose that the conditions(C1)and (C2) hold for the model(1.1)and S(·) is a continuously differentiable function such that
| ˙S|1= 1
0
S(t)˙ dt <+∞. (3.21)
Then,for anyn≥2,
RQ(σn, ςQ)≤κn(Q, S)
√n , (3.22)
where
κn(Q, S)=4| ˙S|21
1+
√σQ n1/4
+ςQ+
L2,n(Q)+L1,n(Q) n1/2 . This assertion is a direct consequence of Proposition 4 in [18].
Propositions3.1and3.3allow one to obtain the following non-asymptotic oracle inequality.
Theorem 3.4. Let the distribution familyQnbe as in Theorem3.2with unknownςQ andS in(1.1)satisfy(3.21).
Then,for anyn≥1and0< ρ <1/3,the model selection procedure(3.15), (3.20)satisfies the oracle inequality R∗n(S∗, S)≤1+3ρ−2ρ2
1−3ρ min
γ∈ΓR∗n(Sγ, S)+1
nB1∗(n, ρ), (3.23)
where
B∗1(n, ρ)=2ςn∗σn∗ν+2ςn∗L∗1,n+νL∗2,n
ς∗ρ(1−3ρ) + 6μκn∗(S) (1−3ρ)√ n
and
κn∗(S)=4| ˙S|21
1+ σn∗
n1/4
+ςn∗+
L∗2,n+L∗1,n n1/2. Moreover,for anyδ >0,
nlim→∞
B∗1(n, ρ) nδ =0.
Now we will obtain the oracle inequalities for the model (1.1) with the noises introduced in Section2. We will need the following parameter
M∗=1162max+332maxEY14
λ∗ . (3.24)
Theorem 3.5. LetQnbe the distribution family for the Ornstein–Uhlenbeck process(2.1)with the parameters meet- ing(2.5).Then,for anyn≥1 and0< ρ <1/3,the estimator(3.15)satisfies the oracle inequality(3.23)with the parametersσn∗=3max,ςn∗=max,ς∗=min,L∗1,n=2(4a2max+15amax+2)maxandL∗2,n=82M∗.
Proof of this theorem is given in Section7.
Remark 3.1. It will be noted that the oracle inequality(3.23)for the model(1.1)–(2.1)holds uniformly in the stability region of the process(2.1)including its boundary,i.e.the case whena=0.
When considering the estimation problem for the model (1.1) with the martingale noise (2.7) we will use two sequences
l1,n=2(1+λ∗)∗ln(n+1) and
l2,n=(1+λ∗)
max(0)+2∗ln(n+1)
(3.25)
with the constants defined in (2.8), (2.10).
Theorem 3.6. LetQn be the family of distributions of the process(2.7)with the parameters meeting(2.8)–(2.10).
Then,for anyn≥1and0< ρ <1/3,the estimator(3.15)satisfies the oracle inequality(3.23)withσn∗= max∗,n, ς∗=min,
ςn∗=n−1 n
0
max(u)du, L∗1,n=∗+ ∗/2 and
L∗2,n=l1,n2
n +2l22,n+4λ∗EY14max4∗,n. Proof of this theorem is given in Section7.
Remark 3.2. If in the model(2.7) limt→∞max(t)= ∞,thenςn∗≈max(n)→ ∞asn→ ∞and,by virtue of the condition(2.9), limn→∞ςn∗/nδ=0for eachδ >0.
3.2. Specification of weights in the model selection procedure(3.15)
We will specify the weight coefficients(γ (j ))j≥1in the way proposed in [11] for a heteroscedastic regression model in discrete time. Consider a numerical grid of the form
An=
1, . . . , k∗
× {t1, . . . , tm}, (3.26)
whereti=iεandm= [1/ε2]. Both parametersk∗≥1 and 0< ε≤1 are assumed to be functions ofn, i.e.k∗=k∗(n) andε=ε(n), such that for anyδ >0
limn→∞k∗(n)= +∞, limn→∞k∗(n) lnn =0,
limn→∞ςn∗ε(n)=0 and limn→∞nδε(n)= +∞, (3.27)
whereςn∗is the least upper bound of the noise variance proxy defined in (3.17). One can take, for example, ε(n)= 1
ln(n+1) and k∗(n)=
ln(n+1).
For eachα=(β, t)∈An, we introduce the weight sequenceγα=(γα(j ))j≥1as γα(j )=1{1≤j≤j0}+
1−(j/ωα)β
1{j0<j≤ωα}, (3.28)
wherej0=j0(α)= [ωα/ln(n+1)],
ωα=(τβt n)1/(2β+1) and τβ=(β+1)(2β+1) π2ββ . We set
Γ = {γα, α∈An}. (3.29)
It will be noted that in this caseν=k∗m.
Remark 3.3. It will be observed that the specific form of weights(3.28)was proposed by Pinsker[25]for the filtra- tion problem with known smoothness of the regression function observed with an additive Gaussian white noise in continuous time.Nussbaum[24]used these weights for the Gaussian regression estimation problem in discrete time.
The minimal mean square risk,called the Pinsker constant,is provided by the weight least squares estimate with the weights where the indexαdepends on the smoothness order of the functionS.If the smoothness order is unknown one has to use,instead of one estimate,a whole family of estimates containing,in particular,the optimal one.
In this case the problem is to study the properties of the whole class of estimates. Below we derive an oracle inequality for this class which yields the best mean square risk up to a multiplicative and additive constants provided that the smoothness of the unknown functionS is not available.Moreover,it will be shown that the multiplicative constant tends to unity and the additive one vanishes asn→ ∞with the rate higher than any minimax rate.
In view of the assumptions (3.27), for anyδ >0, one has
nlim→∞
ν nδ =0.
Moreover, by (3.28) for anyα∈Un
∞ j=1
1{γα(j )>0}≤ωα.
Therefore, taking into account thatAβ≤A1<1 forβ≥1, we get μ=μn≤(n/ε)1/3
and for anyδ >0
nlim→∞
μn n1/3+δ =0.
To study the asymptotic behaviour of the termB1∗(n, ρ)we assume that the parameterρ in the cost function (3.11) depends onn, i.e.ρ=ρnsuch thatρn→0 asn→ ∞and for anyδ >0
nlim→∞nδρn=0. (3.30)
Applying this limiting relation in the analysis of the additive termB∗1(n, ρ)in (3.23) yields the following result.
Theorem 3.7. Assume that the family distributionQn satisfies the condition(H0)and the unknown function S is continuously differentiable satisfying the condition(3.21).Then,for anyn≥1,the model selection procedure(3.15), (3.30), (3.20), (3.29)satisfies the oracle inequality(3.23)with the additive termB1∗(n, ρ)obeying,for anyδ >0,the following limiting relation
nlim→∞
B∗1(n, ρn) nδ =0.
4. Stochastic integrals with respect to the process (2.1)
In this section we establish some properties of the stochastic integral It(f )=
t
0
f (s)dξs, 0≤t≤n, (4.1)
with respect to the process (2.1). We need some notations. Let us denote εf(t)=a
t 0
ea(t−v)f (v)
1+e2av
dv, (4.2)
wheref is a[0,+∞)→Rfunction integrated on any finite interval. We introduce also the following transformation τf,g(t)=1
2 t
0
2f (s)g(s)+ε∗f,g(s)
ds (4.3)
of square integrable[0,+∞)→Rfunctionsf andg. Here ε∗f,g(t)=f (t )εg(t)+εf(t)g(t).
It will be noted that aτf,1(t)=1
2εf(t) and aτ1,1(t)=1 2
e2at−1
. (4.4)
Moreover, we set
τf,g∗ (t)=τf,g(t)+f (t )τ1,g(t)+g(t)τf,1(t)+f (t )g(t )τ1,1(t). (4.5) We can rewrite this function as
τf,g∗ (t)=τf,g(t)+ε∗f,g(t)+f (t )g(t )(e2at−1)
2a .
Proposition 4.1. Iff andgare fromL2[0, n]then
EIt(f )It(g)=τf,g(t), (4.6)
where=21+λ22.
Proof. Noting that the processIt(f )satisfies the stochastic equation dIt(f )=af (t)ξtdt+f (t )dut, I0(f )=0,
and applying the Ito formula (see, for example, [21]) one obtains It(f )It(g)=
t
0
12f (s)g(s)+a
f (s)ζs(g)+g(s)ζs(f ) ds +22
l≥1
f (Tl)g(Tl)Yl21{Tl≤t}+ t
0
Υs−(f, g)dus, (4.7)
whereζs(f )=Is(f )ξsandΥs(f, g)=f (s)Is(g)+g(s)Is(f ). This yields EIt(f )It(g)=a
t
0
f (s)Zs(g)+g(s)Zs(f ) ds +
t
0
f (s)g(s)ds, (4.8)
whereZs(f )=Eζs(f ). Putting hereg=1 and taking into account thatEξs2=(e2as−1)/2a, we obtainZs(f )=
εf(s)/2a. This implies immediately (4.6). Hence Proposition4.1.
Further, for integrated[0,+∞)→Rfunctionsf andg, we define the[0,+∞)× [0,+∞)→Rfunction Df,g(x, z)=
x 0
L∗f,g(y, z)dy+f (z)g(z), (4.9)
whereL∗f,g(y, z)=g(y+z)Lf(y, z)+f (y+z)Lg(y, z);
Lf(x, z)=aeax
f (z)+a x
0
eavf (v+z)dv
.
Proposition 4.2. Let G=σ{Tk, k≥1},beσ-algebra generated by the stopping times(2.4),f and g be bounded left-continuous[0,∞)×Ω→Rfunctions measurable with respect toB[0,+∞)⊗G(the productσ-algebra created byB[0,+∞)andG).Then,for anyk≥1,
E
ITk−(f )|G
=0
andE(ITk−(f )ITk−(g)|G)=21τf,g(Tk)+22 kl=−11Df,g(Tk−Tl, Tl).
Proof. Taking the conditional expectationE(·|G)in (4.7) yields E
It(f )It(g)|G
= t
0
21f (s)g(s)ds+22
l≥1
f (Tl)g(Tl)1{Tl≤t}
+a t
0
f (s)Zg(s)+g(s)Zf(s) ds, whereZf(s)=E(Is(f )ξs|G). By direct calculation we find
Zf(t)=a t
0
f (s)E(ξtξs|G)ds+12 t
0
ea(t−s)f (s)ds +22
l≥1
f (Tl)ea(t−Tl)1{Tl≤t}.
Taking into account here that for any 0≤s≤t E(ξtξs|G)=ea(t−s)
21 2a
e2as−1
+22
l≥1
e2a(s−Tl)1{Tl≤s}
,
one obtains, aZf(t)=12
2 εf(t)+22
j≥1
Lf(t−Tj, Tj)1{Tj≤t}.
From here one comes to the desired equality.
Proposition 4.3. LetF,f andgbe nonrandom bounded left-continuous[0,∞)→Rfunctions.Then
E
k≥1
F (Tk)ITk−(f )ITk−(g)1{Tk≤t}=λ t
0
F (v)τf,g(v)dv.
Proof. We setι(t )=E k≥1F (Tk)ITk−(f )ITk−(g)1{Tk≤t}. By applying Proposition4.2one gets
ι(t )=12E
k≥1
F (Tk)τf,g(Tk)1{Tk≤t}+22E
k≥1
F (Tk)
k−1
l=1
Df,g(Tk−Tl, Tl)1{Tk≤t}
:=12ι1(t)+22ι2(t),
where
ι1(t)=λ t
0
l≥1
F (z)τf,g(z)(λz)l−1
(l−1)!e−λzdz=λ t
0
F (z)τf,g(z)dz.
To calculateι2(t)we note that ι2(t)=E
l≥1
1{Tl≤t}
k≥l+1
F (Tk)Df,g(Tk−Tl, Tl)1{Tk≤t}.
Taking into account thatTk−Tlis independent ofTlfor anyk > lwe obtain ι2(t)=λE
l≥1
1{Tl≤t}
t−Tl
0
k≥l+1
F (z+Tl)Df,g(z, Tl) (λz)k−l−1
(k−l−1)!e−λzdz
=λE
l≥1
1{Tl≤t}
t−Tl
0
F (z+Tl)Df,g(z, Tl)dz
=λ2 t
0
t−x
0
F (z+x)Df,g(z, x)dz dx=λ2
t
0
F (z)τf,g(z)dz.
Hence Proposition4.3.
Now we set
It(f )=It2(f )−EIt2(f ). (4.10)
Further we need the following correlation measures for two integrated[0,+∞)→Rfunctionsf andg f,g= max
0≤v≤n max
0≤t≤n−v
t
0
f (u+v)g(u)du
(4.11)
and
f,g∗ =max(f,g, g,f). (4.12)
For any bounded[0,∞)→Rfunctionf we introduce the following uniform norm f∗,n= sup
0≤t≤n
f (t ).
To check the condition(C2)we need the following non-asymptotic upper bound
Theorem 4.4. For any left-continuous[0,∞)→Rfunctionsf,gwithf∗,n≤1andg∗,n≤1 EIn(f )In(g)≤nMQ
1+f,g∗ +f,1∗ +1,g∗
, (4.13)
whereMQ=1162+33λ42EY14.
Proof. Taking in (4.7)–(4.8)g=f andVt(f )=ζt(f )−Zt(f ), one comes to the following stochastic equation dIt(f )=2aVt(f )f (t)dt+dMt(f ), I0(f )=0, (4.14) whereMt(f )=2t
0Is−(f )f (s)dus+22t
0f2(s)dms and mt=
0≤s≤t
z2s−λt.