"Adaptive estimation of the dynamics of a discrete time stochastic volatility model",

(1)

Adaptive estimation of the dynamics of a discrete time stochas- tic volatility model

F. Comte¹ C. Lacour¹ Y. Rozenholc¹

(1)Universit ´e Paris Descartes, MAP5, UMR CNRS 8145.

fabienne.comte@univ-paris5.fr, claire.lacour@univ-paris5.fr, yves.rozenholc@univ-paris5.fr.

Summary. This paper is concerned with the particular hidden model:Xi+1=b(Xi) +σ(Xi)ξi+1, Zi= Xi+εi, where(ξi)and(εi)are independent sequences of i.i.d. noise. Moreover, the sequences(Xi) and(εi)are independent and the distribution ofεis known. Our aim is to estimate the functionsband σ² when only observationsZ1, . . . , Znare available. We propose to estimatebfand(b²+σ²)f and study the integrated mean square error of projection estimators of these functions on automatically selected projection spaces. By ratio strategy, estimators ofbandσ² are then deduced. The mean square risk of the resulting estimators are studied and their rates are discussed. Lastly, simulation experiments are provided: constants in the penalty functions defining the estimators are calibrated and the quality of the estimators is checked on several examples.

Keywords. Adaptive Estimation; Autoregression; Deconvolution; Heteroscedastic; Hidden Markov Model; Nonparametric Projection Estimator.

1. Introduction

When price processes of assets are observed, generally in discrete time, the dynamics of the unobserved underlying volatility process proves very interesting. Therefore, the so-called discrete time stochastic volatility model has recently become most popular and widely studied, see Ghysels et al. (1996) or Shephard (1996). In this paper, we propose a statistical strategy corresponding to the following model:

Y_i= exp(X_i/2)η_i,

X_i+1=b(X_i) +σ(X_i)ξ_i+1,

where (ηi) and (ξi) are two independent sequences of independent and identically distributed (i.i.d.) random variables (noise processes). The only available observations areY1, . . . , Yn, the process of interest is the unobserved volatilityVi= exp(Xi/2). We describe an estimation method leading to nonparametric estimates of the functions b andσ² driving the dynamics of the volatility process (V_i).

To achieve this aim, we use a deconvolution strategy, which is made possible through the rewriting of the model as follows

Z_i=X_i+ε_i

Xi+1=b(Xi) +σ(Xi)ξi+1

(1) whereεi = ln(η_i²)−E(ln(η²_i)) andZi= ln(Y_i²)−E(ln(η_i²)).

In such a setting, regarding the identifiability of the model, it must be assumed that the distribution of ε, fε (or equivalently of η) is fully known. For instance, the process η is often

(2)

modelled as a standard Gaussian i.i.d. sequence, and thenεihas the distribution of ln(N(0,1)²) + ln(2) +CwhereC is the Euler constant. Van Eset al.(2005) specifically study this case from the point of view of density estimation, however more general distributions can also be considered (see Comteet al.(2006b)).

Model (1) may be considered as a non-linear autoregressive model observed with an additive noise with known (but general) distribution. In this case, the process is sometimes called autoregression with errors-in-variables. Such models have already been studied, but in parametric or semi-parametric context only (see Chanda (1995), Comte and Taupin (2001)).

Lastly, Model (1) belongs to the general class of hidden Markov models (HMM). These models constitute a very famous class of discrete time processes with applications in various areas (see Capp´e, Moulines and Ryden (2005)). Here our model is simpler in the sense that our noise is additive, but in standard HMMs it is assumed that the joint density of (Xi, Zi) has a parametric form.

To our knowledge, the question of estimatingbandσ²in Model (1) on the basis of observations Z1, . . . , Zn has not been studied yet. Only the following regressive model Zi = Xi +εi, Yi = b(Xi) +ξi, in which (Yi) and (Zi) fori= 1, . . . , n+ 1 are observed, has received attention. Then two processes are observed, all sequences (Xi), (ξi), (εi) can be supposed independent identically distributed (i.i.d.) and independent from each other, and (Yi) is homoscedastic (σ(x) ≡ 1). In this context, Fan and Truong (1990), and Comte and Taupin (2007) study the problem of the estimation ofb. See also Fanet al.(1991), Fan and Masry (1992), Ioannides and Alevizos (1997), Koo and Lee (1998). Most authors propose estimators of b based on the ratio of two estimators and this quotient strategy is also adopted in our more general setting. More precisely, we assume that the process (X_i) is stationary, with stationary density denoted byf, and we estimateb(resp.

b²+σ²) as a ratio of an estimator ofbf (resp. (b²+σ²)f) divided by f.

Several papers develop estimation methods forf, see Fan (1991), Pensky and Vidakovic (1999), Comteet al.(2006b), and the optimality of the rates studied in Fan (1991), Butucea (2004) and Butucea and Tsybakov (2007). The adaptive estimator of Comte et al. (2006b) is used in this study. We adopt the same type of projection strategy on automatically selected projection spaces for the numerators. In this respect, this allows to consider general classes of noise densityfε and also various classes of regularities for the functions to estimate (bf, (b²+σ²)f, f).

The proofs of our results involve the study of several centered empirical processes and are interesting more for their general schemes than for their technical details. It is nevertheless worth mentioning that we obtain in the end flexible tools that work in a satisfactory way. The programmes developed for density deconvolution in Comteet al.(2007) can indeed be generalized to the present framework.

The plan of the paper is the following: we first give the notations, the model assumptions and describe projection spaces in Section 2. Next, Section 3 explains the estimation strategy for b and gives bounds of the integrated mean square risk of the estimators. Section 4 develops the same study for the estimation ofσ². Simulation experiments are conducted in Section 5 in order to illustrate the method and compare its performance with previous results. Lastly, proofs are gathered in Sections 6-7-8 and an appendix, namely section 9, describes auxiliary tools.

2. General setting and assumptions

2.1. The principle

Let us assume that the sequence (Xi) is stationary and let us denote byf their stationary density.

The principle of the estimation methods relies in all cases on a “Nadaraya-Watson-strategy” in the sense thatborb²+σ²are estimated as ratio of an estimator of`=bf (respectivelyϑ= (b²+σ²)f) and an estimator of f.

(3)

In all cases, we use the adaptive estimator of f described in Comte et al. (2006b, 2007) which studies independent andβ-mixing contexts.

2.2. Notations and Assumptions

Subsequently we denote byu^∗the Fourier transform of the functionudefined asu^∗(t) =R

e^itxu(x)dx, and bykuk,kuk∞,kuk∞,K, < u, v >,u∗v the quantities

kuk²= Z

u²(x)dx, kuk_∞= sup

x∈R

|u(x)|, kuk_∞,K= sup

x∈K

|u(x)|,

< u, v >=

Z

u(x)v(x)dxwithzz=|z|² andu∗v(x) = Z

u(t)¯v(x−t)dt.

Moreover, we recall that for any integrable and square-integrable functions u, u1, u2,

(u^∗)^∗(x) = 2πu(−x) andhu1, u₂i= (2π)⁻¹hu^∗₁, u^∗₂i. (2) We consider the autoregressive model (1). The assumptions are the following:

A1 (i) Theεi’s are i.i.d. centered (E(ε1) = 0) random variables with finite variance,E(ε²₁) =s²_ε. The density of ε1, fε, belongs toL2(R), and for allx∈R,f_ε^∗(x)6= 0.

(ii) Theξi’s are i.i.d. centered with unit variance (E(ξ₁²) = 1) and E(ξ₁³) = 0.

A2 TheX_i’s are stationary and absolutely regular.

A3 The sequences (X_i)_i∈_Nand (ε_i)_i∈_N are independent. The sequences (ξ_i)_i∈_N and (ε_i)_i∈_N are independent.

TheZi’s are observed but theXi’s are not, the stationary densityf of the Xi’s is unknown and the density fεof theεi’s is known.

Standard assumptions onb,σand theξ_i’s ensure that the sequence (X_i)_i∈_Z is stationary with stationary density denoted by f and absolutely regular, with β-mixing coefficients denoted by β(k), see Doukhan (1994) or Comte and Rozenholc (2002) for precise sets of conditions. We shall consider that the mixing is at least arithmetical with rateθ, i.e. that there existsθ >0 such that

∀k∈N, β(k)≤(1 +k)^−(1+θ), (3) or, more often, geometrical, i.e. ∃θ > 0,∀k ∈ N, β(k) ≤ e^−θk. The definition of the β-mixing coefficients and related properties are recalled in Section 9.

Moreover, as we develop anL²-strategy, we require the target functions to be square-integrable.

A4 The function to estimate (`=bf,ϑ= (b²+σ²)f, orf) is square-integrable.

According to Assumption A3the (unknown) density h of theZi’s equals f ∗fε. It follows that h^∗ = f^∗f_ε^∗ and f^∗ =h^∗/f_ε^∗, a relation which explains the estimation strategy. It is well known that the rate of convergence for estimatingf is strongly related to the rate of decrease off_ε^∗. More precisely, the smoother fε, the slower the rate of convergence for estimatingf is and we shall see that the same happens for the estimation ofbfor (b²+σ²)f. Nevertheless, this rate of convergence can be improved by assuming some additional regularity conditions onf,bf or (b²+σ²)f. These regularity conditions are described by considering functions in the space:

Ss,a,r(A) ={u : Z +∞

−∞

|u^∗(x)|²(x²+ 1)^sexp{2a|x|^r}dx≤A}, (4)

(4)

for nonnegative constants s, a, r andA > 0. When r = 0, this corresponds to Sobolev spaces of order s; whenr >0,a >0, this corresponds to analytic functions, which are often called ”super- smooth” functions.

In the following, we also assume thatfε is such that

A5 For all t in R, A₀(t² + 1)^−γ/2exp{−µ|t|^δ} ≤ |f_ε^∗(t)| ≤ A⁰₀(t²+ 1)^−γ/2exp{−µ|t|^δ}, with γ >1/2 ifδ= 0.

Under Assumption A5, when δ= 0, the errors are usually called “ordinary smooth” errors, and

“super smooth” errors whenδ >0, µ >0. The standard examples are the following : Gaussian or Cauchy distributions are super smooth of order (γ= 0, µ= 1/2, δ= 2) and (γ= 0, µ= 1, δ= 1) respectively, and the Laplace (symmetric exponential) distribution is ordinary smooth (δ= 0) of order γ= 2. Whenε= ln(η²)−E(ln(η²)) withη ∼ N(0,1) as in Van Eset al.(2005), then εis super-smooth withγ= 0, µ=π/2 andδ= 1.

2.3. The projection spaces

As projection estimators are used in all cases, we hereby provide a description of the projection spaces. Let us define

ϕ(x) = sin(πx)

πx and ϕ_m,j(x) =√

mϕ(mx−j),

where m can be replaced by 2^m. It is well known (see Meyer (1990), p.22) that {ϕm,j}_j∈Z is an orthonormal basis of the space of square integrable function having Fourier transform with compact support included into [−πm, πm]. Such a space is denoted bySm.

Sm = Span{ϕ_m,j, j∈Z} ={f ∈L²(R),supp(f^∗)⊂[−mπ, mπ]}.

Moreover, (S_m)_m∈M_n, withMn={1, . . . , mn}, denotes the collection of linear spaces.

In practice, we should consider the truncated spaces S⁽ⁿ⁾m = Span{ϕ_m,j, j ∈ Z, |j| ≤ K_n},whereK_n is an integer depending onn, and the associated estimators under the additional assumption: R

x²ψ²(x)dx < Aψ <∞, whereψ=bf,(b²+σ²)f or f is the function to estimate.

This is done in Comteet al.(2006b) and does not change the main part of the study. For the sake of simplicity, we write in the theoretical part of the present study the sums overZ.

3. Estimation ofb

3.1. The steps of the estimation

3.1.1. First step: the estimators of`=bf

The orthogonal projection of`=bf onSm, `m, is given by

`m=X

j∈Z

am,j(`)ϕm,j witham,j(`) = Z

R

ϕm,j(x)`(x)dx=hϕm,j, `i. (5)

Fortbelonging to a space Smof the collection (Sm)m∈Mn, let γ_n(t) = 1

n

X

i=1

(ktk²−2Z_i+1u^∗_t(Z_i)), u_t(x) = 1 2π

t^∗(−x) f_ε^∗(x) .

(5)

The following sequence of equalities, relying on the Fourier equalities (2), explains the choice of the contrastγn:

E(Z₂u^∗_t(Z₁)) = E(b(X₁)u^∗_t(Z₁)) =hu^∗_t∗f_ε(−.), bfi= 1 2πh t^∗

f_ε^∗(−.)f_ε^∗(−.),(bf)^∗i

= 1

2πht^∗,(bf)^∗i=ht, bfi=E(b(X1)t(X1)) = Z

t(x)b(x)f(x)dx=ht, `i.

Therefore, we find thatE(γn(t)) =ktk²−2h`, ti=kt−`k²− k`k² is minimal whent=`. Thus, we define

`ˆm= arg min

t∈Sm

γn(t) (6)

The estimator can also be written

`ˆm=X

j∈Z

ˆ

am,j(`)ϕm,j, with ˆam,j(`) = 1 n

n

X

i=1

Zi+1u^∗_ϕ_m,j(Zi). (7)

Now, to select an adequate value ofm, we define ˆ`mˆ, by setting ˆ

m= arg min

m∈Mn

n

γn(ˆ`m) + pen(m)o ,

where the penalty function is given by pen(m) =κE(Z₂²)Ψ(m) where

Ψ(m) =







∆(m)

n if 0≤δ <1/3 m[(3δ−1)/2]∧δ∆(m)

n ifδ≥1/3, and ∆(m) = 1 2π

Z πm

−πm

dx

|f_ε^∗(x)|², (8) wherex∧y:= inf(x, y). In practiceE(Z₂²) is unknown and is replaced by its empirical version. The resulting penalty function,pen, then becomes random. We note thatd γn(ˆ`m) =−P

j∈Z[ˆam,j(`)]², which explains (10) below.

3.1.2. Second step: the estimators off

The second stage of the estimation procedure is to estimatef. In fact, Comteet al.(2006b, 2007) explain how to estimatef, in an adaptive way and in a mixing context. The estimator off onSm

is defined by

fˆm=X

j∈Z

ˆ

am,j(f)ϕm,j with ˆam,j(f) = 1 n

n

X

i=1

u^∗_ϕ_m,j(Zi). (9)

Then we define ˆf_m_¨,

¨

m= arg min

m∈Mn







−X

j∈Z

[ˆa_m,j(f)]²+ ¨pen(m)







, (10)

where the penalty function is given by ¨pen(m) = ¨κΨ(m) with Ψ(m) given by (8). For the properties of ˆfm¨ we refer to Comteet al.(2006b). Up to the multiplicative constants, the control of the mean square risk of the estimator is the same as the one obtained for` here.

(6)

3.1.3. Last step: the estimator ofb.

We estimatebon a compact set B only and the following additional assumption is required:

A6 (i) ∀x∈B, f0≤f(x)≤f1for two positive constantsf0 andf1. (ii) b is bounded onB.

Then we can define:

˜b= ˆb_m,_ˆ _m_¨ =

`ˆ_m_ˆ fˆm¨

ifk`ˆ_m_ˆ/fˆ_m_¨k ≤a_n, ˜b= ˆb_m,_ˆ _m_¨ = 0 else, (11) wherean is a sequence to be specified later.

3.2. Risk bound for`ˆmand`ˆmˆ.

We define the following empirical centered process ν_n(t) = 1

n

X

k=1

(Z_k+1u^∗_t(Z_k)− ht, `i), and with (5) and (7), we note that the following equalities hold

k`−`ˆmk² = k`−`mk²+k`m−`ˆmk²=k`−`mk²+X

j∈Z

(am,j(`)−ˆam,j(`))²

= k`−`mk²+X

j∈Z

ν_n²(ϕm,j).

Therefore

Ek`−`ˆ_mk²≤ k`−`_mk²+X

j∈Z

Var[ν_n(ϕ_m,j)].

The following decomposition will prove useful: ν_n(t) =ν⁽¹⁾n (t) +νn⁽²⁾(t) +νn⁽³⁾(t) with ν_n⁽¹⁾(t) = 1

n

X

k=1

ε_k+1u^∗_t(Z_k), ν_n⁽²⁾(t) = 1 n

n

X

k=1

ξ_k+1σ(X_k)u^∗_t(Z_k), (12)

ν_n⁽³⁾(t) = 1 n

n

X

k=1

(b(X_k)u^∗_t(Z_k)− ht, `i). (13) Here the terms νn⁽¹⁾ and νn⁽²⁾ can be kept together and benefit from the uncorrelatedness of the variables involved in the sums. The term νn⁽³⁾ involves dependent variables. Then we find

Var[νn(ϕm,j)]≤2Varh

ν_n⁽¹⁾(ϕm,j) +ν_n⁽²⁾(ϕm,j)i

+ 2Varh

ν_n⁽³⁾(ϕm,j)i .

The first variance involves uncorrelated and centered terms and leads to Var

"

1 n

n

X

i=1

(εi+1+σ(Xi)ξi+1)u^∗_ϕ_m,j(Zi)

#

= 1 n²

n

X

i=1

E[(s²_ε+σ²(Xi))|u^∗_ϕ_m,j(Zi)|²] so that

X

j∈Z

Varh

ν_n⁽¹⁾(ϕm,j) +ν_n⁽²⁾(ϕm,j)i

= 1 n

X

j∈Z

E[(s²_ε+σ²(X1))|u^∗_ϕ_m,j(Z1)|²] = (s²_ε+E(σ²(X1)))∆(m)

n .

(7)

We use here the following useful property of our basis (resulting from a Parseval formula):

∀x∈R,X

j

|u^∗_ϕ

m,j(x)|²= ∆(m), where ∆(m) is defined by (8) and theu^∗_ϕ

m,j(x) are just rewritten as Fourier coefficients.

For the second term, we use the standard tools specific to theβ-mixing context (namely Vien- net’s (1997) covariance Inequality) and we can easily prove the following Lemma:

Lemma 3.1. Under AssumptionsA1-A3,

X

j∈Z

Var

ν_n⁽³⁾(ϕm,j)

≤E(b²(X1))∆(m)

n +4Km n ,

whereK=q 2P

k≥0(k+ 1)β(k)E(b⁴(X₁))ifE(b⁴(X₁))<∞andP

kkβ(k)<+∞.

Therefore, the rate of the estimate ˆ`_m is as follows:

Proposition 3.1. Consider the estimator`ˆmof `defined by (6) where`=bf withbandf as in Model (1). Then under Assumptions A1-A4, if E(b⁴(X1))<+∞and θ > 1 for arithmetical mixing (see (3)), we have

E(k`−`ˆmk²)≤ k`−`mk²+ 2E(Z₂²)∆(m)

n + 8Km n.

In addition, assume that`belongs to a spaceSs,a,r(A)defined by (4) and that AssumptionA5 is fulfilled. Then the estimate`ˆmˇ with mˇ as in Table 1, has the rates given in Table 1 in terms of its mean square integrated risk E(k`ˆm−`k²).

The orders given in Table 1 classically take into account that:

(a) When`belongs to a space S_s,a,r(A) defined by (4), then the order of the squared bias is k`−`mk²= (2π)⁻¹

Z

|x|≥πm

|f^∗(x)|²dx≤Cm^−2sexp(−2a(πm)^r).

(b) Whenf_ε^∗ satisfiesA5then the order of the variance term is bounded by:

C∆(m)/n≤C⁰m^2γ+1−δexp(2µ(πm)^δ)/n.

When r > 0, δ > 0 the value of ˇm is not explicitly given. It is obtained as the solution of the equation

ˇ

m^2s+2γ+1−rexp{2µ(πm)ˇ ^δ+ 2aπ^rmˇ^r}=O(n).

For explicit formulae for the rates, see Lacour (2006).

These rates enhance the interest of building an estimator for which the choice of the relevant modelmis automatically performed. This is done with ˆ`_m_ˆ, and we can prove the following result:

Theorem 3.1. Assume that Assumptions A1-A4hold, thatE(b⁸(X1)),E(σ⁸(X1))andE(ξ⁸₁) are finite and that E(ε⁶₁)<+∞. Assume moreover that the process X is geometrically β-mixing, (or arithmetically β-mixing with θ >14) and that the collectionMn is such that for allm∈ Mn, pen(m)≤1, then

E(k`ˆ_m_ˆ −`k²)≤C inf

m∈M_n k`−`_mk²+ pen(m) +C⁰

n.

(8)

δ= 0 δ >0 fε ordinary smooth fε supersmooth r= 0

` Sobolev(s)

πmˇ =O(n1/(2s+2γ+1)) rate =O(n−2s/(2s+2γ+1)

)

πmˇ = [ln(n)/(2µ+ 1)]^1/δ rate =O((ln(n))^−2s/δ)

r >0

` C^∞

πmˇ = [ln(n)/2a]^1/r rate =O

ln(n)^(2γ+1)/r n

ˇ

msolution of ˇ

m^2s+2γ+1−rexp{2µ(πm)ˇ ^δ+ 2aπ^rmˇ^r}

=O(n)

Fig. 1.Choice ofmˇ and corresponding rates underA1-A5and if`belongs toSs,a,r(A)defined by (4).

The proof of Theorem 3.1 is based on the study of several empirical processes deduced from the preliminary decomposition ofνn given by (12)-(13) and is sketched in Section 6.

Theorem 3.1 shows that the estimator automatically selects the optimalmwhenδ≤1/3, since in that case, the penalty has exactly the same order as the variance (namely ∆(m)/n). When δ >1/3, a compromise is still performed, but the penalty is slightly greater than the variance. In an asymptotic setting, this implies a loss in the rate of convergence of the estimator, but this loss can be shown to be negligible with respect to the rates. For discussions on this point, see Comte et al.(2006b).

3.3. Risk bounds for˜b

Comteet al.(2006a) that ˆfm¨ satisfies the same inequality as ˆ`mˆ that follows.

Theorem 3.2. Assume that AssumptionsA1-A4hold. Assume that the processX is geomet- ricallyβ-mixing, (or arithmeticallyβ-mixing withθ >3in (3)) and that the collectionMn is such that for all m∈ Mn,pen(m)¨ ≤1, then

E(kfˆm¨ −fk²)≤C inf

m∈Mn

kf −fmk²+ ¨pen(m) +C⁰

n wherefmdenotes the orthogonal projection of f on Sm.

Then it is common (see e.g. Lacour (2005) or Comte and Taupin (2007)) to obtain that under the assumptions of Theorems 3.1 and 3.2, underA5, iff belongs to a spaceSs,a,r(A) withs >1/2 if r = 0 and ` to a space Ss⁰,a⁰,r⁰(A⁰), if ln(ln(n))≤ mn ≤ (n/ln(n))^1/(2γ+1) for ˆfm¨, under the additional assumptionA6, and forngreat enough, that

E(k˜b−bk²_B)≤C₁E(kfˆ_m_¨ −fk²) +C₂E(k`ˆ_m_ˆ −`k²) +C₃ n wherean =n^ω withω >1/2 andC1, C2, C3 are constants.

4. Estimation ofσ² 4.1. Steps of the estimation

We now aim at estimating σ²and we also follow the strategy described in Section 2.1.

First step. We setϑ= (b²+σ²)f and we first estimateϑ. To this end, we consider the following contrast:

˘

γn(t) =ktk²−2 n

n

X

k=1

Z_k+1² −σ_ε²

u^∗_t(Zk). (14)

(9)

As fε is assumed to be known, then so is the variances²_ε. Suggestions on how to estimate it can be drawn from Butucea and Matias (2004). Then we define

ϑˆm= arg min

t∈Sm

˘

γn(t) and ˘m= arg min

m∈Mn

˘

γn( ˆϑn) + ˘pen(m) (15) where ˘pen(m) is a penalty function given by: pen(m) = ˘˘ κE((Z₂²−s²_ε)²)Ψ(m),with Ψ(m) given by (8). Again, the expectationE[(Z₂²−s²_ε)²] is replaced by its empirical version in practice.

Second step. As previously, we use as an estimator off, the estimator ˆfm¨ as defined by (9)-(10).

Its risk is controlled by Theorem 3.2.

Third step. We obtain, by defining, similarly to (11), b^²+σ²=

ϑˆm˘

fˆm¨

ifkϑˆm˘/fˆm¨k ≤˘an and 0 otherwise,

and ˘anis a sequence to be specified in the same way asanfor the estimation ofb. Clearly,b^²+σ² is an estimator of b²+σ². For the study of steps 2 and 3, see Section 3.3.

Fourth step. The estimator ofσ² must be built by setting

˜

σ²=b^²+σ²−(˜b)².

Clearly, ask˜σ²−σ²k²≤2kb^²+σ²−(b²+σ²)k²+ 2kb+ ˜bk²kb−˜bk²,the risk of the final estimator is the sum of the risks of the estimators of b²+σ² and b, provided that b is bounded and ˜b is bounded with high probability. The latter step is studied from an empirical point of view only.

4.2. Risk bounds forϑˆmandϑˆm˘.

It is not difficult to check that E(˘γn(t)) = ktk²−2hϑ, ti which justifies the choice of ˘γn given in (14). We can also easily obtain the decomposition ˘γn(t)−γ˘n(s) =kt−ϑk²− ks−ϑk²−2˘νn(t−s) where

˘

νn(t) = 1 n

n

X

k=1

(Z_k+1² −s²_ε)u^∗_t(Zk)− ht, ϑi .

As forb previously, we can write that

kϑˆ_m−ϑk² = kϑm−ϑk²+X

j∈Z

˘

ν_n²(ϕ_m,j).

With the same tools as for the study of`, using a relevant decomposition of the empirical process

˘

νn, we prove (see Section 7) that:

Proposition 4.1. Consider the estimatorϑˆm ofϑdefined by (15) where ϑ= (b²+σ²)f with b, σ andf as in Model (1). Then under Assumptions A1-A4, and if ξ2, ε1, b²(X1) and σ²(X1) admit moments of order 4, then

E(kϑ−ϑˆmk²₂)≤ kϑ−ϑmk²+ 4E[(Z₂²−s²_ε)²]∆(m) n + ˘Km

n whereK˘ = 16q

2P

k≥0(k+ 1)β(k)E((b²(X1) +σ²(X1))⁴)ifP

kkβ(k)<+∞.

(10)

Modelb₀₁ Modelb₀₂

Laplace error Normal error Laplace error Normal error

Method n= 400 800 400 800 400 800 400 800

Oracle Kernel 2.92 2.30 6.72 5.86 3.07 2.22 5.15 3.92

Projection 4.95 2.76 6.42 4.72 3.63 2.53 5.06 2.59

Fig. 2.Average Squared Errors (ASE) for estimatingb01(ASE(×10⁻⁶)) andb02(ASE(×10⁻²)).

It appears from the details of the above study that the empirical processes involved in the decomposition of ˘νn are of the same type as the processes studied for the estimation of`. Therefore, we give the risk bound for ˆϑm˘ but we omit the proof.

Theorem 4.1. Assume that AssumptionsA1-A4hold, thatE(b^p(X₁)),E(σ^p(X₁))andE(ξ^p₁) are finite for ap≥16and thatE(ε¹²₁ )<+∞. Assume that the processX is geometricallyβ-mixing and that the collection M_n is such that for all m∈ M_n,pen(m)˘ ≤1, then

E(kϑˆ_m_˘ −ϑk²)≤C inf

m∈M_n kϑ−ϑ_mk²+ ˘pen(m) +C⁰

n.

5. Simulation results

For the simulations, we adapt the deconvolution algorithm described in Comteet al.(2006a,2007) to the new context here. The penalties are chosen as:

1 n

n

X

i=1

W_i²

!











πm+ln³(πm)

1 +s²_ε + (πm)³s²_ε+3

5(πm)⁵s⁴_ε ifε1Laplace.

πm+ln³(πm)

1 +s²_ε + (πm)³s²_ε Z 1

0

e^(πmu)²^s²^εdu ifε1Gaussian.

where Wi ≡1 for the estimation off, Wi =Zi for the estimation of b andWi =Z_i²−s²_ε for the estimation ofb²+σ². In both cases, a (negligible) logarithmic term (namely ln³(πm)/(1 +s²_ε)) is added as a standard correction for small dimensions. The constantκis chosen equal to πin the Gaussian case, the term (πm)³s²_ε corresponds to the loss in the rate whenδ= 2. In the Laplace case, the constant κ is term by term modified when computing ∆(m). Indeed in that case, we find ∆(m) = (1/π)(πm+ (πm)³s²_ε/3 + (πm)⁵s⁴_ε/20). We selectm among 26 values ranging from 10 ln(n)/(πn) to 10. We refer to Comte et al. (2007) for stability properties of the estimation algorithm.

5.1. Two examples in a regression context

First we compare our method with the kernel strategy described by Fan and Truong (1993).

The model here is Y_i =b(X_i) +ξ_i, Z_i = X_i+ε_i, with observations ((Y₁, Z₁), . . . ,(Y_n, Z_n)), and unobserved i.i.dXi’s. Two regression functions are considered:

b₀₁(x) =x³₊(1−x)³₊ andb₀₂(x) = 1 + 4x.

The variance ofε,s²_εis adjusted in all cases such that Var(X)/(s²_ε+ Var(X)) = 0.70. TheXi’s are N(0.5,0.25²) and the ξi’s are N(0,0.0015²) with b01 and N(0,0.25²) with b02. The convolution noise εis such that either

f_ε^∗(x) = exp(−1

2s²_εx²) orf_ε^∗(x) = 1

1 + ¹₂s²_εx², (16)

(11)

corresponding to a Gaussian super-smooth case or a Laplace ordinary smooth case.

For each simulation, we compute the average squared error (ASE) at 101 grid points from 0.1 to 0.9 of our adaptive estimator and average these ASEs over 100 replications. We compare it with the “oracle” computed by Fan and Truong (1993): this is not an estimator but an oracle because they average ASEs obtained with the best bandwidth in terms of the (unknown in practice) ASE, for each replication: they choosea posteriori the bandwidth that minimizes the ASE. We report Fan and Truong’s kernel results and ours, in Table fig.2. We mention that if we had also computed oracles, we would have systematically had better results than theirs. With our true adaptive estimator, our results remain better than their oracles in the Gaussian case. They are slightly deteriorated in the Laplace case, but they keep the same order as Fan and Truong (1993)’s oracles.

Note that we do not study the casen= 200 because it is too small for our method to work in a satisfactory way.

5.2. Three examples in an heteroscedastic autoregressive context

Comte and Rozenholc (2002) provides a simulation study of the direct model Xi+1 = b(Xi) + σ(Xi)ξi+1 when theXi’s are observed. The strategy consists in a penalized mean-square contrast minimization which can not be applied to the present context. But we can keep some couples of functions (b, σ) and see how the deconvolution method behaves with respect to the estimation of these functions. More precisely, we borrow the following couples from Comte and Rozenholc (2002):







b1(x) = 0.25 sin(2πx+π/3) σ1(x) =s1(0.31 + 0.7 exp(−5x²)), s1= 0.4 b2(x) =−0.25(x+ 2∗exp(−16x²)) σ2(x) =s2(0.2 + 0.4∗exp(−2x²)), s2= 0.5 b3(x) = 1/(1 +e^−x) σ3(x) =s3(φ(x+ 1.2) + 1.5φ(x−1.2)), s3= 0.32,

(17)

whereφis the GaussianN(0,1) probability distribution function. Moreover, we chooseξ∼ N(0,1), εis either Laplace or Gaussian as given by (16) withsε= 0.1.

Figures 4-5-6 illustrate our results in the three cases of couples (b_i, σ²_i) given in (17). For the estimation off, the density of theX_i’s, the true function is unknown, and the estimated function only is plotted. We can see that b and b²+σ² are well estimated by the ratio strategy. The extraction of σ² sometimes suffers from scale problems (if σ² is much smaller that b² or if both are very small). The relative order of both variance of ε and quantity si in the definition of σi, i= 1,2,3 seem to play an important role in the quality of the estimation.

Figures 4-5-6 (top right and bottom left) also plot the data sets generated, not only theZi’s used for the estimation, but also theXi’s, to show the influence of the noise ε: a line joinsXi to Z_i, for eachi, with a + for the true observationZ_i.

We also performed a Monte Carlo study which is reported in Table fig.3. We show that there is little difference between Laplace and Gaussianεi’s, in spite of the difference between the theoretical rates, and that increasing the sample size leads to noticeable improvements of the results.

6. Proofs

6.1. Proof of Lemma 3.1 Var

ν_n⁽³⁾(ϕm,j)

= 1 n²

n

X

k=1

Var

b(Xk)u^∗_ϕ

m,j(Zk) +1

n² X

1≤k6=l≤n

cov(b(Xk)u^∗_ϕ

m,j(Zk), b(Xl)u^∗_ϕ

m,j(Zl)).

Then

E(b(Xk)u^∗_ϕ_m,j(Zk)) = 1 2π

Z

E(b(X1)e^ixX¹)ϕ^∗_m,j(−x)dx=hbf, ϕm,ji=E(b(X1)ϕm,j(X1))

(12)

Laplaceε Gaussianε

n= 400 800 1600 400 800 1600

b1 0.7989 0.3341 0.1677 0.8278 0.4993 0.2247 (0.5976) (0.1966) (0.0773) (0.7159) (0.3558) (0.1352) b²₁+σ₁² 0.1515 0.1011 0.0529 0.1810 0.1117 0.0637

(0.1033) (0.0643) (0.0278) (0.1153) (0.0580) (0.0300) σ₁² 0.0946 0.0823 0.0534 0.1112 0.0880 0.0620

(0.0426) (0.0330) (0.0242) (0.0471) (0.0345) (0.0215) b₂ 4.8038 3.8121 1.7513 4.8138 4.2150 1.6728

(2.2205) (2.7124) (1.7207) (2.2205) (2.7124) (1.7207) b²₂+σ₂² 5.4090 2.7263 1.7643 4.0527 2.6609 1.5938

(6.8739) (2.3491) (1.8050) (4.0048) (2.3839) (1.1365) σ₂² 4.8544 2.6762 1.8548 3.5284 2.8233 1.6558

(5.2914) (1.7226) (1.5342) (4.0048) (2.3839) (1.1365) b₃ 0.0787 0.0389 0.0291 0.0973 0.0704 0.0313

(0.1029) (0.0657) (0.0345) (0.1402) (0.1439) (0.0492) b²₃+σ₃² 0.1532 0.0702 0.0604 0.1789 0.1064 0.0608

(0.1613) (0.0798) (0.0588) (0.1818) (0.1999) (0.0717) σ₃² 0.0276 0.0145 0.0132 0.0516 0.0206 0.0135

(0.0528) (0.0260) (0.0169) (0.1117) (0.0337) (0.0217)

Fig. 3. ASE×100(with standard deviation×100in parenthesis) for the estimation ofbiand σ_i²,i= 1,2,3 given in (17), for100replications of the estimation procedure.

-1.5 -1 -0.5 0 0.5 1 1.5

-1.5 -1 -0.5 0 0.5 1

1.5 drift estimate

-1.50 -1 -0.5 0 0.5 1 1.5

0.5 1 1.5

2 ^b²^+s²^estimate

-0.60 -0.4 -0.2 0 0.2 0.4 0.6 0.8

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16

0.18 volatility estimate

-0.60 -0.4 -0.2 0 0.2 0.4 0.6 0.8

0.2 0.4 0.6 0.8 1 1.2

1.4 X-density

Fig. 4. Example 1. n = 800, dotted line: true function, full line: estimated function. Top left: estimated density of theXi’s, top right: true and estimatedb1with data(Xi, Zi), bottom left: true and estimatedb²₁+σ₁² with data(X_i², Z_i²−s²ε), bottom right: true and estimatedσ²1.

(13)

-1.5 -1 -0.5 0 0.5 1 -1.5

-1 -0.5 0 0.5

1 drift estimate

-1.5 -1 -0.5 0 0.5 1

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

1.8 b²+s² estimate

-1.5 -1 -0.5 0 0.5 1

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

-1 -0.5 0 0.5

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1 X-density

Fig. 5. Example 2. n = 800, dotted line: true function, full line: estimated function. Top left: estimated density of theXi’s, top right: true and estimatedb2with data(Xi, Zi), bottom left: true and estimatedb²₂+σ₂² with data(X_i², Z_i²−s²ε), bottom right: true and estimatedσ²2.

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

1.6 drift estimate

-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

0 0.5 1 1.5 2

b²+s² estimate

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

0.022 0.024 0.026 0.028 0.03 0.032 0.034 0.036 0.038

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

2 X-density

Fig. 6. Example 3. n = 800, dotted line: true function, full line: estimated function. Top left: estimated density of theXi’s, top right: true and estimatedb1with data(Xi, Zi), bottom left: true and estimatedb²₃+σ₃² with data(Xi², Zi²−s²ε), bottom right: true and estimatedσ²3.

(14)

and note that, fork6=l,

E(b(Xk)u^∗_ϕ_m,j(Zk)b(Xl) ¯u^∗ϕ_m,j(Zl)) = 1 4π²

Z Z

E(b(Xk)b(Xl)e^ixX^k^−iyX^l)ϕ^∗_m,j(−x)ϕ^∗_m,j(y)dxdy

= E[b(X_k)b(X_l)ϕ_m,j(X_k)ϕ_m,j(X_l)].

Therefore, Var 1

n

X

k=1

b(X_k)u^∗_ϕ

m,j(Z_k)

!

≤ 1

nVar(b(X₁)u^∗_ϕ

m,j(Z₁)) + Var 1 n

n

X

k=1

b(X_k)ϕ_m,j(X_k)

! .

The last term requires a covariance inequality for mixing variables (Delyon (1990), Viennet (1997), Theorem 9.1 in the appendix) and uses the fact that theXi’s areβ-mixing with coefficientsβ(k).

X

j∈Z

Var

"

1 n

n

X

i=1

b(Xi)ϕm,j(Xi)

#

≤X

j∈Z

4 n

Z

β(x)b²(x)|ϕm,j(x)|²f(x)dx≤4m n

Z

β(x)b²(x)f(x)dx

whereβ is a nonnegative function such thatE(β^p(X))≤pP

k≥0(k+ 1)^p−1β(k) and by using that kP

j|ϕ_m,j|²(.)k_∞=m. Therefore ifE(b⁴(X₁))<∞andθ >1, then X

j∈Z

Var

"

1 n

n

X

i=1

b(Xi)ϕm,j(Xi)

#

≤ 4mq

2P

k≥0(k+ 1)β(k)E(b⁴(X1))

n .

MoreoverP

j∈ZVar

b(X1)u^∗_ϕ_m,j(Z1)

≤E

b²(X1)P

j∈Z(u^∗_ϕ_m,j(Z1))²

so that X

j∈Z

1 nVar

b(X₁)u^∗_ϕ

m,j(Z₁)

≤E(b²(X₁))∆(m)

n ,

which gives the result. 2

6.2. Proof of Theorem 3.1.

The proof could be sketched as follows. Let us define form, m⁰ ∈ Mn,Bm(0,1) ={t∈Sm,ktk= 1}

and Bm,m⁰(0,1) = {t ∈ Sm+Sm⁰,ktk = 1}. Under the definition of ˆm, ∀m ∈ Mn, γn(ˆ`mˆ) + pen( ˆm)≤γn(`m)+pen(m).For all functionssandt,γn(t)−γn(s) =kt−`k²−ks−`k²−2νn(t−s), and

2ν_n(ˆ`_m_ˆ −`_m)≤ 1

4k`ˆ_m_ˆ −`_mk²+ 4 sup

t∈Bm,ˆm(0,1)

ν_n²(t).

Thus, we obtain, ask`ˆmˆ −`mk²≤2k`ˆmˆ −`k²+ 2k`m−`k² that 1

2k`ˆmˆ −`k²≤ 3

2k`m−`k²+ pen(m) + 4 sup

t∈Bm,ˆm(0,1)

ν_n²(t)−pen( ˆm). (18) Then we need to find a functionp(m, m⁰) such that

E sup

t∈Bm,mˆ(0,1)

ν_n²(t)−p(m,m)ˆ

!

+

≤C

n (19)

(15)

which in turn will fix the penalty function through the requirement: ∀m, m⁰ ∈ Mn,

4p(m, m⁰)≤pen(m) + pen(m⁰). (20)

Gathering (18), (19) and (20) will lead to,∀m∈ Mn, 1

2E

k`ˆmˆ −`k²

≤3

2k`m−`k²+ 2pen(m) +4C n which is the result.

Now, ifνn is split into several terms, deduced from the first decomposition given by (12)-(13), sayνn(t) =P3

i=1

Pp_i

j=1νn^(i,j)(t) wherepi≤3, then, up to some multiplicative constants, inequality (19) will follow from inequalities

E sup

t∈Bm,mˆ(0,1)

[ν_n^(i,j)(t)]²−pi,j(m,m)ˆ

!

+

≤Ci,j

n , with C = 9P

i,jCi,j and p(m, m⁰) = 9P

i,jpi,j(m, m⁰). The study of the νn^(i,j)(t) is explained below.

First we splitνn⁽¹⁾ in two parts, so that both expressions involve independent variables, condi- tionally to (X):νn⁽¹⁾=ν^(1,odd)n +νn^(1,even) where

ν_n^(1,even)(t) = 1 n

X

1≤2k≤n

ε2k+1u^∗_t(Z2k), ν_n^(1,odd)(t) = 1 n

X

1≤2k+1≤n

ε2k+2u^∗_t(Z2k+1).

Now, we shall studyνn^(1,even)only since both terms lead to the same type of result. As Talagrand’s Inequality requires the random variables involved to be bounded, we have an additional step that allows to obtain the result under a moment condition on theε_i’s: νn^(1,even)=νn^(1,1)+νn^(1,2)+ν^(1,3)n

with

ν_n^(1,1)(t) = 1 n

X

1≤2k≤n

h

ε2k+11I_|ε_2k+1_|≤n1/4u^∗_t(Z2k)−EX(ε2k+11I_|ε_2k+1_|≤n1/4u^∗_t(Z2k))i ,

ν_n^(1,2)(t) = 1 n

X

1≤2k≤n

E(ε2k+11I_|ε_2k+1_|≤n1/4) [t(X2k)−E(t(X2k))]

and

ν_n^(1,3)(t) = 1 n

X

1≤2k≤n

hε_2k+11I_|ε

2k+1|>n^1/4u^∗_t(Z_2k)−E(ε_2k+11I_|ε

2k+1|>n^1/4u^∗_t(Z_2k))i ,

whereEX denotes the conditional expectation given (X_k)_1≤k≤n+1. It is worth noticing thatν^(1,2)n

vanishes if theε’s are symmetric andνn^(1,3)(t) is negligible under adequate moment conditions on theε’s.

The following lemmas are proved below:

Lemma 6.1.

E sup

t∈Bm,ˆm(0,1)

[ν_n^(1,1)(t)]²−p_1,1(m,m)ˆ

!

+

≤ C n,

where p_1,1(m, m⁰) =KE(ε²₁)Ψ(m∨m⁰) whereK is a numerical constant and Ψ(m)is defined by (8).

(16)

Lemma 6.2. If the process(Xk)is geometricallyβ-mixing (or arithmetically withθ >3), then E sup

t∈Bm,mˆ(0,1)

[ν_n^(1,2)(t)]²−KE(|ε₁|)X

k

β(k)m+ ˆm n

!

+

≤C n.

Lemma 6.3. If E(ε⁶₁)<+∞, andmn is the largest value ofmsuch that ∆(mn)/n≤1, then, E sup

t∈Bm,mˆ(0,1)

[ν_n^(1,3)(t)]²

!

≤E sup

t∈B_mn(0,1)

(ν_n^(1,3)(t))²

!

≤ 2E(ε⁶₁) n .

For the study ofνn⁽²⁾(t), a result is given, whose proof is detailed in Section 8:

Lemma 6.4. Let τn(t) = νn⁽²⁾(t) = (1/n)Pn

k=1ξk+1σ(Xk)u^∗_t(Zk). Under the assumptions of Theorem 3.1,

E sup

t∈Bm,ˆm(0,1)

[τn(t)]²−pτ(m,m)ˆ

!

+

≤ C n,

wherepτ(m, m⁰) =κE(σ²(X1))Ψ(m∨m⁰).

Forνn⁽³⁾(t) we writeνn⁽³⁾(t) =νn^(3,1)(t) +νn^(3,2)(t) with ν_n^(3,1)(t) = 1

n

X

k=1

[b(Xk)u^∗_t(Zk)−b(Xk)t(Xk))], ν_n^(3,2)(t) = 1 n

n

X

k=1

[b(Xk)t(Xk)− ht, `i],

where b(X_k)t(X_k) =E(X)[b(X_k)u^∗_t(Z_k)]. For νn^(3,1)(t) we can apply Talagrand’s Inequality condi- tionally to (X), forνn^(3,2)(t), we can use approximation techniques. More precisely, using the same techniques as previously, we get

Lemma 6.5. If E(b⁸(X1))<+∞, and(Xi)_i∈_Nis arithmetically β-mixing with θ >14, then E sup

t∈Bm,ˆm(0,1)

[ν_n^(3,1)(t)]²−p_3,1(m,m)ˆ

!

+

≤ C n,

wherep3,1(m, m⁰) =KE(b²(X1))Ψ(m∨m⁰), and

E sup

t∈B_m,_m_ˆ(0,1)

[ν_n^(3,2)(t)]²−K⁰(E(b⁴(X1))X

k

(k+ 1)β(k))^1/2m+ ˆm n

!

+

≤ C n,

whereκandκ⁰ are numerical constants.

The proof of the result concerningν^(3,1)n follows the same line as the proof of Lemma 6.4, which is detailed in section 8 and is therefore omitted here. Forνn^(3,2), the bound can be obtained directly by applying Talagrand’s inequality (see Theorem 9.2) to this process, ifbis bounded. As this is not assumed, we writeb=b₁+b₂withb₁(x) =b(x)1I_|b(x)|≤n1/4andb₂(x) =b(x)1I|b(x)|>n^1/4. This allows to split the process in two parts and consequently to obtain the result underE(|b(X1)|⁸)<+∞and mn ≤√

n, wheremnis the largest over them∈ Mn(a condition which is fulfilled in our problem).