On estimating the diffusion coefficient : parametric versus nonparametric

(1)

ON ESTIMATING THE DIFFUSION COEFFICIENT:

PARAMETRIC VERSUS NONPARAMETRIC

Marc HOFFMANN

Laboratoire de Probabilités et Modèles Aléatoires, CNRS-UMR 7599, Université Paris 6 et 7, 16, rue Clisson, 75013 Paris, France

Received 2 September 1999, revised 29 August 2000

ABSTRACT. – We consider the following problem: estimate the Lipschitz continuous diffusion coefficientσ² from the path of a 1-dimensional diffusion process sampled at times i/n, i= 0, . . . , n, when we believe thatσ² actually belongs to a smaller regular parametric set0. By introducing random normalizing factors in the risk function, we obtain confidence sets which can be essentially better than the minimax raten⁻^1/3of estimation for Lipschitz functions in diffusion models. With a prescribed confidence levelα_n, we show that the best possible attainable (random) rate is(

logα⁻_n¹/n)^2/5. We construct an optimal estimator and an optimal random normalizing factor in the sense of Lepski (1999).

This has some consequences for classical estimation: our procedure is adaptive w.r.t.₀and enables us to test the hypothesis that σ² is parametric against a family of local alternatives with prescribed 1st and 2nd-type error probabilities.2001 Éditions scientifiques et médicales Elsevier SAS

AMS classification: 62G05; 62M05

1. Introduction

In this paper, we study the statistical estimation of the diffusion coefficient, when one observes a 1-dimensional diffusion process at timesi/n, i=0, . . . , n, and asymptotics are taken as n→ ∞. The sample size increases not because of a longer observation period but, rather, because of more frequent observations. This setting has been addressed by several authors, both from a parametric or a nonparametric point of view.

A brief summary of the state of the art yields the following conclusions:

(1) In regular parametric models, the LAMN property holds with rate 1/√ n (see Donhal [1], or more recently Gobet [3]), but the MLE is not tractable in general.

Computationnally fast methods based on contrasts are known and possess good optimality properties as far as rates of convergence are concerned (Genon-Catalot and Jacod [2]).

E-mail address: [email protected] (M. Hoffmann).

(2)

(2) For nonparametric models, if the diffusion coefficient has smoothness of order s (in a Sobolev or Hölder sense for instance but this can easily be embedded in a Besov space framework), estimators based on kernels (see Jacod [8]) or linear wavelets techniques (see [6]) achieve the rate n⁻^s/(1⁺^2s). This, of course, under some restriction which are specific to diffusion processes. Moreover, the rate n⁻^s/(1⁺^2s)has been proved to be optimal in the minimax sense when the diffusion coefficient possesses at least bounded derivatives up to order 2 (see [6]). However, from a practical point of view, the methods proposed have drawbacks and are not always easily implementable on numerical data.

(3) But, for nonparametric models with low order of smoothness (precisely: with diffusion coefficient no more regular than Lipschitz continuous), the Nadaraya–

Watson estimator, introduced in this context by Florens in [4] – which historically is the first nonparametric estimator of the diffusion coefficient – has good convergence properties and is easy to implement in practice. (In this paper, we also complete Florens’ results by showing that the Nadaraya–Watson estimator achieves the raten⁻^1/3if the diffusion coefficient is Lipschitz continuous and that this rate is optimal in the minimax sense.)

A caricatural synthesis could be the following: theoretically optimal and computationnally fast methods are known when the underlying model is either parametric and regular (take then the contrast estimators of Genon-Catalot and Jacod and the – optimal – rate 1/√

nis achievable) or nonparametric but the diffusion coefficient is Lipschitz continuous (take the Nadaraya–Watson estimator of Florens and the – optimal – raten⁻^1/3is achievable).

In this paper, we address the following problem: how can we combine both technologies and what precise mathematical consequences can we derive? We believe that such a question has some importance in practice: given two different methods, a practitioner – motivated by a specific experiment in e.g. finance, biology or physics, say – would legitimately ask which one to choose. Of course, a prior knowledge, intuition, suspicion or guess about the underlying structure of the model (parametric) usually exists, and this should be taken into account, even at a mathematical level. Also, the answer we want to give must be numerically feasible and must quantify precisely the consequences of the choice (parametric versus nonparametric), especially when the initial suspicion turns out to be wrong. Our angle is thus the following: we believe that the diffusion coefficient has a given regular parametric structure, but we wish to take into account the possibility that this prior intuition is wrong, in which case the diffusion coefficient could be any Lipschitz continuous function within a certain nonparametric class.

To formulate and solve this problem mathematically, we use the notion of minimax risk with random normalizing factors, which is based on adaptive estimation and nonparametric testing (Theorems 1 and 2 below). The method is easily tractable on numerical data. The ideas developed here heavily rely on the work of Lepski [11].

However, Lepski considers in [11] a slightly different problem in the white noise model context. Therefore, both techniques and answers given here differ a bit from his paper, and we borrow his formalism and mathematical devices rather than complete his theory.

Note also that our approach is different from robustness, where misspecified models

(3)

are allowed. In general, such models are defined around tubular neighbourhoods of the original parametric model, at a distance vanishing asn→ ∞, an assumption we do not have to make here.

A by-product of our approach is that we complete Florens’ paper [4] (and also [6]) by showing that her estimator is optimal in the minimax sense under squared-error loss for Lipschitz continuous diffusion coefficients (Proposition 2), but we know from [9] that this is no longer true for a diffusion coefficient with a higher degree of smoothness.

1.1. Statistical setting

We observeXⁿ=(Xi/n, i=0, . . . , n)where(Xt)t∈[0,1]is a 1-dimensional diffusion process of the form

Xt=x0+ t

0

b(s, Xs) ds+ t

0

σ (Xs) dWs, t∈ [0,1] (1.1) with x0∈R, W a standard Brownian motion, b smooth, σ Lipschitz continuous and nonvanishing. Our aim is to estimate the function (σ²(x), x ∈ I ), for an arbitrary compact interval I. In this setting, the drift b cannot be identified from the data and is a nuisance parameter.

Formally, we take X as the canonical process on the space =C([0,1],R) of continuous functions equipped with the norm of uniform convergence, endowed with its Borel σ-field F. We assume that the drift b has linear growth, therefore (1.1) has a unique solution. We further denote by P_σ² the probability measure on (,F) under whichXsolves (1.1).

There are several ways of assessing the quality of an estimation procedure. First, the estimation ofσ²(x)at a point x∈I is meaningful only if the process X hits the point x before time 1, or ifL^x₁(X) >0, whereL^x₁(X)=lim→0 1

2

₁

01_|Xs−x|ds denotes the local time ofXat levelxand time 1. So if

D(x, ν)=ω∈: L^x₁X(ω)ν,

we shall restrict our attention to the set D(x, ν) for a given ν >0, fixed throughout the paper. However, the setD(x, ν) is not observable, therefore it is better for practical purposes to replace – like in Jacod, [8] – the set D(x, ν)by a setDⁿ(x, ν)measurable w.r.t. the σ-field Gn generated by the Xi/n, i =0, . . . , n, at stage n. To do so, we introduce the following empirical local time

L^xn

Xⁿ= 1 2n^2/3

n

i=1

1X_(i₋₁₎∈[x−n^−1/3, x+n^−1/3] (1.2) which converges toL^x₁(X)asn→ ∞(see, e.g., [9]). The choice of the bandwithn⁻^1/3 will prove to be technically useful. Define

Dⁿ(x, ν)=ω∈:L^xn

Xⁿ(ω)ν

(4)

accordingly. We will further restrict our attention to Dⁿ(x, ν). For c >1, let c = {f: R→R;c⁻¹f (x)c}and define the Lipschitz class

=c(L)=f: R→R; |f (x)−f (y)|L|x−y|∩c.

The space describes the minimal smoothness properties we require for the unknown parameterσ².

An estimator Tn = (Tn(Xⁿ, x), x ∈ I ) of (σ²(x), x ∈ I ) is a function which is Gn⊗B(I )measurable; we evaluate its performance uniformly over by means of its minimax risk

Rn(Tn, , ϕn)= sup

σ²∈

E_σ² ϕ_n⁻²

I

Tn

Xⁿ, x−σ²(x)²dx|D_Iⁿ(ν)

where Dⁿ_I(ν) =x∈IDⁿ(x, ν) and ϕn >0 is a normalizing factor. Of course, the finiteness of Rn will only be meaningful if ϕn →0 as n → ∞. Here, E_σ² means integration with respect to the probabilityP_σ2. Thus we measure the quality of estimation in integrated quadratic loss, conditional on the eventDⁿ_I(ν).

1.2. Statement of the problem and objectives

An estimatorT_n^!is said to attain an optimal rate of convergence ϕn()if lim sup

n→∞ Rn

T_n^!, , ϕn()<+∞ (1.3)

and no estimator can attain a better rate:

lim inf

n→∞ inf

Tn

Rn

Tn, , ϕn()>0 (1.4)

where the infimum is taken over all estimators. In Section 2, we show thatϕn()=n⁻^1/3 is an optimal rate of convergence and prove that the Nadaraya–Watson estimator, introduced in this context by Florens in [4] attains the optimal rate (see also [8]). We understand ϕn() as an accuracy of estimation: for any confidence level α >0, we guarantee from (1.3) the existence of (an explicitly computable) γα>0 s.t.

sup

σ²∈

P_σ^n,ν2 T_n^!−σ²_Iγαϕn()α, (1.5) where P_σ^n,ν2 (·) = P_σ²{· | D_Iⁿ(ν)} and fI =(_If²(x) dx)^1/2. Furthermore, in the optimality sense described by (1.4), this accuracy is the best one achievable uniformly over.

However, suppose we suspectσ² to actually lie in a smaller parametric set, namely σ²∈0=0(I )given by

0(I )=f ∈: f (x)=σ₀²(x, θ ), θ∈%, x∈I, where%⊂R^s,s1 is given and the functionσ₀²(·, θ )is known up toθ.

(5)

Under some regularity assumptions on%andσ₀²(see Section 2 below), an optimal rate of convergence over0isϕn(0)=n⁻^1/2and is attained by the least-square estimator T_n⁽⁰⁾=σ₀²(·,θˆn), where

θˆn=arg min

θ∈%

n

i=1

&ⁿ_iX²−σ₀²(X(i−1)/n, θ )² (1.6)

and where we denote &ⁿ_iX =√

n(Xi/n −X(i−1)/n) the normalized increments of the observed process. Based on the hypothesis: σ²∈0⊂, we can hope to improve the accuracy of estimation. A traditional way of improvement is the so-called adaptive approach.

1.2.1. The adaptive approach

Intuitively, a practitioner would presumably: (1) test the hypothesisσ²∈0, (2) based on the acceptance of the test, choose the parametric estimator T_n⁽⁰⁾, (3) keep the nonparametric estimator T_n^! otherwise. From a mathematical point of view, such a procedure – call it temporarily T_n^(a)=T_n^(a)(x, Xⁿ) – is admissible if it adapts to the sets(0, )in the following sense: define the adaptive rate

ψn(σ²)= n^−1/2 if σ²∈0,

n⁻^1/3 if σ²∈\0. (1.7)

ThenT_n^(a)should verify

lim sup

n→∞ Rn

T_n^(a), , ψn(·)<+∞. (1.8)

However, even if we have satisfied the adaptive criterion (1.8), we are unable to state any accuracy of the method sinceψn=ψn(σ²)depends on the unknown, we cannot provide any confidence set of the type (1.5).

In this paper, we propose an alternative approach by introducing a procedure based on random normalizing factors (r.n.f. for abbreviation), following Lepski in [11] and [7].

This will enable us to improve the accuracy of estimation in the sense of (1.5). We will even show that a procedure based on r.n.f. can simultaneoulsy give an improvement of accuracy and be adaptive in the sense of (1.8).

1.2.2. Random normalizing factors

We introduce the class of observable normalizing factors (r.n.f.) n=ρn∈(0, ϕn()]: ρnisGn-measurable,

where Gn is the σ-field generated by the obervation Xi/n, i =0, . . . , n. Clearly, any estimatorTn satisfying

lim sup

n→∞ Rn(Tn, , ρn) <∞ for someρn∈n (1.9) attains the optimal rate of convergence over. But in contrast to an adaptive estimator, we now guarantee the existence of an explicitly computable γα from (1.9) such that for

(6)

anyα >0:

sup

σ²∈

P_σ^n,ν2 T_n^!−σ²_Iγαρn

α,

where we have set – recall (1.5) –P_σ^n,ν2 (·)=Pσ²{· |Dⁿ_I(ν)}. This provides us with a new (possibly random) accuracy of estimation. The possibility that σ² belongs to 0 may give a value to ρn essentially better than ϕn()=n⁻^1/3 with some probability, while still ensuring a confidence set uniformly over.

Next, we need a consistent way to compare 2 r.n.f. in order to define an optimality criterion. Since a r.n.f. is random, we introduce the following (deterministic) characteristic:

DEFINITION 1. – For a given confidence level 0< αn <1, the characteristic of ρn∈nis

χn(ρn)=inft∈(0, ϕn()]: inf

σ²∈0

P_σ^n,ν2 (ρnt)1−αn

.

Note thatχn(ρn)depends onαnand on0.

Remark. – A heuristic approach to understand this definition can be the following: let us fixt >0 “small”, i.e. at least smaller thanϕn(). What we require is that a “good”

ρn will provide improvement of accuracy if the guess (σ²∈0) turns out to be true.

This means that underP_σ^n,ν2 , forσ²∈0, the event “ρnt” has a controlled probability.

Mathematically, we translate this idea by saying that for a given confidence levelαn, we guarantee that

inf

σ²∈0

P_σ^n,ν2 (ρnt)1−αn. (1.10) Next, the smaller t we can find such that (1.10) holds, the better ρn hence χn(ρn) is defined as infimum oftproviding (1.10).

We now have a canonical way to compare r.n.f. We naturally derive the following optimality criterion:

DEFINITION 2. –ρ_n^!∈nis optimal (orα-optimal) w.r.t.(, 0)if (i) There exists an estimatorT_n^!!such that

lim sup

n→∞ Rn(T_n^!!, , ρ_n^!) <+∞. (ii) For anyρn∈nsuch that

χn(ρn)

χn(ρ_n^!)→0, asn→ ∞, we have

lim inf

n→∞ inf

Tn

Rn(Tn, , ρn)= +∞

where the infimum is taken over all estimators.

(7)

Remark 2. – Following Lepski, we call T_n^!! an α-adaptive estimator. Note that by definition, we always have ρ_n^!ϕn(). Thus an α-adaptive estimator is optimal with respect to . Also, note that anα-optimal random normalizing factor may depends in general on the quantityαn.

The aim of this paper is to construct an optimal random normalizing factor w.r.t (.0)following Definition 2 and anα-optimal estimator accordingly.

1.3. Organization of the paper

In Section 2, we recall and adapt some facts about statistical estimation of the diffusion coefficient from discrete observations. The nonparametric kernel estimator of Section 2.2 was introduced in [4] and later generalized to our setting in [8]. However, we give a self containing proof of the upper bound. The lower bound is new, and follows the same strategy as Proposition 1 in [6], with new technicalities.

Section 3 is devoted to the construction of an optimal r.n.f. for the diffusion coefficient under a parametric hypothesis. We discuss the link to adaptive estimation and show how a slight modification of the optimal (α-adaptive) estimator enables to obtain simultaneously an optimal accuracy of estimation and an adaptive estimator. Links to testing are also mentioned. The proofs are delayed until Section 4.

2. Preliminary results 2.1. Parametric estimation

We need the following regularity assumptions on%andσ₀². Assumption A. – We have

(1) The set%is compact in R^s=R.

(2) For someM=MI>0, sup

x∈I

σ₀²(x, θ1)−σ₀²(x, θ2)M|θ1−θ2| forθ1, θ2∈%.

(3) For someη=ηI>0, inf(x,θ )∈I×%|_dθ^dσ₀²(x, θ )|η.

(4) The functions _dx^d²2σ₀²and _dx^d _dθ^d σ₀²are well defined and continuous onI×%.

(5) The equalityσ₀²(x, θ1)=σ²(x, θ2)for allx∈I impliesθ1=θ2.

Remark 1. – The above assumptions are standard in parametric estimation but do not claim to be minimal. For instance, A5 can be relaxed but known extensions are usually difficult to check (see, e.g., [2]).

By Ito formula, the model given by (1.1) can be recast in a regression setting, having &ⁿ_iX²=n

i/n

(i−1)/n

σ²(Xs) ds+εⁿ_i +a higher order term, (2.11)

(8)

whereεⁿ_i =2n_(i^i/n₋_1)/n(Xs −X(i−1)/n) dWs is a martingale increment. Thus we observe on the non-uniform random grid(X(i−1)/n, i=1, . . . , n)the valuen_(i^i/n_−1)/nσ²(Xs) ds∼ σ²(X(i−1)/n), contaminated by the noise ε_iⁿ, plus a negligible drift effect. From this formulation, we readily obtain the least-square estimator T_n⁽⁰⁾=σ₀²(·,θˆn), where θˆn is defined by (1.6).

PROPOSITION 1. – Grant Assumption A. Then ϕn(0)=n⁻^1/2 is an optimal rate of convergence and is attained byT_n⁽⁰⁾.

The proof of the upper bound is readily obtained from Theorem 1 in [2]. We simply added ad hoc assumptions in order to obtain the uniformity inθ∈%for the integrated risk. The proof of the lower bound follows from the LAMN property of the parametric model (see, e.g., [1]).

Remark 2. – Note thatθˆnis not the best available parametric estimator ofθ – it is not equivalent to the MLE – but since we focus on rates of convergence only, this intuitively simple choice is sufficient.

2.2. Nonparametric estimation

We assume (with no loss of generality as far as practical considerations are concerned) thatI is on a dyadic scale, namely

I=k02⁻^j⁰, k12⁻^j⁰, for some integersk0, k1, j0. For integers(k, j ), letIj k= [k2⁻^j, (k+1)2⁻^j)and define forjj0

V_I^j = {f ∈L2: f is constant onIj k, Ij k⊂I}

the finite element space of functions which are piecewise constant on I over a grid of mesh 2⁻^j. Indeed an orthogonal basis forV_I^j is given by the family

φj k=2^j/2φ2^j· −k, ksuch thatIj k⊂I,

where φ =1_[0,1). We estimate σ² by an element of V_I^jⁿ, for some projection level jn

chosen in accordance with the asymptotics in the following way. Let ˆ

cj_nk= 1 n 2jnk

n

i=1

&ⁿ_iX²φj_nk(X(i−1)/n),

(with 0/0=0), where 2jnk = ²_n^jnⁿi=11X(i−1)/n∈I_jnk. Informally, we use the regression analogy

&ⁿ_iX²∼σ²(X(i−1)/n)+ε_iⁿ

defined in (2.11) and we weight the local average by an approximation 2jnk of the time spent by the process X inIjnk. Finally, the nonparametric estimator T_n^!(x)ofσ²(x)is

(9)

defined by

T_n^!(x)=

k:I_jnk⊂I

ˆ

cjnkφjnk(x)

and is specified by the projection level jn. The performances ofT_n^! are summarized in the following result. Forx∈R, we denote byxthe integer part ofx.

PROPOSITION 2. – An optimal rate of convergence over is ϕn() = n⁻^1/3. Moreover,T_n^!, calibrated byjn= ¹₃logn∨j0is optimal for the criterion given by (1.3) and (1.4).

3. Main result

This section is devoted to the construction of an optimal r.n.f. ρ_n^! in the sense of Definition 2. Accordingly, we construct an α-adaptive estimator w.r.t. (0, ). Our algorithm can be described as follows:

(1) Estimate the distancedˆn (for the · I seminorm) betweenσ²and0,

(2) Takeρ_n^!=n⁻^1/3 ifdˆn is above some threshold level (possibly depending on the confidence levelαn).

(3) Takeρ_n^!=ϕn,αn>0 for some normalizing factorϕn,αn (tuned with the asymptotics) otherwise.

We will show that we can takeϕn,αnconverging to 0 faster thann⁻^1/3by a polynomial power. The value ϕn,α_n corresponds to the acceptance that σ²∈0 and measures the improvement of the accuracy of estimation. The assumptions on the parametric family 0are less stringent than in Section 2.

Assumption B. – We have

(1) The set%is compact in R^s,s1.

(2) There existµ >0 andM=MI,µ>0 such that sup

x∈I

σ₀²(x, θ1)−σ₀²(x, θ2)Mθ1−θ2^µ forθ1, θ2∈%.

(3) For allθ∈%,σ₀²(·, θ )∈0.

(4) There existθ0∈%andL < L˜ such thatσ₀²(·, θ0)∈c(L).˜ 3.1. Construction ofρ_nand main result

ForJnj0andθ∈%, define dn(θ )=

k:I_Jnk⊂I

cˆJnk−cJnk

σ₀²(·, θ )²,

where

cJnk

σ₀²(·, θ )=2^Jⁿ^/2

I_Jnk

σ₀²(x, θ ) dx.

(10)

By Assumption B2, there existsC=C(µ, MI,µ)such that

|dn(θ1)−dn(θ2)|Cθ1−θ2^µ forθ1, θ2∈%.

This, together with Assumption B1, ensures the existence of,θ_n^!∈%, measurable w.r.t.

Gn-measurable and solution to

dn(θ_n^!)= inf

θ∈%dn(θ ).

For technical reasons, we need to compensate the variance ofdnas follows. Let CJ_nk= 2

3n²2_J²_n_k n

i=1

&ⁿ_iX⁴φ_J²

n,k(X(i−1)/n) and

dˆn(θ_n^!)=dn(θ_n^!)−

k:I_Jnk⊂I

CJnk.

The estimate dˆn(θ_n^!) will determine the following decision rule. For αn >0, let Jn= ²₅log√ ⁿ

logα⁻¹_n ∨j0and

ϕn,αn=

logα_n⁻¹ n

2/5

.

Note that 2⁻^Jⁿ andϕn,αn coincide up to a constant. Define now the threshold λ=A(10, c, ν)^−1/4,

whereA(t, c, ν)is specified in (4.17), see the proof of Lemma 3. (The choice ofλwill become transparent in the proof of Theorem 1 in Section 4 below when Lemma 3 is used.) The decision rule then takes the form

ρ_n^!=ϕn,α_n1_{{ ˆ}_d

n(θ_n^!)λ²ϕ_n,αn² }+n⁻^1/31_{{ ˆ}_d

n(θ_n^!)>λ²ϕ²_n,αn}

and

T¯n(x)=

k:I_Jnk⊂I

cJ_nk

σ₀²(·, θ_n^!)φJ_nk(x).

Finally, our estimator ofσ²(x)is

T_n^!!(x)= ¯Tn(x)1_{ρ_n^!=ϕ_n,αn}+T_n^!(x)1_{_ρ_n^!₌_n−1/3},

whereT_n^!is the nonparametric estimator of Section 2.2 specified withjn= ¹₃logn∨j0. THEOREM 1. – Grant Assumption B. Assume thatn⁻^aαn<2⁻⁴for some arbitrary a >0. Then ρ_n^! is an optimal random normalizing factor w.r.t. (0, ) and T_n^!! is α-adaptive.

(11)

Remark 1. – In particular, we see that if our parametric assumption is correct, with prescribed confidence 1−αn, we are able to improve (asymptotically) the accuracy of estimation, i.e., the size of the confidence band we construct, by a factor ϕn,αn = (

logα_n⁻¹/n)^2/5.

Remark 2. – The improvement of the random confidence band is of a polynomial order, but is lowered down by the size ofαn. However, the restriction αnn⁻^aensures that it is a least of order(√

logn/n)^2/5.

Remark 3. – For practical purposes, it seems more clever to replaceνin the definition ofλby infk:I_Jnk⊂I2J_nk. For technical reasons, we are unable however to prove Theorem 1 in this setting. Note also that the practical implementation ofρ_n^!is easy: the computation of cˆJ_nk is reasonably fast and the cardinality (in k) of such cˆJ_nk is of order logn.

Likewise for the CJnk. Eventually, the minimization problem arg min_θ dn(θ ) has the same complexity as the computation of a standard parametric estimator. Once ρ^!_n is computed, one readily computes either T¯n (which is no more difficult to obtain than T_n^!) orT_n^!itself, the standard Nadaraya–Watson estimator.

3.2. Discussion

3.2.1. Links to adaptive estimation

We show in this paragraph how a simple modification of T_n^!! provides us with an adaptive estimator – in the usual sense of (1.8) – without loosing the optimality in terms of r.n.f. Assumption A is in force here. We consider the estimatorT_n⁽⁰⁾=σ₀²(·,θˆn) introduced in Section 1.2, whereθˆnsolves (1.6). Note thatT_n⁽⁰⁾is well defined thanks to Assumption A. Let

T_n^(a)(x)=T_n⁽⁰⁾(x)1_{{ ˆ}_d_n₍_θ_ˆ_n₎_λ2ϕ_n,αn² }+T_n^!(x)1_{{ ˆ}_d_n₍_θ_ˆ_n_)>λ2ϕ²_n,αn}. Define the random normalizing factorρ_n^(a)accordingly

ρ_n^(a)=ϕn,αn1_{{ ˆ}_d_n₍_θ_ˆ_n₎_λ2ϕ²_n,αn}+n⁻^1/31_{{ ˆ}_d_n₍_θ_ˆ_n_)>λ2ϕ²_n,αn}. THEOREM 2. – Grant Assumption A. We have

(i) The estimatorT_n^(a)isα-adaptive w.r.t.(0, ).

(ii) If moreoverαn=O(n⁻¹), the estimatorT_n^(a) is adaptive in the usual sense w.r.t.

(₀, ):

lim sup

n→∞ Rn

T_n^(a), , ψn(·)<∞,

whereψn(σ²)denotes the adaptive rate defined by (1.7).

The proof of (i) is delayed until Section 4. The proof of (ii) is a direct consequence of (i) and Proposition 2 in [11].

Remark 1. – The decision rule 1_{{ ˆ}_d

n(θˆ_n)λ²ϕ_n,αn² }answers to our original question: given a parametric procedure T_n⁽⁰⁾ versus a nonparametric oneT_n^!, which one shall we use in practice? The answer 1 to our test yields the choice T_n⁽⁰⁾, whereas the answer 0 yields

(12)

T_n^!. The precise mathematical consequences of this choice are described by the r.n.f.

ρ_n^(a). Moreover, this choice is optimal in the sense of Definition 2.

Remark 2. – Again – see Remark 3 in Section 3.1 – we can see that the practical implementation ofT_n^(a)is fast and has complexity no worse than that ofT_n^!.

3.2.2. Links to testing

We explore in this section another virtue ofρ_n^!, namely the possibility to build a test for the hypothesis σ²∈0 against a family of local alternatives. More precisely, given h >0 and 0< α <1, define

Wn(h, I, α)=f ∈c(L): inf

θ∈%

f−σ₀²(·, θ )_Ih·ϕn,α

.

THEOREM 3. – Grant Assumption A. For any 0< α, β < 1, we have, for large enoughn

(i) sup_σ2∈₀(I )P_σ^n,ν2 (ρ_n^!=n⁻^1/3)α.

(ii) There existsh(β) >0 such that sup

σ²∈Wn(h(β),I,α)

P_σ^n,ν2

ρ_n^!=ϕn,α

β.

In words, the hypothesis: σ² ∈ 0(I ) can be tested against the family of local alternatives:σ²∈Wn(h(β), I, α)with prescribed first and second type error probability.

The proof of (i) readily follows from (4.27) in the proof of Theorem 1 below. The proof of (ii) follows from Theorem 1 together with Proposition 3 in [11].

4. Proofs

The proof of Proposition 2 can be read independently from the that of Theorems 1 and 2. The reader is however invited to first scan the notation and preliminaries section.

4.1. Notation and preliminaries Forθ ∈%, we abbreviate P_σ²

0(·,θ ) byPθ when no confusion is possible. Forσ²∈, letP˜_σ² denote the law of the processY such thatdYt =σ (Yt) dWt,Y0=x0. DefineP˜_σ^n,ν2

accordingly. We denote byC a generic constant, possibly varying from line to line and which may depend onI. Any other dependence will be explicitly mentioned.

4.1.1. Preliminary decompositions (a) Forσ²∈, define

a_kⁿσ²=2^3Jⁿ^/2 n2Jnk

n

i=1

1X_(i−1)/n∈I_Jnk

I_Jnk

τ_iⁿ(x, X) dx,

bⁿ_kσ²= 2^Jⁿ^/2 n2J_nk

n

i=1

1X(i−1)/n∈I_Jnkε_iⁿ,

(13)

where

τ_iⁿ(x, X)=σ²(x)−n i/n

(i−1)/n

σ²(Xs) ds. (4.12) Define the random variable

&n

σ²=

k:I_Jnk⊂I

cJnk

σ₀²(·, θ_n^!)−σ²−a_kⁿσ²², (4.13)

where – recall Section 2 – we denotecj k(σ²)=2^j/2_I

j kσ²(x) dx. Using that(&ⁿ_iX)²= n_(i^i/n₋_1)/nσ²(Xs) ds+ε_iⁿunderP˜_σ2 we have

ˆ

cJ_nk−cJ_nk=aⁿ_kσ²+b_kⁿσ². We thus obtain the following decomposition

dˆn(θ_n^!)=&n

σ²+Mn

σ²+Nn

σ²,

having

Mn

σ²=2

k:I_Jnk⊂I

cJ_nk

σ²−σ₀²(·, θ_n^!)−aⁿ_kσ²bⁿ_kσ²,

Nn

σ²=

k:I_Jnk⊂I

bⁿ_k²−CJnk

.

(b) Define ψ(x) =1_[0,1/2)(x)−1_[1/2,1)(x). If dj k

σ²= σ²(x)ψj k(x) dx is the wavelet coefficient of σ² in the Haar basis, from the multiscale decomposition of σ², we have by Parseval’s identity

T¯n−σ²²_I=

k:I_Jnk⊂I

c²_J_n_kσ²−σ₀²(·, θn^!)+Jn

σ²

2&n

σ²+2

k:I_Jnk⊂I

a_kⁿσ²²+Jn

σ²,

where the remainder termJn(σ²)=jJ_n

k:I_{j k}⊂Id_{j k}²(σ²)satisfies Jn

σ²C(c, L)2⁻^2Jⁿ=C(c, L)ϕ_n,α² _n

sinceσ²∈. Note that

τ_iⁿ(x, X)1X_(i₋_1)/n∈I_JnkL

2⁻^Jⁿ+n i/n

(i−1)/n

|Xs−X(i−1)/n|ds

, (4.14)

therefore |τ_iⁿ(x, X)|1X_(i₋_1)/n∈I_Jnk L(2⁻^J⁻ⁿ + n⁻^1/2r_iⁿ(σ²)), and since we have sup_σ2∈E˜_σ^n,ν2 (|r_iⁿ(σ²)|^p)Cpfor allp1 by the Burkholder–Davis–Gundy inequality