2001 Éditions scientifiques et médicales Elsevier SAS. All rights reserved S0246-0203(00)01070-0/FLA
ON ESTIMATING THE DIFFUSION COEFFICIENT:
PARAMETRIC VERSUS NONPARAMETRIC
Marc HOFFMANN
Laboratoire de Probabilités et Modèles Aléatoires, CNRS-UMR 7599, Université Paris 6 et 7, 16, rue Clisson, 75013 Paris, France
Received 2 September 1999, revised 29 August 2000
ABSTRACT. – We consider the following problem: estimate the Lipschitz continuous diffusion coefficientσ2 from the path of a 1-dimensional diffusion process sampled at times i/n, i= 0, . . . , n, when we believe thatσ2 actually belongs to a smaller regular parametric set0. By introducing random normalizing factors in the risk function, we obtain confidence sets which can be essentially better than the minimax raten−1/3of estimation for Lipschitz functions in diffusion models. With a prescribed confidence levelαn, we show that the best possible attainable (random) rate is(
logα−n1/n)2/5. We construct an optimal estimator and an optimal random normalizing factor in the sense of Lepski (1999).
This has some consequences for classical estimation: our procedure is adaptive w.r.t.0and enables us to test the hypothesis that σ2 is parametric against a family of local alternatives with prescribed 1st and 2nd-type error probabilities.2001 Éditions scientifiques et médicales Elsevier SAS
AMS classification: 62G05; 62M05
1. Introduction
In this paper, we study the statistical estimation of the diffusion coefficient, when one observes a 1-dimensional diffusion process at timesi/n, i=0, . . . , n, and asymptotics are taken as n→ ∞. The sample size increases not because of a longer observation period but, rather, because of more frequent observations. This setting has been addressed by several authors, both from a parametric or a nonparametric point of view.
A brief summary of the state of the art yields the following conclusions:
(1) In regular parametric models, the LAMN property holds with rate 1/√ n (see Donhal [1], or more recently Gobet [3]), but the MLE is not tractable in general.
Computationnally fast methods based on contrasts are known and possess good optimality properties as far as rates of convergence are concerned (Genon-Catalot and Jacod [2]).
E-mail address: hoffmann@math.jussieu.fr (M. Hoffmann).
(2) For nonparametric models, if the diffusion coefficient has smoothness of order s (in a Sobolev or Hölder sense for instance but this can easily be embedded in a Besov space framework), estimators based on kernels (see Jacod [8]) or linear wavelets techniques (see [6]) achieve the rate n−s/(1+2s). This, of course, under some restriction which are specific to diffusion processes. Moreover, the rate n−s/(1+2s)has been proved to be optimal in the minimax sense when the diffusion coefficient possesses at least bounded derivatives up to order 2 (see [6]). However, from a practical point of view, the methods proposed have drawbacks and are not always easily implementable on numerical data.
(3) But, for nonparametric models with low order of smoothness (precisely: with diffusion coefficient no more regular than Lipschitz continuous), the Nadaraya–
Watson estimator, introduced in this context by Florens in [4] – which historically is the first nonparametric estimator of the diffusion coefficient – has good convergence properties and is easy to implement in practice. (In this paper, we also complete Florens’ results by showing that the Nadaraya–Watson estimator achieves the raten−1/3if the diffusion coefficient is Lipschitz continuous and that this rate is optimal in the minimax sense.)
A caricatural synthesis could be the following: theoretically optimal and computation- nally fast methods are known when the underlying model is either parametric and regular (take then the contrast estimators of Genon-Catalot and Jacod and the – optimal – rate 1/√
nis achievable) or nonparametric but the diffusion coefficient is Lipschitz contin- uous (take the Nadaraya–Watson estimator of Florens and the – optimal – raten−1/3is achievable).
In this paper, we address the following problem: how can we combine both technologies and what precise mathematical consequences can we derive? We believe that such a question has some importance in practice: given two different methods, a practitioner – motivated by a specific experiment in e.g. finance, biology or physics, say – would legitimately ask which one to choose. Of course, a prior knowledge, intuition, suspicion or guess about the underlying structure of the model (parametric) usually exists, and this should be taken into account, even at a mathematical level. Also, the answer we want to give must be numerically feasible and must quantify precisely the consequences of the choice (parametric versus nonparametric), especially when the initial suspicion turns out to be wrong. Our angle is thus the following: we believe that the diffusion coefficient has a given regular parametric structure, but we wish to take into account the possibility that this prior intuition is wrong, in which case the diffusion coefficient could be any Lipschitz continuous function within a certain nonparametric class.
To formulate and solve this problem mathematically, we use the notion of minimax risk with random normalizing factors, which is based on adaptive estimation and nonparametric testing (Theorems 1 and 2 below). The method is easily tractable on numerical data. The ideas developed here heavily rely on the work of Lepski [11].
However, Lepski considers in [11] a slightly different problem in the white noise model context. Therefore, both techniques and answers given here differ a bit from his paper, and we borrow his formalism and mathematical devices rather than complete his theory.
Note also that our approach is different from robustness, where misspecified models
are allowed. In general, such models are defined around tubular neighbourhoods of the original parametric model, at a distance vanishing asn→ ∞, an assumption we do not have to make here.
A by-product of our approach is that we complete Florens’ paper [4] (and also [6]) by showing that her estimator is optimal in the minimax sense under squared-error loss for Lipschitz continuous diffusion coefficients (Proposition 2), but we know from [9] that this is no longer true for a diffusion coefficient with a higher degree of smoothness.
1.1. Statistical setting
We observeXn=(Xi/n, i=0, . . . , n)where(Xt)t∈[0,1]is a 1-dimensional diffusion process of the form
Xt=x0+ t
0
b(s, Xs) ds+ t
0
σ (Xs) dWs, t∈ [0,1] (1.1) with x0∈R, W a standard Brownian motion, b smooth, σ Lipschitz continuous and nonvanishing. Our aim is to estimate the function (σ2(x), x ∈ I ), for an arbitrary compact interval I. In this setting, the drift b cannot be identified from the data and is a nuisance parameter.
Formally, we take X as the canonical process on the space =C([0,1],R) of continuous functions equipped with the norm of uniform convergence, endowed with its Borel σ-field F. We assume that the drift b has linear growth, therefore (1.1) has a unique solution. We further denote by Pσ2 the probability measure on (,F) under whichXsolves (1.1).
There are several ways of assessing the quality of an estimation procedure. First, the estimation ofσ2(x)at a point x∈I is meaningful only if the process X hits the point x before time 1, or ifLx1(X) >0, whereLx1(X)=lim→0 1
2
1
01|Xs−x|ds denotes the local time ofXat levelxand time 1. So if
D(x, ν)=ω∈: Lx1X(ω)ν,
we shall restrict our attention to the set D(x, ν) for a given ν >0, fixed throughout the paper. However, the setD(x, ν) is not observable, therefore it is better for practical purposes to replace – like in Jacod, [8] – the set D(x, ν)by a setDn(x, ν)measurable w.r.t. the σ-field Gn generated by the Xi/n, i =0, . . . , n, at stage n. To do so, we introduce the following empirical local time
Lxn
Xn= 1 2n2/3
n
i=1
1X(i−1)∈[x−n−1/3, x+n−1/3] (1.2) which converges toLx1(X)asn→ ∞(see, e.g., [9]). The choice of the bandwithn−1/3 will prove to be technically useful. Define
Dn(x, ν)=ω∈:Lxn
Xn(ω)ν
accordingly. We will further restrict our attention to Dn(x, ν). For c >1, let c = {f: R→R;c−1f (x)c}and define the Lipschitz class
=c(L)=f: R→R; |f (x)−f (y)|L|x−y|∩c.
The space describes the minimal smoothness properties we require for the unknown parameterσ2.
An estimator Tn = (Tn(Xn, x), x ∈ I ) of (σ2(x), x ∈ I ) is a function which is Gn⊗B(I )measurable; we evaluate its performance uniformly over by means of its minimax risk
Rn(Tn, , ϕn)= sup
σ2∈
Eσ2 ϕn−2
I
Tn
Xn, x−σ2(x)2dx|DIn(ν)
where DnI(ν) =x∈IDn(x, ν) and ϕn >0 is a normalizing factor. Of course, the finiteness of Rn will only be meaningful if ϕn →0 as n → ∞. Here, Eσ2 means integration with respect to the probabilityPσ2. Thus we measure the quality of estimation in integrated quadratic loss, conditional on the eventDnI(ν).
1.2. Statement of the problem and objectives
An estimatorTn!is said to attain an optimal rate of convergence ϕn()if lim sup
n→∞ Rn
Tn!, , ϕn()<+∞ (1.3)
and no estimator can attain a better rate:
lim inf
n→∞ inf
Tn
Rn
Tn, , ϕn()>0 (1.4)
where the infimum is taken over all estimators. In Section 2, we show thatϕn()=n−1/3 is an optimal rate of convergence and prove that the Nadaraya–Watson estimator, introduced in this context by Florens in [4] attains the optimal rate (see also [8]). We understand ϕn() as an accuracy of estimation: for any confidence level α >0, we guarantee from (1.3) the existence of (an explicitly computable) γα>0 s.t.
sup
σ2∈
Pσn,ν2 Tn!−σ2Iγαϕn()α, (1.5) where Pσn,ν2 (·) = Pσ2{· | DIn(ν)} and fI =(If2(x) dx)1/2. Furthermore, in the optimality sense described by (1.4), this accuracy is the best one achievable uniformly over.
However, suppose we suspectσ2 to actually lie in a smaller parametric set, namely σ2∈0=0(I )given by
0(I )=f ∈: f (x)=σ02(x, θ ), θ∈%, x∈I, where%⊂Rs,s1 is given and the functionσ02(·, θ )is known up toθ.
Under some regularity assumptions on%andσ02(see Section 2 below), an optimal rate of convergence over0isϕn(0)=n−1/2and is attained by the least-square estimator Tn(0)=σ02(·,θˆn), where
θˆn=arg min
θ∈%
n
i=1
&niX2−σ02(X(i−1)/n, θ )2 (1.6)
and where we denote &niX =√
n(Xi/n −X(i−1)/n) the normalized increments of the observed process. Based on the hypothesis: σ2∈0⊂, we can hope to improve the accuracy of estimation. A traditional way of improvement is the so-called adaptive approach.
1.2.1. The adaptive approach
Intuitively, a practitioner would presumably: (1) test the hypothesisσ2∈0, (2) based on the acceptance of the test, choose the parametric estimator Tn(0), (3) keep the nonparametric estimator Tn! otherwise. From a mathematical point of view, such a procedure – call it temporarily Tn(a)=Tn(a)(x, Xn) – is admissible if it adapts to the sets(0, )in the following sense: define the adaptive rate
ψn(σ2)= n−1/2 if σ2∈0,
n−1/3 if σ2∈\0. (1.7)
ThenTn(a)should verify
lim sup
n→∞ Rn
Tn(a), , ψn(·)<+∞. (1.8)
However, even if we have satisfied the adaptive criterion (1.8), we are unable to state any accuracy of the method sinceψn=ψn(σ2)depends on the unknown, we cannot provide any confidence set of the type (1.5).
In this paper, we propose an alternative approach by introducing a procedure based on random normalizing factors (r.n.f. for abbreviation), following Lepski in [11] and [7].
This will enable us to improve the accuracy of estimation in the sense of (1.5). We will even show that a procedure based on r.n.f. can simultaneoulsy give an improvement of accuracy and be adaptive in the sense of (1.8).
1.2.2. Random normalizing factors
We introduce the class of observable normalizing factors (r.n.f.) n=ρn∈(0, ϕn()]: ρnisGn-measurable,
where Gn is the σ-field generated by the obervation Xi/n, i =0, . . . , n. Clearly, any estimatorTn satisfying
lim sup
n→∞ Rn(Tn, , ρn) <∞ for someρn∈n (1.9) attains the optimal rate of convergence over. But in contrast to an adaptive estimator, we now guarantee the existence of an explicitly computable γα from (1.9) such that for
anyα >0:
sup
σ2∈
Pσn,ν2 Tn!−σ2Iγαρn
α,
where we have set – recall (1.5) –Pσn,ν2 (·)=Pσ2{· |DnI(ν)}. This provides us with a new (possibly random) accuracy of estimation. The possibility that σ2 belongs to 0 may give a value to ρn essentially better than ϕn()=n−1/3 with some probability, while still ensuring a confidence set uniformly over.
Next, we need a consistent way to compare 2 r.n.f. in order to define an optimality cri- terion. Since a r.n.f. is random, we introduce the following (deterministic) characteristic:
DEFINITION 1. – For a given confidence level 0< αn <1, the characteristic of ρn∈nis
χn(ρn)=inft∈(0, ϕn()]: inf
σ2∈0
Pσn,ν2 (ρnt)1−αn
.
Note thatχn(ρn)depends onαnand on0.
Remark. – A heuristic approach to understand this definition can be the following: let us fixt >0 “small”, i.e. at least smaller thanϕn(). What we require is that a “good”
ρn will provide improvement of accuracy if the guess (σ2∈0) turns out to be true.
This means that underPσn,ν2 , forσ2∈0, the event “ρnt” has a controlled probability.
Mathematically, we translate this idea by saying that for a given confidence levelαn, we guarantee that
inf
σ2∈0
Pσn,ν2 (ρnt)1−αn. (1.10) Next, the smaller t we can find such that (1.10) holds, the better ρn hence χn(ρn) is defined as infimum oftproviding (1.10).
We now have a canonical way to compare r.n.f. We naturally derive the following optimality criterion:
DEFINITION 2. –ρn!∈nis optimal (orα-optimal) w.r.t.(, 0)if (i) There exists an estimatorTn!!such that
lim sup
n→∞ Rn(Tn!!, , ρn!) <+∞. (ii) For anyρn∈nsuch that
χn(ρn)
χn(ρn!)→0, asn→ ∞, we have
lim inf
n→∞ inf
Tn
Rn(Tn, , ρn)= +∞
where the infimum is taken over all estimators.
Remark 2. – Following Lepski, we call Tn!! an α-adaptive estimator. Note that by definition, we always have ρn!ϕn(). Thus an α-adaptive estimator is optimal with respect to . Also, note that anα-optimal random normalizing factor may depends in general on the quantityαn.
The aim of this paper is to construct an optimal random normalizing factor w.r.t (.0)following Definition 2 and anα-optimal estimator accordingly.
1.3. Organization of the paper
In Section 2, we recall and adapt some facts about statistical estimation of the diffusion coefficient from discrete observations. The nonparametric kernel estimator of Section 2.2 was introduced in [4] and later generalized to our setting in [8]. However, we give a self containing proof of the upper bound. The lower bound is new, and follows the same strategy as Proposition 1 in [6], with new technicalities.
Section 3 is devoted to the construction of an optimal r.n.f. for the diffusion coefficient under a parametric hypothesis. We discuss the link to adaptive estimation and show how a slight modification of the optimal (α-adaptive) estimator enables to obtain simultaneously an optimal accuracy of estimation and an adaptive estimator. Links to testing are also mentioned. The proofs are delayed until Section 4.
2. Preliminary results 2.1. Parametric estimation
We need the following regularity assumptions on%andσ02. Assumption A. – We have
(1) The set%is compact in Rs=R.
(2) For someM=MI>0, sup
x∈I
σ02(x, θ1)−σ02(x, θ2)M|θ1−θ2| forθ1, θ2∈%.
(3) For someη=ηI>0, inf(x,θ )∈I×%|dθdσ02(x, θ )|η.
(4) The functions dxd22σ02and dxd dθd σ02are well defined and continuous onI×%.
(5) The equalityσ02(x, θ1)=σ2(x, θ2)for allx∈I impliesθ1=θ2.
Remark 1. – The above assumptions are standard in parametric estimation but do not claim to be minimal. For instance, A5 can be relaxed but known extensions are usually difficult to check (see, e.g., [2]).
By Ito formula, the model given by (1.1) can be recast in a regression setting, having &niX2=n
i/n
(i−1)/n
σ2(Xs) ds+εni +a higher order term, (2.11)
whereεni =2n(ii/n−1)/n(Xs −X(i−1)/n) dWs is a martingale increment. Thus we observe on the non-uniform random grid(X(i−1)/n, i=1, . . . , n)the valuen(ii/n−1)/nσ2(Xs) ds∼ σ2(X(i−1)/n), contaminated by the noise εin, plus a negligible drift effect. From this formulation, we readily obtain the least-square estimator Tn(0)=σ02(·,θˆn), where θˆn is defined by (1.6).
PROPOSITION 1. – Grant Assumption A. Then ϕn(0)=n−1/2 is an optimal rate of convergence and is attained byTn(0).
The proof of the upper bound is readily obtained from Theorem 1 in [2]. We simply added ad hoc assumptions in order to obtain the uniformity inθ∈%for the integrated risk. The proof of the lower bound follows from the LAMN property of the parametric model (see, e.g., [1]).
Remark 2. – Note thatθˆnis not the best available parametric estimator ofθ – it is not equivalent to the MLE – but since we focus on rates of convergence only, this intuitively simple choice is sufficient.
2.2. Nonparametric estimation
We assume (with no loss of generality as far as practical considerations are concerned) thatI is on a dyadic scale, namely
I=k02−j0, k12−j0, for some integersk0, k1, j0. For integers(k, j ), letIj k= [k2−j, (k+1)2−j)and define forjj0
VIj = {f ∈L2: f is constant onIj k, Ij k⊂I}
the finite element space of functions which are piecewise constant on I over a grid of mesh 2−j. Indeed an orthogonal basis forVIj is given by the family
φj k=2j/2φ2j· −k, ksuch thatIj k⊂I,
where φ =1[0,1). We estimate σ2 by an element of VIjn, for some projection level jn
chosen in accordance with the asymptotics in the following way. Let ˆ
cjnk= 1 n 2jnk
n
i=1
&niX2φjnk(X(i−1)/n),
(with 0/0=0), where 2jnk = 2njnni=11X(i−1)/n∈Ijnk. Informally, we use the regression analogy
&niX2∼σ2(X(i−1)/n)+εin
defined in (2.11) and we weight the local average by an approximation 2jnk of the time spent by the process X inIjnk. Finally, the nonparametric estimator Tn!(x)ofσ2(x)is
defined by
Tn!(x)=
k:Ijnk⊂I
ˆ
cjnkφjnk(x)
and is specified by the projection level jn. The performances ofTn! are summarized in the following result. Forx∈R, we denote byxthe integer part ofx.
PROPOSITION 2. – An optimal rate of convergence over is ϕn() = n−1/3. Moreover,Tn!, calibrated byjn= 13logn∨j0is optimal for the criterion given by (1.3) and (1.4).
3. Main result
This section is devoted to the construction of an optimal r.n.f. ρn! in the sense of Definition 2. Accordingly, we construct an α-adaptive estimator w.r.t. (0, ). Our algorithm can be described as follows:
(1) Estimate the distancedˆn (for the · I seminorm) betweenσ2and0,
(2) Takeρn!=n−1/3 ifdˆn is above some threshold level (possibly depending on the confidence levelαn).
(3) Takeρn!=ϕn,αn>0 for some normalizing factorϕn,αn (tuned with the asymptot- ics) otherwise.
We will show that we can takeϕn,αnconverging to 0 faster thann−1/3by a polynomial power. The value ϕn,αn corresponds to the acceptance that σ2∈0 and measures the improvement of the accuracy of estimation. The assumptions on the parametric family 0are less stringent than in Section 2.
Assumption B. – We have
(1) The set%is compact in Rs,s1.
(2) There existµ >0 andM=MI,µ>0 such that sup
x∈I
σ02(x, θ1)−σ02(x, θ2)Mθ1−θ2µ forθ1, θ2∈%.
(3) For allθ∈%,σ02(·, θ )∈0.
(4) There existθ0∈%andL < L˜ such thatσ02(·, θ0)∈c(L).˜ 3.1. Construction ofρnand main result
ForJnj0andθ∈%, define dn(θ )=
k:IJnk⊂I
cˆJnk−cJnk
σ02(·, θ )2,
where
cJnk
σ02(·, θ )=2Jn/2
IJnk
σ02(x, θ ) dx.
By Assumption B2, there existsC=C(µ, MI,µ)such that
|dn(θ1)−dn(θ2)|Cθ1−θ2µ forθ1, θ2∈%.
This, together with Assumption B1, ensures the existence of,θn!∈%, measurable w.r.t.
Gn-measurable and solution to
dn(θn!)= inf
θ∈%dn(θ ).
For technical reasons, we need to compensate the variance ofdnas follows. Let CJnk= 2
3n22J2nk n
i=1
&niX4φJ2
n,k(X(i−1)/n) and
dˆn(θn!)=dn(θn!)−
k:IJnk⊂I
CJnk.
The estimate dˆn(θn!) will determine the following decision rule. For αn >0, let Jn= 25log√ n
logα−1n ∨j0and
ϕn,αn=
logαn−1 n
2/5
.
Note that 2−Jn andϕn,αn coincide up to a constant. Define now the threshold λ=A(10, c, ν)−1/4,
whereA(t, c, ν)is specified in (4.17), see the proof of Lemma 3. (The choice ofλwill become transparent in the proof of Theorem 1 in Section 4 below when Lemma 3 is used.) The decision rule then takes the form
ρn!=ϕn,αn1{ ˆd
n(θn!)λ2ϕn,αn2 }+n−1/31{ ˆd
n(θn!)>λ2ϕ2n,αn}
and
T¯n(x)=
k:IJnk⊂I
cJnk
σ02(·, θn!)φJnk(x).
Finally, our estimator ofσ2(x)is
Tn!!(x)= ¯Tn(x)1{ρn!=ϕn,αn}+Tn!(x)1{ρn!=n−1/3},
whereTn!is the nonparametric estimator of Section 2.2 specified withjn= 13logn∨j0. THEOREM 1. – Grant Assumption B. Assume thatn−aαn<2−4for some arbitrary a >0. Then ρn! is an optimal random normalizing factor w.r.t. (0, ) and Tn!! is α-adaptive.
Remark 1. – In particular, we see that if our parametric assumption is correct, with prescribed confidence 1−αn, we are able to improve (asymptotically) the accuracy of estimation, i.e., the size of the confidence band we construct, by a factor ϕn,αn = (
logαn−1/n)2/5.
Remark 2. – The improvement of the random confidence band is of a polynomial order, but is lowered down by the size ofαn. However, the restriction αnn−aensures that it is a least of order(√
logn/n)2/5.
Remark 3. – For practical purposes, it seems more clever to replaceνin the definition ofλby infk:IJnk⊂I2Jnk. For technical reasons, we are unable however to prove Theorem 1 in this setting. Note also that the practical implementation ofρn!is easy: the computation of cˆJnk is reasonably fast and the cardinality (in k) of such cˆJnk is of order logn.
Likewise for the CJnk. Eventually, the minimization problem arg minθ dn(θ ) has the same complexity as the computation of a standard parametric estimator. Once ρ!n is computed, one readily computes either T¯n (which is no more difficult to obtain than Tn!) orTn!itself, the standard Nadaraya–Watson estimator.
3.2. Discussion
3.2.1. Links to adaptive estimation
We show in this paragraph how a simple modification of Tn!! provides us with an adaptive estimator – in the usual sense of (1.8) – without loosing the optimality in terms of r.n.f. Assumption A is in force here. We consider the estimatorTn(0)=σ02(·,θˆn) introduced in Section 1.2, whereθˆnsolves (1.6). Note thatTn(0)is well defined thanks to Assumption A. Let
Tn(a)(x)=Tn(0)(x)1{ ˆdn(θˆn)λ2ϕn,αn2 }+Tn!(x)1{ ˆdn(θˆn)>λ2ϕ2n,αn}. Define the random normalizing factorρn(a)accordingly
ρn(a)=ϕn,αn1{ ˆdn(θˆn)λ2ϕ2n,αn}+n−1/31{ ˆdn(θˆn)>λ2ϕ2n,αn}. THEOREM 2. – Grant Assumption A. We have
(i) The estimatorTn(a)isα-adaptive w.r.t.(0, ).
(ii) If moreoverαn=O(n−1), the estimatorTn(a) is adaptive in the usual sense w.r.t.
(0, ):
lim sup
n→∞ Rn
Tn(a), , ψn(·)<∞,
whereψn(σ2)denotes the adaptive rate defined by (1.7).
The proof of (i) is delayed until Section 4. The proof of (ii) is a direct consequence of (i) and Proposition 2 in [11].
Remark 1. – The decision rule 1{ ˆd
n(θˆn)λ2ϕn,αn2 }answers to our original question: given a parametric procedure Tn(0) versus a nonparametric oneTn!, which one shall we use in practice? The answer 1 to our test yields the choice Tn(0), whereas the answer 0 yields
Tn!. The precise mathematical consequences of this choice are described by the r.n.f.
ρn(a). Moreover, this choice is optimal in the sense of Definition 2.
Remark 2. – Again – see Remark 3 in Section 3.1 – we can see that the practical implementation ofTn(a)is fast and has complexity no worse than that ofTn!.
3.2.2. Links to testing
We explore in this section another virtue ofρn!, namely the possibility to build a test for the hypothesis σ2∈0 against a family of local alternatives. More precisely, given h >0 and 0< α <1, define
Wn(h, I, α)=f ∈c(L): inf
θ∈%
f−σ02(·, θ )Ih·ϕn,α
.
THEOREM 3. – Grant Assumption A. For any 0< α, β < 1, we have, for large enoughn
(i) supσ2∈0(I )Pσn,ν2 (ρn!=n−1/3)α.
(ii) There existsh(β) >0 such that sup
σ2∈Wn(h(β),I,α)
Pσn,ν2
ρn!=ϕn,α
β.
In words, the hypothesis: σ2 ∈ 0(I ) can be tested against the family of local alternatives:σ2∈Wn(h(β), I, α)with prescribed first and second type error probability.
The proof of (i) readily follows from (4.27) in the proof of Theorem 1 below. The proof of (ii) follows from Theorem 1 together with Proposition 3 in [11].
4. Proofs
The proof of Proposition 2 can be read independently from the that of Theorems 1 and 2. The reader is however invited to first scan the notation and preliminaries section.
4.1. Notation and preliminaries Forθ ∈%, we abbreviate Pσ2
0(·,θ ) byPθ when no confusion is possible. Forσ2∈, letP˜σ2 denote the law of the processY such thatdYt =σ (Yt) dWt,Y0=x0. DefineP˜σn,ν2
accordingly. We denote byC a generic constant, possibly varying from line to line and which may depend onI. Any other dependence will be explicitly mentioned.
4.1.1. Preliminary decompositions (a) Forσ2∈, define
aknσ2=23Jn/2 n2Jnk
n
i=1
1X(i−1)/n∈IJnk
IJnk
τin(x, X) dx,
bnkσ2= 2Jn/2 n2Jnk
n
i=1
1X(i−1)/n∈IJnkεin,
where
τin(x, X)=σ2(x)−n i/n
(i−1)/n
σ2(Xs) ds. (4.12) Define the random variable
&n
σ2=
k:IJnk⊂I
cJnk
σ02(·, θn!)−σ2−aknσ22, (4.13)
where – recall Section 2 – we denotecj k(σ2)=2j/2I
j kσ2(x) dx. Using that(&niX)2= n(ii/n−1)/nσ2(Xs) ds+εinunderP˜σ2 we have
ˆ
cJnk−cJnk=ankσ2+bknσ2. We thus obtain the following decomposition
dˆn(θn!)=&n
σ2+Mn
σ2+Nn
σ2,
having
Mn
σ2=2
k:IJnk⊂I
cJnk
σ2−σ02(·, θn!)−ankσ2bnkσ2,
Nn
σ2=
k:IJnk⊂I
bnk2−CJnk
.
(b) Define ψ(x) =1[0,1/2)(x)−1[1/2,1)(x). If dj k
σ2= σ2(x)ψj k(x) dx is the wavelet coefficient of σ2 in the Haar basis, from the multiscale decomposition of σ2, we have by Parseval’s identity
T¯n−σ22I=
k:IJnk⊂I
c2Jnkσ2−σ02(·, θn!)+Jn
σ2
2&n
σ2+2
k:IJnk⊂I
aknσ22+Jn
σ2,
where the remainder termJn(σ2)=jJn
k:Ij k⊂Idj k2(σ2)satisfies Jn
σ2C(c, L)2−2Jn=C(c, L)ϕn,α2 n
sinceσ2∈. Note that
τin(x, X)1X(i−1)/n∈IJnkL
2−Jn+n i/n
(i−1)/n
|Xs−X(i−1)/n|ds
, (4.14)
therefore |τin(x, X)|1X(i−1)/n∈IJnk L(2−J−n + n−1/2rin(σ2)), and since we have supσ2∈E˜σn,ν2 (|rin(σ2)|p)Cpfor allp1 by the Burkholder–Davis–Gundy inequality