www.imstat.org/aihp 2011, Vol. 47, No. 4, 1197–1218
DOI:10.1214/11-AIHP432
© Association des Publications de l’Institut Henri Poincaré, 2011
Irregular sampling and central limit theorems for power variations: The continuous case 1
Takaki Hayashi
a, Jean Jacod
band Nakahiro Yoshida
caGraduate School of Business Administration, Keio University, 4-1-1 Hiyoshi, Yokohama 223-8526, Japan. E-mail:[email protected] bInstitut de mathématiques de Jussieu, Université P. et M. Curie (Paris-6), 4 place Jussieu, 75252 Paris, France (CNRS UMR 7586).
E-mail:[email protected]
cGraduate School of Mathematical Sciences, University of Tokyo, 3-8-1 Komaba, Tokyo 153, Japan. E-mail:[email protected] Received 6 March 2009; revised 2 May 2011; accepted 9 May 2011
Abstract. In the context of high frequency data, one often has to deal with observations occurring at irregularly spaced times, at transaction times for example in finance. Here we examine how the estimation of the squared or other powers of the volatility is affected by irregularly spaced data. The emphasis is on the kind of assumptions on the sampling scheme which allow to provide consistent estimators, together with an associated central limit theorem, and especially when the sampling scheme depends on the observed process itself.
Résumé. Dans le contexte de données à haute fréquences, il est fréquent de recueillir les informations le long d’une grille irrégi- lière, par exemple aux instants de transaction pour les données financières. Dans cet article, nous étudions comment l’estimation de l’intégrale du carré, ou d’autres puissances, de la volatilité est affectée par l’irrégularité des données. L’accent est mis sur le type d’hypothèses qu’il est nécessaire de faire sur la répartition des observations, en particulier lorsque celles-ci dépendent du processus observé lui-même, de façon à obtenir un théorème limite central pour nos estimateurs.
MSC:Primary 60G44; 62M09; secondary 60G42; 62G20
Keywords:Quadratic variation; Discrete observations; Power variations; High frequency data; Stable convergence
1. Introduction
The approximation of the quadratic variation by the “realized” quadratic variation plays a very important role in practice, especially for financial data, since such data are necessarily reported at discrete times. Other quantities of interest, based on discrete observations as well, have been considered in many recent papers: these quantities are sums of functions of the successive increments of the process, usually some powers or absolute powers of those increments.
They are used to estimate some characteristics of the jumps of the observed process, or the volatility of the continuous part, or for various testing problems about jumps for example.
However, if the behavior of the realized quadratic variation and other similar functionals is well known when the observations come in regularly, this is no longer the case when the observation times are irregularly spaced and, even worse, when they are random. Relatively few papers are so far available in that case: see [1,14] and [15] for deterministic observation times, and [4] for some special random times like hitting times, and [16] for a situation which
1Supported in part by JST Basic Research Programs PRESTO, Grants-in-Aid for Scientific Research of Japan Society for the Promotion of Science, No. 19340021 and No. 19530186, and by the Global COE Program “The research and training center for new developments in mathematics” of the Graduate School of Mathematical Sciences, University of Tokyo.
is relatively close to the present paper, and [6–9] and [2] when the process is multidimensional and the observation times exhibit some sort of relatively restricted randomness, or are random but independent of the observed process.
In the five last papers, the situation is quite complex because the various components are observed at different times.
All those papers are concerned with a continuous underlying process. One may also quote related works dealing with estimation of various parameters with random sampling, like [5] for diffusions and [3] for Markov processes.
Here, our aim is relatively modest: we consider only a continuous underlying process, which is 1-dimensional, quite a restrictive setting indeed which in particular lets aside the problem of non-synchronous observations for multivariate processes. The observation times are possibly random, and we try to find conditions on those times, allowing for limit theorems when the number of observations increases (the “high frequency” setting), which are as general as possible, and in particular accommodate some dependency between the sampling times and the process itself: this is the main novelty of this paper. We also consider the problem of the estimation of integrated powers of the volatility other than 2, for example the estimation of the quarticity, a quantity which turns out to be more and more frequently used in practice. In a sense, we are in the line of [1] and [16], and we try to weaken the assumptions on the observation scheme and the underlying observed processXas much as we can, together with providing an estimation method for other powers than the integrated volatility.
Of course, the assumptions about the observation times, as discussed below, are arbitrary to some extent. What we have in mind is to account for high frequency financial data, and especially those recording transaction times and prices, which are notoriously irregularly spaced. Note that in this paper we only consider a semi-realistic situation, where the process is supposed to be observed exactly at each observation time: we do not consider the problem of microstructure noise. For real applications to finance, this problem should be taken into account.
Let us be more specific. The process of interest is a 1-dimensional continuous semimartingaleX on a filtered probability space(Ω,F, (Ft)t≥0,P), which is of Itô type, that is of the form
Xt=X0+ t
0
bsds+ t
0
σsdWs, (1.1)
whereW is a Wiener process andbandσ are progressively measurable processes. In this paper our aim is to give estimators for the following “integrated” quantities:
V (p)t=mp t
0
|σs|pds (1.2)
(heremp is thepth absolute moment of the standard normal lawN(0,1), and it appears here for later convenience), whenp≥2, together with “feasible” estimators for the variance or conditional variance of these estimators, so as to be able to construct confidence intervals for example. The restrictionp≥2 is not essential, similar results hold when 1< p <2, and also when 0< p≤1 under some additional assumptions, butp≥2 covers the most interesting situations and is simpler technically speaking.
At stage nthe process X is observed along a strictly increasing sequence of – possibly random – finite times T (n, i), i≥0, starting atT (n,0)=0, and we use the notation
(n, i)=T (n, i)−T (n, i−1), I (n, i)=
T (n, i−1), T (n, i) , Ntn=inf
i: T (n, i) > t
−1, πtn=supi=1,...,Nn
t+1(n, i)
(1.3) (πtn is the “mesh” up to timet, by convention inf(∅)= ∞and sup(∅)=0, andNtn is the number of observation times within(0, t]). Also, for any processY we write
niY=YT (n,i)−YT (n,i−1). (1.4)
Among (many) other technical requirements, we will always assume the following minimal ones:
n≥1 ⇒ T (n, i)→ ∞ P-a.s., asi→ ∞, t≥0 ⇒ πtn−→P 0 asn→ ∞.
(1.5)
With the convention0
i=1=0, we take as our estimator forV (p)tof (1.2), at stagen, the variable Vn(p)t=
Ntn
i=1
(n, i)1−p/2 niX p. (1.6)
The upper limit of the sum is such thatVn(p)tinvolves the observations occurring up to timetonly. As is well known, under regular sampling (that is,T (n, i)=infor some time lagngoing to 0), thenVn(p)u.c.p.−→V (p)(this denotes
“convergence in probability, locally uniform in time”), as soon asσ is càdlàg andbsatisfies a suitable integrability condition and even under no condition at all whenp=2. Furthermore under some more regularity conditions, we have an associated CLT (Central Limit Theorem), and standardized versions are also available: see e.g. [12] and [11].
Our aim is to prove the same results for irregular sampling, and in particular to prove standardized versions of the CLT for the estimators. We already know, of course, thatVn(2)converges toV (2)as soon as theT (n, i)’s are stopping times subject to (1.5), but for the other values ofpand for the CLT, we need appropriate additional assumptions on the sampling scheme. Note that for a regular scheme the factor(n, i)1−p/2=1n−p/2goes out of the sum, and this is of course how the “standard” result is stated. The idea to place this factor inside the sum is due to [1].
In Section2we state the precise assumptions on the underlying processX and the sampling schemes. Section3 contains the basic results, and a discussion of their applicability. The proofs are gathered in Sections4,5and6.
2. Setting and assumptions
2.1. Assumptions onX
The assumptions onXwill vary according to the powerpin which we are interested. We always suppose thatXhas the form (1.1), and we use two different assumptions, with(B)stronger than(A)below:
Assumption(A). We have(1.1),and the processbis locally bounded,and the processσis càdlàg(=right continuous with left limits).
Assumption(B). We have(1.1),and the processσ is also a(possibly discontinuous)Itô semimartingale,which can be written as
σt=σ0+ t
0
bsds+ t
0
σsdWs+Mt+
s≤t
σs1{|σs|>1}, where
•Mis a local martingale with|Mt| ≤1,orthogonal toW,and its predictable quadratic covariation process isM, Mt=t
0asds.
•The predictable compensator of
s≤t1{|σs|>1}is t
0asds.
⎫⎪
⎪⎪
⎪⎪
⎬
⎪⎪
⎪⎪
⎪⎭
(2.1)
Moreover,the processesb,a and a are locally bounded,and the processesσ andb are left continuous with right limits.
In(B),Mmay have jumps, and also a non-vanishing continuous martingale part, which must then be a stochastic integral with respect to another Brownian motion independent ofW.
The above assumptions are exactly those under which the theorems given below hold, for regular sampling schemes, except for the case ofVn(2)which requires a bit less.
2.2. The sampling scheme
The sampling scheme, that is the collection((T (n, i))i≥0: n≥1), is subject to a number of conditions. There is first a structural assumption:
Assumption(C). There is a sub-filtration(Ft0)t≥0of(Ft)t≥0with the following properties:
• W andbandσ are adapted to(Ft0);
• any(Ft0)martingale is also an(Ft)-martingale;
• each variableT (n, i) is an (Ft)-stopping time which,conditionally on FT (n,i−1), is independent of theσ-field F0=
t >0Ft0.
This assumption is satisfied when theT (n, i)’s are non-random (deterministic schemes), and when theT (n, i)’s are independent of the processes(W, X, b, σ )(independent schemes), but it includes many other cases as well. However it excludes somea priori interesting situations: whenFt0=Ft, then(C)amounts to saying that those stopping times are “strongly predictable” in the sense thatT (n, i)isFT (n,i−1)-measurable, and this is quite restrictive; for example it excludes the case where theT (n, i)’s are the successive hitting times of a spatial grid byX, a case considered in [4]
under some restrictive assumptions onX.
Apart from(C)and (1.5), the sampling scheme should also be not too wildly scattered, and its meshesπtnshould converge to 0 at some deterministic rate. This rate is expressed through a sequencern→ ∞of positive – non random – numbers. The assumptions below all involve this sequencern, in an implicit way.
Before stating the assumptions, and for anyq≥0, we introduce the processes
A(q)nt =rnq−1
Ntn
i=1
(n, i)q. (2.2)
The normalizationrnq−1is motivated by the regular schemesT (n, i)=in, for whichA(q)nt =n[t /n](with the choicern=1/n) converges towardst for anyq≥0. Note also thatA(0)nt =Ntn/rn, and sincet−πtn≤A(1)nt ≤t we deduce that
A(1)nt −→P t (2.3)
as soon as (1.5) holds. Then, withq≥0, we set:
Assumption(D(q)). We have(C)and(1.5),and there is a(necessarily nonnegative)(Ft0)-optional processa(q),such that for alltwe have
A(q)nt −→P t
0
a(q)sds. (2.4)
Note that(D(q))for some sequencern implies(D(q))for any other sequencern such thatrn/rn→α∈ [0,∞), and the new limit in (2.4) and whenq >1 is thenαq−1a(q), and in particular vanishes whenrn/rn→0: the forth- coming theorems which explicitly involvernare true but “empty” when the limit in (2.4) vanishes identically. Regular sampling schemes with lagnsatisfy(D(q))for allq≥0, withrn=1/nanda(q)t=1.
Some important connections between these assumptions are stated in the next lemma, to be proved in Section4.
Lemma 2.1. Let0≤q < p < q .Then for all0≤s < twe have A(p)nt −A(p)ns ≤
A(q)nt −A(q)ns(q−p)/(q−q) A
q n t −A
qn s
(p−q)/(q−q)
. (2.5)
Moreover,if (D(q))holds for someq=1and ifpis strictly between1andq,from any subsequence one may extract a further subsequence which satisfies(D(p)),and we have versions ofa(q)anda(p)satisfyinga(p)t≤a(q)(pt −1)/(q−1).
Note that (2.4) for alltimplies thatrnq−1Ntn
i=1HT (n,i)(n, i)qu.c.p.−→t
0Hsa(q)sdsas soon asHis càdlàg. In some cases we need this convergence to hold at a rate faster than 1/√rn, and we express this in the following assumption.
It is indeed a very strong assumption, usually not satisfied by random schemes, and in particular not satisfied by the independent schemes for which the(n, i)’s are i.i.d. whenivaries, but not deterministic.
Assumption(D(q)). We have(D(q)),and further for allt≥0and all càdlàg(Ft0)-adapted processesHwe have
√rn
rnq−1
Ntn
i=1
HT (n,i)(n, i)q− t
0
Hsa(q)sds
u.c.p.
−→0. (2.6)
For a deterministic scheme,(D(q))may or may not be satisfied, but there is no simple criterion to ensure that it holds. For a random scheme, it may be useful to describe conditions on the laws or on the conditional laws of the lags (n, i), which ensure(D(q)). For eachnand eachq≥0 we choose an(Ft)-optional(0,∞]-valued processG(q)n such that
G(q)nT (n,i−1)=rnqE
(n, i)q|FT (n,i−1)
. (2.7)
This specifiesG(q)nt only at the timest=T (n, i), so there are many such processesG(q)n. A simple choice consists in takingG(q)nt to be equal to the right side of (2.7) whenT (n, i−1)≤t < T (n, i)(a piecewise constant process). But other choices are possible, and perhaps more appropriate in view of the forthcoming assumption. We can obviously takeG(0)nt =1, and by Hölder’s inequality, we can and will choose processesG(q)nwhich satisfy
0≤p≤q ⇒ G(p)n≤
G(q)np/q
. (2.8)
Then we set, withq≥1:
Assumption(E(q)). We have(C)and(1.5),and for eachp∈ [0, q]there is a càdlàg processG(p),adapted to(Ft0), and furtherG(1)andG(1)−do not vanish,such that for an appropriate choice ofG(p)nwe have
G(p)nu.c.p.−→G(p). (2.9)
Lemma 2.2. Assume(E(q))for someq >1.Then(D(p))holds for allp∈ [0, q),with a(p)t=G(p)t
G(1)t
, (2.10)
and in particular
1
rnNtn−→P t
0
1 G(1)s
ds. (2.11)
For some results below we will also need a rate of convergence in (2.9), which is expressed in the following:
Assumption(E(q)). We have(E(q)),and for eachp∈ [0, q]we have√rn(G(p)n−G(p))u.c.p.−→0.
Finally, we introduce a kind of sampling schemes which are somehow restrictive but should accommodate many practical applications, and are calledmixed renewal schemes. These schemes are constructed as follows: we consider a filtration(Ft0)as in(C), and a double sequence(ε(n, i): i, n≥1)of i.i.d. positive variables on(Ω,F,P), independent ofF0, with moments
mq=E
ε(n, i)q
. (2.12)
We may havemq= ∞forq >1, but we assume thatm1<∞. We consider a sequencevn of positive(Ft)-adapted processes, and we defineT (n, i)by induction onias follows:
T (n,0)=0, T (n, i+1)=T (n, i)+ 1 rn
vnT (n,i)ε(n, i+1). (2.13)
Then we take for(Ft)the smallest filtration containing(Ft0)and such that eachT (n, i)is a stopping time. In this situation, a natural choice for the processesG(q)nof (2.7) isG(q)nt =mq(vnt)q. Again, the next lemma is shown in Section4.
Lemma 2.3. Let(T (n, i))be a mixed renewal scheme.
(a) We have(C),and alsoT (n, i)→ ∞a.s.asi→ ∞if1/vnis a locally bounded process.
(b) Suppose further thatvnu.c.p.−→vwherev is an(Ft0)-adapted càdlàg processvsuch that bothv andv−do not vanish,and thatmq<∞for someq≥1 (this is always true forq=1).Then(E(q))holds withG(p)t=mp(vt)pfor allp∈ [0, q],and(D(p))holds for allp∈(0, q]witha(p)t=mmp
1
(vt)p−1. (c) Under the assumptions of (b),and if√rn(vn−v)u.c.p.−→0,we have(E(q)).
Remark 2.4. In many practical situations,see e.g. [10]for concrete examples,one is led to consider simultaneously the scheme(T (n, i)),and another scheme(T (n, i))which is a “sub-scheme” of the first one;typically one considers only the even observations,that is we takeT (n, i)=T (n,2i).Apart from(1.5),none of the previous assumptions is preserved by such a transformation,and in particular(C)is not satisfied in general by the new scheme,unless the original scheme is deterministic.
In fact another assumption,which replaces (C),has been introduced in [8],namely that T (n, i)is a stopping time for the filtration (F(t−un)+)t≥0,where un=1/rnξ and ξ ∈(0,1). Such a property is obviously shared(upon a modification ofξ)by the sub-scheme T (n, i)above.In this setting,the particular casep=2 and q=0of the forthcoming results has been proved(under stronger assumptions onX,though).We cannot fit this other assumption within our framework in general,but when(D(q))holds for someq >1 then the same “localization procedure” as in Lemma4.1below implies that under this assumption we can modify the scheme so that(C)holds,and without modifying the asymptotics below,providedξ <1−1/q.
3. The results
3.1. Limiting results
As we will see, the behavior ofVn(p)is not enough for our purposes, and we need to establish the convergence in probability for more general processes. Forp >0 andq≥0 we set
Vn(p, q)t=
Ntn
i=1
(n, i)q+1−p/2 niX p, (3.1)
so in particularVn(p)=Vn(p,0). In the regular sampling case(n, i)=nthese processes all convey the same information, sinceVn(p, q)=qnVn(p), but this is no longer the case in the irregular sampling case.
Our first result is a law of large numbers, which goes as follows:
Theorem 3.1. Letp≥1andq≥0.Assume(A)and(D(q+1)).Then rnqVn(p, q)tu.c.p.−→V (p, q)t:=mp
t
0
|σs|pa(q+1)sds. (3.2)
In particular,under(A)and(C)and(1.5),we have Vn(p)t
u.c.p.
−→V (p)t=mp
t
0 |σs|pds. (3.3)
For applications, we need an associated central limit theorem. For its statement, we need to recall the notion of F0-stable convergence in law for a sequence of random variables (or processes)Yndefined on(Ω,F,P), see [13] for
more details. We say thatYn convergeF0-stably in lawtoY, whereYis a variable defined on an extension(Ω, F,P) of(Ω,F,P), if we have
E
Zh(Yn)
→E Zh(Y )
: ZboundedF0-measurable,hcontinuous bounded. (3.4) There are in fact two versions for the CLT. The first is associated with (3.3), and is thus the most useful in practice, and it also holds for (3.2) whenq >0 under a strong additional assumption:
Theorem 3.2. Letp≥2,and assume(A)whenp=2and(B)whenp >2.Letq≥0and assume one of the following two sets of hypotheses:
(i) q=0and(D(2));
(ii) q >0and(D(q+1))and(D(2q+2))and(D(q+1)).
Then the processes√rn(rnqVn(p, q)−V (p, q))convergeF0-stably in law to V (p, q)t=
m2p−m2p t
0
|σs|p
a(2q+2)sdWs, (3.5)
where W is a standard Brownian motion,defined on an extension of (Ω,F, (Ft)t≥0,P) and independent of F.
Moreover,conditionally onF,the processV (p, q)is a continuous centered Gaussian martingale with(conditional) variance at timet:
m2p−m2p
t
0 |σs|2pa(2q+2)sds. (3.6)
As mentioned before,(D(q+1))is a very strong assumption, which for example is never satisfied by mixed renewal schemes unless the variablesε(n, i)are constant. So whenq >0 the previous CLT hardly applies.
In practice we usually want to estimate V (p)t, so the above seems enough because we can use the set (i) of assumptions for this. However, as we will see in Section 3.2below, we need also an estimate for the conditional variance (3.6) whenq=0, that is form2p−m
2p
m2p V (2p,1): for this, we may use(1−mm2p2p)rnVn(2p,1)tby virtue of (3.2).
This is why we have introduced the processes Vn(p, q)for q >0. But now, for asserting the quality of the latter estimator, we need a CLT for the processesVn(p, q)under reasonable assumptions, weaker than (ii) above. This is achieved in the following result:
Theorem 3.3. Letp≥2andq≥0.Assume(B)and that the sampling scheme satisfies(E(2q+2))and(D(2q+2)), and that the(Ft0)-adapted processesG(p)forp∈ [1, q+1]are Itô semimartingales with the same properties as the processσ in Assumption(B).Then the processes√rn(rnqVn(p, q)−V (p, q))convergeF0-stably in law to
V (p, q)t= t
0
|σs|p
a(2q+2)sw(p, q)sdWs, (3.7)
whereW is as in Theorem3.2and
w(p, q)t=m2p−m2p2G(q+2)tG(q+1)tG(1)t−(G(q+1)t)2G(2)t
G(2q+2)t(G(1)t)2 . (3.8)
One may check that 2G(q+2)G(q+1)G(1)−(G(q+1))2G(2)≤G(2q+2)(G(1))2always, becauseq→G(q)t has the same structure as the moments of a random variable. Thusw(p, q)t≥m2p−m2p, and this inequality is strict in general, unlessq=0 of course. Whenq=0 this theorem is a special case of Theorem3.2, with stronger assumptions on the sampling scheme.
Remark 3.4. By Lemma2.2,(E(2q+2))implies(D(p))for allp <2q+2,but not necessarily forp=2q+2,and this is why we add the assumption(D(2q+2)).
Remark 3.5. When q >0 we apparently have two different limits for the same sequence of processes,in the two previous theorems.However if both (E(2q +2+ε)) and (D(q +1)) hold for someq >0, one may check that G(p)nt −(G(1)nt)pu.c.p.−→0 for allp≤2q+2,and this impliesG(p)t =(G(1)t)p,which in turn yieldsw(p, q)t = m2p−m2p.When the scheme is a mixed renewal scheme,this also implies that the variablesε(n, i)are in fact all equal to a constant,and the scheme is strongly predictable.
Remark 3.6. We have consideredp >1and p≥2respectively in the theorems above:this considerably simplifies the proofs,but the same results are also available when p∈(0,1]for Theorem3.1 and p∈(1,2)for Theorems 3.2and3.3.The last two theorems also hold forp∈(0,1]when the processesσ andσ−do not vanish.Note also that whenp=2,Theorem3.10holds under(A)instead of(B),but here again the proof is complicated.
Remark 3.7. Both CLTs above have a multidimensional extension,in the sense that we can consider a finite family (pj, qj)1≤j≤dof indices withpj≥2andqj≥0.We then have theF0-stable convergence in law of thed-dimensional processes with components√rn(Vn(pj, qj)−V (pj, qj)).In the setting of Theorem3.2for example,the limit is, conditionally onF,a continuous centered d-dimensional Gaussian martingale on an extension of the space,with covariance at timet between thejth andkth components given by
(mpj+pk−mpjmpk) t
0 |σs|pj+pka(qj+qk+2)sds, to be compared with(3.6).
Remark 3.8. There is another easy extension to more general test functions.Namely,instead of(3.1)we can consider the process
Vn(f, q)t=
Ntn
i=1
(n, i)q+1f niX/
(n, i)
, (3.9)
wheref is a function onRwith at most polynomial growth and some smoothness properties:it should beC1for the law of large numbers,andC2for the CLT.Letρa denote the normal lawN(0, a2)andρa(f )the integral off with respect toρa.The law of large numbers reads as Theorem3.1,with the limitt
0ρσs(f )a(q+1)sds,and the CLTs are as Theorems3.2or3.3:for example,for the former theorem,for the limiting conditional variance one should replace(3.6)by
t 0
ρσs f2
−ρσs(f )2
a(2q+2)sds.
Remark 3.9. We can even extend all the results to ad-dimensional underlying processXof the form(1.1),with nowW ad-dimensional Brownian motion.Taking test functionsf onRd,we can consider the processesVn(f, q)of(3.9), and derive the same results as in the previous remark(now,ifais ad×d matrix,ρadenotes thed-dimensional law N(0, aa)).The proofs are exactly the same,with slightly more cumbersome notation.
However,this trivial extension requires the observation timesT (n, i)to be the same for all components ofX:this is a fundamental restriction,since when the observations are irregularly spaced they also are typically non-synchronous:
this is why we have not treated explicitly this case here.
3.2. Estimation of the integrated volatility and other powers
In practice, one wants to estimates the variableV (p)t at some timet >0, mostly whenp=2, but the casep=4 is also of interest. An estimator isVn(p)t, and Theorem3.2gives a central limit theorem for this estimator: it says that the normalized estimation error√rn(Vn(p)t−V (p)t)is centered normal, conditionally onF0. Even for regular sampling schemes, for whicha(2)s=1, this result is not directly usable in practice for deriving confidence intervals
for example, because t
0|σs|2pds is a priori unknown. But here things are worse because the processa(2)is also unknown, and in fact we also ignorern. However, we can “standardize” by considering the variable
Ttn=
m2p
(m2p−m2p)Vn(2p,1)t
Vn(p)t−V (p)t
. (3.10)
Then the following holds:
Theorem 3.10. Letp≥2.Assume(D(2)).Assume also(A)whenp=2and(B)otherwise.Then for anyt >0such thatt
0|σs|2pa(2)sds >0a.s.,the sequenceTtnconverges in law(and evenF0-stably in law)toN(0,1).
All ingredients in the definition ofTtn are known to the statistician, except of course the quantity V (p)t to be estimated. Therefore we can derive (asymptotic) confidence intervals forV (p)t in a straightforward way.
If one wants to estimate V (p, q)t one can also use Theorems3.2or 3.3, depending on the assumptions on the scheme. However, so far we have no estimators for the conditional variance t
0|σs|2pa(2q+2)sw(p, q)sds in the second theorem, hence we have no feasible statistics for deriving confidence intervals for V (p, q)t when q >0, except under the strong assumption(D(q+1)).
This theorem should be compared with Theorem 4.1 of [1], which states exactly the same result, when the processX has no drift and the volatility σt is possibly random but independent ofW: in this paper the authors consider only deterministic sampling schemes but, if their assumptions are not formally comparable to ours, they are in some sense significantly weaker: namely, they suppose that mini:i≤Ntn(n, i)2/3/πtn→ ∞as n→ ∞. In [16], Theorem 3.1, is also the same result as above, forp=2 and again when there is no drift: in that paper the sampling scheme is somewhat similar to a mixed renewal scheme, and the precise assumptions are not directly comparable to ours, since basically the authors assume the existence of a CLT for the conditionally centered variables(n, i). Note also that this theorem is proved whenp=2 in [15] for deterministic schemes which satisfy an assumption slightly stronger than(D(2))plus(D(1)).
An interesting – and crucial – feature of our result, as well as in the results of [1], is that the properties of the observation scheme are not showing explicitly in the result itself, and in particular the knowledge of the process a(2)and even of the ratesrn is not necessary to apply it. This is a good thing because those are generally unknown, whereas it is also dangerous because one might be tempted to use the property that Ttn is (approximately)N(0,1) without checking that the assumptions on the sampling scheme are satisfied, here as well as in [1]. When they are satisfied, and although the ratesrndo not explicitly show up, these rates still govern the “true” speed of convergence.
Once more,rnis unknown, butNtn is of course known. Then as soon as we have also(D(0))(for example when (E(2))holds), thenNtn/rn−→P t
0a(0)sds, and so the actual rate of convergence for the estimators is also 1/
Ntn, as it should be.
3.3. Some remarks about optimality
In this part we consider only the question of estimating the integrated volatilityV (2)t. The problem of asymptotic efficiency for estimators ofV (2)1is unsolved yet, as far as we know, and even the definition of asymptotic efficiency is not quite clear, when the volatility is genuinely random, and even for regular sampling.
However, one can examine the asymptotic behavior of our estimators in the very special situation whereXt=σ Wt
andσ is a constant, soV (2)t =σ2t. In this case we have a parametric model, and a very simple one indeed, and the MLE for estimatingσ2tis asymptotically efficient. For regular sampling the MLE is [t /t /n
n]Vn(2)t, so obviously the realized quadratic variationVn(2)t is also asymptotically efficient.
When the sampling is non-random but not regular, the MLE for estimatingσ2is Vn= t
Ntn
Ntn
i=1
1
(n, i) niX 2.
This is again asymptotically efficient, and for eachnthe variance of
Ntn(Vn−σ2t )is 2σ4t2. Now, if further(D(2)) holds, the asymptotic variance of√
rn(Vn(2,0)t−σ2t )is 2σ4t
0a(2)sds, and if(D(0))also holds we haveNtn/rn→
t
0a(0)sds. Therefore, under both(D(0))and(D(2)), the two (normalized) asymptotic variancesΣt andΣtofVn(2)t
andVnsatisfy:
Σt=αtΣt, whereαt= 1 t2
t
0
a(2)sds t
0
a(0)sds
.
If we use (2.5) with q =0 and p=1 and q =2 and go the limit, we see that αt ≥1, as it should be be- cause Vn is asymptotically efficient. It may happen that αt =1, of course, but this is equivalent to saying that A(0)ntA(2)nt −(A(1)nt)2→0, and the equality A(0)ntA(2)nt =(A(1)nt)2 implies that the (n, i)’s fori≤Ntn are all equal (by Cauchy–Schwarz inequality): thereforethe realized volatility Vn(2)t is asymptotically efficient only when the sampling scheme is “asymptotically” a regular sampling.
At this stage, one might wonder why we do not useVninstead ofVn(2)t in general. This is because, if we assume for example(D(0)), the sequenceVnconverges in probability to the variable
t t
0σs2a(0)sds t
0a(0)sds , which is different fromt
0σs2dsunless eitherσsora(0)sdo not depend on time.
4. More about sampling schemes
The aim of this section is to prove Lemmas2.1,2.2and2.3, and it also contains a few complementary results, useful for the proofs of the main theorems. Below, and in all the paper,Kdenotes a constant varying from line to line and may depend on the characteristics of the process or of the sampling scheme (we writeKpif it depends on an additional parameterp).
Proof of Lemma2.1. Since(D(1))always holds under(C)and (1.5), the final claim follows from the estimate (2.5) and from classical results on the tightness and convergence of increasing processes.
As for (2.5), this follows from Hölder’s inequality as such:
A(p)nt −A(p)ns =rnp−1
Ntn
i=Nsn+1
(n, i)p
≤
rnq−1
Ntn
i=Nsn+1
(n, i)q
(q−p)/(q−q) rnq−1
Ntn
i=Nsn+1
(n, i)q
(p−q)/(q−q)
.
Before proving Lemma2.2, we introduce strengthened versions of Assumptions(D(q))and(E(q)), which we will also use later on:
Assumption(SD(q)). We have(D(q)),andt
0a(q)sdsis bounded for eacht,and the convergence in(2.4)takes place inL1.
Assumption(SE(q)). We have(E(q)),and for some constantL≥1 we haveG(q)n≤Land 1/G(1)n≤L(hence G(q)≤Land1/G(1)≤Las well).
Lemma 4.1. (a)Let1≤q1<· · ·< qr and assume(D(qj))for j =1, . . . , r.For allt >0,k≥1, one can find a scheme(Tk,t(n, i)) satisfying(SD(qj))for j=1, . . . , r (with the processesak,t(qj)),and subsetsΩk,t andΩk,n,t ofΩ,such that
P(Ωk,t)≥1−1/k, lim inf
n P(Ωk,n,t)≥1−1/k, ω∈Ωk,n,t ⇒ [0, t] ∩
T (n, i)(ω): i≥1
= [0, t] ∩
Tk,t(n, i)(ω): i≥1
(4.1)
and,forj=1, . . . , r,
s≤t, ω∈Ωk,t ⇒ a(qj)(ω)s=ak,t(qj)(ω)s. (4.2)
(b)Assume(E(q))for someq >1.For allt >0,k≥1,one can find a scheme(Tk,t(n, i))satisfying(SE(q))(with the processesGk,t(p)),and subsetsΩk,tandΩk,n,tofΩ,such that we have(4.1)and,forp∈ [0, q],
s≤t, ω∈Ωk,t ⇒ G(p)(ω)s=Gk,t(p)(ω)s. (4.3)
Proof. (a) We may of course assume thatrn≥1 for all n, and also that qr >1 because (1.5) is enough to im- ply (SD(1)). Fix t and k. We can find a >1 such that the set Ωk,t =
1≤j≤r{t+1
0 a(qj)sds≤a −1}satisfies P(Ωk,t)≥1−1/k, and thus by(D(qj))the setsΩk,n,t=
1≤j≤r{A(qj)nt+1≤a}satisfy lim infnP(Ωk,n,t)≥1−1/k.
Then we define the sampling scheme(Tk,t(n, i))in the following way: we setαn=(arn)1/qr/rn, which goes to 0;
whenαn≥1 we simply takek,t(n, i)=(n, i)(so the observation times are unchanged), and whenαn<1, that is for allnlarge enough, we put
k,t(n, i)=
(n, i), if(n, j )≤αnfor allj=1, . . . , i, 1/rn, otherwise.
We associate the processesAk,t(q)n by (2.2). It is obvious that the scheme(Tk,t(n, i))satisfies (C)and (1.5). On the setΩk,n,t we have(n, i)≤αnfor alli withT (n, i)≤t+1, hencek,t(n, i)=(n, i)for thosei’s, and we conclude the last part of (4.1). MoreoverAk,t(qj)ns =A(qj)ns as long as (n, i)≤αn for all i with T (n, i)≤s, and afterwardsAk,t(qj)ns increases inslike[srn]/rn, so obviously the new scheme satisfies(D(qj))with a function ak,t(qj)which coincides witha(qj)on[0, t]on the setΩk,t, giving (4.2). Finally, by constructionAk,t(qj)nt ≤a+t as soon asαn<1, so in fact this new scheme satisfies(SD(qj))forj=1, . . . , r.
(b) Again, we fixt >0 andk≥1. SinceG(q)is càdlàg(Ft0)-adapted, there isak>1 such thatRk=(t+1)∧ inf(s: G(q)s+1/G(1)s≥ak)is an(Ft0)-stopping time andΩk,t= {Rk> t}satisfiesP(Ωk,t)≥1−1/k. Next,
Rnk=Rk∧inf
s: G(q)ns −G(q)s + G(1)ns −G(1)s ≥ 1 2ak
is an (Ft)-stopping time and Ωk,n,t = {Rnk > t} satisfies the condition in (4.1) by(E(q)). Then the new scheme (Tk,t(n, i))is defined by
k,t(n, i)=
(n, i), ifT (n, i−1) < Rkn, 1/rn, otherwise.
(Note that it depends ont, becauseRknimplicitly depends ont.) It is obvious that the scheme(Tk,t(n, i))satisfies (1.5), and(C)because allRnk are(Ft)-stopping time, and the last part of (4.1) is also obvious.
Now,rnpE(k,t(n, i)p|FTk,t(n,i−1))is equal tornpE((n, i)p|FT (n,i−1))on the set{T (n, i−1) < Rkn}, and to 1 otherwise. Therefore we can take, as the(Ft)-optional processGk,t(p)n satisfying (2.7) relative to the new scheme, the process which equalsG(p)non[0, Rkn)and 1 on[Rkn,∞). Then clearlyGk,t(q)n≤ak+1 and 1/Gk,t(1)n≤2ak. MoreoverP(Rkn=Rk)→1, so the new scheme satisfies (2.9) for allp∈ [1, q], with the limitGk,t(p)equal toG(p) on the interval[0, Rk)and to 1 on[Rk,∞). Therefore the new scheme satisfies(SE(q)), and (b) is proved.
Proof of Lemma2.2. A standard localization argument, based upon the previous lemma, yields that it is enough to prove the result when(SE(q))holds instead of(E(q)). So, in view of (2.8), we assumeG(p)n≤Lfor allp∈ [0, q] and 1/G(1)n≤Lfor some constantL≥1. We can also assumern≥1.
(a) In a first step, we show that E
Ntn+1
≤Ktrn, (4.4)
where Kt is a constant depending on t. We put Ct = (t +1)∨ (2L2)1/(q−1). We also define a new sam- pling scheme by (n, i)=(n, i)∧Ct, with the associated variables Ntn and T (n, i) and G(p)nT (n,i−1) =