Irregular sampling and central limit theorems for power variations : the continuous case

(1)

www.imstat.org/aihp 2011, Vol. 47, No. 4, 1197–1218

DOI:10.1214/11-AIHP432

Irregular sampling and central limit theorems for power variations: The continuous case ¹

Takaki Hayashi

^a

, Jean Jacod

^b

and Nakahiro Yoshida

^c

aGraduate School of Business Administration, Keio University, 4-1-1 Hiyoshi, Yokohama 223-8526, Japan. E-mail:[email protected] bInstitut de mathématiques de Jussieu, Université P. et M. Curie (Paris-6), 4 place Jussieu, 75252 Paris, France (CNRS UMR 7586).

E-mail:[email protected]

cGraduate School of Mathematical Sciences, University of Tokyo, 3-8-1 Komaba, Tokyo 153, Japan. E-mail:[email protected] Received 6 March 2009; revised 2 May 2011; accepted 9 May 2011

Abstract. In the context of high frequency data, one often has to deal with observations occurring at irregularly spaced times, at transaction times for example in finance. Here we examine how the estimation of the squared or other powers of the volatility is affected by irregularly spaced data. The emphasis is on the kind of assumptions on the sampling scheme which allow to provide consistent estimators, together with an associated central limit theorem, and especially when the sampling scheme depends on the observed process itself.

Résumé. Dans le contexte de données à haute fréquences, il est fréquent de recueillir les informations le long d’une grille irrégi- lière, par exemple aux instants de transaction pour les données financières. Dans cet article, nous étudions comment l’estimation de l’intégrale du carré, ou d’autres puissances, de la volatilité est affectée par l’irrégularité des données. L’accent est mis sur le type d’hypothèses qu’il est nécessaire de faire sur la répartition des observations, en particulier lorsque celles-ci dépendent du processus observé lui-même, de façon à obtenir un théorème limite central pour nos estimateurs.

MSC:Primary 60G44; 62M09; secondary 60G42; 62G20

Keywords:Quadratic variation; Discrete observations; Power variations; High frequency data; Stable convergence

1. Introduction

The approximation of the quadratic variation by the “realized” quadratic variation plays a very important role in practice, especially for financial data, since such data are necessarily reported at discrete times. Other quantities of interest, based on discrete observations as well, have been considered in many recent papers: these quantities are sums of functions of the successive increments of the process, usually some powers or absolute powers of those increments.

They are used to estimate some characteristics of the jumps of the observed process, or the volatility of the continuous part, or for various testing problems about jumps for example.

However, if the behavior of the realized quadratic variation and other similar functionals is well known when the observations come in regularly, this is no longer the case when the observation times are irregularly spaced and, even worse, when they are random. Relatively few papers are so far available in that case: see [1,14] and [15] for deterministic observation times, and [4] for some special random times like hitting times, and [16] for a situation which

1Supported in part by JST Basic Research Programs PRESTO, Grants-in-Aid for Scientific Research of Japan Society for the Promotion of Science, No. 19340021 and No. 19530186, and by the Global COE Program “The research and training center for new developments in mathematics” of the Graduate School of Mathematical Sciences, University of Tokyo.

(2)

is relatively close to the present paper, and [6–9] and [2] when the process is multidimensional and the observation times exhibit some sort of relatively restricted randomness, or are random but independent of the observed process.

In the five last papers, the situation is quite complex because the various components are observed at different times.

All those papers are concerned with a continuous underlying process. One may also quote related works dealing with estimation of various parameters with random sampling, like [5] for diffusions and [3] for Markov processes.

Here, our aim is relatively modest: we consider only a continuous underlying process, which is 1-dimensional, quite a restrictive setting indeed which in particular lets aside the problem of non-synchronous observations for multivariate processes. The observation times are possibly random, and we try to find conditions on those times, allowing for limit theorems when the number of observations increases (the “high frequency” setting), which are as general as possible, and in particular accommodate some dependency between the sampling times and the process itself: this is the main novelty of this paper. We also consider the problem of the estimation of integrated powers of the volatility other than 2, for example the estimation of the quarticity, a quantity which turns out to be more and more frequently used in practice. In a sense, we are in the line of [1] and [16], and we try to weaken the assumptions on the observation scheme and the underlying observed processXas much as we can, together with providing an estimation method for other powers than the integrated volatility.

Of course, the assumptions about the observation times, as discussed below, are arbitrary to some extent. What we have in mind is to account for high frequency financial data, and especially those recording transaction times and prices, which are notoriously irregularly spaced. Note that in this paper we only consider a semi-realistic situation, where the process is supposed to be observed exactly at each observation time: we do not consider the problem of microstructure noise. For real applications to finance, this problem should be taken into account.

Let us be more specific. The process of interest is a 1-dimensional continuous semimartingaleX on a filtered probability space(Ω,F, (Ft)_t_≥₀,P), which is of Itô type, that is of the form

Xt=X0+ t

0

bsds+ t

0

σsdWs, (1.1)

whereW is a Wiener process andbandσ are progressively measurable processes. In this paper our aim is to give estimators for the following “integrated” quantities:

V (p)_t=m_p _t

0

|σ_s|^pds (1.2)

(heremp is thepth absolute moment of the standard normal lawN(0,1), and it appears here for later convenience), whenp≥2, together with “feasible” estimators for the variance or conditional variance of these estimators, so as to be able to construct confidence intervals for example. The restrictionp≥2 is not essential, similar results hold when 1< p <2, and also when 0< p≤1 under some additional assumptions, butp≥2 covers the most interesting situations and is simpler technically speaking.

At stage nthe process X is observed along a strictly increasing sequence of – possibly random – finite times T (n, i), i≥0, starting atT (n,0)=0, and we use the notation

(n, i)=T (n, i)−T (n, i−1), I (n, i)=

T (n, i−1), T (n, i) , N_tⁿ=inf

i: T (n, i) > t

−1, π_tⁿ=sup_i₌_1,...,Nⁿ

t+1(n, i)

(1.3) (π_tⁿ is the “mesh” up to timet, by convention inf(∅)= ∞and sup(∅)=0, andN_tⁿ is the number of observation times within(0, t]). Also, for any processY we write

ⁿ_iY=Y_{T (n,i)}−Y_{T (n,i}₋₁₎. (1.4)

Among (many) other technical requirements, we will always assume the following minimal ones:

n≥1 ⇒ T (n, i)→ ∞ P-a.s., asi→ ∞, t≥0 ⇒ π_tⁿ−→^P 0 asn→ ∞.

(1.5)

(3)

With the convention₀

i=1=0, we take as our estimator forV (p)_tof (1.2), at stagen, the variable Vⁿ(p)_t=

N_tⁿ

i=1

(n, i)¹⁻^p/2 ⁿ_iX ^p. (1.6)

The upper limit of the sum is such thatVⁿ(p)tinvolves the observations occurring up to timetonly. As is well known, under regular sampling (that is,T (n, i)=infor some time lagngoing to 0), thenVⁿ(p)^u.c.p.−→V (p)(this denotes

“convergence in probability, locally uniform in time”), as soon asσ is càdlàg andbsatisfies a suitable integrability condition and even under no condition at all whenp=2. Furthermore under some more regularity conditions, we have an associated CLT (Central Limit Theorem), and standardized versions are also available: see e.g. [12] and [11].

Our aim is to prove the same results for irregular sampling, and in particular to prove standardized versions of the CLT for the estimators. We already know, of course, thatVⁿ(2)converges toV (2)as soon as theT (n, i)’s are stopping times subject to (1.5), but for the other values ofpand for the CLT, we need appropriate additional assumptions on the sampling scheme. Note that for a regular scheme the factor(n, i)¹⁻^p/2=¹_n⁻^p/2goes out of the sum, and this is of course how the “standard” result is stated. The idea to place this factor inside the sum is due to [1].

In Section2we state the precise assumptions on the underlying processX and the sampling schemes. Section3 contains the basic results, and a discussion of their applicability. The proofs are gathered in Sections4,5and6.

2. Setting and assumptions

2.1. Assumptions onX

The assumptions onXwill vary according to the powerpin which we are interested. We always suppose thatXhas the form (1.1), and we use two different assumptions, with(B)stronger than(A)below:

Assumption(A). We have(1.1),and the processbis locally bounded,and the processσis càdlàg(=right continuous with left limits).

Assumption(B). We have(1.1),and the processσ is also a(possibly discontinuous)Itô semimartingale,which can be written as

σt=σ0+ t

0

bsds+ t

0

σsdWs+Mt+

s≤t

σs1_{|σ_s|>1}, where

•Mis a local martingale with|Mt| ≤1,orthogonal toW,and its predictable quadratic covariation process isM, Mt=_t

0a_sds.

•The predictable compensator of

s≤t1_{|σs|>1}is _t

0a_sds.

⎫⎪

⎪⎪

⎬

⎪⎪

⎪⎭

(2.1)

Moreover,the processesb,a and a are locally bounded,and the processesσ andb are left continuous with right limits.

In(B),Mmay have jumps, and also a non-vanishing continuous martingale part, which must then be a stochastic integral with respect to another Brownian motion independent ofW.

The above assumptions are exactly those under which the theorems given below hold, for regular sampling schemes, except for the case ofVⁿ(2)which requires a bit less.

2.2. The sampling scheme

The sampling scheme, that is the collection((T (n, i))_i_≥₀: n≥1), is subject to a number of conditions. There is first a structural assumption:

Assumption(C). There is a sub-filtration(Ft⁰)t≥0of(Ft)t≥0with the following properties:

(4)

• W andbandσ are adapted to(Ft⁰);

• any(F_t⁰)martingale is also an(Ft)-martingale;

• each variableT (n, i) is an (Ft)-stopping time which,conditionally on FT (n,i−1), is independent of theσ-field F⁰=

t >0Ft⁰.

This assumption is satisfied when theT (n, i)’s are non-random (deterministic schemes), and when theT (n, i)’s are independent of the processes(W, X, b, σ )(independent schemes), but it includes many other cases as well. However it excludes somea priori interesting situations: whenFt⁰=Ft, then(C)amounts to saying that those stopping times are “strongly predictable” in the sense thatT (n, i)isFT (n,i−1)-measurable, and this is quite restrictive; for example it excludes the case where theT (n, i)’s are the successive hitting times of a spatial grid byX, a case considered in [4]

under some restrictive assumptions onX.

Apart from(C)and (1.5), the sampling scheme should also be not too wildly scattered, and its meshesπ_tⁿshould converge to 0 at some deterministic rate. This rate is expressed through a sequencer_n→ ∞of positive – non random – numbers. The assumptions below all involve this sequencern, in an implicit way.

Before stating the assumptions, and for anyq≥0, we introduce the processes

A(q)ⁿ_t =r_n^q⁻¹

N_tⁿ

i=1

(n, i)^q. (2.2)

The normalizationr_n^q⁻¹is motivated by the regular schemesT (n, i)=i_n, for whichA(q)ⁿ_t =_n[t /_n](with the choicern=1/n) converges towardst for anyq≥0. Note also thatA(0)ⁿ_t =N_tⁿ/rn, and sincet−π_tⁿ≤A(1)ⁿ_t ≤t we deduce that

A(1)ⁿ_t −→^P t (2.3)

as soon as (1.5) holds. Then, withq≥0, we set:

Assumption(D(q)). We have(C)and(1.5),and there is a(necessarily nonnegative)(F_t⁰)-optional processa(q),such that for alltwe have

A(q)ⁿ_t −→^P _t

0

a(q)sds. (2.4)

Note that(D(q))for some sequencern implies(D(q))for any other sequencer_n such thatr_n/rn→α∈ [0,∞), and the new limit in (2.4) and whenq >1 is thenα^q⁻¹a(q), and in particular vanishes whenr_n/r_n→0: the forthcoming theorems which explicitly involver_nare true but “empty” when the limit in (2.4) vanishes identically. Regular sampling schemes with lag_nsatisfy(D(q))for allq≥0, withr_n=1/nanda(q)_t=1.

Some important connections between these assumptions are stated in the next lemma, to be proved in Section4.

Lemma 2.1. Let0≤q < p < q .Then for all0≤s < twe have A(p)ⁿ_t −A(p)ⁿ_s ≤

A(q)ⁿ_t −A(q)ⁿ_s(q−p)/(q−q) A

q n t −A

qn s

(p−q)/(q−q)

. (2.5)

Moreover,if (D(q))holds for someq=1and ifpis strictly between1andq,from any subsequence one may extract a further subsequence which satisfies(D(p)),and we have versions ofa(q)anda(p)satisfyinga(p)_t≤a(q)^(p_t ⁻^1)/(q⁻¹⁾.

Note that (2.4) for alltimplies thatr_n^q⁻¹N_tⁿ

i=1H_{T (n,i)}(n, i)^q^u.c.p.−→_t

0H_sa(q)_sdsas soon asHis càdlàg. In some cases we need this convergence to hold at a rate faster than 1/√r_n, and we express this in the following assumption.

It is indeed a very strong assumption, usually not satisfied by random schemes, and in particular not satisfied by the independent schemes for which the(n, i)’s are i.i.d. whenivaries, but not deterministic.

(5)

Assumption(D(q)). We have(D(q)),and further for allt≥0and all càdlàg(Ft⁰)-adapted processesHwe have

√r_n

r_n^q⁻¹

N_tⁿ

i=1

H_{T (n,i)}(n, i)^q− _t

0

H_sa(q)_sds

u.c.p.

−→0. (2.6)

For a deterministic scheme,(D(q))may or may not be satisfied, but there is no simple criterion to ensure that it holds. For a random scheme, it may be useful to describe conditions on the laws or on the conditional laws of the lags (n, i), which ensure(D(q)). For eachnand eachq≥0 we choose an(Ft)-optional(0,∞]-valued processG(q)ⁿ such that

G(q)ⁿ_{T (n,i}₋₁₎=r_n^qE

(n, i)^q|FT (n,i−1)

. (2.7)

This specifiesG(q)ⁿ_t only at the timest=T (n, i), so there are many such processesG(q)ⁿ. A simple choice consists in takingG(q)ⁿ_t to be equal to the right side of (2.7) whenT (n, i−1)≤t < T (n, i)(a piecewise constant process). But other choices are possible, and perhaps more appropriate in view of the forthcoming assumption. We can obviously takeG(0)ⁿ_t =1, and by Hölder’s inequality, we can and will choose processesG(q)ⁿwhich satisfy

0≤p≤q ⇒ G(p)ⁿ≤

G(q)ⁿp/q

. (2.8)

Then we set, withq≥1:

Assumption(E(q)). We have(C)and(1.5),and for eachp∈ [0, q]there is a càdlàg processG(p),adapted to(Ft⁰), and furtherG(1)andG(1)₋do not vanish,such that for an appropriate choice ofG(p)ⁿwe have

G(p)ⁿ^u.c.p.−→G(p). (2.9)

Lemma 2.2. Assume(E(q))for someq >1.Then(D(p))holds for allp∈ [0, q),with a(p)_t=G(p)t

G(1)t

, (2.10)

and in particular

1

r_nN_tⁿ−→^P _t

0

1 G(1)s

ds. (2.11)

For some results below we will also need a rate of convergence in (2.9), which is expressed in the following:

Assumption(E(q)). We have(E(q)),and for eachp∈ [0, q]we have√r_n(G(p)ⁿ−G(p))^u.c.p.−→0.

Finally, we introduce a kind of sampling schemes which are somehow restrictive but should accommodate many practical applications, and are calledmixed renewal schemes. These schemes are constructed as follows: we consider a filtration(Ft⁰)as in(C), and a double sequence(ε(n, i): i, n≥1)of i.i.d. positive variables on(Ω,F,P), independent ofF⁰, with moments

m_q=E

ε(n, i)^q

. (2.12)

We may havem_q= ∞forq >1, but we assume thatm₁<∞. We consider a sequencevⁿ of positive(Ft)-adapted processes, and we defineT (n, i)by induction onias follows:

T (n,0)=0, T (n, i+1)=T (n, i)+ 1 rn

vⁿ_{T (n,i)}ε(n, i+1). (2.13)

(6)

Then we take for(Ft)the smallest filtration containing(Ft⁰)and such that eachT (n, i)is a stopping time. In this situation, a natural choice for the processesG(q)ⁿof (2.7) isG(q)ⁿ_t =m_q(vⁿ_t)^q. Again, the next lemma is shown in Section4.

Lemma 2.3. Let(T (n, i))be a mixed renewal scheme.

(a) We have(C),and alsoT (n, i)→ ∞a.s.asi→ ∞if1/vⁿis a locally bounded process.

(b) Suppose further thatvⁿ^u.c.p.−→vwherev is an(F_t⁰)-adapted càdlàg processvsuch that bothv andv₋do not vanish,and thatm_q<∞for someq≥1 (this is always true forq=1).Then(E(q))holds withG(p)_t=m_p(v_t)^pfor allp∈ [0, q],and(D(p))holds for allp∈(0, q]witha(p)_t=^m_m^p

1

(v_t)^p⁻¹. (c) Under the assumptions of (b),and if√r_n(vⁿ−v)^u.c.p.−→0,we have(E(q)).

Remark 2.4. In many practical situations,see e.g. [10]for concrete examples,one is led to consider simultaneously the scheme(T (n, i)),and another scheme(T (n, i))which is a “sub-scheme” of the first one;typically one considers only the even observations,that is we takeT (n, i)=T (n,2i).Apart from(1.5),none of the previous assumptions is preserved by such a transformation,and in particular(C)is not satisfied in general by the new scheme,unless the original scheme is deterministic.

In fact another assumption,which replaces (C),has been introduced in [8],namely that T (n, i)is a stopping time for the filtration (F(t−un)⁺)_t_≥₀,where u_n=1/rn^ξ and ξ ∈(0,1). Such a property is obviously shared(upon a modification ofξ)by the sub-scheme T (n, i)above.In this setting,the particular casep=2 and q=0of the forthcoming results has been proved(under stronger assumptions onX,though).We cannot fit this other assumption within our framework in general,but when(D(q))holds for someq >1 then the same “localization procedure” as in Lemma4.1below implies that under this assumption we can modify the scheme so that(C)holds,and without modifying the asymptotics below,providedξ <1−1/q.

3. The results

3.1. Limiting results

As we will see, the behavior ofVⁿ(p)is not enough for our purposes, and we need to establish the convergence in probability for more general processes. Forp >0 andq≥0 we set

Vⁿ(p, q)t=

N_tⁿ

i=1

(n, i)^q⁺¹⁻^p/2 ⁿ_iX ^p, (3.1)

so in particularVⁿ(p)=Vⁿ(p,0). In the regular sampling case(n, i)=_nthese processes all convey the same information, sinceVⁿ(p, q)=^q_nVⁿ(p), but this is no longer the case in the irregular sampling case.

Our first result is a law of large numbers, which goes as follows:

Theorem 3.1. Letp≥1andq≥0.Assume(A)and(D(q+1)).Then r_n^qVⁿ(p, q)_t^u.c.p.−→V (p, q)_t:=m_p

_t

0

|σ_s|^pa(q+1)sds. (3.2)

In particular,under(A)and(C)and(1.5),we have Vⁿ(p)t

u.c.p.

−→V (p)t=mp

_t

0 |σs|^pds. (3.3)

For applications, we need an associated central limit theorem. For its statement, we need to recall the notion of F⁰-stable convergence in law for a sequence of random variables (or processes)Yndefined on(Ω,F,P), see [13] for

(7)

more details. We say thatY_n convergeF⁰-stably in lawtoY, whereYis a variable defined on an extension(Ω, F,P) of(Ω,F,P), if we have

E

Zh(Y_n)

→E Zh(Y )

: ZboundedF⁰-measurable,hcontinuous bounded. (3.4) There are in fact two versions for the CLT. The first is associated with (3.3), and is thus the most useful in practice, and it also holds for (3.2) whenq >0 under a strong additional assumption:

Theorem 3.2. Letp≥2,and assume(A)whenp=2and(B)whenp >2.Letq≥0and assume one of the following two sets of hypotheses:

(i) q=0and(D(2));

(ii) q >0and(D(q+1))and(D(2q+2))and(D(q+1)).

Then the processes√r_n(r_n^qVⁿ(p, q)−V (p, q))convergeF⁰-stably in law to V (p, q)_t=

m_2p−m²_p _t

0

|σ_s|^p

a(2q+2)sdW_s, (3.5)

where W is a standard Brownian motion,defined on an extension of (Ω,F, (Ft)_t_≥₀,P) and independent of F.

Moreover,conditionally onF,the processV (p, q)is a continuous centered Gaussian martingale with(conditional) variance at timet:

m2p−m²_p

t

0 |σs|^2pa(2q+2)sds. (3.6)

As mentioned before,(D(q+1))is a very strong assumption, which for example is never satisfied by mixed renewal schemes unless the variablesε(n, i)are constant. So whenq >0 the previous CLT hardly applies.

In practice we usually want to estimate V (p)t, so the above seems enough because we can use the set (i) of assumptions for this. However, as we will see in Section 3.2below, we need also an estimate for the conditional variance (3.6) whenq=0, that is for^m^2p⁻^m

2p

m2p V (2p,1): for this, we may use(1−_m^m_2p²^p)r_nVⁿ(2p,1)tby virtue of (3.2).

This is why we have introduced the processes Vⁿ(p, q)for q >0. But now, for asserting the quality of the latter estimator, we need a CLT for the processesVⁿ(p, q)under reasonable assumptions, weaker than (ii) above. This is achieved in the following result:

Theorem 3.3. Letp≥2andq≥0.Assume(B)and that the sampling scheme satisfies(E(2q+2))and(D(2q+2)), and that the(Ft⁰)-adapted processesG(p)forp∈ [1, q+1]are Itô semimartingales with the same properties as the processσ in Assumption(B).Then the processes√r_n(r_n^qVⁿ(p, q)−V (p, q))convergeF⁰-stably in law to

V (p, q)_t= _t

0

|σ_s|^p

a(2q+2)sw(p, q)_sdW_s, (3.7)

whereW is as in Theorem3.2and

w(p, q)_t=m_2p−m²_p2G(q+2)tG(q+1)tG(1)t−(G(q+1)t)²G(2)t

G(2q+2)t(G(1)t)² . (3.8)

One may check that 2G(q+2)G(q+1)G(1)−(G(q+1))²G(2)≤G(2q+2)(G(1))²always, becauseq→G(q)_t has the same structure as the moments of a random variable. Thusw(p, q)_t≥m_2p−m²_p, and this inequality is strict in general, unlessq=0 of course. Whenq=0 this theorem is a special case of Theorem3.2, with stronger assumptions on the sampling scheme.

Remark 3.4. By Lemma2.2,(E(2q+2))implies(D(p))for allp <2q+2,but not necessarily forp=2q+2,and this is why we add the assumption(D(2q+2)).

(8)

Remark 3.5. When q >0 we apparently have two different limits for the same sequence of processes,in the two previous theorems.However if both (E(2q +2+ε)) and (D(q +1)) hold for someq >0, one may check that G(p)ⁿ_t −(G(1)ⁿ_t)^p^u.c.p.−→0 for allp≤2q+2,and this impliesG(p)t =(G(1)t)^p,which in turn yieldsw(p, q)t = m_2p−m²_p.When the scheme is a mixed renewal scheme,this also implies that the variablesε(n, i)are in fact all equal to a constant,and the scheme is strongly predictable.

Remark 3.6. We have consideredp >1and p≥2respectively in the theorems above:this considerably simplifies the proofs,but the same results are also available when p∈(0,1]for Theorem3.1 and p∈(1,2)for Theorems 3.2and3.3.The last two theorems also hold forp∈(0,1]when the processesσ andσ₋do not vanish.Note also that whenp=2,Theorem3.10holds under(A)instead of(B),but here again the proof is complicated.

Remark 3.7. Both CLTs above have a multidimensional extension,in the sense that we can consider a finite family (p_j, q_j)₁_≤_j_≤_dof indices withp_j≥2andq_j≥0.We then have theF⁰-stable convergence in law of thed-dimensional processes with components√r_n(Vⁿ(p_j, q_j)−V (p_j, q_j)).In the setting of Theorem3.2for example,the limit is, conditionally onF,a continuous centered d-dimensional Gaussian martingale on an extension of the space,with covariance at timet between thejth andkth components given by

(m_p_j₊_p_k−m_p_jm_p_k) _t

0 |σ_s|^p^j⁺^p^ka(q_j+q_k+2)sds, to be compared with(3.6).

Remark 3.8. There is another easy extension to more general test functions.Namely,instead of(3.1)we can consider the process

Vⁿ(f, q)t=

N_tⁿ

i=1

(n, i)^q⁺¹f ⁿ_iX/

(n, i)

, (3.9)

wheref is a function onRwith at most polynomial growth and some smoothness properties:it should beC¹for the law of large numbers,andC²for the CLT.Letρ_a denote the normal lawN(0, a²)andρ_a(f )the integral off with respect toρ_a.The law of large numbers reads as Theorem3.1,with the limit_t

0ρ_σ_s(f )a(q+1)sds,and the CLTs are as Theorems3.2or3.3:for example,for the former theorem,for the limiting conditional variance one should replace(3.6)by

t 0

ρ_σ_s f²

−ρ_σ_s(f )²

a(2q+2)sds.

Remark 3.9. We can even extend all the results to ad-dimensional underlying processXof the form(1.1),with nowW ad-dimensional Brownian motion.Taking test functionsf onR^d,we can consider the processesVⁿ(f, q)of(3.9), and derive the same results as in the previous remark(now,ifais ad×d matrix,ρ_adenotes thed-dimensional law N(0, aa)).The proofs are exactly the same,with slightly more cumbersome notation.

However,this trivial extension requires the observation timesT (n, i)to be the same for all components ofX:this is a fundamental restriction,since when the observations are irregularly spaced they also are typically non-synchronous:

this is why we have not treated explicitly this case here.

3.2. Estimation of the integrated volatility and other powers

In practice, one wants to estimates the variableV (p)_t at some timet >0, mostly whenp=2, but the casep=4 is also of interest. An estimator isVⁿ(p)_t, and Theorem3.2gives a central limit theorem for this estimator: it says that the normalized estimation error√r_n(Vⁿ(p)_t−V (p)_t)is centered normal, conditionally onF⁰. Even for regular sampling schemes, for whicha(2)s=1, this result is not directly usable in practice for deriving confidence intervals

(9)

for example, because _t

0|σ_s|^2pds is a priori unknown. But here things are worse because the processa(2)is also unknown, and in fact we also ignorern. However, we can “standardize” by considering the variable

T_tⁿ=

m_2p

(m2p−m²_p)Vⁿ(2p,1)t

Vⁿ(p)_t−V (p)_t

. (3.10)

Then the following holds:

Theorem 3.10. Letp≥2.Assume(D(2)).Assume also(A)whenp=2and(B)otherwise.Then for anyt >0such that_t

0|σ_s|^2pa(2)sds >0a.s.,the sequenceT_tⁿconverges in law(and evenF⁰-stably in law)toN(0,1).

All ingredients in the definition ofT_tⁿ are known to the statistician, except of course the quantity V (p)_t to be estimated. Therefore we can derive (asymptotic) confidence intervals forV (p)_t in a straightforward way.

If one wants to estimate V (p, q)_t one can also use Theorems3.2or 3.3, depending on the assumptions on the scheme. However, so far we have no estimators for the conditional variance _t

0|σs|^2pa(2q+2)sw(p, q)sds in the second theorem, hence we have no feasible statistics for deriving confidence intervals for V (p, q)_t when q >0, except under the strong assumption(D(q+1)).

This theorem should be compared with Theorem 4.1 of [1], which states exactly the same result, when the processX has no drift and the volatility σ_t is possibly random but independent ofW: in this paper the authors consider only deterministic sampling schemes but, if their assumptions are not formally comparable to ours, they are in some sense significantly weaker: namely, they suppose that min_i_:_i_≤_N_tⁿ(n, i)^2/3/π_tⁿ→ ∞as n→ ∞. In [16], Theorem 3.1, is also the same result as above, forp=2 and again when there is no drift: in that paper the sampling scheme is somewhat similar to a mixed renewal scheme, and the precise assumptions are not directly comparable to ours, since basically the authors assume the existence of a CLT for the conditionally centered variables(n, i). Note also that this theorem is proved whenp=2 in [15] for deterministic schemes which satisfy an assumption slightly stronger than(D(2))plus(D(1)).

An interesting – and crucial – feature of our result, as well as in the results of [1], is that the properties of the observation scheme are not showing explicitly in the result itself, and in particular the knowledge of the process a(2)and even of the ratesr_n is not necessary to apply it. This is a good thing because those are generally unknown, whereas it is also dangerous because one might be tempted to use the property that T_tⁿ is (approximately)N(0,1) without checking that the assumptions on the sampling scheme are satisfied, here as well as in [1]. When they are satisfied, and although the ratesr_ndo not explicitly show up, these rates still govern the “true” speed of convergence.

Once more,r_nis unknown, butN_tⁿ is of course known. Then as soon as we have also(D(0))(for example when (E(2))holds), thenN_tⁿ/r_n−→^P _t

0a(0)sds, and so the actual rate of convergence for the estimators is also 1/

N_tⁿ, as it should be.

3.3. Some remarks about optimality

In this part we consider only the question of estimating the integrated volatilityV (2)t. The problem of asymptotic efficiency for estimators ofV (2)1is unsolved yet, as far as we know, and even the definition of asymptotic efficiency is not quite clear, when the volatility is genuinely random, and even for regular sampling.

However, one can examine the asymptotic behavior of our estimators in the very special situation whereXt=σ Wt

andσ is a constant, soV (2)t =σ²t. In this case we have a parametric model, and a very simple one indeed, and the MLE for estimatingσ²tis asymptotically efficient. For regular sampling the MLE is _[^{t /}_{t /}ⁿ

n]Vⁿ(2)t, so obviously the realized quadratic variationVⁿ(2)t is also asymptotically efficient.

When the sampling is non-random but not regular, the MLE for estimatingσ²is V_n= t

N_tⁿ

i=1

1

(n, i) ⁿ_iX ².

This is again asymptotically efficient, and for eachnthe variance of

N_tⁿ(V_n−σ²t )is 2σ⁴t². Now, if further(D(2)) holds, the asymptotic variance of√

rn(Vⁿ(2,0)t−σ²t )is 2σ⁴t

0a(2)sds, and if(D(0))also holds we haveN_tⁿ/rn→

(10)

_t

0a(0)sds. Therefore, under both(D(0))and(D(2)), the two (normalized) asymptotic variancesΣ_t andΣ_tofVⁿ(2)t

andV_nsatisfy:

Σ_t=α_tΣ_t, whereα_t= 1 t²

_t

0

a(2)sds _t

0

a(0)sds

.

If we use (2.5) with q =0 and p=1 and q =2 and go the limit, we see that αt ≥1, as it should be because V_n is asymptotically efficient. It may happen that α_t =1, of course, but this is equivalent to saying that A(0)ⁿ_tA(2)ⁿ_t −(A(1)ⁿ_t)²→0, and the equality A(0)ⁿ_tA(2)ⁿ_t =(A(1)ⁿ_t)² implies that the (n, i)’s fori≤N_tⁿ are all equal (by Cauchy–Schwarz inequality): thereforethe realized volatility Vⁿ(2)t is asymptotically efficient only when the sampling scheme is “asymptotically” a regular sampling.

At this stage, one might wonder why we do not useV_ninstead ofVⁿ(2)t in general. This is because, if we assume for example(D(0)), the sequenceV_nconverges in probability to the variable

t _t

0σ_s²a(0)sds _t

0a(0)sds , which is different fromt

0σ_s²dsunless eitherσ_sora(0)sdo not depend on time.

4. More about sampling schemes

The aim of this section is to prove Lemmas2.1,2.2and2.3, and it also contains a few complementary results, useful for the proofs of the main theorems. Below, and in all the paper,Kdenotes a constant varying from line to line and may depend on the characteristics of the process or of the sampling scheme (we writeK_pif it depends on an additional parameterp).

Proof of Lemma2.1. Since(D(1))always holds under(C)and (1.5), the final claim follows from the estimate (2.5) and from classical results on the tightness and convergence of increasing processes.

As for (2.5), this follows from Hölder’s inequality as such:

A(p)ⁿ_t −A(p)ⁿ_s =r_n^p⁻¹

N_tⁿ

i=N_sⁿ+1

(n, i)^p

≤

r_n^q⁻¹

N_tⁿ

i=N_sⁿ+1

(n, i)^q

(q−p)/(q−q) r_n^q⁻¹

N_tⁿ

i=N_sⁿ+1

(n, i)^q

(p−q)/(q−q)

.

Before proving Lemma2.2, we introduce strengthened versions of Assumptions(D(q))and(E(q)), which we will also use later on:

Assumption(SD(q)). We have(D(q)),and_t

0a(q)_sdsis bounded for eacht,and the convergence in(2.4)takes place inL¹.

Assumption(SE(q)). We have(E(q)),and for some constantL≥1 we haveG(q)ⁿ≤Land 1/G(1)ⁿ≤L(hence G(q)≤Land1/G(1)≤Las well).

Lemma 4.1. (a)Let1≤q1<· · ·< qr and assume(D(qj))for j =1, . . . , r.For allt >0,k≥1, one can find a scheme(T_k,t(n, i)) satisfying(SD(qj))for j=1, . . . , r (with the processesa_k,t(q_j)),and subsetsΩ_k,t andΩ_k,n,t ofΩ,such that

P(Ω_k,t)≥1−1/k, lim inf

n P(Ω_k,n,t)≥1−1/k, ω∈Ω_k,n,t ⇒ [0, t] ∩

T (n, i)(ω): i≥1

= [0, t] ∩

T_k,t(n, i)(ω): i≥1

(4.1)

(11)

and,forj=1, . . . , r,

s≤t, ω∈Ω_k,t ⇒ a(q_j)(ω)_s=a_k,t(q_j)(ω)_s. (4.2)

(b)Assume(E(q))for someq >1.For allt >0,k≥1,one can find a scheme(T_k,t(n, i))satisfying(SE(q))(with the processesG_k,t(p)),and subsetsΩ_k,tandΩ_k,n,tofΩ,such that we have(4.1)and,forp∈ [0, q],

s≤t, ω∈Ω_k,t ⇒ G(p)(ω)_s=G_k,t(p)(ω)_s. (4.3)

Proof. (a) We may of course assume thatrn≥1 for all n, and also that qr >1 because (1.5) is enough to im- ply (SD(1)). Fix t and k. We can find a >1 such that the set Ωk,t =

1≤j≤r{_t₊1

0 a(qj)sds≤a −1}satisfies P(Ω_k,t)≥1−1/k, and thus by(D(qj))the setsΩ_k,n,t=

1≤j≤r{A(q_j)ⁿ_t₊₁≤a}satisfy lim infnP(Ω_k,n,t)≥1−1/k.

Then we define the sampling scheme(Tk,t(n, i))in the following way: we setαn=(arn)^1/q^r/rn, which goes to 0;

whenαn≥1 we simply takek,t(n, i)=(n, i)(so the observation times are unchanged), and whenαn<1, that is for allnlarge enough, we put

_k,t(n, i)=

(n, i), if(n, j )≤α_nfor allj=1, . . . , i, 1/rn, otherwise.

We associate the processesAk,t(q)ⁿ by (2.2). It is obvious that the scheme(Tk,t(n, i))satisfies (C)and (1.5). On the setΩ_k,n,t we have(n, i)≤α_nfor alli withT (n, i)≤t+1, hence_k,t(n, i)=(n, i)for thosei’s, and we conclude the last part of (4.1). MoreoverA_k,t(q_j)ⁿ_s =A(q_j)ⁿ_s as long as (n, i)≤α_n for all i with T (n, i)≤s, and afterwardsA_k,t(q_j)ⁿ_s increases inslike[sr_n]/r_n, so obviously the new scheme satisfies(D(qj))with a function a_k,t(q_j)which coincides witha(q_j)on[0, t]on the setΩ_k,t, giving (4.2). Finally, by constructionA_k,t(q_j)ⁿ_t ≤a+t as soon asα_n<1, so in fact this new scheme satisfies(SD(qj))forj=1, . . . , r.

(b) Again, we fixt >0 andk≥1. SinceG(q)is càdlàg(Ft⁰)-adapted, there isak>1 such thatRk=(t+1)∧ inf(s: G(q)_s+1/G(1)s≥a_k)is an(Ft⁰)-stopping time andΩ_k,t= {R_k> t}satisfiesP(Ω_k,t)≥1−1/k. Next,

Rⁿ_k=R_k∧inf

s: G(q)ⁿ_s −G(q)_s + G(1)ⁿ_s −G(1)s ≥ 1 2ak

is an (Ft)-stopping time and Ω_k,n,t = {Rⁿ_k > t} satisfies the condition in (4.1) by(E(q)). Then the new scheme (Tk,t(n, i))is defined by

_k,t(n, i)=

(n, i), ifT (n, i−1) < R_kⁿ, 1/rn, otherwise.

(Note that it depends ont, becauseR_kⁿimplicitly depends ont.) It is obvious that the scheme(T_k,t(n, i))satisfies (1.5), and(C)because allRⁿ_k are(Ft)-stopping time, and the last part of (4.1) is also obvious.

Now,r_n^pE(_k,t(n, i)^p|FTk,t(n,i−1))is equal tor_n^pE((n, i)^p|FT (n,i−1))on the set{T (n, i−1) < R_kⁿ}, and to 1 otherwise. Therefore we can take, as the(Ft)-optional processG_k,t(p)ⁿ satisfying (2.7) relative to the new scheme, the process which equalsG(p)ⁿon[0, R_kⁿ)and 1 on[R_kⁿ,∞). Then clearlyG_k,t(q)ⁿ≤a_k+1 and 1/Gk,t(1)ⁿ≤2ak. MoreoverP(R_kⁿ=Rk)→1, so the new scheme satisfies (2.9) for allp∈ [1, q], with the limitGk,t(p)equal toG(p) on the interval[0, Rk)and to 1 on[R_k,∞). Therefore the new scheme satisfies(SE(q)), and (b) is proved.

Proof of Lemma2.2. A standard localization argument, based upon the previous lemma, yields that it is enough to prove the result when(SE(q))holds instead of(E(q)). So, in view of (2.8), we assumeG(p)ⁿ≤Lfor allp∈ [0, q] and 1/G(1)ⁿ≤Lfor some constantL≥1. We can also assumern≥1.

(a) In a first step, we show that E

N_tⁿ+1

≤K_tr_n, (4.4)

where Kt is a constant depending on t. We put Ct = (t +1)∨ (2L²)^1/(q⁻¹⁾. We also define a new sampling scheme by (n, i)=(n, i)∧Ct, with the associated variables N_tⁿ and T (n, i) and G(p)ⁿ_{T (n,i}₋₁₎ =

Irregular sampling and central limit theorems for power variations : the continuous case