HAL Id: hal-02676701
https://hal.inrae.fr/hal-02676701
Submitted on 31 May 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Models and Stochastic Volatility Models
Valentine Genon-Catalot, Thierry Jeantheau, Catherine Laredo
To cite this version:
Valentine Genon-Catalot, Thierry Jeantheau, Catherine Laredo. Conditional Likelihood Estimators
for Hidden Markov Models and Stochastic Volatility Models. Scandinavian Journal of Statistics,
Wiley, 2003, 30, pp.297-316. �hal-02676701�
Conditional Likelihood Estimators for Hidden Markov Models
and Stochastic Volatility Models
VALENTINE GENON-CATALOT Universite´ de Marne la Valle´e
THIERRY JEANTHEAU Universite´ de Marne-la-Valle´e
CATHERINE LAREDO INRA-Jouy-en-Josas
ABSTRACT. This paper develops a new contrast process for parametric inference of general hidden Markov models, when the hidden chain has a non-compact state space. This contrast is based on the conditional likelihood approach, often used for ARCH-type models. We prove the strong consistency of the conditional likelihood estimators under appropriate conditions.
The method is applied to the Kalman filter (for which this contrast and the exact likeli- hood lead to asymptotically equivalent estimators) and to the discretely observed stochastic volatility models.
Key words:conditional likelihood, diffusion processes, discrete time observations, hidden Markov models, parametric inference, stochastic volatility
1. Introduction
Parametric inference for hidden Markov models (HMMs) has been widely investigated, es- pecially in the last decade. These models are discrete-time stochastic processes including classical time series models and many other non-linear non-Gaussian models. The observed process ðZ
nÞ ismodelled via an unobserved Markov chain ðU
nÞ such that, conditionally on ðU
nÞ, the ðZ
nÞsare independent and the distribution of Z
ndependsonly on U
n. When studying the statistical properties of HMMs, a difficulty arises since the exact likelihood cannot be explicitly calculated. Asa consequence, a lot of papershave been concerned with approxi- mations by means of numerical and simulation techniques (see, e.g. Gourie´roux et al., 1993;
D
1 urbin & Koopman, 1998; Pitt & Shephard, 1999; Kim et al., 1998; Del Moral & Miclo, 2000;
Del Moral et al., 2001).
The theoretical study of the exact maximum likelihood estimator (m.l.e.) is a difficult problem and has only been investigated in the following cases. Leroux (1992) has proved the consistency when the unobserved Markov chain has a finite state space. Asymptotic nor- mality isproved in Bickel et al. (1998). The extension of these properties to a compact state space for the hidden chain can be found in Jensen & Petersen (1999) and Douc & Matias (2001).
In previouspapers(see Genon-Catalot et al., 1998, 1999, 2000a), we have investigated some
statistical properties of discretely observed stochastic volatility models (SV). In particular,
when the sampling interval is fixed, the SV models are HMMs, for which the hidden chain has
a non-compact state space. Using ergodicity and mixing properties, we have built empirical
moment estimators of unknown parameters. Other types of empirical estimators are given in
Gallant
2 et al. (1997) using the efficient moment method or by Sørensen (2000) considering the class of prediction-based estimating functions.
In this paper, we propose a new contrast for general HMMs, which is theoretical in nature but seems close to the exact likelihood. The contrast is based on the conditional likelihood method, which is the estimating method used in the field of ARCH-type models (see Jeantheau, 1998). The method isapplied to examples. Particular attention ispaid throughout this paper to the well-known Kalman filter which lightens our approach. It is a special case of HMM with non-compact state space for the hidden chain. All computations are explicit and detailed herein. The minimum contrast estimator based on the conditional likelihood is as- ymptotically equivalent to the exact m.l.e. In the case of the SV models, when the unobserved volatility is a positive ergodic diffusion, the method is also applicable. It requires that the state space of the hidden diffusion is open, bounded and bounded away from zero. Contrary to moment methods, only the finiteness of the first moment of the stationary distribution is needed.
In Section 2, we recall definitions, ergodic properties and expressions for the likelihood (Propositions 1 and 2) for HMMs. We define and study in Section 3 the conditional like- lihood (Theorem 1 and Definition 2) and build the associated contrasts. Then, we state a general theorem of consistency to study the related minimum contrast estimators (Theorem 2). We apply this method to examples, especially to the Kalman filter and to GARCH(1,1) model. Then, we specialize these results to SV models in Section 4. We consider the con- ditional likelihood (Proposition 4) and study its properties (Proposition 5). Finally, we consider the case of mean reverting hidden diffusion. The proof of Theorem 2 is given in the appendix.
2. Hidden Markov models
2.1. Definition and ergodic properties
The formal definition of an HMM can be taken from Leroux (1992) or Bickel & Ritov (1996).
Definition 1. A stochastic process ðZ
n; nP1Þ, with state space ðZ; BðZÞÞ, isa hidden Markov model if:
(i) (Hidden chain) We are given (but do not observe) a time-homogeneous Markov chain U
1; U
2; . . . ; U
n; . . . with state space ðU; BðUÞÞ.
(ii) (Conditional independence) For all n, given ðU
1; U
2; . . . ;U
nÞ, the Z
i;i ¼ 1; . . . n are con- ditionally independent, and the conditional distribution of Z
ionly dependson U
i. (iii) (Stationarity) The conditional distribution of Z
igiven U
i¼ u doesnot depend on i.
Above, Z and U are general Polish spaces equipped with their Borel sigma-fields. There- fore, thisdefinition extendsthe one given in Leroux (1992) or Bickel & Ritov (1996) since we do not require a finite state space for the hidden chain.
We give now some examples of HMMs. They are all included in the following general framework. We set
Z
n¼ GðU
n; e
nÞ; ð1Þ
where ð Þ e
n n2Nis a sequence of i.i.d. random variables, ð U
nÞ
n2Nisa Markov chain with state
space U, and these two sequences are independent.
Example 1 (Kalman filter). The above class includes the well-known discrete Kalman filter
Z
n¼ U
nþ e
n; ð2Þ
where ðU
nÞ is a stationary real AR(1) Gaussian process and ðe
nÞ are i.i.d. Nð0; c
2Þ.
Example 2 (Noisy observations of a discretely observed Markov process). In (1), take U
n¼ V
nD, where ðV
tÞ
t2Rþis any Markov process independent of the sequence ð Þ e
n n2N.
Example 3 (Stochastic volatility models). These models are given in continuous time by a two-dimensional process ðY
t; V
tÞ with
dY
t¼ r
tdB
t;
and V
t¼ r
2t(the so-called volatility) is an unobserved Markov process independent of the brownian motion ðB
tÞ. Then, a discrete observation with sampling interval D istaken. We set
Z
n¼ 1 ffiffiffiffi D p
Z
nD ðn1ÞDr
sdB
s¼ 1 ffiffiffiffi D
p ðY
nDY
ðn1ÞDÞ: ð3Þ Conditionally on ðV
s; s 0Þ, the random variables ðZ
nÞ are independent and Z
nhas distribution Nð0; V V
nÞ with
V V
n¼ 1
D Z
nDðn1ÞD
V
sds: ð4Þ
Noting that U
n¼ ð V V
n; V
nDÞ isMarkov, we set, in accordance with (1), Z
n¼ GðU
n; e
nÞ ¼ V V
n1=2e
n:
Such models have been first proposed by Hull & White (1987), with ðV
tÞ a diffusion process.
When ðV
tÞ isan ergodic diffusion, it isproved in Genon-Catalot et al. (2000a)
3 that ðZ
nÞ isan
HMM. In Barndorff-Nielsen & Shephard (2001), the model is analogous, with ðV
tÞ an Ornstein Uhlenbeck Levy process.
An HMM hasthe following properties.
Proposition 1
(1) The process ððU
n; Z
nÞ; nP1Þ is a time-homogeneous Markov chain.
(2) If the hidden chain ðU
n; n 1Þ is strictly stationary, so is ððU
n; Z
nÞ; nP1Þ.
(3) If, moreover, the hidden chain ðU
n; nP1Þ is ergodic, so is ððU
n; Z
nÞ; nP1Þ.
(4) If, moreover, ðU
n; nP1Þ is a-mixing, then ððU
n; Z
nÞ; nP1Þ is also a-mixing, and a
ZðnÞOa
ðU;ZÞðnÞOa
UðnÞ:
The first two points are straightforward, the third is proved in Leroux (1992), and the mixing property isproved in Genon-Catalot et al. (2000a, b) and Sørensen (1999). This mixing property holdsfor the previousexamples.
Example 1 (continued) (Kalman filter). We set
U
n¼ aU
n1þ g
n; ð5Þ
where ðg
nÞ are i.i.d. Nð0;b
2Þ. When jaj < 1, and U
nhaslaw Nð0; s
2¼ b
2=ð1 a
2ÞÞ, U
nis a-mixing.
Examples 2 and 3 (continued). In both cases, whenever ðV
tÞ isa diffusion, we have a
ZðnÞOa
Vððn 1ÞDÞ:
For ðV
tÞ a strictly stationary diffusion process, it is well known that a
VðtÞ ! 0 as t ! þ1:
For details(such asthe rate of convergence of the mixing coefficient), we refer to Genon- Catalot et al. (2000a) and the referencestherein.
The above properties have interesting statistical consequences. When the unobserved chain dependson unknown parameters, the ergodicity and mixing propertiesof ðZ
nÞ can be used to derive the consistency and asymptotic normality of empirical estimators of the form 1=n P
ndi¼0
uðZ
iþ1; . . . ; Z
iþdÞ. Thisleadsto variousproceduresthat have been already applied to models[Moment method, GMM (Hansen, 1982), EMM (Tauchen et al., 1996; Gallant et al., 1997) or prediction-based estimating equations
4 (Sørensen, 2000)]. These methods yield esti-
matorswith rate p ffiffiffi n
. However, to be accurate, they require high moment conditionsthat may be not fulfilled, and often lead to biased estimators and/or to cumbersome computations.
Another central question lies in the fact that they may be very far from the exact likelihood method.
2.2. The exact likelihood
We recall here general propertiesof the likelihood in an HMM model. Such likelihoodshave already been studied but mainly under the assumption that the state space of the hidden chain is finite (see, e.g. Leroux, 1992; Bickel & Ritov, 1996). Since this is not the case in the examples above, we focus on the properties which hold without this assumption.
Consider a general HMM as in Definition 1 and let us give some more notations and assumptions.
• (H1) The conditional distribution of Z
ngiven U
n¼ u isgiven by a density fðz=uÞ with respect to a dominating measure lðdzÞ on ðZ; BðZÞÞ.
• (H2) The transition operator P
hof the hidden chain ðU
iÞ dependson an unknown parameter h 2 H R
p; pP1, and the transition probability is specified by a density pðh; u; tÞ with respect to a dominating measure mðdtÞ on ðU; BðUÞÞ.
• (H3) For all h, the transition P
hadmits a stationary distribution having density with respect to the same dominating measure, denoted by gðh; uÞ. The initial variable U
1hasthissta- tionary distribution.
• (H4) For all h, the chain ðU
nÞ with marginal distribution gðh; uÞmðduÞ isergodic.
Under these assumptions, the process ðU
n; Z
nÞ is strictly stationary and ergodic. Let us point out that we do not introduce unknown parametersin the density of Z
ngiven U
n¼ u, since, in our examples, all unknown parameters will come from the hidden chain.
With these notations, the distribution of the initial variable ðU
1; Z
1Þ of the chain ðU
n; Z
nÞ has density gðh; uÞf ðz=uÞ with respect to mðduÞ lðdzÞ; the transition density is given by
pðh; u; tÞf ðz=tÞ: ð6Þ
Now, if we denote by p
nðh; z
1; . . . ; z
nÞ the density of ðZ
1; . . . ; Z
nÞ with respect to dlðz
1Þ dlðz
nÞ, we have
p
1ðh; z
1Þ ¼ Z
U
gðh; uÞf ðz
1=uÞ dmðuÞ; ð7Þ
p
nðh; z
1; . . . ; z
nÞ ¼ Z
Un
gðh; u
1Þf ðz
1=u
1Þ Y
ni¼2
pðh; u
i1; u
iÞf ðz
i=u
iÞ dmðu
1Þ dmðu
nÞ: ð8Þ In formula (8), the function under the integral sign is the density of ðU
1; . . . ; U
n; Z
1; . . . ; Z
nÞ.
A more tractable expression for p
nðh; z
1; . . . ; z
nÞ is obtained from the classical formula:
p
nðh; z
1; . . . ; z
nÞ ¼ p
1ðh; z
1Þ Y
ni¼2
t
iðh; z
i=z
i1; . . . ; z
1Þ; ð9Þ
where t
iðh; z
i=z
i1; . . . ; z
1Þ isthe conditional density of Z
igiven Z
i1¼ z
i1; . . . ; Z
1¼ z
1. The interest of this representation appears below.
Proposition 2
(1) For all iP2, we have (see (9)) t
iðh;z
i=z
i1; . . . ; z
1Þ ¼
Z
U
g
iðh;u
i=z
i1; . . . ; z
1Þf ðz
i=u
iÞ dmðu
iÞ; ð10Þ where g
iðh; u
i=z
i1; . . . ; z
1Þ is the conditional density of U
igiven Z
i1¼ z
i1; . . . ; Z
1¼ z
1. (2) If g g ^
iiðh; u
i=z
i; . . . ; z
1Þ denotes the conditional density of U
igiven Z
i¼ z
i; . . . ; Z
1¼ z
1, then it is
given by
^ g
ig
iðh; u
i=z
i; . . . ; z
1Þ ¼ g
iðh; u
i=z
i1; . . . ; z
1Þf ðz
i=u
iÞ t
iðh; z
i=z
i1; . . . ; z
1Þ : (3) For all iP1,
g
iþ1ðh; u
iþ1=z
i; . . . ; z
1Þ ¼ Z
U
^ g
ig
iðh; u
i=z
i; . . . ; z
1Þpðh; u
i; u
iþ1Þ dmðu
iÞ: ð11Þ The result of Proposition 2 is standard in the field of filtering theory (see, e.g. Liptser &
Shiryaev, 1978, and for details, Genon-Catalot et al., 2000b). It leads to a recursive expression of the exact likelihood function. Now, we are faced with two kindsof problems, a numerical one and a theoretical one.
The numerical computation of the exact m.l.e. of h raises difficulties and there is a large amount of literature devoted to approximation algorithms(Markov chain Monte Carlo methods(MCMC), EM algorithm, particle filter method, etc.). In particular, MCMC methods have been recently considered in econometrics and applied to partially observed diffusions (see, e.g. Eberlein et al., 2001)
5 .
From a theoretical point of view, for HMMs with a finite state space for the hidden chain, results on the asymptotic behaviour (consistency) of the m.l.e. are given in Leroux (1992), Francq & Roussignol (1997) and Bickel & Ritov (1996). Extensions to the case of a compact state space for the hidden chain have been recently investigated (Jensen & Petersen, 1999;
Douc & Matias, 2001). Apart from these cases, the problem is open. Let us stress that, in the SV models, the state space of the hidden chain is not compact.
Nevertheless, there is at least one case where the state space of the hidden chain is non- compact and where the asymptotic behaviour of the m.l.e. is well known: the Kalman filter model defined in (2).
In this paper, we consider and study new estimators that are minimum contrast estimators based on the conditional likelihood. We can justify our approach by the fact that these estimators are asymptotically equivalent to the exact m.l.e. in the Kalman filter.
3. Contrasts based on the conditional likelihood
Our aim is to develop a theoretical tool for obtaining consistent estimators. A specific feature of HMMsisthat the conditional law of Z
igiven the past observations depends effectively on i and all the observations Z
1; . . . ; Z
i1. In order to recover some stationarity properties, we need to introduce the infinite past of each observation Z
i. Thisisthe concern of the conditional likelihood. This estimation method derives from the field of discrete time ARCH models.
Besides, as a tool, it appears explicitly in Leroux (1992, Section 4).
We first introduce and study the conditional likelihood, and then derive associated contrast functions leading to estimators.
3.1. Conditional likelihood
Since the process ðU
n; Z
nÞ is strictly stationary, we can consider its extension to a process ððU
n; Z
nÞ; n 2 ZÞ indexed by Z, with the same finite-dimensional distributions. For all i 2 Z, let Z
i¼ ðZ
i; Z
i1; . . .Þ be the vector of R
Ndefining the past of the process until i. Below, we prove that the conditional distribution of the sample ðZ
1; . . . ;Z
nÞ given the infinite past Z
0admitsa density with respect to lðdz
1Þ lðdz
nÞ, which allowsto define the conditional likelihood of ðZ
1; . . . ;Z
nÞ given Z
0.
From now on, let us suppose that Z ¼ R and U ¼ R
k, for some kP1. Denote by P
hthe distribution of ððU
n;Z
nÞ;n 2 ZÞ on the canonical space, ððU
n; Z
nÞ; n 2 ZÞ will be the canonical process, and E
h¼ E
Ph.
Theorem 1
Under (H1)–(H3), the followingholds:
(1) The conditional distribution, under P
h, of U
1given the infinite past Z
0admits a density with respect to mðduÞ equal to
~
g gðh; u=Z
0Þ ¼ E
hðpðh; U
0; uÞ=Z
0Þ: ð12Þ
The function ~ g gðh; u=Z
0Þ is well defined under P
has a measurable function of ðu; Z
0Þ, since there exists a regular version of the conditional distribution of U
0given Z
0.
(2) The conditional distribution, under P
h, of Z
1given the infinite past Z
0admits a density with respect to dlðz
1Þ equal to
~
p pðh; z
1=Z
0Þ ¼ Z
U
~ g
gðh; u
1=Z
0Þfðz
1=u
1Þmðdu
1Þ: ð13Þ
Proof of Theorem 1. We omit h in all notationsfor the proof. By the Markov property of ðU
n; Z
nÞ, the conditional distribution of U
1given ðU
0;Z
0Þ; ðU
1; Z
1Þ; . . . ; ðU
n; Z
nÞ isiden- tical to the conditional distribution of U
1given ðU
0; Z
0Þ, which iss imply the conditional distribution of U
1given U
0[see (6)]. Consequently, for u : U ! ½0; 1 measurable,
EðuðU
1Þ=U
0; Z
0; Z
1; . . . ; Z
nÞ ¼ EðuðU
1Þ=U
0Þ ¼ PuðU
0Þ; ð14Þ with
P uðU
0Þ ¼ Z
uðuÞpðU
0; uÞmðduÞ: ð15Þ
Hence,
EðuðU
1Þ=Z
0; Z
1; . . . ; Z
nÞ ¼ EðPuðU
0Þ=Z
0; Z
1; . . . ;Z
nÞ: ð16Þ
By the martingale convergence theorem, we get
EðuðU
1Þ=Z
0Þ ¼ EðPuðU
0Þ=Z
0Þ: ð17Þ
Now, using the fact that there is a regular version of the conditional distribution of U
0given Z
0, say dP
U0ðu
0=Z
0Þ, we obtain
EðuðU
1Þ=Z
0Þ ¼ Z
U
P uðu
0Þ dP
U0ðu
0=Z
0Þ: ð18Þ
Applying the Fubini theorem yields EðuðU
1Þ=Z
0Þ ¼
Z
U
uðuÞEðpðU
0; uÞ=Z
0ÞmðduÞ: ð19Þ
So, the proof of (1) iscomplete.
Using again the Markov property of ðU
n; Z
nÞ and (6), we get that the conditional distri- bution of Z
1given ðU
1; Z
0; Z
1; . . . ;Z
nÞ isidentical to the conditional distribution of Z
1given U
1which issimply f ðz
1=U
1Þlðdz
1Þ. So taking u : Z ! ½0; 1 measurable as above, we get
EðuðZ
1Þ=Z
0; Z
1; . . . ;Z
nÞ ¼ EðEðuðZ
1Þ=U
1Þ=Z
0; Z
1; . . . ; Z
nÞ: ð20Þ So, using the martingale convergence theorem and Proposition 2, we obtain
EðuðZ
1Þ=Z
0Þ ¼ Z
R
uðz
1Þlðdz
1Þ Z
U
f ðz
1=u
1Þ~ g gðu
1=Z
0Þmðdu
1Þ: ð21Þ
Thisachievesthe proof. h
Note that, under P
h, by the strict stationarity, for all iP1, the conditional density of Z
igiven Z
i1isgiven by
~
p pðh; z
i=Z
i1Þ ¼ Z
U
~
g gðh; u
i=Z
i1Þf ðz
i=u
iÞmðdu
iÞ: ð22Þ Thus, the conditional distribution of ðZ
1; . . . ; Z
nÞ given Z
0¼ z
0hasa density given by
~ p
p
nðh;z
1; . . . ;z
n=z
0Þ ¼ Y
ni¼1
~
p pðh; z
i=z
i1Þ: ð23Þ
Hence, we may introduce Definition 2.
Definition 2. Let us assume that, P
h0a.s., the function ~ p pðh; Z
n=Z
n1Þ iswell defined for all h and all n. Then, the conditional likelihood of ðZ
1; . . . ; Z
nÞ given Z
0isdefined by
~ p
p
nðh;Z
nÞ ¼ Y
ni¼1
~
p pðh; Z
i=Z
i1Þ ¼ ~ p p
nðh; Z
1; . . . ; Z
n=Z
0Þ: ð24Þ Let usset when defined
lðh; z
1Þ ¼ log ~ p pðh; z
1=z
0Þ; ð25Þ
so that
log ~ p p
nðhÞ ¼ X
ni¼1
lðh; Z
iÞ:
We have Proposition 3.
Proposition 3
Under (H1)–(H4), if E
h0jlðh; Z
1Þj < 1, then, we have, almost surely, under P
h0, 1
n log ~ p p
nðh; Z
nÞ ! E
h0lðh; Z
1Þ:
Proof of Proposition 3. Using Proposition 1 (3), ðZ
nÞ isergodic, and the ergodic theorem
may be applied. h
Example 1 (continued) (Kalman filter). For the discrete Kalman filter [see (2)], the suc- cessive distributions appearing in Proposition 2 are explicitely known and Gaussian.
With the notationsintroduced previously, the unknown parametersare a; b
2while c
2is supposed to be known and jaj < 1. We assume that ðU
nÞ isin a s tationary regime. The following properties are well known and are obtained following the steps of Proposition 2.
Under P
h, ðh ¼ ða; b
2ÞÞ:
(i) the conditional distribution of U
ngiven ðZ
n1; . . . ; Z
1Þ isthe law Nðx
n1;r
2n1Þ;
(ii) the conditional distribution of U
ngiven ðZ
n; . . . ; Z
1Þ isthe law Nð^ x x
n; r r ^
2nÞ;
(iii) the conditional distribution of Z
ngiven ðZ
n1;. . . ; Z
1Þ isthe law Nð x x
n1; r r
2n1Þ, with
v
n¼ r r ^
2nc
2v
1¼ s
2s
2þ c
2; ð26Þ
^ x
x
nþ1¼ a^ x x
nþ ðZ
nþ1a^ x x
nÞv
nþ1^ x x
0¼ 0; ð27Þ
v
nþ1¼ fðv
nÞ with f ðvÞ ¼ 1 1 þ b
2c
2þ a
2v
1
; ð28Þ
x
n1¼ a^ x x
n1¼ x x
n1r
2n1¼ b
2þ a
2c
2v
n1; ð29Þ
r
r
2n1¼ r
2n1þ c
2r r
20¼ s
2þ c
2: ð30Þ With the above notations, the distribution of Z
1isthe law Nð x x
0; r r
20Þ and the exact likelihood of ðZ
1; . . . ; Z
nÞ is(up to a constant) equal to
p
nðhÞ ¼ p
nðh; Z
1; . . . ; Z
nÞ ¼ ð r r
0. . . r r
n1Þ
1exp X
ni¼1
ðZ
ix x
i1Þ
22 r r
2i1!
; ð31Þ
where h ¼ ða;b
2Þ, xx
i¼ xx
iðhÞ and r r
i¼ r r
iðhÞ.
The conditional distribution of Z
1given ðZ
0; . . . ; Z
nþ2Þ is obtained substituting ðZ
n1; . . . ; Z
1Þ by ðZ
0; . . . ; Z
nþ2Þ in the law Nð x x
n1; r r
2n1Þ. This sequence of distributions convergesweakly to the conditional distribution of Z
1given Z
0. Indeed, the deterministic recurrence equation for ðv
nÞ convergesto a limit vðhÞ 2 ð0; 1Þ with exponential rate [see (28)].
Using the above equations, we find after some computations that the conditional distribution of Z
1given Z
0isthe law Nð xxðh; Z
0Þ; r r
2ðhÞÞ with
xxðh; Z
0Þ ¼ aE
hðU
0=Z
0Þ ¼ avðhÞ X
1i¼0
a
ið1 vðhÞÞ
iZ
iand r r
2ðhÞ ¼ ac
2v þ b
2þ c
2: Therefore, the conditional likelihood is(up to a constant) explicitely given by
~ p
p
nðh; Z
nÞ ¼ r rðhÞ
nexp X
ni¼1
ðZ
ix xðh; Z
i1ÞÞ
22 r rðhÞ
2!
: ð32Þ
Since the series xxðh; Z
0Þ convergesin L
2ðP
h0Þ, thisfunction iswell defined for all h under P
h0.
Moreover, the assumptions of Proposition 3 are satisfied, and the limit is
E
h0lðh; Z
1Þ ¼ 1
2 log r rðhÞ
2þ 1 r
rðhÞ
2E
h0ðZ
1x xðh; Z
0ÞÞ
2( )
: ð33Þ
3.2. Associated contrasts
The conditional likelihood cannot be used directly, since we do not observe Z
0. However, the conditional likelihood suggests a family of appropriate contrasts to build consistent estima- tors.
As mentioned earlier, our approach is motivated by the estimation methods used for dis- crete time ARCH models (see Engle, 1982). In these models, the exact likelihood is untract- able, whereasthe conditional likelihood isexplicit and simple. Moreover, the common feature of these series is that they usually do not have even second-order moments. This rules out any standard estimation method based on moments. Elie & Jeantheau (1995) and Jeantheau (1998) have proved an appropriate theorem to deal with this difficulty. We also use this theorem later and recall it. For the sake of clarity, its proof is given in the appendix.
3.2.1. A general result of consistency
For a given known sequence z
0of R
N, we define the random vector of R
NZ
nðz
0Þ ¼ ðZ
n; . . . ;Z
1; z
0Þ:
Let f be a real function defined on H R
Nand set F
nðh; z
nÞ ¼ n
1X
ni¼1
f ðh; z
iÞ: ð34Þ
Let usintroduce the random variable h
n¼ arg inf
h
F
nðh;Z
nÞ ¼ h
nðZ
nÞ: ð35Þ
Now, we introduce the estimator defined by the equation h ~
h
nðz
0Þ ¼ arg inf
h
F
nðh;Z
nðz
0ÞÞ ¼ h
nðZ
nðz
0ÞÞ: ð36Þ
The estimator h h ~
nðz
0Þ isa function of the observationsðZ
1; . . . ; Z
nÞ, but also depends on f and z
0. Theorem 2 givesconditionson f and z
0to obtain strong consistency for this type of estimators.
We have in mind the case of f ¼ log l [see (25)] so that F
nðh; Z
nÞ ¼ log ~ p p
nðhÞ. Never- theless, other functions of f could be used.
Asusual, let h
0be the true value of the parameter and consider the following conditions:
• C0 H iscompact.
• C1 The function f issuch that
(i) For all n, fðh; Z
nÞ ismeasurable on H X and continuousin h, P
h0a.s.
(ii) Let Bðh;qÞ be the open ball of centre h and radius q, and set for i 2 Z, f
ðh; q;Z
iÞ ¼ infff ðh
0; Z
iÞ; h
02 Bðh;qÞ \ Hg. Then, 8h 2 H; E
h0ðf
ðh; q; Z
1ÞÞ > 1 [with the notation a
¼ infða; 0Þ] .
• C2 The function h ! F ðh
0; hÞ ¼ E
h0ð fðh; Z
1Þ Þ hasa unique (finite) minimum at h
0.
• C3 The function f and z
0are such that
(i) For all n, fðh; Z
nðz
0ÞÞ ismeasurable on H X and continuousin h, P
h0a.s.
(ii) f ðh; Z
nÞ f ðh; Z
nðz
0ÞÞ ! 0 as n ! 1; P
h0a:s:; uniformly in h.
Theorem 2
Assume (H1)–(H4). Then, under (C0)–(C2), the random variable h
ndefined in (35) converges P
h0a.s. to h
0when n ! 1. Under (C0)–(C3), h h ~
nðz
0Þ converges P
h0a.s. to h
0.
Let us make some comments on Theorem 2. It holds not only for HMMs, but for any strictly stationary and ergodic process under condition (C0)–(C3). It is an extension of a result of Pfanzagl (1969) proved for i.i.d. data and for the exact likelihood. Itsmain interest isto obtain strongly consistent estimators under weaker assumptions than the classical ones. In particular, (C1) and (C2) are weak moment and regularity conditions. Condition (C3) appears as the most difficult one. We give below examples where these conditions may be checked. Let us stress that in econometric literature, (C3) is generally not checked and only h
nisconsidered and treated as the standard estimator. Note also that (C3) may be weakened into
1 n
X
ni¼1
f ðh; q; Z
iÞ f ðh;q; Z
iðz
0ÞÞ
ð Þ ! 0 as n ! 1;
uniformly in h, in P
h0-probability. Thisleadsto a convergence in P
h0-probability of h h ~
nðz
0Þ to h
0.
Example 4 (ARCH-type models). Consider observations Z
ngiven by Z
n¼ U
n1=2e
nand U
n¼ uðh; Z
n1Þ;
where ðe
nÞ isa sequence of i.i.d. random variableswith mean 0 and variance 1, and with I
n¼ rðZ
k; kOnÞ, e
nis I
n-measurable and independent of I
n1. Although U
nisa Markov chain, Z
nis not an HMM in the sense of Definition 1. Still, under appropriate assumptions, Z
nis strictly stationary and ergodic. If the e
ns are Gaussian, we choose f ðh; Z
1Þ ¼ 1
2 log uðh;Z
0Þ þ Z
12uðh;Z
0Þ
;
which corresponds to the conditional loglikelihood. If the e
ns are not Gaussian, we may still consider the same function.
To clarify, consider the special case of a GARCH(1,1) model, given by U
n¼ x þ aZ
2n1þ bU
n1¼ x þ ðae
2n1þ bÞU
n1;
where x; a and b are positive real numbers. It is well known that, if Eðlogðae
21þ bÞÞ < 0, there exists a unique stationary and ergodic solution, and, almost surely,
U
n¼ x
1 b þ a X
iP1
b
i1Z
ni2:
Let us remark that this solution does not even have necessarily a first-order moment.
However, it is enough to add the assumption xPc > 0 to prove that assumptions (C1)–(C2) hold. Fixing Z
0¼ z
0isequivalent to fixing U
1¼ u
1, and (C3) can be proved (see Elie &
Jeantheau, 1995).
3.2.2. Identifiability assumption
Condition (C2) is an identifiability assumption. The most interesting case is when f directly comesfrom the conditional likelihood, that isto say
f ðh; z
1Þ ¼ lðh; z
1Þ ¼ log ~ p pðh; z
1=z
0Þ:
In this case, the limit appearing in (C2) admits a representation in terms of a Kullback–Leibler
information. Consider the random probability measure
P ~
P
hðdzÞ ¼ ~ p pðh; z=Z
0ÞlðdzÞ ð37Þ
and the assumptions
• (H5) 8h
0; h; E
h0jlðh; Z
1Þj < 1
• (H6) ~ P P
h¼ P P ~
h0; P
h0a.s. ¼) h ¼ h
0. Set
Kðh
0; hÞ ¼ E
h0Kð P P ~
h0; P P ~
hÞ; ð38Þ
where Kð P P ~
h0; P P ~
hÞ denotesthe Kullback information of P P ~
h0with respect to P P ~
h.
Lemma 1
Assume (H1)–(H6), we have, almost surely, under P
h0, 1
n ðlog ~ p p
nðh
0; Z
nÞ log ~ p p
nðh; Z
nÞÞ ! Kðh
0;hÞ;
and (C2) holds for f ¼ l.
Proof of Lemma 1. Under (H5), the convergence isobtained by the ergodic theorem.
Conditioning on Z
0, we get E
h0log ~ p pðh; Z
1=Z
0Þ ¼ E
h0Z
R
log ~ p pðh; z=Z
0Þ d P P ~
h0ðzÞ:
Therefore,
E
h0½ lðh; Z
1Þ lðh
0; Z
1Þ ¼ Kðh
0; hÞ:
Thisquantity isnon negative [s ee (38)] and, by (H6), equal to 0 if and only if
~ P
P
h¼ P P ~
h0P
h0a.s. h
Example 1 (continued) (Kalman filter). Recall that [see (33)] the conditional likelihood is based on the function
f ðh; Z
1Þ ¼ lðh; Z
1Þ ¼ 1
2 log r rðhÞ
2þ 1
r rðhÞ
2ðZ
1x xðh;Z
0ÞÞ
2( )
: ð39Þ
Condition (C1) is immediate. To check (C2), we use Lemma 1. Assumption (H5) is also immediate. Let uscheck (H6). We have seen that
P ~
P
h¼ Nð xxðh; Z
0Þ; r r
2ðhÞÞ:
Thus, P ~
P
h¼ P P ~
h0P
h0a.s.
isequivalent to
xxðh; Z
0Þ ¼ xxðh
0; Z
0Þ P
h0a.s. and r rðhÞ
2¼ r rðh
0Þ
2: ð40Þ The first equality in (40) writes P
iP0
k
iZ
i¼ 0 P
h0a.s. with k
i¼ avðhÞa
ið1 vðhÞÞ
ia
0vðh
0Þa
i0ð1 vðh
0ÞÞ
i:
Suppose that the k
isare not all equal to 0. Denote by i
0the smallest integer such that k
i06¼ 0.
Then, Z
i0becomes a deterministic function of its infinite past. This is impossible. Hence, for all
i, k
i¼ 0. Thus, we get avðhÞ ¼ a
0vðh
0Þ and að1 vðhÞÞ ¼ a
0ð1 vðh
0ÞÞ. Since 0 < vðhÞ; vðh
0Þ < 1, we get a ¼ a
0and vðhÞ ¼ vðh
0Þ. The second equality in (40) yields b ¼ b
0. 3.2.3. Stability assumption
Condition (C3) can be viewed as a stability assumption, since it states an asymptotic forgetting of the past. But, here, the stability condition has only to be checked on the specific function f and point z
0chosen to build the estimator. This holds for Example 4 (ARCH-type models).
We can also check it for the Kalman filter.
Example 1 (continued) (Kalman filter). Let usprove that (C3) holdswith z
0¼ ð0; 0;. . .Þ ¼ 0 and f given by (39). Note that for abirtrary z
0in R
N, x xðh; z
0Þ may be undefined. For z
0¼ 0, we have
f ðh; Z
nÞ f ðh; Z
nð0ÞÞ ¼ 1
2 r rðhÞ
2ð x xðh; Z
n1Þ x xðh; Z
n1ð0ÞÞ Þ ð 2Z
nx xðh; Z
n1Þ xxðh; Z
n1ð0ÞÞ Þ:
Now,
x xðh; Z
n1Þ x xðh; Z
n1ð0ÞÞ ¼ a
n1ð1 vðhÞÞ
n1xðh; Z
0Þ:
Using that H iscompact, we easily deduce (C3) for thisexample.
Remark
To conclude, let us stress that it is possible to compare the exact m.l.e. and the minimum contrast estimator h h ~
nð0Þ in the Kalman filter example. Indeed, ðZ
nÞ isa stationary ARMA(1,1) Gaussian process. The exact likelihood requires the knowledge of Z
tR
11;nZ where R
1;nisthe covariance matrix of ðZ
1;. . . ; Z
nÞ. To avoid the difficult computation of R
11;n, two approxi- mations are classical. The first one is the Whittle approximation which consists in computing
~ Z
Z
tR
11;1Z Z, where ~ R
1;1isthe covariance matrix of the infinite vector ðZ
n; n 2 ZÞ and
~ Z
Z ¼ ð. . . ;0; 0; Z
1; . . . ; Z
n; 0; 0; . . .Þ. The second one is the case described here. It corresponds to computing ~ Z Z
tR
11;0Z Z ~ with ~ Z Z ¼ Z
nð0Þ ¼ ð. . . ; 0; 0; Z
1; . . . ; Z
nÞ. It iswell known that the three estimators are asymptotically equivalent. It is also classical to use the previous estimators even for non-Gaussian stationary processes (for details, see Beran, 1995).
4. Stochastic volatility models
In this section, we give more details on Example 3, in the case where the volatility V
tisa strictly stationary diffusion process.
4.1. Model and assumptions
We consider for t 2 R , ðY
t; V
tÞ defined by, for sOt Y
tY
s¼
Z
t sr
udB
u; ð41Þ
V
t¼ r
2tand V
tV
s¼ Z
ts
bðh;V
uÞ du þ Z
ts
aðh; V
uÞ dW
u: ð42Þ
For positive D, we observe a discrete sampling ðY
iD; i ¼ 1;. . . ; nÞ of (41) and the problem isto
estimate the unknown h 2 H R
dof (42) from thisobservation.
We assume that
• (A0) ðB
t;W
tÞ
t2Risa s tandard Brownian motion of R
2defined on a probability space ðX; A; PÞ.
Equation (42) defines a one-dimensional diffusion process indexed by t 2 R. We make now the standard assumptions on functions bðh; uÞ and aðh;uÞ ensuring that (42) admits a unique strictly stationary and ergodic solution with state space ðl; rÞ included is ð0; 1Þ.
• (A1) For all h 2 H; bðh; vÞ and aðh; vÞ are continuous(in v) real functionson R, and C
1functionson ðl; rÞ such that
9k > 0; 8v 2 ðl; rÞ; b
2ðh; vÞ þ a
2ðh;vÞOkð1 þ v
2Þ and 8v 2 ðl;rÞ; aðh; vÞ > 0:
For v
02 ðl; rÞ, define the derivative of the scale function of diffusion ðV
tÞ, sðh;vÞ ¼ exp 2
Z
v v0bðh; uÞ a
2ðh;uÞ du
: ð43Þ
• (A2) For all h 2 H, Z
lþ
sðh; vÞ dv ¼ þ1;
Z
rsðh; vÞ dv ¼ þ1;
Z
r ldv
a
2ðh; vÞsðh; vÞ ¼ M
h< þ1:
Under (A0)–(A2), the marginal distribution of ðV
tÞ is p
hðdvÞ ¼ pðh;vÞdv, with pðh; vÞ ¼ 1
M
h1
a
2ðh;vÞsðh;vÞ 1
ðv2ðl;rÞÞ: ð44Þ
In order to study the conditional likelihood, we consider the additional assumptions
• (A3) For all h 2 H; R
rl
vp
hðdvÞ < 1.
• (A4) 0 < l < r < 1.
Let us stress that (A3) is a weak moment condition. The condition l > 0 iscrucial. Intui- tively, it is natural to consider volatilities bounded away from 0 in order to estimate their parametersfrom the observation of ðY
tÞ.
Let C ¼ CðR; R
2Þ be the space of continuous functions on R and R
2-valued, equipped with the Borel r-field C associated with the uniform topology on each compact subset of R. We shall assume that ðY
t; V
tÞ is the canonical diffusion solution of (41) and (42) on ðC; CÞ, and we keep the notation P
hfor its distribution. For given positive D, we observe ðY
iDY
ði1ÞD; i ¼ 1; . . . ; nÞ and we define ðZ
iÞ asin (3). Asrecalled in Example 3, ðZ
iÞ isan HMM with hidden chain U
n¼ ðV
n; V
nDÞ.
Setting t ¼ ðv; vÞ 2 ðl; rÞ
2, the conditional distribution of Z
igiven U
i¼ t is the Gaussian law Nð0;v), so that lðdzÞ ¼ dz is the Lebesgue measure on R and
f ðz=tÞ ¼ f ðz=vÞ ¼ 1
ð2pvÞ
1=2exp z
22v
: ð45Þ
For the transition density of the hidden chain U
i¼ ðV
i; V
iDÞ, it isnatural to have, as dominating measure, the Lebesgue measure
mðdtÞ ¼ 1
ðl;rÞ2ðv; vÞdv dv: ð46Þ
Actually, it amounts to proving that the two-dimensional diffusion ð R
t0
V
sds;V
tÞ admitsa transition density. Two points of view are possible. In some models, a direct proof may be feasible. Otherwise, this will be ensured under additional regularity assumptions on functions
aðh; :Þ and bðh; :Þ (see, e.g. Gloter, 2000). For sake of clarity, we introduce the following assumption:
• (A5) The transition probability distribution of the chain ðU
iÞ isgiven by
pðh; u; tÞmðdtÞ; ð47Þ
where ðu; tÞ 2 ðl; rÞ
2ðl; rÞ
2and mðdtÞ is the Lebesgue measure (46).
This has several interesting consequences. First, note that this transition density has a special form. Setting u ¼ ða; aÞ 2 ðl; rÞ
2,
pðh; u; tÞ ¼ pðh; a; tÞ ð48Þ
only dependson a and isequal to the conditional density of U
1¼ ðV
1; V
DÞ given V
0¼ a.
Therefore, the (unconditional) density of U
1is(with t ¼ ðv; vÞ) gðh; tÞ ¼
Z
r lpðh; a; tÞpðh; aÞ da; ð49Þ
where pðh; aÞ da, defined in (44), is the stationary distribution of the hidden diffusion ðV
tÞ. Of course, gðh; tÞ is the stationary density of the chain ðU
iÞ. The densities of V
1and Z
1are, therefore [see (45)],
pðh; vÞ ¼ Z
rl
gðh; ðv; vÞÞ dv; ð50Þ
p
1ðh; z
1Þ ¼ Z
rl
fðz
1=vÞpðh; vÞ dv: ð51Þ
Second, the conditional distribution of V
igiven Z
i1¼ z
i1; . . . ; Z
1¼ z
1hasa density with respect to the Lebesgue measure on ðl; rÞ, s ay
p
iðh; v
i=z
i1; . . . ; z
1Þ: ð52Þ
So, applying Proposition 2, we can integrate with respect to the second coordinate V
iDof U
i¼ ðV
i; V
iDÞ to obtain that the conditional density of Z
igiven Z
i1¼ z
i1; . . . ; Z
1¼ z
1, is equal to
t
iðh;z
i=z
i1; . . . ; z
1Þ ¼ Z
rl
p
iðh; v
i=z
i1; . . . ; z
1Þf ðz
i=v
iÞ dv
ið53Þ for all iP2 [see (9)]. Therefore, (53) is a variance mixture of Gaussian distributions, the mixing distribution being p
iðh; v
i=z
i1; . . . ; z
1Þ dv
i.
Let us establish some links between the likelihood and a contrast previously used in the case of the small sampling interval (see Genon-Catalot et al., 1999). The contrast method is based on the property that the random variables ðZ
iÞ behave asymptotically (as D ¼ D
ngoesto zero) as a sample of the distribution
qðh; zÞ ¼ Z
rl
pðh; vÞf ðz=vÞ dv:
The same property of variance mixture of Gaussian distributions appears in (53), with a change of mixing distribution [see also (54)].
4.2. Conditional likelihood
Applying Theorem 1 and integrating with respect to the second coordinate V
Dof
U
1¼ ðV
1; V
DÞ, we obtain the following proposition:
Proposition 4
Assume (A0)–(A2) and (A5). Then, under P
h:
(1) The conditional distribution of V
1given Z
0¼ z
0admits a density with respect to the Lebesgue measure on ðl;rÞ, which we denote by p p ~
hðv=z
0Þ.
(2) The conditional distribution of Z
1given Z
0¼ z
0has the density
~
p pðh; z
1=z
0Þ ¼ Z
rl
~ p
p
hðv=z
0Þf ðz
1=vÞ dv: ð54Þ
Hence, the conditional likelihood of ðZ
1; . . . ; Z
nÞ given Z
0isgiven by
~ p
p
nðh;Z
nÞ ¼ Y
ni¼1
~
p pðh; Z
i=Z
i1Þ:
Therefore, the distribution given by (54) is a variance mixture of Gaussian distributions, the mixing distribution being now the conditional distribution p p ~
hðv=Z
0Þ dv of V
1given Z
0[com- pare with (53)].
In accordance with Definition 2, let us assume that, P
h0a.s., the function p p ~
hðv=Z
0Þ iswell defined for all h and isa probability density on ðl; rÞ. We keep the following notations:
f ðh; Z
nÞ ¼ log ~ p pðh; Z
n=Z
n1ÞÞ and P P ~
hðdzÞ ¼ ~ p pðh;z=Z
0Þ dz:
Then, we have
Proposition 5 Under (A0)–(A5):
(1) 8h
0;h, E
h0jf ðh;Z
1Þj < 1:
(2) We have, almost surely, under P
h0, 1
n ðlog ~ p p
nðh
0; Z
nÞ log ~ p p
nðh; Z
nÞÞ ! Kðh
0; hÞ ¼ E
h0Kð ~ P P
h0; P P ~
hÞ;
see (38).
Proof of Proposition 5. Using (A4) and the fact that p p ~
hðv=Z
0Þ isa probability density over ðl; rÞ, we get
1 ffiffiffiffiffiffiffi p 2pr exp z
212l O~ p pðh; z
1=Z
0ÞÞO 1 ffiffiffiffiffiffiffi
p 2pl : ð55Þ
So, for some constant C (independent of h, involving only the boundaries l; r), we have
jf ðh; Z
1ÞjOCð1 þ Z
12Þ: ð56Þ
By (A3), E
h0Z
12¼ E
h0V
0< 1. Therefore, we get the first part. The second follows from the
ergodic theorem. h
So, we have checked (H5). We do not know how to check the identifiability assumption (H6). However, in statistical problems for which the identifiability assumption contains ran- domness, this assumption can rarely be verified. Hence, if we know that regularity condition (C1) holds, we get that h
nconvergesa.s. to h
0. Condition (C3) remainsto be checked (see, in thisrespect, our commentsafter Theorem 2).
To conclude, the above results on the SV models are of theoretical nature but clarify the difficulties of the statistical inference in this model and enlight the set of minimal assumptions.
4.3. Mean revertinghidden diffusion
In the SV models, we cannot have explicit expressions for the conditional densities. Therefore, we must use other functions f ðh; Z
nÞ to build estimators. To illustrate this, let us consider mean-reverting volatilities, that is, models of the form
dV
t¼ aðb V
tÞ dt þ aðV
tÞ dW
t; ð57Þ
where a > 0, b > 0 and aðV
tÞ may also depend on unknown parameters. Due to the mean reverting drift, these models present some special features. In particular, many authors have remarked that the covariance structure of the process ðV
iÞ is simple (see, e.g. Genon-Catalot et al., 2000a; Sørensen, 2000).
Assume that the above hidden diffusion ðV
tÞ satisfies (A1) and (A2), and that EV
02isfinite.
Then, EV
1¼ EV
0¼ b, EV
12
¼ b
2þ VarðV
0Þ 2ðaD 1 þ e
aDÞ
a
2D
2ð58Þ
and for kP1,
EV
1V
kþ1¼ b
2þ VarðV
0Þ ð1 e
aDÞ
2a
2D
2e
aðk1ÞD: ð59Þ The previousformulae allow to compute the covariance function of ðZ
i2; iP1Þ.
Proposition 6
Assume (A0)–(A2) and that EV
02is finite. Then, the process defined for iP1 by
X
i¼ Z
iþ12b e
aDðZ
i2bÞ ð60Þ
satisfies, for jP2, CovðX
i; X
iþjÞ ¼ 0. Hence, ððZ
i2bÞ; iP1Þ is centred and ARMA(1,1).
Proof of proposition 6. The process ðZ
i2; iP1Þ is strictly stationary and ergodic. Straight- forward computationslead to EZ
21¼ b,
VarðZ
12Þ ¼ 2EV
12
þ VarðV
1Þ ð61Þ
and for jP1,
CovðZ
12; Z
21þjÞ ¼ CovðV
1; V
1þjÞ: ð62Þ
Then, the computation of CovðX
i;X
iþjÞ easily follows from (58)–(60). h Estimation by the Whittle approximation of the likelihood is, therefore, feasible as sug- gested by Barndorff-Nielsen & Shephard (2001). To apply our method, as in the Kalman filter, we can use the linear projection of Z
12b on its infinite past to build a Gaussian conditional likelihood. To be more specific, let us set h ¼ ða; b; c
2Þ with c
2¼ VarV
0and define, under P
h,
c
hð0Þ ¼ Var
hX
i; c
hð1Þ ¼ Cov
hðX
i; X
iþ1Þ:
Straightforward computationsshow that c
hð1Þ < 0. The L
2ðP
hÞ-projection of Z
12b on the linear space spanned by (Z
i2b; iP0) hasthe following form:
Z
12b ¼ X
iP0
a
iðhÞðZ
i2bÞ þ Uðh; Z
1Þ;
where, for all iP0,
E
hððZ
i2bÞUðh; Z
1ÞÞ ¼ 0 ð63Þ
and the one-step prediction error is
r
2ðhÞ ¼ E
hðU
2ðh; Z
1ÞÞ: ð64Þ The coefficients ða
iðhÞ; iP0Þ can be computed using c
hð0Þ and c
hð1Þ asfollows . By the canonical representation of ðX
iÞ as an MA(1)-process, we have,
X
i¼ W
iþ1bðhÞW
i;
where ðW
iÞ is a centred white noise (in the wide sense), the so-called innovation process. It is such that jbðhÞj < 1 and
r
2ðhÞ ¼ Var
hW
i:
Therefore, the spectral density f
hðkÞ satisfies
f
hðkÞ ¼ c
hð0Þ þ 2c
hð1Þ cosðkÞ ¼ r
2ðhÞð1 þ b
2ðhÞ 2bðhÞ cosðkÞÞ:
Since, for all k, f
hðkÞ > 0, we get that c
2hð0Þ 4c
2hð1Þ > 0. Now, using that c
hð1Þ < 0, the following holds
bðhÞ ¼ c
hð0Þ c
2hð0Þ 4c
2hð1Þ
1=22c
hð1Þ ; r
2ðhÞ ¼ c
hð1Þ bðhÞ : Then, setting aðhÞ ¼ exp ðaDÞ, vðhÞ ¼ 1 ½bðhÞ=aðhÞ,
a
iðhÞ ¼ aðhÞvðhÞ ð aðhÞð1 vðhÞÞ Þ
i: ð65Þ
Now, we can define
f ðh; Z
1Þ ¼ log r
2ðhÞ þ 1
r
2ðhÞ Z
12b X
iP0
a
iðhÞðZ
i2bÞ
!
2:
Easy computations using (63)–(65) yield that E
h0ðf ðh;Z
1Þ f ðh
0; Z
1ÞÞ ¼ r
2ðh
0Þ
r
2ðhÞ 1 log r
2ðh
0Þ
r
2ðhÞ þ ðb
0bÞ
2r
2ðhÞ
1 aðhÞ 1 bðhÞ
2
þ 1
r
2ðhÞ E
h0X
iP0
ða
iðh
0Þ a
iðhÞÞðZ
i2b
0Þ
!
2:
Hence, all the conditionsof Theorem 2 may be checked and h can be identified by thismethod.
5. Concluding remarks
The conditional likelihood method is classical in the field of ARCH-type models. In this paper, we have shown that it can be used for HMMs, and in particular for SV models. The approach is theoretical but enlightens the minimal assumptions needed for statistical inference. From this point of view, these assumptions do not require the existence of high-order moments for the hidden Markov process. This is consistent with financial data that usually exhibit fat tailed marginals.
In order to illustrate on an explicit example the conditional likelihood method, we revisit in full detail the Kalman filter. SV modelswith mean-reverting volatility provide another example where the method can be used.
This method may be applied to other classes of models for financial data: models including leverage effects(see, e.g. Tauchen et al. 1996); complete models with SV (see, e.g. Hobson &
Rogers, 1998; Jeantheau, 2002).
6. Acknowledgements
The authors were partially supported by the research training network ‘Statistical Methods for Dynamical Stochastics Models’ under the Improving Human Potential programme, financed by the Fifth Framework Programme of the European Commission. The authors would like to thank the referees and the associate editor for helpful comments.
References
Barndorff-Nielsen, O. E. & Shephard, N. (2001). Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics (with discussion).J. Roy. Statist. Soc. Ser. B63, 167–241.
Beran, J. (1995). Maximum likelihood estimation of the differencing parameter for invertible short and long memory autoregressive integrated moving average models.J. Roy. Statist. Soc. Ser. B57, 659–672.
Bickel, P. J. & Ritov, Y. (1996). Inference in hidden Markov modelsI: local asymptotic normality in the stationary case.Bernoulli2, 199–228.
Bickel, P. J., Ritov, Y. & Ryden, T. (1998). Asymptotic normality of the maximum likelihood estimator for general hidden Markov models.Ann. Statist.26, 1614–1635.
Del Moral, P., Jacod, J. & Protter, Ph. (2001). The Monte-Carlo method for filtering with discrete time observations.Probab. Theory Related Fields120, 346–368.
Del Moral, P. & Miclo, L. (2000). Branching and interacting particle systems approximation of Feynman- Kac formulae with applicationsto non linear filtering. In Se´minaire de probabilite´s XXXIV (eds J. Aze´ma, M. Emery, M. Ledoux & M. Yor). Lecture Notesin Mathematics, vol.1729, Springer-Verlag, Berlin, pp. 1–145.
Douc, P. & Matias, L. (2001). Asymptotics of the maximum likelihood estimator for general hidden Markov models.Bernoulli7, 381–420.
Durbin, J. & Koopman, S. J. (1998). Monte-Carlo maximum likelihood estimation for non-Gaussian state space models.Biometrika84, 669–684.
Eberlein, O., Chib, S. & Shephard, N. (2001). Likelihood inference for discretely observed non-linear diffusions.Econometrica69, 959–993.
Elie, L. & Jeantheau, T. (1995). Consistance dans les mode`leshe´te´rosce´dastiques.C.R. Acad. Sci. Paris, Se´r. I320, 1255–1258.
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation.Econometrica50, 987–1007.
Francq, C. & Roussignol, M. (1997). On white noises driven by hidden Markov chains.J. Time. Ser. Anal.
18, 553–578.
Gallant, A. R., Hsieh, D. & Tauchen, G. (1997). Estimation of stochastic volatility models with diagnostics.J. Econometrics81, 159–192.
Genon-Catalot, V., Jeantheau, T. & Lare´do, C. (1998). Limit theorems for discretely observed stochastic volatility models.Bernoulli4, 283–303.
Genon-Catalot, V., Jeantheau, T. & Lare´do, C. (1999). Parameter estimation for discretely observed stochastic volatility models.Bernoulli5, 855–872.
Genon-Catalot, V., Jeantheau, T. & Lare´do, C. (2000a). Stochastic volatility models as hidden Markov models and statistical applications.Bernoulli6, 1051–1079.
Genon-Catalot, V., Jeantheau, T. & Lare´do, C. (2000b). Consistency of conditional likelihood estimators for stochastic volatility models. Pre´publication Marne la Valle´e, 03/2000
Gloter, A. (2000). Estimation des parame`tresd’une diffusion cache´e. PhD Thesis, Universite´ de Marne- la-Valle´e.
Gourieroux, C., Monfort, A. & Renault, E. (1993). Indirect inference.J. Appl. Econometrics.8, S85–S118.
Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators.
Econometrica50, 1029–1054.
Hobson, D. G. & Rogers, L. C. G. (1998). Complete models with stochastic volatility.Math Finance8, 27–48.
Hull, J. & White, A. (1987). The pricing of options on assets with stochastic volatilities.J. Finance42, 281–300.
Jeantheau, T. (1998). Strong consistency of estimators for multivariate ARCH models.Econometric Theory14, 70–86.
Jeantheau, T. (2002). A link between complete models with stochastic volatility and ARCH models.
Finance Stoch., to appear.
6
Jensen, J. L. & Petersen, N. V. (1999). Asymptotic normality of the maximum likelihood estimator in state space models.Ann. Statist.27, 514–535.
Kim, S., Shephard, N. & Chib, S. (1998). Stochastic volatility: likelihood inference and comparison with ARCH-models.Rev. Econom. Stud.65, 361–394.
Leroux, B. G. (1992). Maximum likelihood estimation for hidden Markov models.Stochastic Process.
Appl.40, 127–143.
Lipster, R. S. & Shiryaev, A. N. (1978).Statistics of random processes II. Springer Verlag, Berlin.
Pfanzagl, J. (1969). On the measurability and consistency of minimum contrast estimates.Metrika14, 249–272.
Pitt, M. K. & Shephard, N. (1999). Filtering via simulation: auxiliary particle filters.J. Amer. Statist.
Assoc.94, 590–599.
Sørensen, M. (2000). Prediction-based estimating functions.Econom. J.3, 121–147.
Tauchen, G., Zhang, H. & Ming, L. (1996). Volume, volatility and leverage.J. Econometrics74, 177–208.
Received March 2000, in final form July 2002
Catherine Lare´do, I.N.R.A., Labratoire de Biome´trie, 78350, Jouy-en-Josas, France.
E-mail: Catherine.Laredo@jouy.inra.fr
Appendix
Thisappendix isdevoted to prove Theorem 2 due to Elie & Jeantheau (1995). Recall that we have set a
þ¼ supða; 0Þ and a
¼ infða; 0Þ, so that a ¼ a
þþ a
. The following proof holdsfor any strictly stationary and ergodic process under (C0)–(C3).
Proof. First, let us introduce the random variable defined by the equation h
n¼ arg inf
h
F
nðh; Z
nÞ:
Note that h
nis not an estimator, since it is a function of the infinite past ðZ
nÞ. The first part of the proof isdevoted to show that, under (H0)–(H2), h
nconvergesto h
0a.s.
By the continuity assumption on f, f ðh; q; Z
iÞ ismeas urable. Moreover, under (C1), E
h0ð f
ðhÞ; Z
1Þ > 1, therefore F ðh
0; hÞ iswell defined, but may be equal to þ1.
For all h 2 H and h
02 Bðh; qÞ \ H, the function h
0! fðh
0;Z
1Þ f
ðh; q; Z
1Þ isnon-neg- ative. Using (C1) and the continuity of f with respect to h, Fatou’sLemma implies lim inf
h0!hF ðh
0; h
0ÞPF ðh
0;hÞ. Therefore, F islower semicontinuousin h.
Applying the monotone convergence theorem to f
þðh; q; Z
1Þ and the Lebesgue dominated convergence theorem to f
ðh; q;Z
1Þ, we get
q!0
lim E
h0ðf ðh; q; Z
1ÞÞ ¼ E
h0ðf ðh; Z
1ÞÞ ¼ F ðh
0; hÞ: ð66Þ Let e > 0 and consider the compact set K
e¼ H \ ð Bðh
0; eÞ Þ
c. By (C2) and the lower semi- continuity of F, there exists a real g > 0 such that:
8h 2 K
e; F ðh
0; hÞ F ðh
0; h
0Þ > g: ð67Þ
Consider h 2 K
e. If F ðh
0; hÞ < þ1, using (66), there exists qðhÞ > 0 such that 0OF ðh
0; hÞ E
h0ð f
ðh; qðhÞ; Z
1Þ Þ < g=2:
Combining the above inequality with (67), we obtain
E
h0ð f ðh;qðhÞ; Z
1Þ Þ F ðh
0; h
0Þ > g=2: ð68Þ If F ðh
0; hÞ ¼ þ1, since F ðh
0; h
0Þ is finite, using (66), we can also associate qðhÞ > 0 such that (68) holds. So we cover the compact set K
ewith a finite number L of balls, say fB ð h
k; qðh
kÞ Þ; k ¼ 1; . . . ; Lg.