A NNALES DE L ’I. H. P., SECTION B
M ARTIN B AXTER
Discounted additive functionals of Markov processes
Annales de l’I. H. P., section B, tome 32, n
o5 (1996), p. 623-644
<http://www.numdam.org/item?id=AIHPB_1996__32_5_623_0>
© Gauthier-Villars, 1996, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
Discounted additive functionals of Markov processes
Martin BAXTER
Statistical Laboratory, University of
Cambridge, Cambridge
CB2 1 SB.Vol. 32, nO 5, 1996, p. 623-644 Probabilités et Statistiques
ABSTRACT. - When
taking
thelong
term time average of theoccupation
measure of a Markov process,
traditionally
the uniform time average has been used. In this paper, we derive results for moregeneral
averageshapes,
such as the
exponential
discount. We obtainlarge-deviation
results and evenstronger density asymptotics
for small rates of discount. Astrong
law anda central limit theorem are also
proved.
R6SUM6 - Les
fonctionnelles
additivesponderees
des processus de Markov. La moyenne along
terme de la mesured’occupation
d’unprocessus Markovien est traditionnellement calculee en utilisant une
moyenne uniforme. Nous etudions ici
quelques
formesplus générales
de moyennes, telles que la
ponderation exponentielle.
Nous obtenons unresultat de
grandes
deviations ainsi que lecomportement asymptotique
dela densite dans le cas d’un taux de
ponderation petit. Enfin,
nous prouvonsune loi forte et un theoreme central limite.
1. INTRODUCTION
Much
study
has been made of the time averages of random processes.Most of this effort has been directed towards the Cesaro average which
weights
timesuniformly
up to a finite horizon. In this paper, we shall deriveAnnales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques - 0246-0203
Vol. 32/96/05/$ 4.00/@ Gauthier-Villars
some results about more
general
averages. The initial motivation was theexponential
discount(we
will call this the Abelaverage),
which appearsfrequently
in the contextsof,
amongothers, systems
control and models of financial markets. Thetechniques developed, however,
extendeasily
toother
shapes
of discount.In this paper we will
push
forward two strands ofinquiry
that have beendeveloped
inproceeding
work. Thegenerality
that we will obtain shouldstrictly
encompass theexisting
material and can be read on its own,though
reference will be made to earlier papers when a
proof
isessentially
thesame as one before.
For
general
results about thelarge-deviations
behaviour of discountedoccupation
times of processes, in Section 2 we will build on Section(a)
of Baxter and Williams[2]
and on Baxter[3]. Previously
we knew that thelarge-deviation property
held for the Abel discounted average of ageneral
finite-state Markov
chain,
and we willfully
extend this tocompletely general
discounts of chains andpartially
extend further to a wide class of Markov processes. We discover thatalthough
thelarge-deviation
ratefunction of the discounted average can be written in terms of that for the
Cesaro,
the rate is often different for a different discount. We also derive results about the smoothness and finiteness of the rate function which areused in the next Section to prove a central limit theorem.
Finally
we shall gobeyond
the limitedapproximation precision
of thelarge-deviation property
andgive
anasymptotic expansion
of thedensity
of the distribution
itself, following
from thedensity
studies of Section(b)
of Baxter and Williams
[2].
Thisagain
is nowperformed
forgeneral
discounted averages of finite-state Markov chains. We also notice a
pattern
in the differentialequations
we workedwith,
andhypothesize
about their full solutions and othergeneralizations.
The main
objects
studied are the Cesaro and Abel averages of a process X. Both are random measures of unit mass on the setS,
the state space of X. The former is defined asFor any
density
m on~0, oo ) ,
that is for rrL inL i ( f~+ ) ,
the Abel averageAa
isThe two main results of the paper are:
Annales de l ’lnstitut Henri Poincaré - Probabilités
THEOREM A. -
Suppose
that X is an irreducible Markov chain on afinite state-space S,
withQ-matrix Q.
Then bothCt
andAx obey
thelarge-deviation principle
with ratefunctions
I and Krespectively.
That isthat
for
F and Grespectively
closed and open subsetsof
M :=Ml (S).
Additionally,
K is related to Iby
theequation
where I * and K* are the
Legendre transforms (convex conjugates) of
Iand
K, satisfying
THEOREM B. -
Additionally, if
m isof
boundedvariation,
then thedensity of Aa
on M under the lawstarting
X ati, f i
can be written aswhere
HK
is the Hessianof
K taken with respect toM, z(x)
is thepositive eigenvector of Q
+ and the residue termr~
goes to 1 in the sense thatTa(x)
dx convergesweakly
to dx onInt(M).
2. LARGE DEVIATIONS
We will work with
general
discountshapes
andpositive
recurrentprocesses. Let X be a stochastic process with
state-space
E and invariantVol. 32, n° 5-1996.
_
distribution 7r. In the Cesaro case we would
expect
some sort ofergodic
theorem such as
for all measurable subsets F of E. Then
Ct
takes values inlV.h (E),
thespace of
probability
measures on E. Wemight
also have alarge-deviation result,
which we can think of for the moment as theslogan
for some rate function
I,
withI( 7r)
= 0. The spaceMi(E)
and thecontinuous bounded functions on
E, Cb(E)
are induality
via thebracket,
(v, v) = IE v(x)v(dx).
A relatedslogan
is that of theLaplace
transformwhere 8 and I are related
by Legendre
transformation(convex conjugation),
in that
Our program will be to
study,
for any discountdensity
m inLi ([~+),
the average
We will show that 7r as A goes to
0,
and that thelarge-deviation principle
holds with rate K whoseLegendre transform ~
isgiven by
theequation
This is
actually
the same as theq-equation
at(1.16)
in Baxter and Williams[2] (with
discount mt =e-t)
and at Theorem C in Baxter[3] (with
discountmt =
(1 - t)’~),
but(5)
is a more natural formulation.Standard
set-up. -
Let X be anergodic Feller-Dynkin
Markov processon a
locally compact
Polish spaceE,
withgenerator
L. We define theAnnales de l ’lnstitut Henri Poincaré - Probabilités
Cesaro average
Ct
and thegeneral
averageAa by (1)
and(4) respectively.
Then
Ct
andA03BB
converge to 03C0, the invariant distribution ofX,
withrespect
to the weak
topology
on11i11(Ej,
thatis,
in the sense of(1).
A sufficientcondition for the former limit is
that,
as in 8.11.2 ofBingham et
al.[5],
~r is alimiting
distribution of the transitionsemigroup (Pt).
The latter limit follows from the formerby
a similarLi-continuity argument
to that which will be used in theproof
of Theorem 1. Deuschel and Stroock[6]
show that under anassumption
of uniformergodicity
thelarge-deviation property
holds forCt
with rate function I defined onMi (E).
That is thatfor F and G
respectively
closed and open subsets ofMl (E).
We learnfrom 4.2.17 of Deuschel and Stroock
[6]
thatwhere 8 and I are convex functions
satisfying (2)
and(3).
Further there are,by
4.2.27 and 4.2.38 of Deuschel and Stroock[6], explicit expressions
for 8 and I as
As in Baxter and Williams
[2]
we shall beparticularly
interested in the case where X is a Markov chain on a finitestate-space
S withQ-
matrix
Q.
Then8(v)
=sup{Re(z) : z
Espect(Q
+V)},
where V denotesthe
diagonal
matrixdiag(v)
andspect(.)
denotesspectrum (here
the setof
eigenvalues).
Thisexpression
for 8 also holds in thegeneral
Markovprocess
setting,
if thegenerator
L isx-symmetric.
We
begin by proving
a result whose firstpart
is similar to one remarkedby
Kifer[9]
in the context of thelarge-deviations
of the averages ofdynamical systems,
but it is the secondpart
which will be more usefulVol. 32, n° 5-1996.
in our further work. In earlier papers we derived a differential
equation by
theself-similarity
of discountshapes
such ase-t,
but it isenough
tostudy
the shifts of the discountalong
thetime-axis,
whichprovides
a usefulone-dimensional
parameterisation.
THEOREM 1. -
Suppose
that X is an FD Markov process on a spaceE,
with generator
L,
and m is anydensity
on~0, oo ),
andfor
x in E and vin
Cb(E)
the limitb(v)
=lim x A log ~x
exp ds existsuniformly
in x on E.
If
wedefine
~pby
where
Bt
is theshift
operatorBt f (s)
=f (t
+s),
thenFurther, ~p(~, t, ~, v)
is in the domainof
Land~p(x, ~, a, v)
isdifferentiable
and _
Proof of
Theorem 1. - We prove the firstpart using continuity arguments.
If we define
then sup x
~H(~, x, a) - 8(av)1 [
goes to 0 as A does. We startby proving
the
general
average limit for m of the formwhere
{(ai, bi)}
aredisjoint
intervals inI~+
and ci > 0. Setand define
~Z ( x ) .
Then for Asufficiently
small[
Euniformly
in x. ThusAnnales de l’lnstitut Henri Poincaré - Probabilités
and so we have the
right
upper bound.Similarly
we have the lower bound.Let us now define :=
La(m) .
:=03BB log E exp I03BB (zn),
andJ(m)
:=~~0 6(mtv)
dt. Then[
is bounded
m2~1 uniformly
in w, andhence
has the same bound. Because
|03B4(v1) - 03B4(v2)|
thesame bound also dominates
.I~77L2~I.
ThisLi-continuity
and(careful) application
of monotone class theorems let usgeneralize firstly
toall bounded m of
compact support
and then to all m inFor the differential
equation,
we needonly apply
theFeynman-Kac
formula to the
space-time process 5$
:=(Xt, Tt), where Tt
= To+ t,
whichhas
generator
L +dt. Taking A
= 1 forsimplicity,
for v inCb(E),
wedefine vy on
Cb(E
xR+) by
:=rrztv(x),
andWithout loss of
generality
we can assume that v isnon-negative,
becauseif
(11)
and(12)
hold for some v, thenthey
hold for all vectors of the form v +al,
where 1 is the constant vector( 1,1, ... ,1 ) .
Thisshifting identity
follows from the fact that8(v
+a1)
=8(v)
+ a, aproperty
which17 inherits. Then the
semigroup
P~’ definedby
has
generator
Lv :- as seenin,
forexample,
III.39 of Williams[12].
Then if we sett))
:=t,1, v),
which is continuous int,
we have that
_ =
(16)
Thus
I)cp
=0, implying
that ~p is in the domain of Lv and is annihilatedby
it. Theequation Lv 03C6
= 0 isexactly (12).
DWe note that in the case of X a standard Brownian motion and the
exponential
discount mt =e-t
andv(x)
=I(x
>0),
then(12)
isequation (3.5)
of Baxter and Williams[ 1 ] .
COROLLARY 2. -
Suppose
that X is an irreducible Markov chain on afinite state-space S,
withQ-matrix Q,
and m is anydensity
on~0, oo ),
andA~,
isdefined by (4),
then thelarge-deviation property analogue of (6)
holdsfor A~,
with ratefunction K,
Vol. 32, n° 5-1996.
for F
and Grespectively
closed and open subsetsof
~. The ratefunction
K relates to the r~
of (11 ) through
thefollowing equations:
where M :=
Mi(9) == {(xi)ini=1 : 03A3ixi
=1, x > 0}.
Proof of Corollary
2. - We are in the context of Theorem 1 because Xwill
satisfy
condition(U)
of 4.2.7 of Deuschel and Stroock[6],
which issufficient for the limit 6 to exist as
required by
the theorem. Thelarge-
deviation
property
and(18)
come from theorem II.2 of Ellis[7].
In hislanguage, t
is our v,Y,t
is ourc~ (~)
is andc(-)
is our
r~(~).
As r~ is defined and differentiable on the whole of it meetsEllis’
"steep" hypothesis.
From(11), r~
inherits the(strict) convexity
anddifferentiability
of6,
whichgives (19).
DWe
complete
this Section with apair
of results about thelarge-deviation
rate function K. The former of these is in the
spirit
ofProposition
D ofBaxter and Williams
[2]
and identifies thepoints
where the various suprema inLegendre
transforms(18)
and(19)
are achieved. This leads to centrallimit results and the
major
result of the next Section.PROPOSITION 3. - Under the conditions
of Corollary 2,
K isfinite,
twicedifferentiable
andstrictly
convex onInt(M),
and the supremumof (18)
isattained
uniquely (up
tomultiples of 1)
at v = and the supremumof (19)
is attaineduniquely
Proof of Proposition
3. - It is immediate from its definition that-~ 1 as - oo with v > 0. But as also
~6(v)~ ~
the Dominated
Convergence
theoremgives
us that goes to 1as well. Take x E
Int(M)
and suppose there exists a sequence of vectors(vn)
such thatWithout loss of
generality
we canreplace (vn ) by (vn - (minivn(i))1),
because +
a1)
== + a, and thus assume that the(vn)
arepositive,
with at least one zero co-ordinate. The sequence must still
get infinitely
large,
butwhich is
large
andnegative
forlarge
n,contradicting
oursupposition
ofK(x)
= oo.As remarked in
Corollary 2, r~
inherits the smoothness and the strictconvexity
on1-~
of b. Itscontinuity
means that the supremum must be attained at some finitepoint v(x),
and theconvexity gives
theuniqueness.
The
differentiability
shows that themaximizing v
will be the solution of= x. We can
expand
v around an x asv(x
+E)
=v(x)
+H;lE
to see that
Thus K is twice
differentiable, v(x)
is amultiple
of1,
and K islocally (and
henceglobally) strictly
convex.(Technical
note: we areregarding
the Hessian of ri, as anautomorphism
of11.) By
the abovex = is a solution of
(19),
and the strictconvexity
of K showsit is
unique.
DIn the
simple example
studied in Section(d)
of Baxter and Williams[2],
the rate function was calculated
exactly
asK(x)
whichis infinite on the
boundary
ofM,
whilst the Cesaro rate function I is finiteeverywhere.
Note that we can see that I is finite in thegeneral Corollary
2situation
by considering equation (9).
We shall have further remarks about thisexample
in the nextSection,
but for the moment we derive a necessaryand sufficient condition for K to be
everywhere
finite or infinite.PROPOSITION 4. - Under the conditions
of Corollary 2,
the ratefunction
K is either
everywhere finite
oreverywhere infinite
on theboundary of
.Haccording
as to whether thesupport of
the discountfunction
m isof finite
or
infinite (Lebesgue) length.
Proof of Proposition
4. -Firstly
let us defineV+
to be the space of elements of( U~ + ) n
which have at least one zerocomponent.
We note thatAV+
=V+
for anypositive A,
which is a feature we shall use later. For x inInt(M),
we take vx to be theunique
choice inV+
of the vin
Proposition
3. In fact thepair (VK, represents
ahomeomorphism
between
Int(M)
andV+.
Then x soby taking
thegradient
of(11),
we can write x asThen because
Vol. 32, nO 5-1996.
As vx
is theoptimal v
in(18),
we can expressK(.x)
asThus,
for an upperbound,
and so K is bounded on all of M if the
support
of m,supp(m),
iscompact.
The rate function I is
only
0 in M at 7r, and ~bonly
takes the value 7r inV+
at 0. Thus from(22)
we have the lower boundNow as x tends towards
9M,
theboundary
ofM,
the vector vx tends toinfinity
inY+.
So if m has unboundedsupport,
thenK (x)
tends toinfinity
as x tends to The
intuition,
of course, is that X can withpositive probability
avoidhitting
a certain state for all times in a finitelength
setbut not for all times in an infinite
length
set. D3. MORE EXACT RESULTS FOR MARKOV CHAINS Our aim is to obtain a
sharper
version of(11)
for finite Markovchains,
and then to derive more terms of the
asymptotic expansion
of thedensity
of
Aa.
The initial case studied in Baxter and Williams
[2]
was of asymmetrizable (reversible)
Markov chain and a smooth discountdensity
m. It turns out thatm need
only
be of bounded variation(see below),
but for technical ease weshall
give
theproof
first in the case where m is alsoabsolutely
continuous.More
interestingly,
thesymmetrizability
is seen now to haveonly
beenneeded to make one of the
eigenvalues
ofQ
real and itscorresponding eigenvector orthogonal
to the others. This in facthappens automatically
because every
(non-diagonal)
element ofQ
isnon-negative (we
say thatQ
isessentially non-negative).
Thefollowing
theorem collects all the facts aboutnon-negative
matrices that we will need.THEOREM 5. - Let R be an
essentially non-negative
n x n matrix. Let b be itsprincipal eigenvalue (the
one with greatest realpart).
Then b isitself real,
and its
corresponding eigenvector
isnon-negative
and no other ispositive.
Annales de l’Institut Henri Poincaré - Probabilités
If,
inaddition,
R is irreducible(in
the stochasticsense),
then b issimple,
itseigenvector
isstrictly positive
and no other isnon-negative,
and there existsa real
diagonal
matrix F withpositive
elements such that S : = 03B4I -F-1
RF has asimple eigenvalue
zero, with anorthogonal eigenprojection P,
andthat, for
somepositive
a(Where ( , ) and ]] ]]
are the standard innerproduct
and its norm onRn,
and an
orthogonal projection
P satisfiesPT - p2 - P.)
Proof of
Theorem 5. - For the firstparts
see the Perron-Frobenius theoremin,
forexample,
theorem 1.5 of Seneta[11] or
theorems 1.7.5 and 1.7.10 of Kato[8].
For the existence of F see Theorem B of Baxter[3],
which itself isadapted
from theorem 1.7.13 of Kato[8].
DWe recall that the variation of a measurable function x :
[0, oo )
~ Ron
[a, b]
is defined aswhere the supremum is taken over all
partitions:
a =to tl
...tn
= b of[a, b~ .
We say that x is of finite variation(FV)
ift)
is finite for allt,
and that x is of bounded variation(BV)
ifYx (0, ~)
:-limt~~ Yx (o, t)
is finite. An
absolutely
continuous BV function is thepartial integral
of afunction in
L 1 ( 0, oo ) . (My
thanks to James Norris forcorrecting
aprevious
mis-statement
here. )
We can now
begin by strengthening ( 11 ):
THEOREM 6. - Let X be an irreducible continuous-time Markov chain
on a
finite
setS,
withQ-matrix Q.
Let m be anon-negative absolutely
continuous
density
on~0, oo ) of
bounded variation.Then, if
X starts in statei and v is
where is as in
(11 )
andw(v)
is thepositive eigenvector of Q
+ V ando( 1 )
tends to 0locally uniformly
in v as 03BB goes to 0.Proof of
Theorem 6. - The chain has an invariant distribution ~-, but wedo not need to assume that
Q
is03C0-symmetric.
We will aim toget
a uniform bound for all v in somecompact
subsetVK
of and for a fixed m suchVol. 32, n° 5-1996.
that
Vm (0, Kv .
Since Theorem 1gives
us theasymptotic exponential
size of p, it is sensible to discount it
by
the same,by defining
Then ~
satisfies the vector differentialequation
transformed from(12)
where
R(v)
is6(v)I - (Q
+V)
which has asimple eigenvalue
at0,
and all its othereigenvalues
havepositive
realpart.
From Theorem5,
there exists a realdiagonal
matrixF(v)
withpositive elements,
such thatS(v)
:==F-1(v)R(v)F(v)
has anorthogonal eigenprojection P(v)
ontothe space
spanned by
thestrictly positive eigenvector y(v) corresponding
to the
eigenvalue
zero. Further there exists apositive a(v)
such that(23)
holds,
that iswhere ( , ) and ] ] ]
are the standard innerproduct
and its norm onIRS.
Kato[8],
orotherwise,
tells us thatR, S, F, P, y
and a are smooth in v with bounded derivatives onVK .
Let ~o : -infvEVK a(v),
which ispositive.
Wenow fix v,
although
our bounds will still beuniform,
and writeRa
forR(av),
and so on. We can choose the normalisation of Funiquely
suchthat
Fo
= and =0,
andby choosing
= 1we ensure that = 0 and ~/o =
As in Baxter
[3],
wechange
basesappropriately by defining
The differential
equation (27)
now becomeswhere
~Ia
:_ Thenby taking
the innerproduct
of(30) with x
we can
produce
a differentialinequality
in the norm of ~,where
Ki :=
the supremum taken over the range cx E and v EV K.
Whence we deducethat x
isuniformly
bounded in t and Aby KX : - exp(Ki Kv ) .
Now wesplit x
upaccording
to thedecomposition
I =
P~
C(I - Pa),
and definex- (t)
:=(I - Pr,.z(t))xt.
We differentiate~_,
using (30),
toget
Taking
the innerproduct
of this with x-itself,
we derive theinequality
where
K2
:= sup with the supremum taken over the same range asKi .
Which we canintegrate
toget
the upper boundAnd so we see that
x_ (t, ~)
tends to 0 as a tends to 0 for all finitet,
though
note that the convergence is notnecessarily
uniform in t.Finally
weconsider the
component of x
in the ymdirection, A)
:_(x(t, A),
which is
governed by
the differentialequation
obtained from(30)
where we used the fact that
PJy
=Py’
= 0. Thenwhich, by
the DominatedConvergence theorem,
tends to 0uniformly
int as A goes to 0. So
and
F(v)y(v)
=w(v),
wherew(v)
is thepositive eigenvector
ofQ
+V,
with the normalisation that
w(0)
= 1 and isorthogonal
to thepositive eigenvector
ofQT
+ In the case whereQ
is03C0-symmetrizable
= then the normalisation condition becomes =
1,
where D
The next theorem removes the restriction that m need be
continuous,
buttakes us into the technicalities of FV functions. The casual reader can pass this
by
withoutdisadvantage.
Vol. 32, n° 5-1996.
An FV function can be written as the difference of two
increasing functions,
that iswhere
And so x has
only countably
many discontinuities(though they
may even bedense),
and thus can be taken to be an R-function(right-continuous
withleft
limits).
We shall take all our functions to be R-functions. Weadapt
the calculus from the left-continuous
integrands
of V.18 ofRogers
andWilliams
[10] (changing
somesigns)
togive
the formulae(Decomposition)
(Integration by parts)
(Ito’ s formula)
where x
and y are
FV andf
isC1, Axt
is xt - and x~ and xa denote the continuous andpurely
discontinuousparts
of xrespectively.
Thereis an
expression
for xa as Asx+
and x- areincreasing they
inducepositive
a-finiteLebesgue-Stieltjes
measures on(0,00),
viax+ (a, b~
=xt.
So we can associate x with the(signed)
measureof their difference. We write
d~t
=dxt - dxt:.
We will also use thenotation |dxt|
fordxt
+dxt
= The differentialexpressions
aboveare
symbolic, being merely
shorthand forintegral expressions.
We will also use an FV
exponential
result in that if x is BV andAnother useful result follows from
integration by parts,
in thatTHEOUM 7. - Theorem 6 remains true
if m
is a discontinuousnon-negative
density of
bounded variation.Proof of
Theorem 7. - We follow theproof
of Theorem 6exactly
downto
(29), except
that we takeR, F,
S and P to be functions of t rather thanv. We write G : =
F -1, w : - Fy
and w * : -G y,
and take the normalisation that = 1( ~ (w, w*)
=1)
and(w, dw*~
= 0( 4=~ (dw,
=0).
Note that all these functions are BV. Then
(30)
becomesSo
by (39)
and(38),
for some constantKi . Hence ~xt~
isuniformly
boundedin t
by
some constantKx.
Nowusing (40)
and(39)
weagain
work with thecomponents of x orthogonal to y,
for some constant
K2.
So a result of the same form as(34)
holds.Finally
we find that
and we have a’ bound similar to that of
(36),
because(w, dw*)
= 0.Explicitly,
wt is thepositive eigenvector
ofQ + mt
Tl with the normalisation that = 1 anddwt
isorthogonal
to thepositive eigenvector
ofQT -f- mt_ Y
DWe can calculate an exact
expression
for w.LEMMA 8. - Let
y(v)
be thepositive eigenvector of Q
+ Vof
constantnorm, with = 1.
If w°
:- andwt
is thepositive eigenvector of QT
+mt V satisfying
=1,
thenFurther wt =
Wt ( v, rrt)
is continuous in v.Proof of
Lemma 8. - If we set wt = thenAn
application
of(38) gives
theexpression
for w.Elementary perturbation results,
in forexample
Kato[7],
tell us that y is smooth in v and soThus the difference can be written as
Hence
where
Kl := supKv~~y(V)~
+vx supKv ~v~ u~
and some constant
KZ.
Hence wt is(Lipschitz)
continuous in v. D THEOREM 9. - Let X be an irreducible continuous-time Markov chain on afinite
setS,
withQ-matrix Q.
Let m be anon-negative density
onof
bounded variation. Then where is thedensity of Aa
on M under thelaw
starting
X ati,
the( f ~)
can be written aswhere K is as
defined by (18), HK
denotes its Hessian taken with respect toM, z(x)
is thepositive eigenvector of Q
+ and the residue termr03BB
goes to 1 as 03BB goes to0,
in the sense thatfor
all x inInt (M), and for
F and Grespectively
closed and open bounded subsetsof 11.
Notes. -
( 1 )
We take the Hessianregarding
K as a function on an open subset of that isK(xl, ... , xn-1,1 - ~i 1 xi).
See theexample
at the end of this paper.
(2) Unfortunately
we wouldreally
like to prove the result thatfor suitable
H,
as A goes to 0. This could beproved
if theintegrand
in ourcontrol of
r~
was + rather than +Annales de l ’lnstitut Henri Poincaré - Probabilités
Proof of
Theorem 9. - Theorem 6 can be taken assaying
that for v inI~s,
is
(asymptotically)
adensity
onM,
wherewi(v)
is thewo(v,m)(i)
of Lemma
8,
and our will be IfA~
underPi
hasthe law then we can derive a central limit result
by considering Za
:=(AX -
We see that for u in~s,
using
the localuniformity
in v of the convergence ofo(1). As ~
inheritsthe smoothness of
8,
we canexpand
it about v asand hence deduce that
In other words
We can think of as the distribution of
Aa
conditioned in some way to converge to but we do not make this formal.Proposition
3provides
the
interpretation
of as themaximizing x
in theLegendre
transform.Recall that a sequence of laws
(vn)
on a Polish space E converges toa law v with
respect
to the weaktopology
onMi(E)
if(v,
-(v, v)
for all v in
Cb(E). Billingsley [4], 2.1,
shows that this isequivalent
toeach of the
following
and
Vol. 32, nO 5-1996.
Setting x =
we recall fromProposition
3 thatVK(x)
is v, up to amultiple
of 1. Theasymptotics
of thedensity
ofZX
aregiven by
because
The normal distribution
N(0; H~(v))
itself hasdensity
as
Proposition
3 tells us thatHK
= on M.By
Lemma 1.45.1 ofWilliams
[12],
if H is bounded and = 0 thenHence
by
theequivalence
of the aboveexpressions
for weak convergence, the result isproved.
DThe
following Corollary
is intended in the way of aremark,
and was theoriginal
statement of Theorem9,
but is now seen to beweaker, although perhaps
a more natural formulation.COROLLARY 10. - Under the conditions
of
Theorem9,
for
F closed inInt(M)
and G open inInt(M).
In otherwords, r~’.~ (x)
dxconverges
weakly
to dx onInt(M).
Proof of Corollary
10. - Take G open inInt(M),
b small andpositive
with
Gs : _ ~ y
E G :B(y, b )
CG) ,
and B a ball around0,
thenby
Fatou’ slemma and Fubini’ s theorem
Letting 6
tend to0,
we have one of our bounds. For some F closed inInt(M),
we needIB r~(x
+dy
to beuniformly
bounded on F andfor A near 0. It
is,
and the bound isWorking
with this F and withFa
:={y
e M :d(,y, F) b},
we canshow in a similar way that
and hence we are home. D Some remarks.
(a)
Were ther;
to beequicontinuous (or
some suchcondition)
we wouldhave that
r; (x)
-~ 1 for all x and hence thatf ~ (~) / f~ (~)
-~ zi(x) (x)
and differs from
moV’K(x) only by
amultiple
of1,
as in Section(c)
of Baxter and Williams[2],
where the choice ofVK(x)
inker(6)
was calledg(x).
(b)
Note that theproof
of Theorem 9gives
us a central limit theorem forAa
asTaking
aTaylor expansion
of 6 about 0 andintegrating
we discover thatH~ (o) _ ~2 Hs (o),
which is finite because m is in bothLi
andExample. - (This
case was first studied in Section(d)
of Baxter and Williams[2].) Suppose
we have a Markov chain which issymmetric
and
space-homogeneous,
withQ-matrix
7rj - where 7r is a distribution on a finite set S. The Cesarolarge-deviation
rate function isI(x)
= 1 -(~
and theexponentially
discountedlarge-deviation
rate is
I~(x) _ ~
~i We found then thatb(v)
is theunique
root 8 in
( maxi ( vi - 1 ) , oo )
ofVol. 32, n° 5-1996.
and
that 1J
isgiven by
=6(v) - ~i
~rilog(6(v)
+ 1 -vi).
We findnow that
Here we chose and to be in the kernel of 8. The distribution of
Aa
can be calculatedexplicitly
to be a multidimensional~3-distribution
with
density
Note that the Hessian of K on M is not the same as that derived from the extension of K to
R~,
butby using
any of thefollowing
co-ordinate schemes:where
or
where
What is
happening
here is that our choice of basis forevaluating
the Hessiancorresponds
to our choice of basis forintegrating
which was made back atthe start of Section
(b)
of Baxter and Williams[2].
TheKo representation projects
onto M and adds astrictly
convex term which isperpendicular
toM. This
representation
is morenatural, though
cumbersome to calculatewith,
and can be shownequivalent
to any of the othersby verifying
thatthe
change
of basis matrix has determinant one. Thus the Hessian(in
theKn realisation)
and its determinant aregiven by
The normalisation of the
eigenvector
is that =1,
so it isgiven by
(The corresponding
vector for the Cesaro case is Wecan now calculate the residual functions
using Stirling’s
formulawhere
K/x
as x - oo for some constant K. It is thus discovered that the residual functionsr; (x)
can be calculated and arefound to be
independent
of both i and x, and are of size 1 +0(A).
Hypothesis
11. - We recall from Theorem C of Baxter and Williams[2]
that in the
set-up
of Theorem 9 with theexponential
discount(mt
=e-t),
the
density f ~
satisfies the vector differentialequation
where £ is the matrix differential
operator
£ =diag ( ~~
-Here we have
changed
the domain off ~‘
from a subset ofequivalent
to
M,
to aneighbourhood
of M in Rnby
extension. Theoperator
£ is invariant to the extension chosen. If we discountf ~‘ by
the knownlarge-deviation
rate functionK,
that isby defining ga by
then
This compares with
equation (27)
which said thatwhere ~
is the discount of cp as definedby (26).
The matrixR(v)
hasa
simple eigenvalue
0 and all othereigenvalues
havepositive
realpart.
We saw
that ~
tended to amultiple
of the0-eigenvector
ofR(mtv)
as Awent to
0,
and also thatg~
tended(in
somesense)
toz(x),
which was the0-eigenvector
of We can formulate ananalogue
of(52)
for thegeneral
discount case as follows.Vol. 32, n° 5-1996.
Let us write
Ax,t
for03BBM-1t~~003B8tm(03BBs)03B4xs ds,
whereMt
:=ds, and f03BB,ti
for thedensity
ofAx,t
if X starts in state z . Thenwill
satisfy
thelarge-deviation property
with rate functionKt,
whereand we write
f ~‘ ~ t
asThen
Again
Theorem 9 tells us thatg~‘~t
tends(in
somesense)
to the0-eigenvector,
z, of the matrix R. We
hypothesize
that the convergence is in factpointwise.
REFERENCES
[1] M. W. BAXTER and D. WILLIAMS, Symmetry characterizations of certain distributions, 1, Math. Proc. Cambridge Philos. Soc., Vol. 111, 1992, pp. 387-397.
[2] M. W. BAXTER and D. WILLIAMS, Symmetry characterizations of certain distributions, 2:
Discounted additive functionals and large deviations, Math. Proc. Cambridge Philos.
Soc., Vol. 112, 1992, pp. 599-611.
[3] M. W. BAXTER, Symmetry characterizations of certain distributions, 3: Discounted additive functionals and large deviations for a general finite-state Markov chain, Math. Proc.
Cambridge Philos. Soc., Vol. 113, 1993, pp. 381-386.
[4] P. BILLINGSLEY, Convergence of Probability Measures, Wiley, 1968.
[5] N. H. BINGHAM, C. M. GOLDIE and J. L. TEUGELS, Regular Variation, Encyclopedia of
Mathematics and its Applications, Vol. 27, Cambridge University Press, 1987.
[6] J.-D. DEUSCHEL and D. W. STROOCK, Large deviations, Academic Press, 1989.
[7] R. S. ELLIS, Large deviations for a general class of random vectors, Ann. Probab., Vol. 12, 1984, pp. 1-12.
[8] T. KATO, A Short Introduction to Perturbation Theory for Linear Operators, Springer- Verlag, 1982.
[9] Y. KIFER, Averaging in Dynamical Systems and Large Deviations, Inventiones Math., Vol. 110, 1992, pp. 337-370.
[10] L. C. G. ROGERS and D. WILLIAMS, Diffusions, Markov Processes and Martingales, Vol. 2:
Itô calculus, Wiley, 1987.
[11] E. SENETA, Non-negative Matrices, an Introduction to Theory and Applications, Allen and Unwin, 1973.
[12] D. WILLIAMS, Diffusions, Markov Processes and Martingales, Vol. 1: Foundations, Wiley,
1979.
(Manuscript accepted November 7, 1995. )