HAL Id: hal-00940907
https://hal-enac.archives-ouvertes.fr/hal-00940907
Submitted on 1 Apr 2014
HAL
is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire
HAL, estdestinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Pascal Lezaud
To cite this version:
Pascal Lezaud. Chernoff-type bound for finite Markov chains. Annals of Applied Probability, Institute
of Mathematical Statistics (IMS), 1998, 8 (3), pp 849-867. �hal-00940907�
CHERNOFF-TYPE BOUND FOR FINITE MARKOV CHAINS
By Pascal Lezaud
CENA and Universit´e Paul Sabatier
This paper develops bounds on the distribution function of the empiri- cal mean for irreducible finite-state Markov chains. One approach, explored by Gillman, reduces this problem to bounding the largest eigenvalue of a perturbation of the transition matrix for the Markov chain. By using esti- mates on eigenvalues given in Kato’s bookPerturbation Theory for Linear Operators, we simplify the proof of Gillman and extend it to nonreversible finite-state Markov chains and continuous time. We also set out another method, directly applicable to some general ergodic Markov kernels having a spectral gap.
1. Introduction. LetXnbe an irreducible Markov chain on a finite set Gwith transition matrixPand stationary distributionπ:Then, for any func- tion f and any initial distribution q, the weak law of large numbers states that for almost every trajectory of the Markov chain, the empirical mean n−1Pn
i=1fXi converges to πf = P
yπyfy: This result is the basis of the Markov chain simulation method to evaluate the mean of the function f: In this paper, we will quantify this rate of convergence by studying the probability
Pq
n−1
n
X
i=1
fXi −πf≥γ
;
wherePq denotes the probability measure of the chain with the initial distri- bution q; and the size of deviation, γ; is a small number, such as πf/10 or πf/100:
The corresponding rate of convergence for sums of independent random variables has been given by Kolmogorov (1929), Cram´er (1938), Chernoff (1952), Bahadur and Ranga Rao (1960) and Bennett (1962), but the sam- ples in Markov chains are generally correlated with each other, even if this correlation decreases exponentially with the number of steps. In that case, the theory of large deviations gives an asymptotic rate of convergence, since the empirical mean satisfies the large deviation principle with the good rate functionIz =supr∈Rrz−logβ0r;where β0ris the largest eigenvalue (i.e., Perron–Frobenius eigenvalue) of a perturbation of the transition matrix P[see Dembo and Zeitouni (1993)]. This asymptotic result is not satisfactory if one wants to achieve bounds that are useful for fixed n: Using perturba- tion theory, Gillman (1993) estimated the rate of convergence for reversible finite-state Markov chains by bounding the eigenvalueβ0r:He got a bound in terms of the spectral gap of the original transition matrix P; defined
Received December 1996; revised June 1997.
AMS1991subject classification. 60F10.
Key words and phrases. Markov chain, Chernoff bound, eigenvalues, perturbation theory.
849
byεP =1−β1P;whereβ1Pdenotes the second largest eigenvalue ofP:
Using a technique adapted from Rellich, Dinwoodie (1995) improved the bound obtained by Gillman, but only for γsmall. Improvements of such bounds are motivated by their wide use in simulation.
The aim of this work, based on Kato’s perturbation theory, is to obtain a bound which depends on the variance off and which is applicable for allγ:
More precisely, this bound provides a Gaussian behavior for the small values of γ and a Poissonian behavior for the large values. Moreover, the result we achieved can also be easily extended to nonreversible Markov chains and con- tinuous time. For instance, we get in the reversible case the following theorem, proved in Section 3.1, where · 2denotes the l2π-norm.
Theorem 1.1. LetP; πbe an irreducible and reversible Markov chain on a finite setG:LetfxG→Rbe such thatπf=0,f∞≤1and0<f22≤b2: Then, for any initial distributionq;any positive integernand all 0< γ≤1;
Pq
n−1
n
X
i=1
fXi ≥γ
≤eεP/5Nqexp
− nγ2εP
4b21+h5γ/b2
; (1)
whereNq= q/π2and
hx = 12 p
1+x− 1−x/2
: Therefore, ifγ≪b2andεP ≪1;the bound is
1+o1Nqexp
− nγ2εP
4b21+o1
:
Remark 1. For γ > 1; the probability of deviation is zero; thus, we can assume that γ ≤ 1: Moreover, applying Theorem 1.1 to −f gives the same bound on Pqn−1Pn
i=1fXi ≤ −γ: Thus, inequality (1) implies bounds on all moments of the positive random variable n−1Pn
i fXi; and quantifies convergence in eachlpnorm, 1≤p <∞:
Remark 2. hxis an increasing function of x ≥ 0 such thathx ≤ x/2 for allx:Thus, forγ≤2b2/5, we get
Pq
n−1
n
X
i=1
fXi ≥γ
≤eεP/5Nqexp
−nγ2εP
4b2
1− 5γ 2b2
; whereas forγ >2b2/5;
Pq
n−1
n
X
i=1
fXi ≥γ
≤eεP/5Nqexp
−nγεP
10
1−2b2 5γ
:
Remark 3. Obviously, we can takeb2=1 in inequality (1), which gives Pq
n−1
n
X
i=1
fXi ≥γ
≤eεP/5Nqexp−nγ2εP/12:
(2)
The constantNqis thel2π-norm of the density ofqrelated to the station- ary distribution π: This norm can be large but is always bounded by π∗−1/2, whereπ∗ =minπyxy∈G:However, it is worth noting thatNq=1 when the initial distribution is the stationary one. This fact will be used to im- prove the convergence, by splitting the simulation in two steps, as suggested in Aldous (1987). The first step, the initial transient period, may be defined as the period of time which must be discarded before the chain comes close toπ;
whereas the second step, the observation period, may be defined as the period of time where observations must be collected to reach the desired precision.
A short description of the paper is as follows. Section 2 of this paper sets out preliminaries on the perturbation theory of linear operators. The results taken from Kato (1966) are based on a function-theoretical study of the resolvent, in particular on the expression of the eigenprojections as contour integrals of the resolvent. However, the specific results needed here are easier than in Kato (1966), since we restrict our study to the case of operators having an eigenvalue whose algebraic multiplicity is 1.
Section 3.1 proves Theorem 1.1. The bound is extended to nonreversible Markov chains in Section 3.2, by considering the multiplicative symmetriza- tion of the operator P;namely, K = P∗P; where P∗ is the adjoint of P on l2π:Explicitly,
P∗x; y =πyPy; x/πx:
Dinwoodie (1995) also obtained a bound for nonreversible Markov chain, but his bound depended on the random covering time Tdefined by T=infn≥ 1x X0; X1; : : : ; Xn =G:Bounding this covering time can be difficult.
Section 3.3 shows how the previous bounds extend to continuous-time Markov chains, by employing a method suggested by Fill (1991). We thus get an exponential bound in terms of the smallest nonzero eigenvalue of
−3+3∗/2; where3is the generator of the semi-groupPt:
Perturbation theory of linear operators is a fruitful method to achieve ex- plicit bounds on finite Markov chains. Mann (1996) used it to obtain a Berry–
Esseen bound for Markov chains with finite or countably infinite state space, such that
βx=sup
Pg2
g2
<1;
(3)
where the sup is over all complex-valued functions on the state spaceGwith πg=0:Nagaev (1957, 1961) used also the perturbation method for studying some limit theorems for Markov chains under the Doeblin condition. Following the method used in a setting of the finite-state space, we recently have been able to achieve a Chernoff-type bound for general Markov operatorP under the assumptions that 1 is an isolated eigenvalue ofPand the dimension of the eigenspace for the eigenvalue 1 is finite. In particular, these assumptions are fulfilled for Markov chains satisfying the Doeblin condition or the relation (3).
We will publish this result in a forthcoming paper. Instead of presenting these results, we would rather set out another method, directly applicable to a gen-
eral irreducible Markov kernel assuming condition (3). Section 4 introduces this method, due to Bakry and Ledoux, which gives the following inequality forγ≤b2:
Pπ
n−1
n
X
i=1
fXi −πf≥γ
≤exp
−nγ21−β
21b2
:
Note that this last inequality involves β and not εP: For instance, if the state space is finite and the Markov chain is periodic, then β=1:Thus, this method is not applicable in this case, unlike the perturbation method.
Finally, Section 5 shows how the running time of simulation can be im- proved by considering an initial burning period whose effect is to decreaseNq: 2. Perturbation theory of linear operators. LetTbe a linear operator on some finite-dimensional vector spaceX: We denote the spectrum ofT by 6T:The resolvent ofT, defined byRζ = T−ζ−1, is an analytic operator- valued function on the domainC/ \6T;called the resolvent set. Furthermore, the only singularities ofRζare the eigenvaluesλh,h=1; : : : ; s;ofT:
The Laurent series of the resolvent at a simple eigenvalue λh takes the form [Kato (1966), page 38]
Rζ = −ζ−λh−1Ph+
∞
X
n=0
ζ−λhnSn+1h ;
wherePhis the eigenprojection operator for the eigenvalueλhandShis called the reduced resolvent ofTwith respect to the eigenvalueλh: More precisely, Sh is the inverse of the restriction of T−λh in the subspaceI−PhX:We deduce thatPh is the residue of−Rζinλh;and
Ph= − 1 2πi
Z
ŴhRζdζ;
whereŴhis a positively oriented small circle enclosingλh;but excluding other eigenvalues ofT:
Consider a family of operator-valued functions with the form Tχ =T+χT1+χ2T2+ · · · :
Then the resolvent Rζ; χ = Tχ −ζ−1 of Tχ is analytic in the two variablesζ; χin each domain in whichζ is not equal to any of the eigenvalues ofTχ[Kato (1966), page 66]. So it can be expanded into the following power series inχwith coefficients depending onζ:
Rζ; χ =Rζ +
∞
X
n=1
χnRnζ;
(4)
where eachRn is an operator-valued function. This series is uniformly con- vergent if
∞
X
n=1
χnTnRζ<1:
(5)
Letrζbe the value ofχsuch that the left member of (5) is equal to 1:Then (5) is satisfied forχ< rζ:Letλbe one of the eigenvalues ofT=T0with multiplicitym=1;andŴbe a positively oriented circle, in the resolvent set of T;enclosingλbut no other eigenvalues ofT:The series (4) is then uniformly convergent forζ∈Ŵif
χ< r0=min
ζ∈Ŵ rζ:
(6)
In the special case in which X is a Hilbert space and T is normal (i.e., T∗T=TT∗), we get
Rζ =1/distζ; 6T;
r0=min
ζ∈Ŵ
a
distζ; 6T+c −1
for every ζ in the resolvent set of T; where a = T1 and c is such that
Tn ≤acn−1: If we choose asŴthe circleζ−λ =d/2, where d= min
µ∈6T\λλ−µ;
we obtainr0= 2ad−1+c−1:
The existence of the resolventRζ; χ for ζ ∈ Ŵimplies that there are no eigenvalues ofTχonŴ: The operator
Pχ = − 1 2πi
Z
ŴRζ; χdζ (7)
is the eigenprojection for all the eigenvalues ofTχlying insideŴ: In partic- ular [Kato (1966), page 68], for allχsufficiently small, we have
dimPχX=dimPX=1:
Therefore, only the eigenvalue λχ of Tχ lies inside Ŵ; and Pχ is the eigenprojection for this eigenvalue.
As the only eigenvalues ofTχPχare 0 andλχ; we will consider λχ −λ=trTχ −λPχ:
Combining (7) and substitution for Rζ; χ from (4) give the Taylor series expansion [Kato (1966), page 79]
λχ −λ= − 1 2πitrZ
Ŵζ−λRζ; χdζ =
∞
X
n=1
χnλn; where
λn =
n
X
p=1
−1p p
X
ν1+···+νp=n k1+···+kp=p−1
νi≥1; kj≥0
trTν1Sk1· · ·TνpSkp;
(8)
withS0= −P, Sn=Sn:HereSis the reduced resolvent ofTwith respect to the eigenvalueλ:
3. Chernoff-type bound for Markov chains. In this section, we con- sider an irreducible finite Markov chain with transition matrix P and sta- tionary distribution π: This transition matrix defines an operator acting on functions by
Pfx = X
y∈G
Px; yfy:
We are interested in bounding the rate of convergence of Pqn−1tn≥γ; wheretn=
n
X
i=1
fXi −πf:
If the Markov chain is reversible, the operator P is self-adjoint on l2π
endowed with the inner product
f; g = X
x∈G
fxgxπx:
This operator P has largest eigenvalue 1 and because of irreducibility, the constant functions are the only eigenfunctions with eigenvalue 1:These eigen- values are denoted by
β0P =1> β1P ≥β2P ≥ · · · ≥βG−1P ≥ −1;
and the spectral gap by εP =1−β1P:
IfP; π is not reversible, we will consider the multiplicative symmetriza- tionK=P∗PofP:This operatorKis a reversible Markov kernel with same stationary distributionπ:We will assume thatKis irreducible. For instance, this will be the case as soon asPx; x>0 for everyx∈G:
Since we will always work with a self-adjoint operator onl2π, every norm considered will be the l2π-norm. Moreover, except for constant functions without interest, replacing fby f−πf/f−πf∞ shows that there is no loss of generality in assuming
πf=0; f∞≤1; f22≤b2≤1:
3.1. Reversible Markov chains. This section proves Theorem 1.1 stated in the Introduction, by using the method of Gillman (1993). We first prove the following lemma.
Lemma 3.1. Referring to the setting of Theorem 1.1, letr >0:Then, for any positive integern,
Pqtn/n≥γ ≤e−rnγqTPrn1;
wherePr = erfyPx; y:
Proof. By Markov’s inequality,
Pqtn≥nγ ≤e−rnγEqexprtn;
where Eq denotes the expectation given the initial distribution q: Observe that
Eqexprtn = X
x0; x1;:::;xn
exprtnqx0
n
Y
i=1
Pxi−1; xi;
where the summation is evaluated over all possible trajectoriesx0; x1; : : : ; xn: By introducing the operatorPr = erfyPx; y;we obtain the expression
Eqexprtn =qTPrn1:
Lemma 3.2. Letr >0 andNq = q/π2:Then qTPrn1≤erNqβn0r;
whereβ0Pris the largest eigenvalue ofPr.
Proof. Let us recall that Pis a self-adjoint operator in l2π;and intro- duce the diagonal matrixEr=diagerfy; so thatPr =PEr:ThenPr = pE−1r p
ErPp Erp
Eris similar to the self-adjoint matrix Sr =p ErPp
Er; so its eigenvalues are real values forr≥0:In this case, the Perron–Frobenius theorem says thatSr2→2=β0Pr;whereβ0Pris the largest eigen- value ofPr: Applying the Cauchy–Schwarz inequality inl2πgives
qTPrn1= q
π; Prn1
π
≤Nq12Er/22→2E−r/22→2Srn2→2
≤erNqβn0Pr:
Thus, Lemma 3.2 gives the inequality
Pqtn/n≥γ ≤erNqexp−nrγ−logβ0Pr:
(9)
LetD=diagfx; then we can write Pr =P+
∞
X
i=1
ri i!PDi:
Since P ≤ Pr ≤ erP; we get 1 ≤ β0Pr ≤ er [see Kato (1966), Theo- rem I-6.44]. It follows that β0Pr is the perturbation of the eigenvalue 1 of P: The irreducibility of P implies that 1 is a simple eigenvalue and the eigenprojection for the eigenvalue 1 is the operatorπ defined by
πfx x=X
x
fxπx:
Thus (8) can be used to obtain the coefficients βn in the Taylor series of β0Pr −1;so long asr < r0, wherer0is the convergence radius defined by
(6). We obtain βn=
n
X
p=1
−1p p
X
ν1+···+νp=n k1+···+kp=p−1
νi≥1; kj≥0
1
ν1!: : : νp!fν1; Sk1PDν2· · ·Skp−1Pfνp;
(10)
where we used the following relation whatever the matrixM:
trπPDν1MDνp =trπDν1MDνp = fν1; Mfνpπ: For instance,
β1= f;1π =0; β2= f;−Sfπ−12f; fπ:
An explicit calculation gives β2 = limn→∞Eπt2n/n; which is precisely the asymptotic variance oftn:
The number of terms in the formula (10) is Cn x=
n
X
p=1
n−1 p−1
2p−1
p−1 1
p: Furthermore, as1/n! ≤21−nand 2kk
<22kkπ−1/2, we obtain the following inequalities forn≥3:
Cn<1+π−1/2
n−1
X
p=1
n−1 p
1 p+14p≤
25 4n
5n−2:
Thus, forn≥7;we getCn ≤5n−2;but direct computations show this bound is also valid forn≥3:Finally, the Cauchy–Schwarz inequality gives
βn≤ b2/55/εPn−1; n≥2:
Hence, provided thatr < εP/5;we obtain β0Pr ≤1+ b2
εPr2
1− 5r εP
−1
: (11)
Combining inequalities (9) with (11) and log1+x ≤xgives, forr < εP/5;
Pqtn/n≥γ ≤eεP/5Nqexp−nQr;
whereQr =γr− b2/εP1−5r/εP−1r2is maximized when
r= γεP
b21+5γ/b2+p
1+5γ/b2:
We can check that r < εP/5; and simple computations yield inequality (1). ✷
3.2. Nonreversible Markov chains. This section extends the previous bound to nonreversible Markov chains. We now consider the multiplicative sym- metrization of P; namely, K = P∗P: Set Kr = P∗rPr and denote its largest eigenvalue by β0r: As in the previous section, Markov’s inequality gives
Pqtn/n≥γ ≤e−rnγqTPrn1
≤e−rnγNq12Prn2→2
=e−rnγNqPr∗nPrn1/22→2
=e−rnγNqq
β0; nr;
where β0; nr denotes the largest eigenvalue of Pr∗nPrn: Result anal- ogous to Lemma 3.2 requires the use of Marcus’ Theorem [see Marshall and Olkin (1979), Theorem 9.H.2.a]. This theorem states that β0; nr ≤ β0rn; whence the following inequality:
Pqtn/n≥γ ≤Nqβn/20 re−rnγ:
Here again, the operatorKr will be considered as a perturbation of K;
since we have
Kr =K+
∞
X
i=1
ri i
X
j=0
1
j!i−j!DjKDi−j
:
In the remainder of this section, we assume thatKis irreducible. From here, the arguments are exactly the same as the ones used in the proof of Theo- rem 1.1. The coefficientsβn in the Taylor series ofβ0r −1 are bounded by
βn≤2b22/εKn−1Cn ≤ 2/5b210/εKn−1: Therefore, provided thatr < εK/10;we obtain
β0r ≤1+ 4b2 εKr2
1− 10r εK
−1
y hence,
Pqtn/n≥γ ≤Nqexp
−n
rγ− 4b2
εKr21− 10r/εK−1
: Optimizing inr < εK/10;we may therefore state the following.
Theorem 3.3. Let P; π be an irreducible Markov chain on a finite set G; such that its multiplicative symmetrization K = P∗P is irreducible. Let fx G→ Rbe such that πf=0, f∞ ≤ 1 and 0< f22 ≤ b2: Then, for any initial distributionq;any positive integernand all 0< γ≤1;
Pqtn/n≥γ ≤Nqexp
− nγ2εK
8b21+h5γ/b2
;
whereεKis the spectral gap ofKand hx = 12 p
1+x− 1−x/2
:
Obviously, with some modifications, the remarks stated in the Introduction are also valid.
3.3. Continuous-time Markov chain. In this section, we consider an irre- ducible continuous-time Markov chain on a finite-state space. Ifπdenotes its unique stationary distribution, the weak law of large numbers states that, for any functionsf;
P 1
t Zt
0
fXsds−πf≥γ
→0 ast→ ∞:
We can express the previous integral as the following limit:
1 t
Zt
0 fXsds= lim
k→∞k−1
k
X
i=1
fXit/k:
Fixk and assume thatπf=0; f∞ ≤1 and 0<f22 ≤b2: The Markov kernelPt/ksatisfies
Pt/kx; y>0 ∀t >0 ∀x; y ∈G2:
It follows that the Markov kernelKt/k =P∗t/kPt/kis irreducible. Let β1t/kbe the second largest eigenvalue ofKt/kand denote its spectral gap byεt/k =1−β1t/k:Hence, for all 0≤γ≤1;Theorem 3.3 implies
Pq
k−1
k
X
i=1
fXit/k ≥γ
≤Nqexp
− kγ2εt/k
8b21+h5γ/b2
: (12)
If 3 denotes the generator of the semigroup Pt; the generator of P∗t is 3∗:So, we deduce that, for k→ ∞; Kt/k =I+ t/k3+3∗ +o1/k2and εt/k =2t/kλ1+o1/k2;whereλ1denotes the smallest positive eigenvalue of −3+3∗/2: Now applying Fatou’s lemma for k → ∞ to inequality (12), gives the following theorem.
Theorem 3.4. LetPt; πbe an irreducible continuous-time Markov chain on a finite setG;and3its infinitesimal generator. LetfxG→Rbe such that πf=0,f∞≤1and0<f22≤b2:Then, for any initial distributionq; all t >0and all 0< γ≤1;
Pq 1
t Zt
0
fXsds≥γ
≤Nqexp
− γ2λ1t 4b21+h5γ/b2
; whereλ1is the smallest positive eigenvalue of −3+3∗/2and
hx = 12 p
1+x− 1−x/2
:
Finally, boundingb2by 1 gives the boundNqexp−γ2λ1t/12;which is valid for allγ≤1:
Remark. Applying Lemma 3.1 with the transition matrixPt/kand us- ing the following formula valid for all matricesA; B:
k→∞limexpA/kexpB/kk=expA+B;
give
Pq 1
t Zt
0 fXsds≥γ
≤e−rtγNqexp3rt2→2;
where 3r = 3+rdiagfx: Thus, we get a linear perturbation of the infinitesimal generator 3:Moreover, exp3rt2→2≤exp−λ0rt; where λ0r is the smallest eigenvalue of −3+3∗/2−rD: By using the Trotter product formula [see Trotter (1959)], this method can be extended to general state space ergodic Markov processes.
4. Direct method. This section develops another method to obtain upper bound on the distribution function of the empirical measure. This method does not require the perturbation theory of linear operators and works easily in a more general setting.
From now on, we consider an ergodic Markov kernelPx; dydefined on a general state spaceG;with the stationary distributionπ:This kernel defines an operatorPacting onL2πby
Pgx =Z
GgyPx; dy:
Assume now that the condition (3) is fulfilled by P: Our goal is to obtain an upper bound for EπexprPn
i=1fXi; where 0 ≤ r and f is such that
f22≤b2, πf=0 andf∞≤1: Writeg=exprf; then Eπexp
r
n
X
i=1
fXi
=Z
Qngdπ;
where, for allh∈L2π,
Q1hx =Z
Px; dyhy;
Qn+1=Z
Px; dygyQnhy:
The principle of the proof, due to Bakry and Ledoux, consists of centering each successive term on which the operatorPacts. Introduce some notation:
a0=1; ai=Eπexp
r
i
X
j=1
fXj
; 1≤i≤n;
g1=g−Z
g dπ; gi=gPgi−1−Z
gPgi−1dπ; 2≤i≤n;
b1=Z
g dπ; bi=Z
gPgi−1dπ=Z
g1Pgi−1dπ; 2≤i≤ny
then
an=an−1b1+Z
Qng1dπ
=an−1b1+an−2b2+Z
Qn−1g2dπ :::
=
n
X
i=1
bian−i:
Now, each bi can be bounded by using (3), since the operator P acts on the centered functionsgi−1:Actually, letα=βer,a=b2er/2; then
b1=Z
exprfdπ=1+r2
2Eπf2+r3
3!Eπf3+ · · · ≤1+ar2; bi ≤βαi−2g122; 2≤i < n;
where we used the inequalitygi2 ≤αi−1g12, 1≤i≤ n:Furthermore, by Jensen’s inequality, we get R
exprfdπ≥1, so
g122=Z
exp2rfdπ− Z
exprfdπ 2
≤1+2r2e2rEπf2−1≤4ar2er:
Chooser <1−β;soα <1, since logβ≤ −1−β:Now, as induction hypothesis, suppose thatak ≤φkr;whereφr =1+Cr2 andC≥4ais independent of k:As
a1≤ g2≤ 1+4ar21/2≤ 1+2ar2;
the hypothesis is true fork=1 and the induction hypothesis gives ak ≤
1+ar2φk−1r +4ar2αk−1
k−2
X
i=0
φr/αi
≤
1+ar2φk−1r +4ar2αφk−1r
φr −α
≤φk−1r1+4ar21−α−1:
Choosing C= 4a1−α−1; it follows that the induction hypothesis is valid.
Hence, for every 0≤γ <4b2 andr=γ1−β/4b2;we obtain, by Markov’s inequality,
Pπtn/n≥γ ≤exp−nrγ−2b2r2er1−βer−1
=exp
−n1−βγ2 8b2
2−1−βexpγ1−β/4b2 1−βexpγ1−β/4b2
:
Assume now thatγ≤2b2:Asex≤1+4x/3;forx≤1/2;we get the following:
Pπtn/n≥γ ≤exp
−n1−βγ2 8b2
1− γ
2b2
: (13)
This method is directly applicable to general Markov chains and gives the same type of asymptotic bound as the one obtained by the perturbation method for finite nonreversible Markov chains. Observe, however, that for finite re- versible Markov chains the bound (13) involves the constantβ;instead of the second largest eigenvalue of the kernelP:
Inequality (13) can be extended to continuous time by assuming thatf is continuous and bounded and
supPtg2/g2; πg=0 ≤e−tλ ∀t≥0 andλ >0:
(14)
Arguing as in the proof of Theorem 3.4, we get that, for everyγ≤2b2; Pπ
1 t
Zt
0
fXsds≥γ
≤exp
−γ2tλ 8b2
1− γ
2b2
:
An exponential inequality applicable for all 0 ≤ γ ≤ 1 can be obtained by following the proof of the Prokhorov’s inequality for independent random variables given in Stout (1974). We obtain the following inequality, when qπ,Eπf=0, f∞≤aandf22≤b2:
Pπtn/n≥γ ≤exp
− γ
2aarcsinh
nγa1−β
4b2
:
Proof. For everyx≥0,ex≥ex:Therefore,Eπexprtn ≤expEπexprtn−
1:LetQx =Pπtn≤xy we get Eπexprtn ≤exp
Z
exprx −1dQx
: Moreover,Qassigns mass 1 to the interval−na; naand
Z x dQx =0; Z
x2dQx =Eπt2n ≤2nb21−β−1: Thus,
Eπexprtn ≤exp Z
exprx −1−rxdQx
≤exp
2Z
coshrx −1dQx
: As the functionhx =coshrx −1 is even and convex, we obtain
Z
hxdQx ≤Z
hxdQ1x;
where Q1 assigns mass only to 0 and to na; with as much mass as possible assigned tonawithout violating
Z x2dQ1x ≤2nb21−β−1y thus,Q1na ≤ 2b2n1−βa2−1:This yields
Eπexprtn ≤exp
4b2
na21−βcoshrna −1
: Hence, for any r≥0;
Pπtn/n≥γ ≤exp
−nγr+ 4b2
na21−βcoshrna −1
: Differentiation shows that
r= na−1arcsinhnaγ1−β4b2−1 x= na−1θ
minimizes the right-hand side of the above inequality. It suffices to show coshθ−1≤ θsinhθ/2:But the above inequality can easily be deduced from
∀x∈R; ex2−x +e−x2+x ≤4:
Another proof, due to Ledoux, can be developed for continuous-time Markov chain under the weaker assumption that the functionfis bounded, measur- able and hypothesis (14) is fulfilled. LetYt=Rt
0fXsds; then, for all k≥1, Ykt =k!Zt
0
ds1Zt
s1ds2· · ·Zt
sk−1fXs1fXs2 · · ·fXskdsk; EπYkt =k!Zt
0
du1Zt−u1
0
du2· · · Zt−u1+···+uk−1
0
dukZ
fPu2f· · · fPukf · · ·dπ:
Let a2;:::;k= R
fPu2f· · · fPukf · · ·dπ; then, using (14) and the centering method gives the inequality
a2;:::;k≤ f22
exp−u2+ · · · +ukλ +a2exp−u4+ · · · +ukλ
+ · · · +a2;:::;k−2exp−ukλ
: An iteration of this inequality shows that a2;:::;k is bounded by a sum of at most 2k−1 of such terms:b2s+1exp−m2+ · · · +mkλ; wheremi =0 orui andsis the number ofmi 6=ui with 0≤s≤ k/2 −1:This implies
EπexprYt ≤1+
∞
X
k=2
rk2k−1
k/2
X
s=1
b2ts s! λs−k
≤1+ 2−4r/λ−1
∞
X
s=1
4b2r2t/λs s! ;
so long as 2r/λ <1:For 2r/λ≤1/2;this yieldsEπexprYt ≤exp4b2tr2λ−1:
Optimizing in r≥ λ/4 together with Markov’s inequality, it follows that, for every 0≤γ≤2b2;
Pπ 1
t Zt
0 fXsds≥γ
≤exp
−γ2tλ 16b2
:
5. Some reflections on simulation. This last section shows how Theo- rem 1.1 can be combined with quantification of the closeness to stationarity, in order to compute the simulation time required to obtain a sufficiently ac- curate estimate. More precisely, we will consider the algorithm suggested by Aldous (1987).
The constantNq. As it has already been said before, the coefficientNq=
q/π2may be large but always bounded byπ∗−1/2;whereπ∗=minπyxy∈ G: Furthermore,Nq =1 when q = π: This fact suggests that it might be better to begin the computation oftnonly when the distribution of the Markov chain is close toπ:For instance, if the Markov chain is ergodic (i.e., irreducible and aperiodic), the distribution πn of Xn converges to π exponentially asn tends to infinity. In that case, the simulation can be split into two phases.
During the first one, the Markov chain approaches the stationary distribution sufficiently close to eliminate the “bias” from the initial positionX0;while the second one is the period of time necessary beforetn is a reasonable estimate of πf:
Letπn be the distribution ofXnfor the initial distribution q;and N2n= πn/π22=X
x
πx
X
y
qyPny; x/πx
2
:
Then, the time until the distribution is close to stationary can be formalized by a parameterτ;defined as
τ=min
nxNn≤p
1+1/e2 =minnx πn/π −12≤1/e:
From now on, we consider reversible and aperiodic Markov chains. Instead of working with Nn; we will use the chi-square distance πn/π −12: We have
πn/π −122≤ X
x; y
πxqyPny; x/πx −12
≤sup
y Pny;·/π· −12= Pn−π22→∞
≤ Pn122→∞Pn2−Eπ22→2;
where the first inequality follows from the Cauchy–Schwarz inequality and n1+n2=n:From the last inequality, we deduce
πn/π −12≤ min
n1+n2=nDn1Pn2−Eπ2→2
≤ 1/π∗−11/2µn;
(15)
whereDn1 = Pn12→∞ andµn = Pn−Eπ2→2:Finally, an upper bound forµnis accomplished by using the inequalityµn ≤β∗Pn;where
β∗P =maxβG−1P; β1P:
See Diaconis and Saloff-Coste (1993) and Fill (1991) for more details. Combin- ing this inequality with (15) gives
πn/π −12≤ 1/π∗−11/2β∗Pn: (16)
We are interested in determining the sufficient total number of steps for the simulation,ne=τ+n; such that
Pq
n−1
τ+n
X
i=τ+1
fXi
≥γ
≤α;
(17)
where α < 1 is a small constant such as 0:05; for instance. Then with the setup of Theorem 1.1, the inequality (2) stated in Remark 3 gives
Pq
n−1
τ+n
X
i=τ+1
fXi
≥γ
≤2 exp−nγ2εP/12:
(18)
Moreover, different techniques can be used to get a bound onτ;such as eigen- values and geometric techniques or coupling [see Aldous and Fill, Fill (1991), Diaconis and Saloff-Coste (1996a, b)]. For instance, Poincar´e inequality gives upper bound forµn[see Diaconis and Saloff-Coste (1996a)], whereas, when applicable, Nash and Log-Sobolev inequalities allow us to bound Dn [see Diaconis and Saloff-Coste (1996b)].
Examples. The following examples deal with the Metropolis algorithm, a widely used tool in simulation. Consider G = 0;1; : : : ; N and π a fixed distribution. We would like to construct an irreducible Markov chainMx; y
whose stationary distribution is π: The Metropolis algorithm begins with a base chainPx; yonGwhich is modified by an auxiliary randomization. We will assume that Pis irreducible and aperiodic and thatPx; y>0 implies Py; x>0:Let
Ax; y =
πyPy; x/πxPx; y; ifPx; y>0;
0; otherwise.
Formally, the Markov kernel Mis
Mx; y =
Px; y; ifAx; y ≥1; x6=y;
Px; yAx; y; ifAx; y<1;
Px; y + X
zxAx; z<1
Px; z1−Ax; z; ifx=y:
We can easily prove that M is reversible and aperiodic. For the present examples, we take the base chain to be the nearest-neighbor random walk
Px; x+1 =Px; x−1 =1/2; 1≤x≤N−1;
P0;1 =P0;0 =PN; N−1 =PN; N =1/2:
Binomial distribution. The stationary distribution is πx = 2−N Nx : The standard Metropolis construction gives [see Diaconis and Saloff-Coste (1996a)]
Mx; y =
1/2; if
(y=x+1; 0≤x≤ N−1/2;
y=x−1; N+1/2≤x≤N;
x/2N−x+1; ify=x−1; 1≤x≤ N+1/2;
N−x/2x+1; ify=x+1; N−1/2≤x≤N−1;
N−2x+1/2N−x+1; ify=x; 0≤x≤ N+1/2;
2x−N+1/2x+1; ify=x; N−1/2≤x≤N:
Here 1/N≤1−β1M ≤2/N,π∗=2−N;andβNM ≥ −1+2 minMx; x ≥
−1/2:Furthermore, it is shown in Diaconis and Saloff-Coste (1996a), by using the Log-Sobolev inequality, that
Mnx;·/π−1 ≤ 1+2e21/2e−c for n≥ N/2logN+2c +1:
In other words, we find that approximate equilibrium is reached for τ of the order ofNlogN, whereas inequality (16) asserts approximate randomness for τ of orderN2: We thus see, using inequality (2) (i.e., withτ =0), that order
N/γ2 steps are sufficient for condition (17) to hold, whereas inequality (18) improves the running time, since we get ne of the order ofNlogN+1/γ2:
Exponential fall-off. In this example, given in Diaconis and Saloff- Coste (1995), the stationary distribution is
πi =zaahi; 0< a <1; zthe normalizing constant:
Then, assuming
hi+1 −hi ≥c≥1; 0≤i≤N−1;
the Metropolis chain is given by
Mi; i−1 =1/2; Mi; i =2−11−ahi+1−hi;
Mi; i+1 =2−1ahi+1−hi; 1≤i≤N−1;
M0;0 =1−2−1ah1−h0; M0;1 =2−1ah1−h0; MN; N−1 =MN; N =1/2:
Here, the second eigenvalue of this chain satisfies β1≤1−2−11−ac/22:
See Diaconis and Saloff-Coste (1995) for the proof based on a Poincar´e in- equality. Moreover, βN ≥ −1+2 minMx; x ≥ −1+21/2−ac/2 = −ac: Thus,
β∗≤max1−2−11−ac/22; ac:
(19)
Moreover, inequality (19) says orderNsteps are sufficient to reach station- arity. It follows that, using inequality (18), condition (17) holds forne of the order ofN; whereas applying inequality (2) givesne of the order ofN/γ2:
Acknowledgments. Dominique Bakry, Laurent Saloff-Coste, Persi Dia- conis and Michel Ledoux were exceptionally helpful with this paper. I would like to thank also the referee and Associate Editor for suggesting a number of improvements.
REFERENCES
Aldous, D.(1987). On the Markov chain simulation method for uniform combinatorial distribu- tions and simulated annealing.Prob. Engrg. Inform. Sci.133–46.
Aldous, D.andFill, J.Reversible Markov chains and random walks on graphs. Unpublished manuscript.
Bahadur, R. R.andRanga Rao, R.(1960). On deviations of the sample mean.Ann. Math. Statist.
311015–1033.
Bennett, G.(1962). Probability inequalities for sums of independent random variables.J. Amer.
Statist. Assoc.5733–45.
Chernoff, H.(1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations.Ann. Math. Statist.23493–507.
Cram ´er, H.(1938). Sur un nouveau th´eor`eme de la th´eorie des probabilit´es.Actualit´es Sci. Indust.
736.
Dembo, A.andZeitouni, O.(1993). Large Deviations Techniques and Applications. Jones and Bartlett, Boston.
Diaconis, P.andSaloff-Coste, L.(1993). Comparison theorems for reversible Markov chains.
Ann. Appl. Probab.3696–730.
Diaconis, P.andSaloff-Coste, L.(1995). What do we know about the Metropolis algorithm?
Unpublished manuscript.
Diaconis, P.andSaloff-Coste, L.(1996a). Logarithmic Sobolev inequalities and finite Markov chains.Ann. Appl. Probab.6695–750.
Diaconis, P.andSaloff-Coste, L.(1996b). Nash inequalities for finite Markov chains.J. Appl.
Probab.9459–510.
Dinwoodie, I.(1995). A probability inequality for the occupation measure of a reversible Markov chain.Ann. Appl. Probab.537–43.
Fill, J. (1991). Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process.Ann. Appl. Probab.162–87.
Gillman, D.(1993). Hidden Markov chains: rates of convergence and the complexity of inference.
Ph.D. dissertation, MIT.
Kato, T.(1966). Perturbation Theory for Linear Operators. Springer, New York.
Kolmogorov, A.(1929). ¨Uber das Gesetz des iterieten Logaritmus.Math. Ann.99309–319.
Mann, B. (1996). Berry–Esseen central limit theorem for Markov chains. Ph.D. dissertation, Harvard Univ.
Marshall, A. W.andOlkin, I.(1979).Inequalities: Theory of Majorization and Its Applications.
Academic Press, New York.
Nagaev, S. V.(1957). Some limit theorems for stationary Markov chains.Theory Probab. Appl.
2378–406.
Nagaev, S. V.(1961). More exact statements of limits theorems for homogeneous Markov chains.
Theory Probab. Appl.662–81.
Stout, W. F.(1974). Almost Sure Convergence. Academic Press, New York.
Trotter, H. F.(1959). On the product of semi-groups of operators.Proc. Amer. Math. Soc.10 545–551.
Centre d’Etudes de la Navigation A ´erienne 31055 Toulouse Cedex
and
Lab. Statistique et Probabilit ´e Universit ´e Paul Sabatier 31055 Toulouse Cedex France
E-mail:[email protected]