Chernoff-type bound for finite Markov chains

(1)

HAL Id: hal-00940907

https://hal-enac.archives-ouvertes.fr/hal-00940907

Submitted on 1 Apr 2014

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Pascal Lezaud

To cite this version:

Pascal Lezaud. Chernoff-type bound for finite Markov chains. Annals of Applied Probability, Institute

of Mathematical Statistics (IMS), 1998, 8 (3), pp 849-867. �hal-00940907�

(2)

CHERNOFF-TYPE BOUND FOR FINITE MARKOV CHAINS

By Pascal Lezaud

CENA and Universit´e Paul Sabatier

This paper develops bounds on the distribution function of the empirical mean for irreducible finite-state Markov chains. One approach, explored by Gillman, reduces this problem to bounding the largest eigenvalue of a perturbation of the transition matrix for the Markov chain. By using esti- mates on eigenvalues given in Kato’s bookPerturbation Theory for Linear Operators, we simplify the proof of Gillman and extend it to nonreversible finite-state Markov chains and continuous time. We also set out another method, directly applicable to some general ergodic Markov kernels having a spectral gap.

1. Introduction. LetX_nbe an irreducible Markov chain on a finite set Gwith transition matrixPand stationary distributionπ:Then, for any function f and any initial distribution q, the weak law of large numbers states that for almost every trajectory of the Markov chain, the empirical mean n⁻¹Pn

i=1fXi converges to πf = P

yπyfy: This result is the basis of the Markov chain simulation method to evaluate the mean of the function f: In this paper, we will quantify this rate of convergence by studying the probability

P_q

n⁻¹

n

X

i=1

fX_i −πf≥γ

;

whereP_q denotes the probability measure of the chain with the initial distribution q; and the size of deviation, γ; is a small number, such as πf/10 or πf/100:

The corresponding rate of convergence for sums of independent random variables has been given by Kolmogorov (1929), Cram´er (1938), Chernoff (1952), Bahadur and Ranga Rao (1960) and Bennett (1962), but the sam- ples in Markov chains are generally correlated with each other, even if this correlation decreases exponentially with the number of steps. In that case, the theory of large deviations gives an asymptotic rate of convergence, since the empirical mean satisfies the large deviation principle with the good rate functionIz =sup_r∈Rrz−logβ₀r;where β₀ris the largest eigenvalue (i.e., Perron–Frobenius eigenvalue) of a perturbation of the transition matrix P[see Dembo and Zeitouni (1993)]. This asymptotic result is not satisfactory if one wants to achieve bounds that are useful for fixed n: Using perturbation theory, Gillman (1993) estimated the rate of convergence for reversible finite-state Markov chains by bounding the eigenvalueβ₀r:He got a bound in terms of the spectral gap of the original transition matrix P; defined

Received December 1996; revised June 1997.

AMS1991subject classification. 60F10.

Key words and phrases. Markov chain, Chernoff bound, eigenvalues, perturbation theory.

849

(3)

byεP =1−β₁P;whereβ₁Pdenotes the second largest eigenvalue ofP:

Using a technique adapted from Rellich, Dinwoodie (1995) improved the bound obtained by Gillman, but only for γsmall. Improvements of such bounds are motivated by their wide use in simulation.

The aim of this work, based on Kato’s perturbation theory, is to obtain a bound which depends on the variance off and which is applicable for allγ:

More precisely, this bound provides a Gaussian behavior for the small values of γ and a Poissonian behavior for the large values. Moreover, the result we achieved can also be easily extended to nonreversible Markov chains and continuous time. For instance, we get in the reversible case the following theorem, proved in Section 3.1, where · ₂denotes the l²π-norm.

Theorem 1.1. LetP; πbe an irreducible and reversible Markov chain on a finite setG:LetfxG→Rbe such thatπf=0,f_∞≤1and0<f²₂≤b²: Then, for any initial distributionq;any positive integernand all 0< γ≤1;

P_q

n⁻¹

n

X

i=1

fX_i ≥γ

≤e^εP/5N_qexp

− nγ²εP

4b²1+h5γ/b²

; (1)

whereN_q= q/π₂and

hx = ¹₂ p

1+x− 1−x/2

: Therefore, ifγ≪b²andεP ≪1;the bound is

1+o1N_qexp

− nγ²εP

4b²1+o1

:

Remark 1. For γ > 1; the probability of deviation is zero; thus, we can assume that γ ≤ 1: Moreover, applying Theorem 1.1 to −f gives the same bound on P_qn⁻¹Pn

i=1fX_i ≤ −γ: Thus, inequality (1) implies bounds on all moments of the positive random variable n⁻¹Pn

i fX_i; and quantifies convergence in eachl^pnorm, 1≤p <∞:

Remark 2. hxis an increasing function of x ≥ 0 such thathx ≤ x/2 for allx:Thus, forγ≤2b²/5, we get

P_q

n⁻¹

n

X

i=1

fX_i ≥γ

≤e^εP/5N_qexp

−nγ²εP

4b²

1− 5γ 2b²

; whereas forγ >2b²/5;

P_q

n⁻¹

n

X

i=1

fX_i ≥γ

≤e^εP/5N_qexp

−nγεP

10

1−2b² 5γ

:

Remark 3. Obviously, we can takeb²=1 in inequality (1), which gives Pq

n⁻¹

n

X

i=1

fXi ≥γ

≤e^εP/5Nqexp−nγ²εP/12:

(2)

(4)

The constantN_qis thel₂π-norm of the density ofqrelated to the stationary distribution π: This norm can be large but is always bounded by π∗^−1/2, whereπ_∗ =minπyxy∈G:However, it is worth noting thatNq=1 when the initial distribution is the stationary one. This fact will be used to im- prove the convergence, by splitting the simulation in two steps, as suggested in Aldous (1987). The first step, the initial transient period, may be defined as the period of time which must be discarded before the chain comes close toπ;

whereas the second step, the observation period, may be defined as the period of time where observations must be collected to reach the desired precision.

A short description of the paper is as follows. Section 2 of this paper sets out preliminaries on the perturbation theory of linear operators. The results taken from Kato (1966) are based on a function-theoretical study of the resolvent, in particular on the expression of the eigenprojections as contour integrals of the resolvent. However, the specific results needed here are easier than in Kato (1966), since we restrict our study to the case of operators having an eigenvalue whose algebraic multiplicity is 1.

Section 3.1 proves Theorem 1.1. The bound is extended to nonreversible Markov chains in Section 3.2, by considering the multiplicative symmetrization of the operator P;namely, K = P^∗P; where P^∗ is the adjoint of P on l₂π:Explicitly,

P^∗x; y =πyPy; x/πx:

Dinwoodie (1995) also obtained a bound for nonreversible Markov chain, but his bound depended on the random covering time Tdefined by T=infn≥ 1x X₀; X₁; : : : ; X_n =G:Bounding this covering time can be difficult.

Section 3.3 shows how the previous bounds extend to continuous-time Markov chains, by employing a method suggested by Fill (1991). We thus get an exponential bound in terms of the smallest nonzero eigenvalue of

−3+3^∗/2; where3is the generator of the semi-groupP_t:

Perturbation theory of linear operators is a fruitful method to achieve explicit bounds on finite Markov chains. Mann (1996) used it to obtain a Berry–

Esseen bound for Markov chains with finite or countably infinite state space, such that

βx=sup

Pg₂

g₂

<1;

(3)

where the sup is over all complex-valued functions on the state spaceGwith πg=0:Nagaev (1957, 1961) used also the perturbation method for studying some limit theorems for Markov chains under the Doeblin condition. Following the method used in a setting of the finite-state space, we recently have been able to achieve a Chernoff-type bound for general Markov operatorP under the assumptions that 1 is an isolated eigenvalue ofPand the dimension of the eigenspace for the eigenvalue 1 is finite. In particular, these assumptions are fulfilled for Markov chains satisfying the Doeblin condition or the relation (3).

We will publish this result in a forthcoming paper. Instead of presenting these results, we would rather set out another method, directly applicable to a gen-

(5)

eral irreducible Markov kernel assuming condition (3). Section 4 introduces this method, due to Bakry and Ledoux, which gives the following inequality forγ≤b²:

P_π

n⁻¹

n

X

i=1

fX_i −πf≥γ

≤exp

−nγ²1−β

21b²

:

Note that this last inequality involves β and not εP: For instance, if the state space is finite and the Markov chain is periodic, then β=1:Thus, this method is not applicable in this case, unlike the perturbation method.

Finally, Section 5 shows how the running time of simulation can be improved by considering an initial burning period whose effect is to decreaseN_q: 2. Perturbation theory of linear operators. LetTbe a linear operator on some finite-dimensional vector spaceX: We denote the spectrum ofT by 6T:The resolvent ofT, defined byRζ = T−ζ⁻¹, is an analytic operator- valued function on the domainC^/ \6T;called the resolvent set. Furthermore, the only singularities ofRζare the eigenvaluesλ_h,h=1; : : : ; s;ofT:

The Laurent series of the resolvent at a simple eigenvalue λ_h takes the form [Kato (1966), page 38]

Rζ = −ζ−λ_h⁻¹P_h+

∞

X

n=0

ζ−λ_hⁿSⁿ⁺¹_h ;

whereP_his the eigenprojection operator for the eigenvalueλ_handS_his called the reduced resolvent ofTwith respect to the eigenvalueλ_h: More precisely, S_h is the inverse of the restriction of T−λ_h in the subspaceI−P_hX:We deduce thatPh is the residue of−Rζinλh;and

P_h= − 1 2πi

Z

Ŵ_hRζdζ;

whereŴ_his a positively oriented small circle enclosingλ_h;but excluding other eigenvalues ofT:

Consider a family of operator-valued functions with the form Tχ =T+χT¹+χ²T²+ · · · :

Then the resolvent Rζ; χ = Tχ −ζ⁻¹ of Tχ is analytic in the two variablesζ; χin each domain in whichζ is not equal to any of the eigenvalues ofTχ[Kato (1966), page 66]. So it can be expanded into the following power series inχwith coefficients depending onζ:

Rζ; χ =Rζ +

∞

X

n=1

χⁿRⁿζ;

(4)

where eachRⁿ is an operator-valued function. This series is uniformly convergent if

∞

X

n=1

χⁿTⁿRζ<1:

(5)

(6)

Letrζbe the value ofχsuch that the left member of (5) is equal to 1:Then (5) is satisfied forχ< rζ:Letλbe one of the eigenvalues ofT=T0with multiplicitym=1;andŴbe a positively oriented circle, in the resolvent set of T;enclosingλbut no other eigenvalues ofT:The series (4) is then uniformly convergent forζ∈Ŵif

χ< r₀=min

ζ∈Ŵ rζ:

(6)

In the special case in which X is a Hilbert space and T is normal (i.e., T^∗T=TT^∗), we get

Rζ =1/distζ; 6T;

r₀=min

ζ∈Ŵ

a

distζ; 6T+c −1

for every ζ in the resolvent set of T; where a = T¹ and c is such that

Tⁿ ≤acⁿ⁻¹: If we choose asŴthe circleζ−λ =d/2, where d= min

µ∈6T\λλ−µ;

we obtainr₀= 2ad⁻¹+c⁻¹:

The existence of the resolventRζ; χ for ζ ∈ Ŵimplies that there are no eigenvalues ofTχonŴ: The operator

Pχ = − 1 2πi

Z

ŴRζ; χdζ (7)

is the eigenprojection for all the eigenvalues ofTχlying insideŴ: In particular [Kato (1966), page 68], for allχsufficiently small, we have

dimPχX=dimPX=1:

Therefore, only the eigenvalue λχ of Tχ lies inside Ŵ; and Pχ is the eigenprojection for this eigenvalue.

As the only eigenvalues ofTχPχare 0 andλχ; we will consider λχ −λ=trTχ −λPχ:

Combining (7) and substitution for Rζ; χ from (4) give the Taylor series expansion [Kato (1966), page 79]

λχ −λ= − 1 2πitrZ

Ŵζ−λRζ; χdζ =

∞

X

n=1

χⁿλⁿ; where

λⁿ =

n

X

p=1

−1^p p

X

ν₁+···+ν_p=n k₁+···+k_p=p−1

ν_i≥1; k_j≥0

trT^ν¹S^k¹· · ·T^ν^pS^k^p;

(8)

withS⁰= −P, Sⁿ=Sⁿ:HereSis the reduced resolvent ofTwith respect to the eigenvalueλ:

(7)

3. Chernoff-type bound for Markov chains. In this section, we consider an irreducible finite Markov chain with transition matrix P and stationary distribution π: This transition matrix defines an operator acting on functions by

Pfx = X

y∈G

Px; yfy:

We are interested in bounding the rate of convergence of P_qn⁻¹t_n≥γ; wheret_n=

n

X

i=1

fX_i −πf:

If the Markov chain is reversible, the operator P is self-adjoint on l₂π

endowed with the inner product

f; g = X

x∈G

fxgxπx:

This operator P has largest eigenvalue 1 and because of irreducibility, the constant functions are the only eigenfunctions with eigenvalue 1:These eigenvalues are denoted by

β₀P =1> β₁P ≥β₂P ≥ · · · ≥βG−1P ≥ −1;

and the spectral gap by εP =1−β₁P:

IfP; π is not reversible, we will consider the multiplicative symmetriza- tionK=P^∗PofP:This operatorKis a reversible Markov kernel with same stationary distributionπ:We will assume thatKis irreducible. For instance, this will be the case as soon asPx; x>0 for everyx∈G:

Since we will always work with a self-adjoint operator onl₂π, every norm considered will be the l₂π-norm. Moreover, except for constant functions without interest, replacing fby f−πf/f−πf_∞ shows that there is no loss of generality in assuming

πf=0; f_∞≤1; f²₂≤b²≤1:

3.1. Reversible Markov chains. This section proves Theorem 1.1 stated in the Introduction, by using the method of Gillman (1993). We first prove the following lemma.

Lemma 3.1. Referring to the setting of Theorem 1.1, letr >0:Then, for any positive integern,

P_qt_n/n≥γ ≤e^−rnγq^TPrⁿ1;

wherePr = e^rfyPx; y:

Proof. By Markov’s inequality,

P_qt_n≥nγ ≤e^−rnγE_qexprt_n;

(8)

where E_q denotes the expectation given the initial distribution q: Observe that

E_qexprt_n = X

x₀; x₁;:::;x_n

exprt_nqx₀

n

Y

i=1

Px_i−1; x_i;

where the summation is evaluated over all possible trajectoriesx₀; x₁; : : : ; x_n: By introducing the operatorPr = e^rfyPx; y;we obtain the expression

E_qexprt_n =q^TPrⁿ1:

Lemma 3.2. Letr >0 andN_q = q/π₂:Then q^TPrⁿ1≤e^rN_qβⁿ₀r;

whereβ₀Pris the largest eigenvalue ofPr.

Proof. Let us recall that Pis a self-adjoint operator in l₂π;and introduce the diagonal matrixE_r=diage^rfy; so thatPr =PE_r:ThenPr = pE⁻¹_r p

E_rPp E_rp

E_ris similar to the self-adjoint matrix Sr =p E_rPp

E_r; so its eigenvalues are real values forr≥0:In this case, the Perron–Frobenius theorem says thatSr_2→2=β₀Pr;whereβ₀Pris the largest eigenvalue ofPr: Applying the Cauchy–Schwarz inequality inl₂πgives

q^TPrⁿ1= q

π; Prⁿ1

π

≤N_q1₂E_r/2_2→2E_−r/2_2→2Srⁿ_2→2

≤e^rN_qβⁿ₀Pr:

Thus, Lemma 3.2 gives the inequality

P_qt_n/n≥γ ≤e^rN_qexp−nrγ−logβ₀Pr:

(9)

LetD=diagfx; then we can write Pr =P+

∞

X

i=1

rⁱ i!PDⁱ:

Since P ≤ Pr ≤ e^rP; we get 1 ≤ β₀Pr ≤ e^r [see Kato (1966), Theo- rem I-6.44]. It follows that β₀Pr is the perturbation of the eigenvalue 1 of P: The irreducibility of P implies that 1 is a simple eigenvalue and the eigenprojection for the eigenvalue 1 is the operatorπ defined by

πfx x=X

x

fxπx:

Thus (8) can be used to obtain the coefficients βⁿ in the Taylor series of β₀Pr −1;so long asr < r₀, wherer₀is the convergence radius defined by

(9)

(6). We obtain βⁿ=

n

X

p=1

−1^p p

X

ν₁+···+ν_p=n k₁+···+kp=p−1

ν_i≥1; k_j≥0

1

ν₁!: : : ν_p!f^ν¹; S^k¹PD^ν²· · ·S^k^p−1Pf^ν^p;

(10)

where we used the following relation whatever the matrixM:

trπPD^ν¹MD^ν^p =trπD^ν¹MD^ν^p = f^ν¹; Mf^ν^pπ: For instance,

β¹= f;1_π =0; β²= f;−Sf_π−¹₂f; f_π:

An explicit calculation gives β² = lim_n→∞Eπt²_n/n; which is precisely the asymptotic variance oft_n:

The number of terms in the formula (10) is Cn x=

n

X

p=1

n−1 p−1

2p−1

p−1 1

p: Furthermore, as1/n! ≤2¹⁻ⁿand ^2k_k

<2^2kkπ^−1/2, we obtain the following inequalities forn≥3:

Cn<1+π^−1/2

n−1

X

p=1

n−1 p

1 p+14^p≤

25 4n

5ⁿ⁻²:

Thus, forn≥7;we getCn ≤5ⁿ⁻²;but direct computations show this bound is also valid forn≥3:Finally, the Cauchy–Schwarz inequality gives

βⁿ≤ b²/55/εPⁿ⁻¹; n≥2:

Hence, provided thatr < εP/5;we obtain β₀Pr ≤1+ b²

εPr²

1− 5r εP

−1

: (11)

Combining inequalities (9) with (11) and log1+x ≤xgives, forr < εP/5;

P_qt_n/n≥γ ≤e^εP/5N_qexp−nQr;

whereQr =γr− b²/εP1−5r/εP⁻¹r²is maximized when

r= γεP

b²1+5γ/b²+p

1+5γ/b²:

We can check that r < εP/5; and simple computations yield inequality (1). ✷

(10)

3.2. Nonreversible Markov chains. This section extends the previous bound to nonreversible Markov chains. We now consider the multiplicative symmetrization of P; namely, K = P^∗P: Set Kr = P^∗rPr and denote its largest eigenvalue by β₀r: As in the previous section, Markov’s inequality gives

P_qt_n/n≥γ ≤e^−rnγq^TPrⁿ1

≤e^−rnγN_q1₂Prⁿ_2→2

=e^−rnγNqPr^∗ⁿPrⁿ^1/2_2→2

=e^−rnγN_qq

β_{0; n}r;

where β_{0; n}r denotes the largest eigenvalue of Pr^∗ⁿPrⁿ: Result anal- ogous to Lemma 3.2 requires the use of Marcus’ Theorem [see Marshall and Olkin (1979), Theorem 9.H.2.a]. This theorem states that β_{0; n}r ≤ β₀rⁿ; whence the following inequality:

P_qt_n/n≥γ ≤N_qβ^n/2₀ re^−rnγ:

Here again, the operatorKr will be considered as a perturbation of K;

since we have

Kr =K+

∞

X

i=1

rⁱ i

X

j=0

1

j!i−j!D^jKD^i−j

:

In the remainder of this section, we assume thatKis irreducible. From here, the arguments are exactly the same as the ones used in the proof of Theo- rem 1.1. The coefficientsβⁿ in the Taylor series ofβ₀r −1 are bounded by

βⁿ≤2b²2/εKⁿ⁻¹Cn ≤ 2/5b²10/εKⁿ⁻¹: Therefore, provided thatr < εK/10;we obtain

β₀r ≤1+ 4b² εKr²

1− 10r εK

−1

y hence,

P_qt_n/n≥γ ≤N_qexp

−n

rγ− 4b²

εKr²1− 10r/εK⁻¹

: Optimizing inr < εK/10;we may therefore state the following.

Theorem 3.3. Let P; π be an irreducible Markov chain on a finite set G; such that its multiplicative symmetrization K = P^∗P is irreducible. Let fx G→ Rbe such that πf=0, f_∞ ≤ 1 and 0< f²₂ ≤ b²: Then, for any initial distributionq;any positive integernand all 0< γ≤1;

P_qt_n/n≥γ ≤N_qexp

− nγ²εK

8b²1+h5γ/b²

;

(11)

whereεKis the spectral gap ofKand hx = ¹₂ p

1+x− 1−x/2

:

Obviously, with some modifications, the remarks stated in the Introduction are also valid.

3.3. Continuous-time Markov chain. In this section, we consider an irreducible continuous-time Markov chain on a finite-state space. Ifπdenotes its unique stationary distribution, the weak law of large numbers states that, for any functionsf;

P 1

t Z^t

0

fX_sds−πf≥γ

→0 ast→ ∞:

We can express the previous integral as the following limit:

1 t

Z^t

0 fX_sds= lim

k→∞k⁻¹

k

X

i=1

fX_it/k:

Fixk and assume thatπf=0; f_∞ ≤1 and 0<f²₂ ≤b²: The Markov kernelPt/ksatisfies

Pt/kx; y>0 ∀t >0 ∀x; y ∈G²:

It follows that the Markov kernelKt/k =P^∗t/kPt/kis irreducible. Let β₁t/kbe the second largest eigenvalue ofKt/kand denote its spectral gap byεt/k =1−β₁t/k:Hence, for all 0≤γ≤1;Theorem 3.3 implies

P_q

k⁻¹

k

X

i=1

fX_it/k ≥γ

≤N_qexp

− kγ²εt/k

8b²1+h5γ/b²

: (12)

If 3 denotes the generator of the semigroup P_t; the generator of P^∗_t is 3^∗:So, we deduce that, for k→ ∞; Kt/k =I+ t/k3+3^∗ +o1/k²and εt/k =2t/kλ₁+o1/k²;whereλ₁denotes the smallest positive eigenvalue of −3+3^∗/2: Now applying Fatou’s lemma for k → ∞ to inequality (12), gives the following theorem.

Theorem 3.4. LetPt; πbe an irreducible continuous-time Markov chain on a finite setG;and3its infinitesimal generator. LetfxG→Rbe such that πf=0,f_∞≤1and0<f²₂≤b²:Then, for any initial distributionq; all t >0and all 0< γ≤1;

P_q 1

t Z^t

0

fX_sds≥γ

≤N_qexp

− γ²λ₁t 4b²1+h5γ/b²

; whereλ₁is the smallest positive eigenvalue of −3+3^∗/2and

hx = ¹₂ p

1+x− 1−x/2

:

(12)

Finally, boundingb²by 1 gives the boundN_qexp−γ²λ₁t/12;which is valid for allγ≤1:

Remark. Applying Lemma 3.1 with the transition matrixPt/kand using the following formula valid for all matricesA; B:

k→∞limexpA/kexpB/k^k=expA+B;

give

P_q 1

t Z^t

0 fX_sds≥γ

≤e^−rtγN_qexp3rt_2→2;

where 3r = 3+rdiagfx: Thus, we get a linear perturbation of the infinitesimal generator 3:Moreover, exp3rt_2→2≤exp−λ₀rt; where λ₀r is the smallest eigenvalue of −3+3^∗/2−rD: By using the Trotter product formula [see Trotter (1959)], this method can be extended to general state space ergodic Markov processes.

4. Direct method. This section develops another method to obtain upper bound on the distribution function of the empirical measure. This method does not require the perturbation theory of linear operators and works easily in a more general setting.

From now on, we consider an ergodic Markov kernelPx; dydefined on a general state spaceG;with the stationary distributionπ:This kernel defines an operatorPacting onL²πby

Pgx =Z

GgyPx; dy:

Assume now that the condition (3) is fulfilled by P: Our goal is to obtain an upper bound for E_πexprPn

i=1fX_i; where 0 ≤ r and f is such that

f²₂≤b², πf=0 andf_∞≤1: Writeg=exprf; then E_πexp

r

n

X

i=1

fX_i

=Z

Qⁿgdπ;

where, for allh∈L²π,

Q¹hx =Z

Px; dyhy;

Qⁿ⁺¹=Z

Px; dygyQⁿhy:

The principle of the proof, due to Bakry and Ledoux, consists of centering each successive term on which the operatorPacts. Introduce some notation:

a₀=1; ai=Eπexp

r

i

X

j=1

fXj

; 1≤i≤n;

g₁=g−Z

g dπ; g_i=gPg_i−1−Z

gPg_i−1dπ; 2≤i≤n;

b₁=Z

g dπ; bi=Z

gPgi−1dπ=Z

g₁Pgi−1dπ; 2≤i≤ny

(13)

then

a_n=a_n−1b₁+Z

Qⁿg₁dπ

=a_n−1b₁+a_n−2b₂+Z

Qⁿ⁻¹g₂dπ :::

=

n

X

i=1

b_ia_n−i:

Now, each b_i can be bounded by using (3), since the operator P acts on the centered functionsg_i−1:Actually, letα=βe^r,a=b²e^r/2; then

b₁=Z

exprfdπ=1+r²

2E_πf²+r³

3!E_πf³+ · · · ≤1+ar²; b_i ≤βαⁱ⁻²g₁²₂; 2≤i < n;

where we used the inequalityg_i₂ ≤αⁱ⁻¹g₁₂, 1≤i≤ n:Furthermore, by Jensen’s inequality, we get R

exprfdπ≥1, so

g₁²₂=Z

exp2rfdπ− Z

exprfdπ 2

≤1+2r²e^2rE_πf²−1≤4ar²e^r:

Chooser <1−β;soα <1, since logβ≤ −1−β:Now, as induction hypothesis, suppose thata_k ≤φ^kr;whereφr =1+Cr² andC≥4ais independent of k:As

a₁≤ g₂≤ 1+4ar²^1/2≤ 1+2ar²;

the hypothesis is true fork=1 and the induction hypothesis gives ak ≤

1+ar²φ^k−1r +4ar²α^k−1

k−2

X

i=0

φr/αⁱ

≤

1+ar²φ^k−1r +4ar²αφ^k−1r

φr −α

≤φ^k−1r1+4ar²1−α⁻¹:

Choosing C= 4a1−α⁻¹; it follows that the induction hypothesis is valid.

Hence, for every 0≤γ <4b² andr=γ1−β/4b²;we obtain, by Markov’s inequality,

Pπtn/n≥γ ≤exp−nrγ−2b²r²e^r1−βe^r⁻¹

=exp

−n1−βγ² 8b²

2−1−βexpγ1−β/4b² 1−βexpγ1−β/4b²

:

(14)

Assume now thatγ≤2b²:Ase^x≤1+4x/3;forx≤1/2;we get the following:

Pπtn/n≥γ ≤exp

−n1−βγ² 8b²

1− γ

2b²

: (13)

This method is directly applicable to general Markov chains and gives the same type of asymptotic bound as the one obtained by the perturbation method for finite nonreversible Markov chains. Observe, however, that for finite reversible Markov chains the bound (13) involves the constantβ;instead of the second largest eigenvalue of the kernelP:

Inequality (13) can be extended to continuous time by assuming thatf is continuous and bounded and

supPtg₂/g₂; πg=0 ≤e^−tλ ∀t≥0 andλ >0:

(14)

Arguing as in the proof of Theorem 3.4, we get that, for everyγ≤2b²; P_π

1 t

Z^t

0

fX_sds≥γ

≤exp

−γ²tλ 8b²

1− γ

2b²

:

An exponential inequality applicable for all 0 ≤ γ ≤ 1 can be obtained by following the proof of the Prokhorov’s inequality for independent random variables given in Stout (1974). We obtain the following inequality, when qπ,E_πf=0, f_∞≤aandf²₂≤b²:

P_πt_n/n≥γ ≤exp

− γ

2aarcsinh

nγa1−β

4b²

:

Proof. For everyx≥0,e^x≥ex:Therefore,E_πexprt_n ≤expE_πexprt_n−

1:LetQx =P_πt_n≤xy we get E_πexprt_n ≤exp

Z

exprx −1dQx

: Moreover,Qassigns mass 1 to the interval−na; naand

Z x dQx =0; Z

x²dQx =E_πt²_n ≤2nb²1−β⁻¹: Thus,

E_πexprt_n ≤exp Z

exprx −1−rxdQx

≤exp

2Z

coshrx −1dQx

: As the functionhx =coshrx −1 is even and convex, we obtain

Z

hxdQx ≤Z

hxdQ₁x;

(15)

where Q₁ assigns mass only to 0 and to na; with as much mass as possible assigned tonawithout violating

Z x²dQ₁x ≤2nb²1−β⁻¹y thus,Q₁na ≤ 2b²n1−βa²⁻¹:This yields

E_πexprt_n ≤exp

4b²

na²1−βcoshrna −1

: Hence, for any r≥0;

P_πt_n/n≥γ ≤exp

−nγr+ 4b²

na²1−βcoshrna −1

: Differentiation shows that

r= na⁻¹arcsinhnaγ1−β4b²⁻¹ x= na⁻¹θ

minimizes the right-hand side of the above inequality. It suffices to show coshθ−1≤ θsinhθ/2:But the above inequality can easily be deduced from

∀x∈R; e^x2−x +e^−x2+x ≤4:

Another proof, due to Ledoux, can be developed for continuous-time Markov chain under the weaker assumption that the functionfis bounded, measur- able and hypothesis (14) is fulfilled. LetY_t=Rt

0fX_sds; then, for all k≥1, Y^k_t =k!Z^t

0

ds₁Z^t

s₁ds₂· · ·Z^t

s_k−1fX_s₁fX_s₂ · · ·fX_s_kds_k; E_πY^k_t =k!Z^t

0

du₁Z^t−u1

0

du₂· · · Z^t−u1+···+u_k−1

0

du_kZ

fP_u₂f· · · fP_u_kf · · ·dπ:

Let a_2;:::;k= R

fP_u₂f· · · fP_u_kf · · ·dπ; then, using (14) and the centering method gives the inequality

a_2;:::;k≤ f²₂

exp−u₂+ · · · +u_kλ +a₂exp−u₄+ · · · +u_kλ

+ · · · +a_2;:::;k−2exp−u_kλ

: An iteration of this inequality shows that a_2;:::;k is bounded by a sum of at most 2^k−1 of such terms:b²^s+1exp−m₂+ · · · +m_kλ; wherem_i =0 oru_i andsis the number ofm_i 6=u_i with 0≤s≤ k/2 −1:This implies

E_πexprY_t ≤1+

∞

X

k=2

r^k2^k−1

k/2

X

s=1

b²t^s s! λ^s−k

≤1+ 2−4r/λ⁻¹

∞

X

s=1

4b²r²t/λ^s s! ;

(16)

so long as 2r/λ <1:For 2r/λ≤1/2;this yieldsE_πexprY_t ≤exp4b²tr²λ⁻¹:

Optimizing in r≥ λ/4 together with Markov’s inequality, it follows that, for every 0≤γ≤2b²;

P_π 1

t Z^t

0 fX_sds≥γ

≤exp

−γ²tλ 16b²

:

5. Some reflections on simulation. This last section shows how Theo- rem 1.1 can be combined with quantification of the closeness to stationarity, in order to compute the simulation time required to obtain a sufficiently ac- curate estimate. More precisely, we will consider the algorithm suggested by Aldous (1987).

The constantN_q. As it has already been said before, the coefficientN_q=

q/π₂may be large but always bounded byπ∗^−1/2;whereπ_∗=minπyxy∈ G: Furthermore,N_q =1 when q = π: This fact suggests that it might be better to begin the computation oftnonly when the distribution of the Markov chain is close toπ:For instance, if the Markov chain is ergodic (i.e., irreducible and aperiodic), the distribution π_n of X_n converges to π exponentially asn tends to infinity. In that case, the simulation can be split into two phases.

During the first one, the Markov chain approaches the stationary distribution sufficiently close to eliminate the “bias” from the initial positionX₀;while the second one is the period of time necessary beforet_n is a reasonable estimate of πf:

Letπ_n be the distribution ofX_nfor the initial distribution q;and N²_n= π_n/π²₂=X

x

πx

X

y

qyPⁿy; x/πx

2

:

Then, the time until the distribution is close to stationary can be formalized by a parameterτ;defined as

τ=min

nxN_n≤p

1+1/e² =minnx π_n/π −1₂≤1/e:

From now on, we consider reversible and aperiodic Markov chains. Instead of working with Nn; we will use the chi-square distance πn/π −1₂: We have

π_n/π −1²₂≤ X

x; y

πxqyPⁿy; x/πx −1²

≤sup

y Pⁿy;·/π· −1²= Pⁿ−π²_2→∞

≤ Pⁿ¹²_2→∞Pⁿ²−E_π²_2→2;

where the first inequality follows from the Cauchy–Schwarz inequality and n₁+n₂=n:From the last inequality, we deduce

πn/π −1₂≤ min

n₁+n₂=nDn₁Pⁿ²−Eπ_2→2

≤ 1/π_∗−1^1/2µn;

(15)

(17)

whereDn₁ = Pⁿ¹_2→∞ andµn = Pⁿ−E_π_2→2:Finally, an upper bound forµnis accomplished by using the inequalityµn ≤β_∗Pⁿ;where

β_∗P =maxβ_G−1P; β₁P:

See Diaconis and Saloff-Coste (1993) and Fill (1991) for more details. Combin- ing this inequality with (15) gives

π_n/π −1₂≤ 1/π_∗−1^1/2β_∗Pⁿ: (16)

We are interested in determining the sufficient total number of steps for the simulation,n_e=τ+n; such that

P_q

n⁻¹

τ+n

X

i=τ+1

fX_i

≥γ

≤α;

(17)

where α < 1 is a small constant such as 0:05; for instance. Then with the setup of Theorem 1.1, the inequality (2) stated in Remark 3 gives

P_q

n⁻¹

τ+n

X

i=τ+1

fX_i

≥γ

≤2 exp−nγ²εP/12:

(18)

Moreover, different techniques can be used to get a bound onτ;such as eigenvalues and geometric techniques or coupling [see Aldous and Fill, Fill (1991), Diaconis and Saloff-Coste (1996a, b)]. For instance, Poincar´e inequality gives upper bound forµn[see Diaconis and Saloff-Coste (1996a)], whereas, when applicable, Nash and Log-Sobolev inequalities allow us to bound Dn [see Diaconis and Saloff-Coste (1996b)].

Examples. The following examples deal with the Metropolis algorithm, a widely used tool in simulation. Consider G = 0;1; : : : ; N and π a fixed distribution. We would like to construct an irreducible Markov chainMx; y

whose stationary distribution is π: The Metropolis algorithm begins with a base chainPx; yonGwhich is modified by an auxiliary randomization. We will assume that Pis irreducible and aperiodic and thatPx; y>0 implies Py; x>0:Let

Ax; y =

πyPy; x/πxPx; y; ifPx; y>0;

0; otherwise.

Formally, the Markov kernel Mis

Mx; y =











Px; y; ifAx; y ≥1; x6=y;

Px; yAx; y; ifAx; y<1;

Px; y + X

zxAx; z<1

Px; z1−Ax; z; ifx=y:

(18)

We can easily prove that M is reversible and aperiodic. For the present examples, we take the base chain to be the nearest-neighbor random walk

Px; x+1 =Px; x−1 =1/2; 1≤x≤N−1;

P0;1 =P0;0 =PN; N−1 =PN; N =1/2:

Binomial distribution. The stationary distribution is πx = 2^{−N N}_x : The standard Metropolis construction gives [see Diaconis and Saloff-Coste (1996a)]

Mx; y =











1/2; if

(y=x+1; 0≤x≤ N−1/2;

y=x−1; N+1/2≤x≤N;

x/2N−x+1; ify=x−1; 1≤x≤ N+1/2;

N−x/2x+1; ify=x+1; N−1/2≤x≤N−1;

N−2x+1/2N−x+1; ify=x; 0≤x≤ N+1/2;

2x−N+1/2x+1; ify=x; N−1/2≤x≤N:

Here 1/N≤1−β₁M ≤2/N,π_∗=2^−N;andβ_NM ≥ −1+2 minMx; x ≥

−1/2:Furthermore, it is shown in Diaconis and Saloff-Coste (1996a), by using the Log-Sobolev inequality, that

Mⁿx;·/π−1 ≤ 1+2e²^1/2e^−c for n≥ N/2logN+2c +1:

In other words, we find that approximate equilibrium is reached for τ of the order ofNlogN, whereas inequality (16) asserts approximate randomness for τ of orderN²: We thus see, using inequality (2) (i.e., withτ =0), that order

N/γ² steps are sufficient for condition (17) to hold, whereas inequality (18) improves the running time, since we get n_e of the order ofNlogN+1/γ²:

Exponential fall-off. In this example, given in Diaconis and Saloff- Coste (1995), the stationary distribution is

πi =zaa^hi; 0< a <1; zthe normalizing constant:

Then, assuming

hi+1 −hi ≥c≥1; 0≤i≤N−1;

the Metropolis chain is given by

Mi; i−1 =1/2; Mi; i =2⁻¹1−ahi+1−hi;

Mi; i+1 =2⁻¹ahi+1−hi; 1≤i≤N−1;

M0;0 =1−2⁻¹a^h1−h0; M0;1 =2⁻¹a^h1−h0; MN; N−1 =MN; N =1/2:

(19)

Here, the second eigenvalue of this chain satisfies β₁≤1−2⁻¹1−a^c/2²:

See Diaconis and Saloff-Coste (1995) for the proof based on a Poincar´e inequality. Moreover, β_N ≥ −1+2 minMx; x ≥ −1+21/2−a^c/2 = −a^c: Thus,

β_∗≤max1−2⁻¹1−a^c/2²; a^c:

(19)

Moreover, inequality (19) says orderNsteps are sufficient to reach stationarity. It follows that, using inequality (18), condition (17) holds forn_e of the order ofN; whereas applying inequality (2) givesn_e of the order ofN/γ²:

Acknowledgments. Dominique Bakry, Laurent Saloff-Coste, Persi Dia- conis and Michel Ledoux were exceptionally helpful with this paper. I would like to thank also the referee and Associate Editor for suggesting a number of improvements.

REFERENCES

Aldous, D.(1987). On the Markov chain simulation method for uniform combinatorial distribu- tions and simulated annealing.Prob. Engrg. Inform. Sci.133–46.

Aldous, D.andFill, J.Reversible Markov chains and random walks on graphs. Unpublished manuscript.

Bahadur, R. R.andRanga Rao, R.(1960). On deviations of the sample mean.Ann. Math. Statist.

311015–1033.

Bennett, G.(1962). Probability inequalities for sums of independent random variables.J. Amer.

Statist. Assoc.5733–45.

Chernoff, H.(1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations.Ann. Math. Statist.23493–507.

Cram ér, H.(1938). Sur un nouveau théorème de la théorie des probabilités.Actualités Sci. Indust.

736.

Dembo, A.andZeitouni, O.(1993). Large Deviations Techniques and Applications. Jones and Bartlett, Boston.

Diaconis, P.andSaloff-Coste, L.(1993). Comparison theorems for reversible Markov chains.

Ann. Appl. Probab.3696–730.

Diaconis, P.andSaloff-Coste, L.(1995). What do we know about the Metropolis algorithm?

Unpublished manuscript.

Diaconis, P.andSaloff-Coste, L.(1996a). Logarithmic Sobolev inequalities and finite Markov chains.Ann. Appl. Probab.6695–750.

Diaconis, P.andSaloff-Coste, L.(1996b). Nash inequalities for finite Markov chains.J. Appl.

Probab.9459–510.

Dinwoodie, I.(1995). A probability inequality for the occupation measure of a reversible Markov chain.Ann. Appl. Probab.537–43.

Fill, J. (1991). Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process.Ann. Appl. Probab.162–87.

Gillman, D.(1993). Hidden Markov chains: rates of convergence and the complexity of inference.

Ph.D. dissertation, MIT.

Kato, T.(1966). Perturbation Theory for Linear Operators. Springer, New York.

Kolmogorov, A.(1929). ¨Uber das Gesetz des iterieten Logaritmus.Math. Ann.99309–319.

Mann, B. (1996). Berry–Esseen central limit theorem for Markov chains. Ph.D. dissertation, Harvard Univ.

(20)

Marshall, A. W.andOlkin, I.(1979).Inequalities: Theory of Majorization and Its Applications.

Academic Press, New York.

Nagaev, S. V.(1957). Some limit theorems for stationary Markov chains.Theory Probab. Appl.

2378–406.

Nagaev, S. V.(1961). More exact statements of limits theorems for homogeneous Markov chains.

Theory Probab. Appl.662–81.

Stout, W. F.(1974). Almost Sure Convergence. Academic Press, New York.

Trotter, H. F.(1959). On the product of semi-groups of operators.Proc. Amer. Math. Soc.10 545–551.

Centre d’Etudes de la Navigation A ´erienne 31055 Toulouse Cedex

and

Lab. Statistique et Probabilit ´e Universit ´e Paul Sabatier 31055 Toulouse Cedex France

E-mail:[email protected]