or Chapter 5 of Durrett [1996]) and with the spectral theory of bounded symmet- ric operators (Section 107 in Riesz and Sz.-Nagy [1990], Section XI.6 in Yosida [1995]).

(1)

Chapter 1

A Warming-up Example

The purpose of this chapter is to present, in the simplest possible context, some of the ideas that will appear recurrently in this book. We assume that the reader is fa- miliar with the basic theory of Markov chains (e.g. Chapter 7 of Breiman [1968]

or Chapter 5 of Durrett [1996]) and with the spectral theory of bounded symmet- ric operators (Section 107 in Riesz and Sz.-Nagy [1990], Section XI.6 in Yosida [1995]).

Consider a Markov chain { X _j : j ≥ 0 } on a countable state space E , stationary and ergodic with respect to a probability measure π . The problem is to find neces- sary and sufficient conditions on a function V : E → R to guarantee a central limit theorem for

√ 1 N

N

−

1 ∑ j=0

V (X j ) . (1.1)

We assume that h V i

π

= 0. The idea is to relate this question to the well known martingale central limit theorems.

Denote by P the transition probability of the Markov chain and fix a function V in L ² ( π ), the space of functions f : E → R square integrable with respect to π ^. Assume the existence of a solution of the Poisson equation

V = (I − P) f (1.2)

for some function f in L ² ( π ), where I stands for the identity. For j ≥ 1, let Z _j = f (X j ) − (P f )(X j

−

1 ) .

It is easy to check that M ₀ = 0, M _N = ∑ 1

≤

j

≤

N Z _j is a martingale with respect to the filtration { F _j : j ≥ 0 } , F _j = σ (X 0 , . . . , X _j ), and that

N

−

1 j=0 ∑

V (X j ) = M _N − f (X N ) + f (X 0 ) . (1.3)

1

(2)

Since f is square integrable, the last two terms divided by N ^1/2 vanish as N ↑ ∞ and the central limit theorem for N

⁻

^1/2 ∑ 0

≤

j<N V (X j ) follows from the central limit theorem for the martingale M N .

If the Markov chain has good mixing properties which guarantee the convergence of the series ∑ j

≥

0 P ^j V in L ² ( π ), the Poisson equation (1.2) has a solution given by f = ∑ j

≥

0 P ^j V . Unfortunately, as it will be seen in the book, this is not the typical situation. Nevertheless, the central limit theorem can be established with weaker conditions approximating V , in a proper norm, by functions in the range of I − P.

The optimal situation is achieved in the special case where the invariant state π is reversible. It is shown in Theorem 1.9, the main result of this chapter, that in this case the finiteness of the limit variance

σ ² (V ) = lim

N

→∞

1 N E h ^N

−

1 ∑ j=0

V (X j ) 2 i

(1.4) is a necessary and sufficient condition for a central limit theorem for (1.1). Non- reversible chains require a deeper analysis presented in the next chapter.

The material is organized as follows. In Section 1.1 we introduce some termi- nology and prove a few elementary facts on Markov chains on countable state spaces. In the second section, we prove a central limit theorem for the sequence N

⁻

^1/2 ∑ 0

≤

j<N V (X j ) assuming that the solution of the Poisson equation (1.2) be- longs to L ² ( π ). In Section 1.3 we prove a central limit theorem for a stationary and ergodic sequence of random variables whose partial sums form a square integrable martingale. In the fourth section we obtain necessary and sufficient conditions for the limit variance (1.4) to be finite. This computation leads us to introduce some Hilbert spaces associated to the transition probability of the Markov chain which are examined in details in Section 1.6. In Section 1.5 we prove a central limit theorem for the sequence N

⁻

^1/2 ∑ 0

≤

j<N V (X j ) showing that this sum can be approximated by a martingale.

1.1 Ergodic Markov Chains

We present in this section some elementary results on Markov chains. Fix a count- able state space E and a transition probability function P : E × E → R :

P(x, y) ≥ 0 , x, y ∈ E , ∑

y

∈E

P(x, y) = 1 , x ∈ E .

A sequence of random variables { X _j : j ≥ 0 } defined on some probability space ( Ω , F , P) and taking values in E is a time-homogeneous Markov chain on E if

P[X j+1 = y | X _j , . . . , X ₀ ] = P(X j ,y) (1.5)

(3)

1.1 Ergodic Markov Chains 3 for all j ≥ 0, y in E . P(x,y) is called the probability of jump from x to y in one step. Notice that it does not depend on time, what explains the terminology of a time-homogeneous chain. The law of X ₀ is called the initial state of the chain.

Assume furthermore that on ( Ω , F ) we are given a family of measures P z , z ∈ E , each satisfying (1.5) and such that P x [X 0 = x] = 1. We call it a Markov family that corresponds to the transition probabilities P( · , · ). For a given probability measure µ on E , let P

µ

= ∑ x

∈E

µ (x) P x . Observe that µ is the initial state of the chain under P

µ

. We shall denote by E

µ

the expectation with respect to that measure and by E x

the expectation with respect to P x .

The transition probability P can be considered as an operator on C _b ( E ), the space of (continuous) bounded functions on E . In this case, for f in C _b ( E ), P f : E → E is defined by

(P f )(x) = ∑

y

∈E

P(x,y) f (y) = E[ f (X 1 ) | X ₀ = x] . (1.6) We use the same notation P for the transition probability and for the operator on C _b ( E ). In the countable case, we can think of P f as the product of the square matrix P with the column vector f .

Let µ P be the state of the process at time 1 if it starts from µ , i.e., the distribution at time 1 of the process starting at time 0 from µ ^:

( µ P)(x) = P

µ

[X 1 = x] = ∑

y

∈E

µ (y)P(y,x) .

In the countable case, we can think of µ P as the product of the line vector µ with the square matrix P.

For n ≥ 1, we denote by P ⁿ the n-fold composition of P with itself so that P ⁿ (x, y) = ∑

x

1,...,xn−1∈E

P(x, x ₁ )P(x 1 , x ₂ ) ··· P(x n

−

2 , x _n

₋

₁ )P(x n

−

1 , y)

for all x, y in E . In particular,

(P ⁿ f )(x) = E x [ f (X n )] , ( µ ^P ⁿ )(x) = ∑

y

∈E

µ (y)P y [X n = x] .

for all bounded function f and all probability measure µ ^{. Hence,} µ ^P ⁿ stands for the state of the process at time n if it starts from µ . By convention we let P ⁰ = I – the identity operator.

A probability measure π is said to be stationary or invariant for the chain if

π ^P = π . This happens if and only if the sequence of random variables { X _j : j ≥ 0 } is

stationary under P

π

. We do not assume the chain to be indecomposable. There might

exist, in particular, more than one invariant measure. Denote by E

_π

the expectation

with respect to π , not to be confounded with E

π

– the expectation with respect to

P

π

.

(4)

We say that an invariant measure π is ergodic if any bounded function f satisfy- ing (I − P)f = 0 is π -almost everywhere constant. It can be shown, see e.g. the proof of Theorem 7.16 in Breiman [1968], that π is ergodic if and only if the sequence of random variables { X _j : j ≥ 0 } is ergodic when considered over ( Ω , F ,P

_π

).

We may extend the domain of the definition of the operator P given in (1.6) to L ² ( π ), the space of π -square integrable functions. It is indeed clear, by Schwarz inequality, that P f defined by (1.6) belongs to L ² ( π ) if f does since

x ∑

∈E

π (x)[(P f )(x)] ² = ∑

x

∈E

π (x) n

y ∑

∈E

P(x, y) f (y) o 2

≤ ∑

x

∈E

π (x) ∑

y

∈E

P(x,y) f (y) ² = ∑

y

∈E

( π P)(y) f (y) ² = ∑

y

∈E

π (y) f (y) ²

because π is invariant. We have thus proved that P is a contraction in L ² ( π ):

h P f , P f i

π

≤ h f , f i

π

, (1.7) where h· , ·i

π

stands for the scalar product in L ² ( π ). Let k · k be the norm associated to the scalar product h· , ·i

π

.

1.2 Almost Sure Central Limit Theorem for Ergodic Markov Chains

Consider a time-homogeneous irreducible (or indecomposable in the terminology of Breiman [1968]) Markov chain { X j : j ≥ 0 } on a countable state space E with transition probability function P : E × E → R

+

. Assume that there exists a stationary probability measure, denoted by π . By [Breiman, 1968, Theorem 7.16], π ^{is unique} and ergodic. In particular, for any bounded function g : E → R and any x in E ,

N lim

→∞

1 N

N

−

1 ∑ j=0

(P ^j g)(x) = E

_π

[g] .

Fix a function V : E → R in L ² ( π ) which has mean zero with respect to π ^{. We} prove in this section a central limit theorem for the sequence N

⁻

^1/2 ∑ ^N _j=0

⁻

¹ V (X j ) assuming that the solution of the Poisson equation (1.2) belongs to L ² ( π ). Under this hypothesis we obtain a central limit theorem which holds π -a.s. with respect to the initial state.

Theorem 1.1. Fix a function V : E → R in L ² ( π ) which has mean zero with respect to π . Assume that there exists a solution f in L ² ( π ) of the Poisson equation (1.2).

Then, for all x in E , as N ↑ ∞,

(5)

1.2 Almost Sure Central Limit Theorem for Ergodic Markov Chains 5

√ 1 N

N

−

1 ∑ j=0

V (X j )

converges in P x distribution to a mean zero Gaussian random variable with variance σ ² (V ) = E

_π

[ f ² ] − E

_π

[(P f ) ² ].

Proof. Fix a mean zero function V in L ² ( π ) and an initial state x in E . By assump- tion, there exists a solution f in L ² ( π ) of the Poisson equation (1.2). Consider the sequence { Z _j : j ≥ 1 } of random variables defined by

Z _j = f (X j ) − P f (X j

−

1 ) . (1.8) The sequence { Z _j : j ≥ 1 } is adapted to the natural filtration of the Markov chain F _j = σ (X 0 , . . . , X _j ). Let M 0 = 0, M _j = ∑ 1

≤

k

≤

j Z _k , j ≥ 1. A simple computation shows that { M _j : j ≥ 1 } is a martingale adapted to the filtration { F _j : j ≥ 0 } .

Assume first that the solution f of the Poisson equation (1.2) is bounded. In this case, the random variables { Z _j : j ≥ 1 } are bounded. Thus, for | θ | small enough, we may define

A _j ( θ ) := log E x

h exp

i θ ^Z j

F

₍

_j

−

1)

i . (1.9)

Fix θ ⁱⁿ R . An elementary computation shows that for all N large enough, E x

h exp n (i θ / √

N) M _N −

∑ N j=1

A _j ( θ / √ N) oi

= 1 . It follows from a second order Taylor expansion that

∑ N j=1

A j ( θ / √

N) = − θ ² 2N

∑ N j=1

E x Z ² _j

F

₍

_j

−

1)

+ 1

√ N R N ,

for some random variable R N bounded above by a constant. Since E x

Z ² _j F

₍

_j

₋

₁₎

= (P f ² )(X j

−

1 ) − (P f ) ² (X j

−

1 ) , by the ergodic theorem, as N ↑ ∞, ∑ 1

≤

j

≤

N A _j ( θ / √

N) converges P x -a.s. to − ( θ ² /2) E

_π

[(P f ² ) − (P f ) ² ] = − ( θ ² /2)E

_π

[ f ² − (P f ) ² ]. In particular,

lim

N

→∞

E x

h exp

(i θ / √ N)M N

i = e

^−θ²^σ²^/2

,

where σ ² = E

_π

[ f ² − (P f ) ² ]. The central limit theorem for (1/ √

N)∑ 0

≤

j<N V (X j ) follows from this result and identity (1.3).

Assume now that f belongs to L ² ( π ). Let { f n : n ≥ 1 } be a sequence of

bounded function which converge to f in L ² ( π ). For a fixed n ≥ 1, let Z

⁽ⁿ⁾

_j =

f _n (X j ) − P f _n (X j

−

1 ), j ≥ 1, and let M ₀

⁽ⁿ⁾

= 0, M

⁽ⁿ⁾

_j = ∑ 1

≤

k

≤

j Z _k

⁽ⁿ⁾

be the martingale

associated to the sequence { Z

⁽ⁿ⁾

_j : j ≥ 1 } . By the first part of the proof, for every

(6)

n ≥ 1,

lim

N

→∞

E x

h exp

(i θ / √

N) M _N

⁽ⁿ⁾

i

= e

^−θ²^σⁿ²^/2

, where σ n ² = E

_π

[ f _n ² − (P f n ) ² ].

Since E

_π

[ f _n ² − (P f n ) ² ] converges to E

_π

[ f ² − (P f ) ² ] as n ↑ ∞, to conclude the proof of the theorem we need to show that

n lim

→∞

sup

N

→∞

E x

h exp

(i θ / √

N)M _N

⁽ⁿ⁾

i

− E x

h exp

(i θ / √ N)M N

i = 0 . Since | exp { ix } − exp { iy }| ≤ | x − y | , the previous difference is absolutely bounded

√ θ N E x

h

M _N

⁽ⁿ⁾

− M _N

i ≤ n θ ² N E x

h

M _N

⁽ⁿ⁾

− M _N 2 io 1/2

,

where we used Schwarz inequality in the last step. Recalling the representation of the martingales M _N , M

⁽ⁿ⁾

_N in terms of the sequences { Z _j : j ≥ 1 } , { Z

⁽ⁿ⁾

_j : j ≥ 1 } , by orthogonality of these variables in L ² (P x ), the epxression inside braces is equal to

θ ² N

∑ N j=1

E x

h

Z

⁽ⁿ⁾

_j − Z _j 2 i .

Let F _n = f _n − f so that Z

⁽ⁿ⁾

_j − Z _j = F _n (X j ) − PF _n (X j

−

1 ). With this notation the pre- vious sum becomes

θ ² N

∑ N j=1

(P ^j F _n ² )(x) − [P ^j

⁻

¹ (PF n ) ² ](x) .

By the ergodic theorem this average converges to θ ² ^E

π

[F _n ² − (PF n ) ² ]. This expres- sion vanishes as n ↑ ∞ because F _n converges to 0 in L ² ( π ). This proves the central limit theorem for the martingale M N and the theorem in view of identity (1.3). ⊓ ⊔

In general there is no solution of the Poisson equation in L ² ( π ), but one still expects a central limit theorem for N

⁻

^1/2 ∑ ^N _j=0

⁻

¹ V (X j ) if its variance remains finite.

We prove such a result in the next sections under the assumption of reversibility of the stationary measure π . The approach relies on a central limit theorem for martingales presented below.

1.3 Central Limit Theorem for Martingales

Fix a probability space ( Ω , F ,P) and an increasing filtration { F _j : j ≥ 0 } . Denote

by E the expectation with respect to the probability measure P . Let { Z _j : j ≥ 1 } be a

stationary and ergodic sequence of random variables adapted to the filtration { F _j }

and such that

(7)

1.3 Central Limit Theorem for Martingales 7 E

Z ₁ ²

< ∞ , E

Z _j+1 F _j

= 0 , j ≥ 0 . (1.10)

The variables { Z j : j ≥ 1 } are usually called martingale differences because the process { M _j : j ≥ 0 } defined as M ₀ := 0, M _j := ∑ 1

≤

k

≤

j Z _k , j ≥ 1, is a zero-mean, square integrable martingale with respect to the filtration { F _j : j ≥ 0 } .

Theorem 1.2. Let { Z _j : j ≥ 1 } be a sequence of stationary, ergodic random vari- ables satisfying (1.10). Then, N

⁻

^1/2 ∑ 1

≤

j

≤

N Z _j converges in distribution, as N ↑ ∞ , to a Gaussian law with zero mean and variance σ ² = E[Z ₁ ² ].

Proof. If one assumes that the martingale differences { Z j } are bounded, the proof is elementary and follows from the ergodic assumption. Suppose therefore that | Z ₁ | ≤ C ₀ , P -a.s.for some finite constant C ₀ .

We first build exponential martingales. Since { Z _j } are martingale differences, E[ ∑ j+1

≤

k

≤

j+K Z _k | F _j ] = 0 for all j ≥ 0, K ≥ 1. Therefore, since | e ^ix − 1 − ix | ≤ x ² /2, x ∈ R , subtracting E[i θ ∑ j+1

≤

k

≤

j+K Z _k | F _j ] from the expression on the left hand side in the next formula we obtain that

E h

exp n i θ ^j+K ∑

k=j+1

Z _k o F _j i

− 1 ≤ θ ²

2 E h ^j+K

k=j+1 ∑

Z _k 2 F _j i

.

Since the variables { Z _j } are martingale differences, we may replace (∑ k Z _k ) ² by

∑ k Z ² _k in the above conditional expectation to obtain that this expression is bounded above by ( θ ^C 0 ) ² K/2. The left hand side of the previous displayed equation is thus bounded by 1/2 if | θ | ≤ 1/ √

KC ₀ . We can therefore define for θ in this range and for j ≥ 1 the compensator

A _j ( θ ) := log E h exp n

i θ ∑ ^jK

k=( j

−

1)K+1

Z _k o

F

₍

_j

₋

_1)K i ,

with the usual definition of logarithm log(1 + z) := z − z ² /2 + z ³ /3 − . . . valid for

| z | < 1.

Fix 1 ≪ K ≪ N. This means that K increases to infinity after N. Let m _j := M _jK for j ≥ 0. Clearly, for every admissible θ we have exp { i θ ^m j − ∑ 1

≤

k

≤

j A _k ( θ ) } is a mean one exponential martingale with respect to the filtration { F _jK : j ≥ 0 } .

Assume without loss of generality that N = ℓK for some integer ℓ. An elementary third order Taylor expansion shows that

∑

ℓ

j=1

A _j ( θ / √

N) = − θ ² 2N

∑

ℓ

j=1

E h ^jK

k=(j

−

∑ 1)K+1

Z _k ² F

₍

j

−

1)K

i + K ²

√ N R _N,K

for some random variables R _N,K that can be deterministically bounded indepen- dently of N and K. Since

exp { i θ ^m j − ∑ 1

≤

k

≤

j A _k ( θ ) } : j ≥ 0 is a mean one ex-

ponential martingale, for any θ ⁱⁿ R and N sufficiently large we obtain

(8)

1 = E h exp n

i( θ / √ N)m

ℓ

−

∑

ℓ

k=1

A _k ( θ / √ N) oi

= E h

exp n i θ

√ N M _N + θ ² 2N

ℓ−

1 ∑ j=0

E h

⁽

^j+1)K

k=jK+1 ∑

Z ² _k F _jK i

− K ²

√ N R _N,K oi .

We prove below that

K lim

→∞

sup

ℓ≥

1 E h

exp n θ ² 2N

ℓ−

1 j=0 ∑

E h

⁽

^j+1)K

k= ∑ jK+1

(Z _k ² − σ ² ) F _jK io

− 1

i = 0 . (1.11)

Therefore, since R _N,K are uniformly bounded random variables, for every θ ⁱⁿ R ,

N lim

→∞

E h

exp n i θ

√ N M _N + θ ² σ ² 2

oi = 1 ,

which proves the central limit theorem in the case of bounded martingale differ- ences.

It remains to show that (1.11) is in force. The expression inside braces is bounded by a finite constant which depends on C ₀ and θ ^{. Since} | e ^x − 1 | ≤ | x | e

^|

^x

^|

, x ∈ R , the expectation in (1.11) is less than or equal to

C ₁ θ ² ℓ

∑

ℓ

j=1

E h 1 K

(

j+1)K k= ∑ jK+1

(Z _k ² − σ ² ) i

for some finite constant C ₁ . Since { Z _k } is a stationary sequence, this expression does not depend on ℓ and is equal to

C ₁ θ ² E h 1 K

∑ K k=1

(Z ² _k − σ ² )

i . (1.12)

By the ergodic theorem (1.12) vanishes as K ↑ ∞ .

The general case can be deduced from the previous one by approximating the martingale differences { Z _j : j ≥ 0 } by bounded martingales differences. The most natural way consists in fixing a cut-off level κ ≥ 1 and to define

Z

^(κ)

_j := φ

κ

(Z j ) − E φ

κ

(Z j )

F _j

₋

₁

for j ≥ 1. Here φ

κ

: R → [ − κ , κ ] stands for a cut-off function which can be taken as φ

κ

(x) = x1 {| x | ≤ κ } for instance.

Note that { Z

^(κ)

_j : j ≥ 1 } forms a sequence of bounded martingale differences.

Although { φ

κ

(Z j ) } inherits stationarity and ergodicity from the original sequence,

the random variables { Z

^(κ)

_j : j ≥ 1 } may loose these properties due to the presence

of the conditional expectation.

(9)

1.3 Central Limit Theorem for Martingales 9 Let σ

_κ

² := E[ φ

_κ

(Z 1 ) ² ]. By the dominated convergence theorem, σ

_κ

² converges to σ ² ^as κ ↑ ∞. Moreover, Z ₁

^(κ)

converges to Z ₁ in L ² (P), as κ ↑ ∞, since

E h

Z ₁ − Z ₁

^(κ)

2 i

≤ E h

Z ₁ ² 1 {| Z ₁ | > κ } i .

To derive this inequality we expanded the square and used the identity E[Z 1 | F ₀ ] = 0 which follows from the fact that { Z _j } are martingale differences.

Let { M

^(κ)

_j : j ≥ 0 } be the martingale associated to the sequence { Z

^(κ)

_j : j ≥ 1 } : M ₀

^(κ)

:= 0, M _N

^(κ)

:= ∑ 1

≤

j

≤

N Z

^(κ)

_j , N ≥ 1. We show below that

κ→

lim

∞

lim

N

→∞

E h

exp n (i θ / √

N)M _N

^(κ)

oi

− e

^−θ²^σ^κ²^/2

= 0 . (1.13) This claim does not follow from the first part of the proof because the sequence { Z

^(κ)

_j : j ≥ 1 } may be not stationary nor ergodic.

On the other hand, we have seen above that exp {− θ ² σ

_κ

² /2 } converges, as κ ↑ ∞, to exp {− θ ² σ ² /2 } . Therefore, to conclude the proof of the theorem we need to show that the difference

E h exp n

(i θ / √ N)M N

oi − E h exp n

(i θ / √

N)M _N

^(κ)

oi vanishes as N ↑ ∞ and then κ ↑ ∞.

Since | e ^ix − e ^iy | ≤ | x − y | for x, y ∈ R , by Schwarz inequality, the previous expres- sion is absolutely bounded by

θ ⁿ ¹ N E h

M _N

^(κ)

− M N

2 io 1/2

.

Since the random variables { Z _j − Z

^(κ)

_j : j ≥ 1 } are orthogonal, the expression inside braces is equal to the average of E[(Z

^(κ)

_j − Z _j ) ² ]. We estimated above a similar term showing that it is less than or equal to E[Z ₁ ² 1 {| Z ₁ | ≥ κ } ]. Analogous arguments apply here. This expectation vanishes as κ ↑ ∞, concluding the proof.

We now turn to the claim (1.13). Recall the proof of the central limit theorem for bounded martingale differences and observe that up to (1.11) we did not use the stationarity or the ergodicity of the sequence. The term we need to estimate is the exponential of

θ ² 2N

ℓ−

1 j=0 ∑

E h

⁽

^j+1)K

k= ∑ jK+1

(Z _k

^(κ)

) ² F _jK i

= θ ² 2N

ℓ−

1 ∑ j=0

E h

⁽

^j+1)K

k=jK+1 ∑

φ

κ

(Z _k ) ² −E[ φ

κ

(Z _k ) | F _k

₋

₁ ] ² F _jK i

,

(10)

where the identity follows from the definition of the variables Z

^(κ)

_k . Notice that the positive term E[ φ

κ

(Z k ) | F _k

₋

₁ ] ² has a negative sign in front. In particular, its expo- nential is bounded by one. On the other hand, since for each fixed κ ^, { φ

κ

(Z k ) : j ≥ 1 } is a stationary ergodic sequence, by (1.11) we may replace φ

κ

(Z k ) ² by σ

_κ

² ^{in the} exponential.

It remains to estimate E h

exp n − θ ² 2N

ℓ−

1 ∑ j=0

E h

⁽

^j+1)K

k=jK+1 ∑

E[ φ

κ

(Z _k ) | F _k

₋

₁ ] ² F _jK io

− 1

i .

Since the expression inside the exponential is negative and since 1 − e

⁻

^x ≤ x for x ≥ 0, the previous expression is bounded above by

θ ² 2N

∑ N k=1

E h

E[ φ

κ

(Z _k ) | F _k

₋

₁ ] ² i .

Since { Z _k : k ≥ 1 } is a martingale difference, we may subtract Z _k in the conditional expectation without affecting it. Applying Schwarz inequality we bound the previ- ous expression by ( θ ² /2)E[Z ₁ ² 1 {| Z ₁ | > κ } ]. This prove (1.13) and concludes the proof of the theorem. ⊓ ⊔

In the proof of Theorem 1.2 all estimates were carried through in L ¹ (P). A straightforward adaptation of the arguments gives a conditional central limit the- orem:

Theorem 1.3. Under the assumptions of Theorem 1.2, for every θ ⁱⁿ R ,

N lim

→∞

E h E h

exp

i θ (M N / √ N)

F ₀ i

− e

^−θ²^σ²^/2

i = 0 , where σ ² = E[Z ₁ ² ].

Remark 1.4 Deeper analysis permits to prove convergence of the partial sums to a Brownian motion with diffusion coefficient σ ² = E[Z ₁ ² ]: For every T > 0,

Z ^N (t) = 1

√ N

[Nt]

∑

j=0

Z j + Nt − [Nt]

√ N Z

_[Nt]+1

converges to a Brownian motion in C([0,T ]), the space of continuous functions in [0, T ]. In this formula, [a] stands for the integer part of a ∈ R : [a] = sup { n ∈ Z : n ≤ a } . We refer to Theorem2.??.

1.4 Time-Variance in Reversible Markov Chains

We examine in this section the asymptotic behaviour of the variance of

(11)

1.4 Time-Variance in Reversible Markov Chains 11

√ 1 N

N

−

1 ∑ j=0

V (X j )

for square integrable functions V in the context of reversible Markov chains. Re- versibility with respect to π means that P is a symmetric operator in L ² ( π ):

h P f , g i

π

= h f , Pg i

π

for all f , g in L ² ( π ). It is easy to check that a probability measure π is reversible if and only if it satisfies the detailed balance condition:

π (x)P(x,y) = π (y)P(y, x) for all x, y in E , which means that

P

π

[X n = x, X _n+1 = y] = P

π

[X n = y, X _n+1 = x] . A reversible measure is necessarily invariant since

( π P)(x) = ∑

y

∈E

π (y)P(y , x) = ∑

y

∈E

π (x)P(x, y) = π (x) .

We prove in this section that the following limit exists:

σ ² (V ) = lim

N

→∞

E

π

h 1

√ N

N

−

1 j=0 ∑

V (X j ) 2 i ,

where we admit +∞ as a possible value, and we find necessary and sufficient con- ditions for σ ² (V ) to be finite. We also introduce Hilbert spaces associated to the transition operator P which will play a central role in the next chapters.

Fix an invariant probability measure π and a function V in L ² ( π ). An elementary computation gives that

E

π

h 1

√ N

N

−

1 ∑ j=0

V (X j ) 2 i

= 1 N

N

−

1 j,k=0 ∑

E

π

[V (X j )V (X k )]

= 1 N

N

−

1 j=0 ∑

E

π

[V (X j ) ² ] + 2 N ∑

j<k

E

π

[V (X j )V (X _k )] .

Since π is a stationary measure, E

π

[V (X j ) ² ] = h V,V i

π

and, for j < k, E

π

[V (X j ) V (X k )] = h V,P ^k

⁻

^j V i

π

so that the second term is equal to

2 N ∑

j<k

h V, P ^k

⁻

^j V i

π

= 2

N

−

1 i=1 ∑

[1 − (i/N)] h V,P ⁱ V i

π

.

In conclusion,

(12)

E

π

h 1

√ N

N

−

1 ∑ j=0

V (X j ) 2 i

= h V,V i

π

+ 2

N

−

1 i=1 ∑

[1 − (i/N)] h V,P ⁱ V i

π

.

To estimate the second expression we rely on the spectral decomposition of the operator P. Since P is symmetric in L ² ( π ), all its eigenvalues are real and P admits a spectral decomposition:

P = Z

R

ϕ dE

_ϕ

.

By (1.7), P is a contraction, its spectrum is contained in [ − 1, 1] so that P =

Z 1

−

1 ϕ ^dE

ϕ

.

Notice that 1 is an eigenvalue associated to the constants because P1 = 1 if 1 is the constant function equal to 1.

The spectral decomposition of P permits to represent the scalar product h P ^k V,V i

π

in terms of the spectral measure of V : h V, P ^k V i

π

= h V,

Z 1

−

1 ϕ ^k ^dE

ϕ

V i

π

= Z 1

−

1 ϕ ^k ^d h V, E

_ϕ

V i

π

.

Denote the spectral measure d h V, E

_ϕ

V i

π

of V by µ V (d ϕ ) and notice that µ V is a finite measure on [ − 1, 1] with total mass equal to h V,V i

π

. With this notation,

E

π

h 1

√ N

N

−

1 ∑ j=0

V (X j ) 2 i

= h V,V i

π

+ 2 Z 1

−

1 N

−

1 i=1 ∑

[1 − (i/N)] ϕ ⁱ µ V (d ϕ ) .

The second term on the right hand side can be rewritten as 2

Z ₁

−

1 ∑

i

≥

1 [1 − (i/N)]

⁺

ϕ ⁱ µ V (d ϕ ) ,

where a

⁺

stands for the positive part of a. For − 1 ≤ ϕ ≤ 0, an elementary com- putation shows that ∑ i

≥

1 [1 − (i/N)]

⁺

ϕ ⁱ is absolutely bounded by a finite constant and that it converges to ϕ /(1 − ϕ ), as N ↑ ∞. On the other hand, for 0 ≤ ϕ ≤ 1,

∑ i

≥

1 [1 − (i/N)]

⁺

ϕ ⁱ increases to ϕ /(1 − ϕ ). Therefore, by the monotone and by the dominated convergence theorem, the previous integral converges to

2 Z 1

−

1 ϕ

1 − ϕ µ V (d ϕ ) . We have thus proved the following result.

Lemma 1.5 For any function V in L ² ( π ), σ ² (V ) < ∞ if and only if

Z 1

−

1

1 1 − ϕ µ V (d ϕ ) < ∞ . (1.14)

(13)

1.4 Time-Variance in Reversible Markov Chains 13 In this case,

σ ² (V ) = Z 1

−

1 1 + ϕ

1 − ϕ µ V (d ϕ ) . (1.15)

Note that σ ² (V) = ∞ if E

_π

[V ] 6 = 0 since for such functions the spectral measure gives a positive weight to 1: µ V ( { 1 } ) > 0.

The previous computation leads to the following Hilbert space. Since P is an operator bounded by 1, I − P is non-negative so that

h f , g i 1 = h f ,(I − P)g i

π

defines a semi-definite scalar product in L ² ( π ). It is semi-definite because h 1, 1 i 1 = 0. Denote by H ₁ the Hilbert space induced by L ² ( π ) endowed with the scalar prod- uct h· , ·i 1 . Let k · k 1 be the norm associated to this scalar product. The Hilbert space H ₁ is examined in details in section 1.6.

For λ > 0, consider the resolvent equation

λ ^f

_λ

+ (I − P) f

_λ

= V . (1.16)

This equation has a solution f

_λ

in L ² ( π ) because (1 + λ )I − P is invertible for all λ > 0. It is given explicitly by f

_λ

= (1 + λ )

⁻

¹ ∑ j

≥

0 (1 + λ )

⁻

^j P ^j V .

The proof of the Central Limit Theorem for N

⁻

^1/2 ∑ 0

≤

j<N V (X j ) relies on the following two estimates on the solution of the resolvent equation.

Lemma 1.6 Let f

_λ

be the solution of the resolvent equation (1.16) for some zero- mean function V with finite time-variance σ ² (V ). Then,

λ→

lim 0 λ h f

_λ

, f

_λ

i

π

= 0 .

Proof. Since f

_λ

is the solution of the resolvent equation (1.16), λ _h f

_λ

, f

_λ

i

π

= λ E

_π

hn

[(1 + λ )I − P]

⁻

¹ V o 2 i

= Z 1

−

1 λ

(1 + λ − ϕ ) ² µ V (d ϕ ) . Since λ /(1 + λ − ϕ ) ² ≤ (1 − ϕ )

⁻

¹ and since µ V (d ϕ ) integrates (1 − ϕ )

⁻

¹ , by the dominated convergence theorem, the previous integral vanishes as λ ↓ 0. ⊓ ⊔ Lemma 1.7 Let f

_λ

be the solution of the resolvent equation (1.16) for some zero- mean function V with finite time-variance σ ² (V ). The sequence f

_λ

is a Cauchy sequence in H ₁ : for every ε > 0, there exists λ 0 > 0 such that for any λ 1 , λ 2 < λ 0

h f

_λ₁

− f

_λ₂

, (I − P)(f

_λ₁

− f

_λ₂

) i

π

< ε .

Proof. Since f

_λ

is the solution of the resolvent equation,

(14)

h f

_λ₁

− f

_λ₂

,(I − P)( f

_λ₁

− f

_λ₂

) i

π

= Z 1

−

1 (1 − ϕ ) n 1 1 + λ 2 − ϕ ⁻

1 1 + λ 1 − ϕ

o 2

µ V (d ϕ )

= Z 1

−

1 (1 − ϕ )( λ 2 − λ 1 ) ²

[1 + λ 1 − ϕ ] ² [1 + λ 2 − ϕ ] ² µ V (d ϕ ) .

Since the integrand is bounded above by (1 − ϕ )

⁻

¹ and since the spectral measure of V integrates (1 − ϕ )

⁻

¹ , the integral converges to 0 as λ 1 , λ 2 ↓ 0. ⊓ ⊔

A similar computation to the one presented in the previous proof shows that σ ² (V) = lim

λ→

0 h f

_λ

, (I − P ² )f

_λ

i

π

. (1.17) Indeed, since f

_λ

solves the resolvent equation, by the spectral representation of the operator P,

h f

_λ

,(I − P ² ) f

_λ

i

π

= Z 1

−

1 (1 − ϕ ² )

(1 + λ − ϕ ) ² µ V (d ϕ ) .

Since the integrand is bounded above by (1 − ϕ )

⁻

¹ , by the dominated convergence theorem, as λ ↓ 0, the previous expression converges to

Z 1

−

1 1 + ϕ

1 − ϕ µ V (d ϕ ) , which is equal to σ ² (V ) in view of (1.15).

1.5 Central Limit Theorem for Reversible Markov Chains

We prove in this section a central limit theorem for additive functionals of re- versible Markov chains. Fix a zero-mean function V in L ² ( π ). We have seen in the beginning of this chapter that a central limit theorem for the additive functional N

⁻

^1/2 ∑ 0

≤

j<N V (X j ) follows easily from a central limit theorem for martingales if V belongs to the range of I − P, i.e., if there is a solution in L ² ( π ) of the Poisson equation (I − P) f = V . This assumption is too strong and should be relaxed. A nat- ural condition to impose on V is to require that its time variance σ ² (V ) is finite. In this case we may try to repeat the approach presented in the beginning of the chapter replacing the solution of the Poisson equation (I − P) f = V , which may not exist, by the solution f

_λ

of the resolvent equation λ ^f

_λ

+ (I − P) f

_λ

= V which always exists.

Fix therefore a zero-mean function V and assume that its variance σ ² (V ) is finite.

Let f

_λ

be the solution of the resolvent equation (1.16). For N ≥ 1,

(15)

1.5 Central Limit Theorem for Reversible Markov Chains 15 N

−

1 j=0 ∑

V (X j ) = λ ^N ∑

⁻

¹

j=0

f

_λ

(X j ) +

N

−

1 ∑ j=0

{ f

_λ

(X j ) − (P f

_λ

)(X j ) }

= M _N

^λ

+ f

_λ

(X 0 ) − f

_λ

(X N ) + λ ^N ∑

⁻

¹

j=0

f

_λ

(X j ) , (1.18)

where { M _N

^λ

: N ≥ 0 } is the martingale with respect to the filtration { F _j : j ≥ 0 } , F _j = σ (X 0 , . . . , X j ), defined by M ₀

^λ

:= 0,

M _N

^λ

:=

∑ N j=1

Z

^λ

_j ,

for Z

^λ

_j = f

_λ

(X j ) − (P f

_λ

)(X j

−

1 ) for j ≥ 1.

Lemma 1.8 For each N ≥ 1, as λ ↓ 0, M _N

^λ

converges in L ² (P

_π

) to some variable M _N . The limit process { M _j , j ≥ 0 } is a martingale with respect to the filtration { F _j } and

N

−

1 ∑ j=0

V (X j ) = M _N + R _N for some process R _N in L ² (P

_π

).

Proof. Since N is fixed, to prove the convergence of the martingale M _N

^λ

, we just need to show that each term Z

^λ

_j = f

_λ

(X j ) − (P f

_λ

)(X j

−

1 ) converges in L ² (P

_π

). We will prove that this sequence is Cauchy. For 0 < λ 1 < λ 2 , since P is a symmetric operator and π is a stationary measure,

E

π

h

f

_λ₂

(X j+1 ) − f

_λ₁

(X j+1 ) − (P f

_λ₂

)(X j ) + (P f

_λ₁

)(X j ) 2 i

= E

_π

h f

_λ

2

− f

_λ

1

2 i

− E

_π

h P f

_λ

2

− P f

_λ

1

2 i

= h f

_λ₂

− f

_λ₁

,(I − P ² ) f

_λ₂

− f

_λ₁

i

π

.

Since I +P ≤ 2I and I − P ≥ 0, we have I − P ² = (I − P)(I + P) ≤ 2(I − P). There- fore, the above quantity is bounded by

2 h f

_λ₂

− f

_λ₁

, (I − P)(f

_λ₂

− f

_λ₁

) i

π

,

which vanishes as λ 1 , λ 2 ↓ 0, by virtue of Lemma 1.7. We have thus proved that the martingale M _N

^λ

is a Cauchy sequence in L ² (P

_π

) as λ ↓ 0. It converges, in par- ticular, to some process M _N , which is a square-integrable martingale because the convergence takes place in L ² (P

_π

).

To conclude the proof of the lemma it remains to recall the decomposition of

∑ 0

≤

j<N V (X j ) given in (1.18). Since the left hand side does not depend on λ ^and

since we just proved the convergence of the first term on the right hand side, the

second must also converge to some limit that we denote by R N . ⊓ ⊔

(16)

Lemma 1.1. Recall the decomposition of ∑ 0

≤

j<N V (X j ) obtained in the previous lemma. N

⁻

^1/2 R _N converges to 0 in L ² (P

_π

) as N ↑ ∞.

Proof. It follows from the decomposition given in (1.18) and the one presented in Lemma 1.1 that

R _N = M _N

^λ

− M _N + f

_λ

(X 0 ) − f

_λ

(X N ) + λ ^N ∑

⁻

¹

j=0

f

_λ

(X j )

for all λ > 0. Choose λ = N

⁻

¹ . We claim that each term on the right hand side divided by √

N vanishes in L ² (P

_π

). On the one hand, by Schwarz inequality and by the invariance of π ^,

1 N E

π

h λ ^N ∑

⁻

¹

j=0

f

_λ

(X j ) 2 i

≤ 1 N E

_π

f

_λ

²

= λ h f

_λ

, f

_λ

i

π

because λ = N

⁻

¹ . We proved in Lemma 1.6 that this expression vanishes as λ ↓ 0.

By similar reasons, N

⁻

^1/2 f

_λ

(X N ) and N

⁻

^1/2 f

_λ

(X 0 ) vanish in L ² (P

_π

) as N ↑ ∞.

It remains to consider the martingale part. By orthogonality of the increments of the martingales and by stationarity

N

⁻

¹ E

π

[(M _N

^λ

− M N ) ² ] = 1 N

N

−

1 ∑ j=0

E

π

(Z

^λ

_j − Z j ) ²

= E

π

[(Z ₁

^λ

− Z ₁ ) ² . We proved in Lemma 1.8 that this expression vanishes as N ↑ ∞, which concludes the proof of the lemma. ⊓ ⊔

We are now in a position to state the main result of this chapter.

Theorem 1.9. Consider a Markov chain { X _j , j ≥ 0 } on a countable state space E , ergodic and reversible with respect to some invariant state π ^{. Let V :} ^E → R be a zero-mean function in L ² ( π ) with finite time-variance: σ ² (V ) < ∞. Then, under P

π

,

√ 1 N

N

−

1 ∑ j=0

V (X j )

converges in distribution to a zero-mean Gaussian law with variance σ ² (V ).

Proof. By Lemma 1.8, each variable Z

^λ

_j converges in L ² (P

_π

), as λ ↓ 0. Denote by Z _j its limit. We claim that the sequence { Z _j : j ≥ 1 } satisfies the assumptions of Theorem 1.2 with respect to the filtration F _j = σ (X 0 , . . . , X _j ), j ≥ 0, and the probability measure P

π

.

The sequence { Z _j } inherits the stationary from { Z

^λ

_j } as well as the measurability

with respect to the filtration { F _j } . Since Z

^λ

_j converges to Z _j in L ² (P

_π

) and since

{ Z

^λ

_j } are martingale differences with respect to the filtration { F _j } ,

(17)

1.5 Central Limit Theorem for Reversible Markov Chains 17

E

π

[Z ² ₁ ] < ∞ and E

π

[Z j+1 | F _j ] = 0 , j ≥ 0 .

To show that the sequence { Z _j } is ergodic, let ν be the probability measure on E × E defined by ν (x, y) := π (x)P(x, y). For λ > 0, let Ψ

_λ

^: ^E × E → R be defined by Ψ

_λ

(x,y) := f

_λ

(y) − P f

_λ

(x). Note that Z k = Ψ

_λ

(X k , X _k

₋

₁ ). Moreover,

k Ψ

_λ

k ² _L

²_(ν)

= h f

_λ

, (I − P ² ) f

_λ

i

π

≤ 2 k f

_λ

k ² 1 .

In particular, by the proof of Lemma 1.8, Ψ

_λ

is a Cauchy sequence in L ² ( ν ). Denote by Ψ its limit. Since Z _k = Ψ

_λ

(X k , X _k

₋

₁ ), Z k = Ψ (X k ,X k

−

1 ) for k ≥ 1. This shows that the sequence { Z _k } is ergodic.

We just showed that all assumptions of Theorem 1.2 are in force. Thus, N

⁻

^1/2 M N

converges in distribution to a zero-mean Gaussian variable with variance σ ² = E

π

[Z ₁ ² ] = lim

λ→

0 E

π

[(Z ₁

^λ

) ² ] = lim

λ→

0 h f

_λ

, (I − P ² ) f

_λ

i

π

. By (1.17), this expression is equal to σ ² (V ).

By Lemma 1.1, N

⁻

^1/2 R _N converges to 0 in L ² (P

_π

). Therefore, in view of the decomposition presented in Lemma 1.8, N

⁻

^1/2 ∑ 0

≤

j<N V (X j ) converges in distribu- tion to a zero-mean Gaussian variable with variance σ ² (V ). ⊓ ⊔

Remark 1.10 By Theorem 1.3, under the assumptions of Theorem 1.9, for every bounded continuous function f : R → R ,

lim

N

→∞

∑

x

∈E

π (x) E x

h f 1

√ N

N

−

1 j=0 ∑

V (X j ) i

− Z

f (u) Φ

_σ²_(V)

(u)du = 0 , where Φ

_σ²

is the density of the mean zero Gaussian distribution with variance σ ² ^.

In the countable case the previous result provides a central limit theorem almost sure with respect to the initial state. This is a special feature of the discrete setting.

In general we will have to content ourselves with a L ¹ -central limit theorem as stated above.

Remark 1.11 The Markov chain may have different ergodic measures π ^{. In this} case the CLT is valid for each ergodic measure, but in general the asymptotic vari- ance will depend on the particular ergodic measure: σ ² (V ) = σ ² (V, π ). This is the case of the exclusion processes which will be examined in detail in the next chapters.

Remark 1.12 A slightly deeper analysis permits to prove convergence of the partial sums to a Brownian motion with diffusion coefficient given by (1.15): For every T > 0,

√ 1 N

[Nt]

∑

j=0

V (X j ) + Nt − [Nt]

√ N V (X

_[Nt]+1

)

converges to a Brownian motion in C([0, T ]) (cf. Theorem 2.??).

(18)

Remark 1.13 Existence of a solution of the Poisson equation (1.2) in L ² (P

_π

) cor- responds to the condition (I − P)

⁻

¹ V ∈ L ² ( π ). In terms of the spectral measure of V , this is equivalent to require that

Z 1

−

1

1 (1 − ϕ ) ² µ V (d ϕ ) < ∞ ,

a stronger assumption than the hypothesis that V has a finite time-variance since in this case we only require (1.14).

1.6 The Space of finite time-variance functions.

We examine in this section the subspace of L ² ( π ) of functions with finite time- variance. Assume without loss of generality, that the set E is connected in the sense that for any x, y in E , there exists a path from x to y. Here and below, a path from x to y is a finite sequence { x ₀ , . . . , x _n } such that x ₀ = x, x _n = y, p(x i ,x i+1 ) > 0 for 0 ≤ i < n. It follows from the connectivity and the reversibility that π (x) > 0 for all x in E .

1.6.1 The space H ₁ .

A. An explicit formula for k·k 1 . Recall from Section 1.4 the definition of the semi- scalar product h· , ·i 1 . Fix a function f in L ² ( π ). A simple computation shows that

h f ,(I − P) f i

π

= ∑

x,y

∈E

π (x)p(x, y) f (x)[ f (x) − f (y)] .

We have seen that π (x)p(x, y) = π (y)p(y, x) because the process is reversible. We may therefore rewrite the previous expression as

(1/2) ∑

x,y

∈E

π (x)p(x, y) f (x)[ f (x) − f (y)]

+ (1/2) ∑

x,y

∈E

π (y)p(y, x) f (x)[ f (x) − f (y)] .

Renaming x and y in the second sum and adding the two sums, we conclude that h f , (I − P) f i

π

= (1/2) ∑

x,y

∈E

π (x)p(x, y)[ f (x) − f (y)] ² . (1.19)

B. The space H ₁ . Denote the right hand side of (1.19) by k f k ² 1 , which is well

defined for any function f : E → R . Let ˜ D be the space of functions f : E → R such

(19)

1.6 The Space of finite time-variance functions. 19 that k f k 1 < ∞:

D ˜ =

f : E → R : k f k 1 < ∞ .

It follows from (1.19) that k · k 1 is a semi-norm. Its kernel corresponds to the constant functions: Suppose that k f k 1 = 0 for some function f : E → R . By (1.19),

f (y) = f (x) if p(x, y) > 0. By the connectivity of E , f is constant.

Define the equivalence relation ∼ in ˜ D by stating that f ∼ g if k f − g k 1 = 0, i.e., if f − g is constant. Denote by D the equivalence classes of ˜ D : D = D| ˜

∼

.

The space D is complete. Consider a Cauchy sequence { f _n : n ≥ 1 } in D . For any x, y such that p(x,y) > 0, f _n (y) − f _n (x) is a Cauchy sequence and thus converges.

Since E is connected, f _n (y) − f _n (x) converges for all x, y in E . Fix a site x

^∗

in E and consider the function f : E → R defined by f (x

^∗

) = 0,

f (y) = lim

n

→∞

f _n (y) − f _n (x

^∗

) .

It is not difficult to show that f belongs to D and that f _n converges to f in D . Since the norm k · k 1 satisfies the parallelogram identity, D is a Hilbert space with the scalar product h· , ·i 1 defined by h f , g i 1 = (1/4) {k f + g k ² 1 − k f − g k ² 1 } . From (1.19) we have that

h g, f i 1 = (1/2) ∑

x,y

∈E

π (x)p(x, y)[ f (x) − f (y)] [g(x) − g(y)] (1.20)

for any function f , g in D .

Since (a − b) ² ≤ 2a ² + 2b ² , it follows from (1.19) that k f k ² 1 ≤ 2 k f k ² for any function f in L ² ( π ). Thus L ² ( π ) is contained in D . Denote by H ₁ the subspace of D generated by the functions in L ² ( π ) so that L ² ( π ) ⊂ H ₁ ⊂ D .

C. Exact forms. Let ν be the measure on E × E defined by ν (x, y) = (1/2) π (x)p(x, y) and denote by k · k

ν

the norm of L ² ( ν ). A function D in L ² ( ν ) is called an exact form if for all x in E and all finite paths { x ₀ , x ₁ , . . . , x _n } from x to x,

n

−

1 i=0 ∑

D(x i , x _i+1 ) = 0 . (1.21)

Let F be the subspace of exact forms in L ² ( ν ). Note that F is closed.

Fix an exact form D, a vertex x

^∗

in E and a constant c in R , since we assumed E to be connected, we may define a function f _D,c,x

∗

= f _D : E → R by

f _D (y) = c +

n

−

1 i=0 ∑

D(x i , x _i+1 ) ,

where { x ₀ , . . . ,x n } is a path from x

^∗

to y. The value of f _D at y does not depend on the specific path because D is an exact form. Of course, f _D may not belong to L ² ( π ) as in the case where c 6 = 0, D = 0.

or Chapter 5 of Durrett [1996]) and with the spectral theory of bounded symmet- ric operators (Section 107 in Riesz and Sz.-Nagy [1990], Section XI.6 in Yosida [1995]).

Chapter 1

A Warming-up Example

The purpose of this chapter is to present, in the simplest possible context, some of the ideas that will appear recurrently in this book. We assume that the reader is fa- miliar with the basic theory of Markov chains (e.g. Chapter 7 of Breiman [1968]

or Chapter 5 of Durrett [1996]) and with the spectral theory of bounded symmet- ric operators (Section 107 in Riesz and Sz.-Nagy [1990], Section XI.6 in Yosida [1995]).

Consider a Markov chain { X j : j ≥ 0 } on a countable state space E , stationary and ergodic with respect to a probability measure π . The problem is to find neces- sary and sufficient conditions on a function V : E → R to guarantee a central limit theorem for

√ 1 N

N

1

∑ j=0

V (X j ) . (1.1)

We assume that h V i

= 0. The idea is to relate this question to the well known martingale central limit theorems.

Denote by P the transition probability of the Markov chain and fix a function V in L 2 ( π ), the space of functions f : E → R square integrable with respect to π . Assume the existence of a solution of the Poisson equation

V = (I − P) f (1.2)

for some function f in L 2 ( π ), where I stands for the identity. For j ≥ 1, let Z j = f (X j ) − (P f )(X j

1 ) .

It is easy to check that M 0 = 0, M N = ∑ 1

j

N Z j is a martingale with respect to the filtration { F j : j ≥ 0 } , F j = σ (X 0 , . . . , X j ), and that

N

1 j=0 ∑

V (X j ) = M N − f (X N ) + f (X 0 ) . (1.3)

1

Since f is square integrable, the last two terms divided by N 1/2 vanish as N ↑ ∞ and the central limit theorem for N

1/2 ∑ 0

j<N V (X j ) follows from the central limit theorem for the martingale M N .

If the Markov chain has good mixing properties which guarantee the convergence of the series ∑ j

0 P j V in L 2 ( π ), the Poisson equation (1.2) has a solution given by f = ∑ j

0 P j V . Unfortunately, as it will be seen in the book, this is not the typical situation. Nevertheless, the central limit theorem can be established with weaker conditions approximating V , in a proper norm, by functions in the range of I − P.

The optimal situation is achieved in the special case where the invariant state π is reversible. It is shown in Theorem 1.9, the main result of this chapter, that in this case the finiteness of the limit variance

σ 2 (V ) = lim

N

1

N E h N

1

∑ j=0

V (X j ) 2 i

(1.4) is a necessary and sufficient condition for a central limit theorem for (1.1). Non- reversible chains require a deeper analysis presented in the next chapter.

The material is organized as follows. In Section 1.1 we introduce some termi- nology and prove a few elementary facts on Markov chains on countable state spaces. In the second section, we prove a central limit theorem for the sequence N

1/2 ∑ 0

1/2 ∑ 0

j<N V (X j ) showing that this sum can be approximated by a martingale.

1.1 Ergodic Markov Chains

We present in this section some elementary results on Markov chains. Fix a count- able state space E and a transition probability function P : E × E → R :

P(x, y) ≥ 0 , x, y ∈ E , ∑

y

P(x, y) = 1 , x ∈ E .

A sequence of random variables { X j : j ≥ 0 } defined on some probability space ( Ω , F , P) and taking values in E is a time-homogeneous Markov chain on E if

P[X j+1 = y | X j , . . . , X 0 ] = P(X j ,y) (1.5)

1.1 Ergodic Markov Chains 3 for all j ≥ 0, y in E . P(x,y) is called the probability of jump from x to y in one step. Notice that it does not depend on time, what explains the terminology of a time-homogeneous chain. The law of X 0 is called the initial state of the chain.

Assume furthermore that on ( Ω , F ) we are given a family of measures P z , z ∈ E , each satisfying (1.5) and such that P x [X 0 = x] = 1. We call it a Markov family that corresponds to the transition probabilities P( · , · ). For a given probability measure µ on E , let P

= ∑ x

µ (x) P x . Observe that µ is the initial state of the chain under P

. We shall denote by E

the expectation with respect to that measure and by E x

the expectation with respect to P x .

The transition probability P can be considered as an operator on C b ( E ), the space of (continuous) bounded functions on E . In this case, for f in C b ( E ), P f : E → E is defined by

(P f )(x) = ∑

y

P(x,y) f (y) = E[ f (X 1 ) | X 0 = x] . (1.6) We use the same notation P for the transition probability and for the operator on C b ( E ). In the countable case, we can think of P f as the product of the square matrix P with the column vector f .

Let µ P be the state of the process at time 1 if it starts from µ , i.e., the distribution at time 1 of the process starting at time 0 from µ :

( µ P)(x) = P

[X 1 = x] = ∑

y

µ (y)P(y,x) .

In the countable case, we can think of µ P as the product of the line vector µ with the square matrix P.

For n ≥ 1, we denote by P n the n-fold composition of P with itself so that P n (x, y) = ∑

x

P(x, x 1 )P(x 1 , x 2 ) ··· P(x n

2 , x n

1 )P(x n

1 , y)

for all x, y in E . In particular,

(P n f )(x) = E x [ f (X n )] , ( µ P n )(x) = ∑

y

µ (y)P y [X n = x] .

for all bounded function f and all probability measure µ . Hence, µ P n stands for the state of the process at time n if it starts from µ . By convention we let P 0 = I – the identity operator.

A probability measure π is said to be stationary or invariant for the chain if

π P = π . This happens if and only if the sequence of random variables { X j : j ≥ 0 } is

Consider a Markov chain { X _j : j ≥ 0 } on a countable state space E , stationary and ergodic with respect to a probability measure π . The problem is to find neces- sary and sufficient conditions on a function V : E → R to guarantee a central limit theorem for

Denote by P the transition probability of the Markov chain and fix a function V in L ² ( π ), the space of functions f : E → R square integrable with respect to π ^. Assume the existence of a solution of the Poisson equation

for some function f in L ² ( π ), where I stands for the identity. For j ≥ 1, let Z _j = f (X j ) − (P f )(X j

It is easy to check that M ₀ = 0, M _N = ∑ 1

N Z _j is a martingale with respect to the filtration { F _j : j ≥ 0 } , F _j = σ (X 0 , . . . , X _j ), and that

V (X j ) = M _N − f (X N ) + f (X 0 ) . (1.3)

Since f is square integrable, the last two terms divided by N ^1/2 vanish as N ↑ ∞ and the central limit theorem for N

^1/2 ∑ 0

0 P ^j V in L ² ( π ), the Poisson equation (1.2) has a solution given by f = ∑ j

0 P ^j V . Unfortunately, as it will be seen in the book, this is not the typical situation. Nevertheless, the central limit theorem can be established with weaker conditions approximating V , in a proper norm, by functions in the range of I − P.

σ ² (V ) = lim

N E h ^N

^1/2 ∑ 0

^1/2 ∑ 0

A sequence of random variables { X _j : j ≥ 0 } defined on some probability space ( Ω , F , P) and taking values in E is a time-homogeneous Markov chain on E if

P[X j+1 = y | X _j , . . . , X ₀ ] = P(X j ,y) (1.5)

1.1 Ergodic Markov Chains 3 for all j ≥ 0, y in E . P(x,y) is called the probability of jump from x to y in one step. Notice that it does not depend on time, what explains the terminology of a time-homogeneous chain. The law of X ₀ is called the initial state of the chain.

The transition probability P can be considered as an operator on C _b ( E ), the space of (continuous) bounded functions on E . In this case, for f in C _b ( E ), P f : E → E is defined by

P(x,y) f (y) = E[ f (X 1 ) | X ₀ = x] . (1.6) We use the same notation P for the transition probability and for the operator on C _b ( E ). In the countable case, we can think of P f as the product of the square matrix P with the column vector f .

Let µ P be the state of the process at time 1 if it starts from µ , i.e., the distribution at time 1 of the process starting at time 0 from µ ^:

For n ≥ 1, we denote by P ⁿ the n-fold composition of P with itself so that P ⁿ (x, y) = ∑

P(x, x ₁ )P(x 1 , x ₂ ) ··· P(x n

2 , x _n

₁ )P(x n

(P ⁿ f )(x) = E x [ f (X n )] , ( µ ^P ⁿ )(x) = ∑

for all bounded function f and all probability measure µ ^{. Hence,} µ ^P ⁿ stands for the state of the process at time n if it starts from µ . By convention we let P ⁰ = I – the identity operator.

π ^P = π . This happens if and only if the sequence of random variables { X _j : j ≥ 0 } is

We may extend the domain of the definition of the operator P given in (1.6) to L ² ( π ), the space of π -square integrable functions. It is indeed clear, by Schwarz inequality, that P f defined by (1.6) belongs to L ² ( π ) if f does since

π (x)[(P f )(x)] ² = ∑

P(x,y) f (y) ² = ∑

( π P)(y) f (y) ² = ∑

π (y) f (y) ²

because π is invariant. We have thus proved that P is a contraction in L ² ( π ):

stands for the scalar product in L ² ( π ). Let k · k be the norm associated to the scalar product h· , ·i

. Assume that there exists a stationary probability measure, denoted by π . By [Breiman, 1968, Theorem 7.16], π ^{is unique} and ergodic. In particular, for any bounded function g : E → R and any x in E ,

(P ^j g)(x) = E

Fix a function V : E → R in L ² ( π ) which has mean zero with respect to π ^{. We} prove in this section a central limit theorem for the sequence N

^1/2 ∑ ^N _j=0

¹ V (X j ) assuming that the solution of the Poisson equation (1.2) belongs to L ² ( π ). Under this hypothesis we obtain a central limit theorem which holds π -a.s. with respect to the initial state.

Theorem 1.1. Fix a function V : E → R in L ² ( π ) which has mean zero with respect to π . Assume that there exists a solution f in L ² ( π ) of the Poisson equation (1.2).

converges in P x distribution to a mean zero Gaussian random variable with variance σ ² (V ) = E

[ f ² ] − E

[(P f ) ² ].

Proof. Fix a mean zero function V in L ² ( π ) and an initial state x in E . By assump- tion, there exists a solution f in L ² ( π ) of the Poisson equation (1.2). Consider the sequence { Z _j : j ≥ 1 } of random variables defined by

Z _j = f (X j ) − P f (X j

1 ) . (1.8) The sequence { Z _j : j ≥ 1 } is adapted to the natural filtration of the Markov chain F _j = σ (X 0 , . . . , X _j ). Let M 0 = 0, M _j = ∑ 1

j Z _k , j ≥ 1. A simple computation shows that { M _j : j ≥ 1 } is a martingale adapted to the filtration { F _j : j ≥ 0 } .

Assume first that the solution f of the Poisson equation (1.2) is bounded. In this case, the random variables { Z _j : j ≥ 1 } are bounded. Thus, for | θ | small enough, we may define

A _j ( θ ) := log E x

i θ ^Z j

_j