Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates

(1)

HAL Id: hal-02292419

https://hal.archives-ouvertes.fr/hal-02292419

Submitted on 19 Sep 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates

J.-R Chazottes, J Moles, E Ugalde

To cite this version:

J.-R Chazottes, J Moles, E Ugalde. Gaussian Concentration bound for potentials satisfying Walters

condition with subexponential continuity rates. Nonlinearity, IOP Publishing, 2019. �hal-02292419�

(2)

Gaussian Concentration bound for potentials satisfying Walters condition with subexponential

continuity rates

J.-R. Chazottes

^∗1

, J. Moles

^†2

, and E. Ugalde

^‡2

1

Centre de Physique Th´ eorique, Ecole Polytechnique, CNRS, 91128 Palaiseau, France.

Email: chazottes@cpht.polytechnique.fr

2

Instituto de F´ısica, Universidad Aut´ onoma de San Luis Potos´ı, S.L.P., 78290 M´ exico

Emails: molesjdn@ifisica.uaslp.mx, ugalde@ifisica.uaslp.mx

Dated: September 10, 2019

Abstract

We consider the full shift T : Ω → Ω where Ω = A

^N

, A being a finite alphabet. For a class of potentials which contains in particular potentials φ with variation decreasing like O(n

^−α

) for some α > 2, we prove that their corresponding equilibrium state µ

φ

satisfies a Gaussian concentration bound. Namely, we prove that there exists a constant C > 0 such that, for all n and for all separately Lipschitz functions K(x

0

, . . . , x

_n−1

), the exponential moment of K(x, . . . , T

ⁿ⁻¹

x) − R

K(y, . . . , T

ⁿ⁻¹

y) dµ

φ

(y) is bounded by exp C P

n−1

i=0

Lip

_i

(K)

²

. The crucial point is that C is independent of n and K. We then derive various consequences of this inequality. For instance, we obtain bounds on the fluctuations of the empirical frequency of blocks, the speed of convergence of the empirical measure, and speed of Markov approximation of µ

φ

. We also derive an almost-sure central limit theorem.

Keywords: concentration inequalities, empirical measure, Kantorovich distance, Wasserstein distance, d-bar distance, relative entropy, Markov approximation, almost-sure central limit theorem.

∗

JRC benefited from an ECOS Nord project for his stay at San Luis Potos´ı.

†

JM benefited from an ECOS Nord project for his stay at Ecole Polytechnique.

‡

EU acknowledges Ecole Polytechnique for financial support (one-month stay).

(3)

1 Introduction 2

2 Setting and preliminary results 4

3 Main result and applications 8

3.1 Gaussian concentration bound . . . . 8

3.2 Related works . . . . 10

3.3 Applications . . . . 11

3.3.1 Birkhoff sums . . . . 11

3.3.2 Empirical frequency of blocks . . . . 12

3.3.3 Hitting times and entropy . . . . 13

3.3.4 Speed of convergence of the empirical measure . . . . . 14

3.3.5 Relative entropy, ¯ d-distance and speed of Markov approximation . . . . 15

3.3.6 Shadowing of orbits . . . . 19

3.3.7 Almost-sure central limit theorem . . . . 20

4 Proof of Theorem 3.1 22 4.1 Some preparatory results . . . . 22

4.2 Proof of Theorem 3.1 . . . . 24

1 Introduction

We consider the full shift T : Ω → Ω where Ω = A

^N

, A being a finite alphabet.

Given an ergodic measure µ on Ω and a continuous observable f : Ω → R , we know by Birkhoff’s ergodic theorem that n

⁻¹

S

n

f (x) converges, for µ-almost every x, to R

f dµ. (We use the standard notation S

_n

f = f + f ◦ T + · · · + f ◦ T

ⁿ⁻¹

.) To refine this result, we need more assumptions on µ and f . For instance, if µ = µ

_φ

is the equilibrium state for a Lipschitz potential φ : Ω → R and f is also a Lipschitz function, then the following central limit theorem holds:

n→∞

lim µ

φ

x : S

_n

f (x) − n R f dµ

_φ

√ n ≤ u

= 1

σ √ 2π

Z

u

−∞

e

⁻

ξ2

2σ2

dξ (1.1) for all u ∈ R , where σ

²

= σ

_f²

is the variance of the process {f (T

ⁿ

x)}

_n≥0

where x is distributed according to µ

φ

.

¹

This result says in essence that the fluctuations of S

n

f (x) − n R

f dµ

_φ

are with high probability of order √

n, when n → ∞. Fluctuations of order n, referred to as ‘large deviations’, are unlikely

1

σ

²_f

= R

f

²

dµ

φ

− R fdµ

φ

2

+ 2 P

`≥1

R f · f ◦ T

^`

dµ

φ

− R fdµ

φ

2

(4)

to appear. Indeed, for instance one has µ

_φ

x : 1

n S

_n

f (x) ≥ Z

f dµ

_φ

+ u

e

^−nI(u+^R^f^dµ^φ⁾

(1.2) where u ≥ 0 and I (u) ≥ 0 is the so-called ‘rate function’ which is (strictly) convex, such that I( R

f dµ

_φ

) = 0, and equal to +∞ outside a certain finite interval (u

f

, u ¯

f

).

²

Of course, both the central limit theorem and the large deviation asymptotics have been obtained for more general potentials, and for more general ‘chaotic’ dynamical systems. For a fairly recent review on prob- abilistic properties of nonuniformly hyperbolic dynamical systems modeled by Young towers, we refer to [4].

In this paper, we are interested in concentration inequalities which describe the fluctuations of observables of the form K(x, T x, . . . , T

ⁿ⁻¹

x) around their average. The only restriction on K is that it has to be separately Lipschitz.

By this we mean that, for all i = 0, . . . , n − 1, there exists a constant Lip

_i

(K) with

|K(x

₀

, . . . , x

_i

, . . . , x

n−1

) − K(x

₀

, . . . , x

⁰_i

, . . . , x

n−1

)| ≤ Lip

_i

(K) d(x

_i

, x

⁰_i

).

for all points x

₀

, . . . , x

_i

, . . . , x

n−1

, x

⁰_i

in Ω, where d is the usual distance on Ω (see (2.1)). So K can be nonlinear and implicitly defined. Of course, such a class contains partial sums of Lipschitz functions, namely functions of the form K(x

₀

, . . . , x

n−1

) = f (x

₀

) + · · · + f (x

n−1

) for which Lip

_i

(K ) = Lip(f ) for all i.

Beside considering very general observables, the other essential characteristics of concentration inequalities is that they are valid for all n, contrarily to the above two results which are valid only in the limit n → ∞. More precisely, we shall prove the following ‘Gaussian concentration bound’. There exists a constant C such that, for all n and for all separately Lipschitz functions K(x

₀

, . . . , x

n−1

), we have

Z

exp K x, T x, . . . , T

ⁿ⁻¹

x

dµ

φ

(x)

≤ exp Z

K x, T x, . . . , T

ⁿ⁻¹

x

dµ

_φ

(x)

exp



C

n−1

X

j=0

Lip

_j

(K)

²



 . (1.3) The crucial point is that C is independent of n and K. By a standard argument (see below), the previous inequality implies that for all u > 0

µ

_φ

x : K x, T x, . . . , T

ⁿ⁻¹

x

− Z

K y, T y, . . . , T

ⁿ⁻¹

y

dµ

_φ

(y) ≥ u

≤ exp − u

²

4C P

n−1

i=0

Lip

_i

(K)

²

!

. (1.4)

2

For two positive sequences (a

n

), (b

n

), a

n

b

n

means that lim

n

(1/n) log a

n

=

lim

n

(1/n) log b

n

.

(5)

The Gaussian concentration bound (1.3) is known for Lipschitz potentials [7]. We shall prove that it remains true for a large subclass of potentials φ satisfying Walters condition. For instance, the bound holds for a potential whose variation is O(n

^−α

) for some α > 2. The proof of our result relies on two main ingredients. First, we start with a classical decomposition of K − R

K as a telescopic sum of martingale differences. Second, we have to do a second telescoping to use Ruelle’s Perrons-Frobenius operator. But we do not have a spectral gap anymore as in [7] (in the case of Lipschitz potentials). Instead, we use a result of V. Maume-Deschamps [18] based on Birkhoff cones.

We apply the Gaussian concentration bound and its consequences, like (1.4), to various observables. On the one hand, we obtain concentration bounds for previously studied observables. We get the same bounds but they are no more limited to equilibrium states with Lipschitz potentials. On the other hand, we consider observables not considered before. Even when K(x, . . . , T

ⁿ⁻¹

x) = S

_n

f (x), we get a non-trivial bound. We then obtain a control on the fluctuations of the empirical frequency of blocks a

⁰

, . . . , a

^k−1

around µ([a

⁰

, . . . , a

^k−1

]), uniformly in a

⁰

, . . . , a

^k−1

∈ A

^k

. We then consider an estimator of the entropy µ

_φ

based on hitting times. The next application is about the speed of convergence of the empirical measure (1/n) P

n−1

i=0

δ

_Tix

towards µ

φ

in Wasserstein distance. Then we obtain an upper bound for the d-distance between any shift-invariant probability measure and ¯ µ

_φ

. This distance is bounded by the square root of their relative entropy, times a constant.

A consequence of this inequality is a bound for the speed of convergence of the Markov approximation of µ

_φ

in ¯ d-distance. Then we quantify the ‘shadowing’

of an orbit by another one which has to start in a subset of Ω with µ

_φ

-measure 1/3, say. Finally, we prove an almost-sure version of the central limit theorem.

This application shows in particular that concentration inequalities can also be used to obtain limit theorems.

2 Setting and preliminary results

Let Ω = A

^N

where A is a finite set. We denote by x = x

⁰

x

¹

. . . the elements of Ω (hence x

ⁱ

∈ A), and by T the shift map: (T x)

^k

= x

^k+1

, k ∈ N . (We use upper indices instead of lower indices because we will need to consider bunches of points in Ω, e.g., x

0

, x

1

, . . . , x

p

, x

i

∈ Ω.) We use the classical distance

d

θ

(x, y) = θ

^inf^{k:x^k^6=y^k^}

(2.1) where θ ∈ (0, 1) is some fixed number. Probability measures are defined on the Borel sigma-algebra of Ω which is generated by cylinder sets. Let φ : Ω → R be a continuous potential, which means that

var

_n

(φ) := sup{|φ(x) − φ(y)| : x

ⁱ

= y

ⁱ

, 0 ≤ i ≤ n − 1} −−−→

^n→∞

0.

(6)

The sequence (var

_n

(φ))

n≥1

is the modulus of continuity of φ and it is called the ‘variation’ of φ in our context. By the way, we denote by C (Ω) the Banach space of real-valued continuous functions on Ω equipped with the supremum norm k · k

_∞

. We put further restrictions on φ, namely that it must satisfy the Walters condition [22]. For x, y in Ω let

W (φ, x, y) = sup

n∈N

sup

a∈Aⁿ

|S

_n

φ(ax) − S

_n

φ(ay)| .

We assume that W (φ, x, y) exists and that there exists W (φ) > 0 such that sup

x,y∈Ω

W (φ, x, y) ≤ W (φ) . (2.2)

Now for p ∈ N let

W

p

(φ) := sup{W (φ, x, y) : x

ⁱ

= y

ⁱ

, 0 ≤ i ≤ p − 1} .

Definition 2.1. φ is said to satisfy Walters’ condition if (W

p

(φ))

p∈N

is a strictly positive sequence and decreases to 0 as p → ∞.

We now make several remarks on Walters’ condition. First, observe that locally constant potentials do not satisfy this condition because W

_p

(φ) = 0 for all p larger than some p

₀

. But one can in fact work with any strictly positive sequence ( W f

p

(φ))

_p∈N

decreasing to zero such that W

p

(φ) ≤ W f

p

(φ) for all p, e.g., max(W

_p

(φ), η

^p

) for some fixed η ∈ (0, 1). Second, one easily checks that

var

p+1

(φ) ≤ W

p

(φ) ≤

∞

X

k=p+1

var

_k

(φ) , p ∈ N . (2.3) Hence the set of potentials satisfying Walters’ condition contains the set of potentials with summable variation. In particular, (W

_p

(φ))

_p

is bounded above by a geometric sequence if and only if (var

p

(φ))

p

is also bounded above by a geometric sequence. This corresponds to the case of Lipschitz or H¨ older potentials (with respect to d

_θ

).

Now define Ruelle’s Perron-Frobenius operator P

φ

: C (Ω) → C (Ω) as P

_φ

f (x) = X

T y=x

f (y) e

^φ(y)

.

The next step is to define a function space preserved by P

_φ

and on which it has good spectral properties. We take the space of Lipschitz functions with respect to a new distance d

_φ

built out of φ as follows.

Definition 2.2 (The distance d

_φ

). For x, y ∈ Ω let

d

φ

(x, y) = W

p

(φ) if d

θ

(x, y) = θ

^p

and d

_φ

(x, x) = 0.

(7)

Now define

L

_φ

= {f ∈ C (Ω) : ∃M > 0 such that var

_n

(f ) ≤ M W

_n

(φ), n = 1, 2, . . .}

and

Lip

_φ

(f ) = sup

|f (x) − f (y)|

d

_φ

(x, y) : x 6= y

= sup

var

n

(f)

W

_n

(φ) : n ∈ N

.

One can then define a norm on L

_φ

, making it a Banach space, by setting kf k

L_φ

= kf k

∞

+ Lip

_φ

(f ).

Remark 2.1. The usual Banach space of Lipschitz functions is defined as follows. Let

L

_θ

= {f ∈ C (Ω) : ∃M > 0 such that var

n

(f ) ≤ M θ

ⁿ

, n = 1, 2, . . .}

and

Lip

_θ

(f) = sup

|f(x) − f (y)|

d

θ

(x, y) : x 6= y

= sup

var

_n

(f )

θ

ⁿ

: n ∈ N

. The canonical norm making L

_θ

a Banach space is kfk

_L_θ

= kf k

_∞

+ Lip

_θ

(f ).

In view of (2.3), if we have W

n

(φ) = O(θ

ⁿ

), then L

_θ

= L

_φ

. If we now have, for instance, W

n

(φ) = O(n

^−q

) for some q > 0, then we get a bigger space which contains in particular all functions f such that var

_n

(f ) = O(n

^−r

) with r ≥ q.

The following result is instrumental to this article. In brief, it tells us that a potential φ satisfying Walters’ condition has a unique equilibrium state, which will be denoted by µ

_φ

, and gives a speed of convergence for the properly normalized iterates of the associated Ruelle’s Perron-Frobenius operator. The first part of the theorem is due to Walters, while the second one is due to Maume-Deschamps and can be found in her PhD thesis [18, Chapter I.2].

Unfortunately, her result was not published even though it is much sharper than the result in [16].

Theorem 2.1 ([22], [18]). Let φ : Ω → R satisfying Walters’ condition as above. Then the following holds.

A. There exists a unique triplet (h

_φ

, λ

_φ

, ν

_φ

) such that h

_φ

∈ L

_φ

and is strictly positive, k log h

_φ

k

_∞

< ∞, λ

_φ

> 0, ν

_φ

a fully supported probability measure such that R

h

φ

dν

φ

= 1. Moreover, P

φ

h

φ

= λ

φ

h

φ

and P

_φ^∗

ν

φ

= λ

φ

ν

φ

, and φ has a unique equilibrium state µ

_φ

= h

_φ

ν

_φ

which is mixing.

³

3

This means that for any pair of cylinders B, B

⁰

lim

n→∞

µ

φ

(B ∩ T

⁻ⁿ

B

⁰

) = µ

φ

(B )µ

φ

(B

⁰

).

In particular µ

φ

is ergodic.

(8)

B. There exists a positive sequence (

_n

)

n∈N

converging to zero, such that, for any f ∈ L

_φ

,

P

_φⁿ

f λ

ⁿ_φ

− h

_φ

Z f dν

_φ

_∞

≤ C

_(2.1)

_n

kfk

_L_φ

, ∀n ∈ N . (2.4) Morover, one has the following behaviors:

1. If W

n

(φ) = O(η

ⁿ

) for some η ∈ (0, 1), then there exists η

⁰

∈ (0, 1) such that

_n

= O(η

⁰ⁿ

).

2. If W

_n

(φ) = O(n

^−α

) for some α > 0, then

_n

= O(n

^−α

).

3. If W

n

(φ) = O(θ

^(logⁿ⁾^α

) for some θ ∈ (0, 1) and α > 1, then, for any > 0,

_n

= O(θ

^(logⁿ⁾^α−

).

4. If W

n

(φ) = O(e

^−cn^α

) for some c > 0 and α ∈ (0, 1), then there exists c

⁰

> 0 such that

_n

= O e

^−c⁰ⁿ

α α+1

.

The fact that µ

_φ

is an equilibrium state means that it maximizes the func- tional µ 7→ h(ν) + R

φ dν over the set of shift-invariant probability measures on Ω, where h(ν) is the entropy of ν, and the maximum is equal to the topological pressure P (φ) of φ (see e.g. [15]), and we have P (φ) = log λ

_φ

.

Let us give examples of potentials. First consider A = {−1, 1} and p > 1, and define

φ(x) = − X

n≥2

x

⁰

x

ⁿ⁻¹

n

^p

.

One can check that W

_n

(φ) = O(n

^−p+2

). This is the analog of the so-called long-range Ising model on N . Let us now take A = {0, 1} and let [0

^k

1] = {x ∈ Ω : x

ⁱ

= 0, 0 ≤ i ≤ k − 1, and x

_k

= 1}. Let (v

_n

) be a monotone decreasing sequence of real numbers converging to 0 and define

φ(x) =



 

 

v

k

if x ∈ [0

^k

1]

0 if x = (0, 0, . . .) 0 otherwise .

One can check that var

_n

(φ) = v

_n

. This example is taken from [19].

Remark 2.2. Let us briefly explain how we can interpret an equilibrium state

for a non Lipschitz potential as an absolutely continuous invariant measure of

a piecewise expanding map of the unit interval with a Markov partition. It is

well-known that a uniformly expanding map S of the unit interval with a finite

Markov partition which is piecewise C

^1+η

, for some η > 0, can be coded by

a subshift of finite type (Ω, T ) over a finite alphabet. Then, − log |S

⁰

| induces

a potential φ on Ω which is Lipschitz (with respect to d

_θ

). The pullback of

µ

_φ

is then the unique absolutely continuous invariant probability measure for

(9)

S. In [10], the authors showed that, given φ which is not Lipschitz, one can construct a uniformly expanding map of the unit interval with a finite Markov partition which is piecewise C

¹

, but not piecewise C

^1+η

for any η > 0, and such that the pullback of µ

_φ

is the Lebesgue measure.

3 Main result and applications

3.1 Gaussian concentration bound

We can now state our main theorem whose proof is deferred to Section 4. We start by the definition of separately d

_θ

-Lipschitz functions.

Definition 3.1. A function K : Ω

ⁿ

→ R is said to be separately d

_θ

-Lipschitz if, for all i, there exists a constant Lip

_θ,i

(K) with

|K(x

₀

, . . . , x

_i

, . . . , x

n−1

) − K(x

₀

, . . . , x

⁰_i

, . . . , x

n−1

)| ≤ Lip

_θ,i

(K) d

_θ

(x

_i

, x

⁰_i

).

for all points x

₀

, . . . , x

_i

, . . . , x

n−1

, x

⁰_i

in Ω.

Theorem 3.1. Suppose that φ satisfies one of the following conditions:

1. W

n

(φ) = O(θ

ⁿ

) (that is, φ is d

_θ

-Lipschitz);

2. W

_n

(φ) = O(n

^−α

) for some α > 1;

3. W

_n

(φ) = O(θ

^(logⁿ⁾^α

) for some θ ∈ (0, 1) and α > 1;

4. W

n

(φ) = O(e

^−cn^α

) for some c > 0 and α ∈ (0, 1).

Then the process (x, T x, . . .), with x distributed according to µ

_φ

, satisfies the following Gaussian concentration bound. There exists C

_(3.1)

> 0 such that for any n ∈ N and for any separately d

_θ

-Lipschitz function K : Ω

ⁿ

→ R , we have

Z

exp K x, T x, . . . , T

ⁿ⁻¹

x

dµ

_φ

(x)

≤ exp Z

K x, T x, . . . , T

ⁿ⁻¹

x

dµ

φ

(x)

exp



C

_(3.1)

n−1

X

j=0

Lip

_θ,j

(K)

²



 . (3.1) Three remarks are in order. First, we conjecture that this theorem is valid under the condition P

n

var

n

(φ) < ∞. Second, it would be useful to have an

explicit formula for C

_(3.1)

in (3.1). Unfortunately, this constant is proportional

to C

_(2.1)

(see Theorem 2.1) which is cumbersome since it involves the eigendata

of P

_φ

. Third, for the sake of simplicity, we considered the full shift A

^N

. In

fact, our results remain true if Ω ⊂ A

^N

is a topologically mixing one-sided

subshift of finite type. Moreover, one can extend Theorem 3.1 to bilateral

subshifts of finite type by a trick used in [7].

(10)

We now give some corollaries of our main theorem that we will be used in the section on applications. First, by (2.3) we immediately obtain the following corollary.

Corollary 3.2. If there exists α > 2 such that var

n

(φ) = O

1 n

^α

then we have the Gaussian concentration bound (3.1).

Next, we get the following concentration inequalities from (3.1).

Corollary 3.3. For all u > 0, we have µ

_φ

x ∈ Ω : K x, T x, . . . , T

ⁿ⁻¹

x

− Z

K y, T y, . . . , T

ⁿ⁻¹

y

dµ

_φ

(y) ≥ u

≤ exp − u

²

4C

_(3.1)

P

n−1

i=0

Lip

_θ,i

(K)

²

!

(3.2) and

µ

_φ

x ∈ Ω :

K x, T x, . . . , T

ⁿ⁻¹

x

− Z

K y, T y, . . . , T

ⁿ⁻¹

y

dµ

_φ

(y)

≥ u

≤ 2 exp − u

²

4C

_(3.1)

P

n−1

i=0

Lip

_θ,i

(K)

²

!

. (3.3)

Proof. Inequality (3.2) follows by a well-known trick referred to as Chernoff’s bounding method [2]. Let us give the proof for completeness. Let u > 0.

For any random variable Y , Markov’s inequality tells us that P (Y ≥ u) ≤ e

^−ξu

E e

^ξY

for all ξ > 0. Now let Y = K x, T x, . . . , T

ⁿ⁻¹

x

− Z

K y, T y, . . . , T

ⁿ⁻¹

y

dµ

φ

(y) .

Using (3.1) and optimizing over ξ, we get (3.2). Inequality (3.3) follows by applying (3.2) to −K and then summing up the two bounds.

The last corollary we want to state is about the variance of any separately d

_θ

-Lipschitz function.

Corollary 3.4. We have Z

K x, T x, . . . , T

ⁿ⁻¹

x

− Z

K y, T y, . . . , T

ⁿ⁻¹

y

dµ

_φ

(y)

2

dµ

_φ

(x)

≤ 2C

_(3.1)

n−1

X

i=0

Lip

_θ,i

(K)

²

. (3.4)

(11)

Proof. To alleviate notations, we simply write K instead of K x, T x, . . . , T

ⁿ⁻¹

x , R K instead of R

K y, T y, . . . , T

ⁿ⁻¹

y

dµ

φ

(y), and so on and so forth. Apply- ing (3.1) to ξK where ξ is any real number different from 0, we get

Z exp

ξ

K − Z

K

≤ exp



C

_(3.1)

ξ

²

n−1

X

j=0

Lip

_θ,j

(K)

²



 .

Now by Taylor expansion we get 1 + ξ

²

2 Z

K − Z

K

2

+ o(ξ

²

) ≤ 1 + C

_(3.1)

ξ

²

n−1

X

j=0

Lip

_θ,j

(K)

²

+ o(ξ

²

) . Dividing by ξ

²

on both sides and then taking the limit ξ → 0, we obtain the desired inequality.

Although we were not able to prove the Gaussian concentration bound for separately d

_φ

-Lipschitz functions, for many applications separately d

_θ

- Lipschitz functions are more natural. Furthermore there is a notable class of separately d

_φ

-Lipschitz functions, namely Birkhoff sums of the potential itself, for which our theorem holds. Indeed, when φ ∈ L

_φ

, the function K(x, . . . , T

ⁿ⁻¹

x) = S

_n

φ(x) is obviously separately d

_φ

-Lipschitz and Lip

_φ,j

(K) = Lip

_φ

(φ) for all j. We have the following result.

Theorem 3.5. Under the hypotheses of Theorem 3.1, there exists C

_(3.5)

> 0 such that, for any ψ ∈ L

_φ

, for all u > 0, and for all n ∈ N , we have

µ

_φ

x ∈ Ω : 1

n S

_n

ψ(x) − Z

ψ dµ

_φ

≥ u

≤ exp

− nu

²

4C

_(3.5)

Lip

_φ

(ψ)

²

. (3.5)

The proof is left to the reader. The main (simple) modification lies in the proof of Lemma 4.3 in which considering a Birkhoff sum of a d

_φ

-Lipschitz function works fine, whereas we are stuck for a general separately d

φ

-Lipschitz function.

We will apply this result with ψ = −φ to derive concentration bounds for hitting times. Note that under the assumptions of this theorem, {ψ(T

ⁿ

x)}

_n≥0

satisfies the central limit theorem [18, Chapter 2].

3.2 Related works

The novelty here is to prove a Gaussian concentration bound for potentials

with a variation decaying subexponentially. For φ is Lipschitz, Theorem 3.1

was proved in [7]. The main goal of [7] was then to deal with nonuniformly

(12)

hyperbolic systems modeled by a Young tower. For a tower with a return- time to the base with exponential tails, the authors of [7] proved a Gaussian concentration bound. For polynomial tails, they proved moment concentration bounds. For C

^1+η

maps of the unit interval with an indifferent fixed point, which are thus nonuniformly expanding, we are in the latter situation. In view of Remark 2.2 above, we deal here with maps whose derivative is not H¨ older continuous, but which are still uniformly expanding.

Let us also mention the paper [14] in which the authors prove a Gaus- sian concentration bound for φ of summable variation (whereas we need a bit more than summable). Their proof is based on coupling. However, they consider functions K on A

ⁿ

, not on A

^N

n

= Ω

ⁿ

as in this paper. For such functions, the analogue of Lip

_θ,i

(K) is δ

i

(K) = sup{|K(a

⁰

, . . . , a

ⁱ

, . . . , a

ⁿ⁻¹

) − K(b

⁰

, . . . , b

ⁱ

, . . . , b

ⁿ⁻¹

)| : a

^j

= b

^j

, ∀j 6= i}. It is clear that a Gaussian concentration bound for functions K : A

^N

n

→ R implies a Gaussian concentration bound for functions K : A

ⁿ

→ R , but the converse is not true.

3.3 Applications

We now give several applications of the Gaussian concentration bound (3.1) and its corollaries. Throughout this section, µ

φ

is the equilibrium state for a potential φ satisfying one of the conditions 1-4 in Theorem 3.1.

3.3.1 Birkhoff sums

Let f : Ω → R be a d

θ

-Lipschitz function and define K(x

₀

, . . . , x

n−1

) = f (x

₀

) + · · · + f (x

n−1

)

whence K(x, T x, . . . , T

ⁿ⁻¹

x) = f (x) + f (T x) + · · · + f (T

ⁿ⁻¹

x) := S

_n

f(x) is the Birkhoff sum of f . Clearly, Lip

_θ,i

(K) = Lip

_θ

(f ) for all i = 0, . . . , n − 1.

Applying Corollary 3.3 we immediately get µ

φ

x :

S

n

f(x)

n −

Z f dµ

φ

≥ u

≤ 2 exp −c

_(3.6)

nu

²

(3.6) for all n ≥ 1 and u ∈ R

+

, where

c

_(3.6)

= 1

4C

_(3.1)

Lip

_θ

(f )

²

.

This bound can be compared with the large deviation asymptotics (1.2). We see that it has the right behavior in n. Replacing u by u/ √

n in (3.6) we get µ

_φ

x :

S

_n

f(x) − n Z

f dµ

_φ

≥ u √ n

≤ 2 exp −c

_(3.6)

u

²

for all n and u > 0. This can be compared with the central limit theorem (1.1). We can see that the previous bound is consistent with that theorem.

Note that the central limit is about convergence in law, whereas here we obtain

a (non-asymptotic) bound from which one cannot deduce a convergence in law.

(13)

3.3.2 Empirical frequency of blocks Take f (x) = 1

[a^0,k−1]

(x) where

[a

^0,k−1

] = {x ∈ Ω : x

ⁱ

= a

ⁱ

, i = 0, . . . , k − 1}

is a given k-cylinder. Let

f

_n

(x, a

^0,k−1

) = P

n−k

k=0

1

[a^0,k−1]

(T

^k

x) n − k + 1 .

This is the ‘empirical frequency’ of the block a

^0,k−1

∈ A

^k

in the orbit of x up to time n − k. By Birkhoff’s ergodic theorem, we know that, for each a

^0,k−1

, f

_n

(x, a

^0,k−1

) goes to µ

_φ

([a

^0,k−1

]) for µ

_φ

-almost all x. The next theorem quantifies this asymptotic statement. Notice that we can control the fluctuations of f

_n

(x, a

^0,k−1

) around µ

φ

([a

^0,k−1

]) uniformly in a

^0,k−1

.

Theorem 3.6. For all n ∈ N , for all 1 ≤ k ≤ n and for all u > 0 we have µ

_φ

x : max

a^0,k−1

f

_n

(x, a

^0,k−1

) − µ

_φ

([a

^0,k−1

])

≥ (u + c √ k ) θ

^−k

√ n − k + 1

!

≤ e

⁻

u2 4C(3.1)

where c = 2 q

2C

_(3.1)

log |A|. Moreover, if k = k(n) = ζ log n for some ζ > 0, then

µ

_φ

x : max

a^0,k(n)−1

f

_n

(x, a

^0,k(n)−1

) − µ

_φ

([a

^0,k(n)−1

])

≥ (u + c

⁰

√

log n )n

^ζ|^log^θ|

p n − k(n) + 1

!

≤ e

⁻

u2 4C(3.1)

where c

⁰

= 2 q

2ζC

_(3.1)

log |A|.

Proof. Define the function K : Ω

^n−k+1

→ R by K(x

₀

, . . . , x

n−k

) = max

a^0,k−1

Z(a

^0,k−1

; x

₀

, . . . , x

n−k

) where

Z (a

^0,k−1

; x

₀

, . . . , x

n−k

) =

P

n−k

j=0

1

_[a^0,k−1_]

(x

_j

)

n − k + 1 − µ

_φ

([a

^0,k−1

]) .

It is left to the reader to check that Lip

_θ,j

(K) =

^Lip_n−k+1^θ^(f⁾

=

_θk(n−k+1)¹

, so we get immediately from 3.6

µ

_φ

x ∈ Ω : K(x, T x, . . . , T

^n−k

x) ≥ u + Z

K

y, T y, . . . , T

^n−k

y

dµ

_φ

(y)

≤ exp

− θ

^2k

4C

_(3.1)

(n − k + 1)u

²

(14)

for all n ≥ 1 and u > 0. To complete the proof, we need a good upper bound for R

K y, T y, . . . , T

^n−k−1

y

dµ

φ

(y). Actually, this can be done by using again the Gaussian concentration bound. Using (3.1) and Jensen’s inequality we get for any ξ > 0

exp

ξ Z

K

x, T x, . . . , T

^n−k

x

dµ

_φ

(x)

≤ Z

exp

ξ max

a^0,k−1

Z(a

^0,k−1

; x, T x, . . . , T

^n−k

x)

dµ

φ

(x)

≤ X

a^0,k−1∈A^k

Z exp

ξZ(a

^0,k−1

; x, T x, . . . , T

^n−k

x)

dµ

_φ

(x)

≤ 2|A|

^k

exp C

_(3.1)

θ

^−2k

ξ

²

n − k + 1

! .

The third inequality is obtained by using the trivial inequality e

^max^pⁱ⁼¹^aⁱ

≤

p

X

i=1

e

^aⁱ

.

Taking logarithms on both sides and then dividing by ξ, we have Z

K

x, T x, . . . , T

^n−k

x

dµ

_φ

(x) ≤ log 2 + k log |A|

ξ + C

_(3.1)

θ

^−2k

ξ n − k + 1 . There is a unique ξ > 0 minimizing the right-hand side, hence

Z K

x, T x, . . . , T

^n−k

x

dµ

φ

(x) ≤ 2θ

^−k

s

C

_(3.1)

(k + 1) log |A|

n − k + 1 where we used that log 2 ≤ log |A|. Hence we get the desired estimate.

Note that log |A| is the topological entropy of the full shift with alphabet A.

3.3.3 Hitting times and entropy For x, y ∈ Ω, let

T

_x^0,n−1

(y) = inf{j ≥ 1 : y

^j,j+n−1

= x

^0,n−1

} .

This is the first time that the n first symbols of x appear in y. We assume that φ satisfies

var

n

(φ) = O 1

n

^α

for some α > 2 . (3.7)

(15)

One can prove (see [9]) that

n→∞

lim 1

n log T

_x^0,n−1

(y) = h(µ

_φ

) , for µ

_φ

⊗ µ

_φ

-almost every (x, y) . Roughly, this means that, if we pick x and y independently, each one according to µ

_φ

, then the time it takes to see the first n symbols of x appearing in y for the first time is ≈ e

^nh(µ^φ⁾

.

Theorem 3.7. If φ satisfies (3.7), then there exist strictly positive constants c

₁

, c

₂

and u

₀

such that, for all n and for all u > u

₀

,

(µ

φ

⊗ µ

φ

) n

(x, y) : 1

n log T

_x^0,n−1

(y) ≥ h(µ

φ

) + u o

≤ c

1

e

^−c²^nu²

and

(µ

φ

⊗ µ

φ

) n

(x, y) : 1

n log T

_x^0,n−1

(y) ≤ h(µ

φ

) − u o

≤ c

1

e

^−c²^nu

.

These bounds were obtained in [8] when φ is Lipschitz. Observe that the probability of being above h(µ

_φ

) is bounded above by c

₁

e

^−c²^nu²

, whereas the probability of being below h(µ

_φ

) is bounded above by c

1

e

^−c²^nu

. The proof of this theorem being very similar to that given in [8], we omit the details and only sketch it. We cannot directly deal with T

_x^0,n−1

(y) but we have log T

_x^0,n−1

(y) = log T

_x^0,n−1

(y)µ

φ

([x

^0,n−1

])

− log µ

φ

([x

^0,n−1

]). Then we use Theorem 3.5 for ψ = −φ, assuming (without loss of generality) that P(φ) = 0, that is, h(µ

_φ

) = − R

φ dµ

_φ

, because we can control uniformly in x the approximation − log µ

φ

([x

^0,n−1

]) ≈ S

n

(−φ)(x). To control the other term, we use that the law of T

_x^0,n−1

(y)µ

_φ

([x

^0,n−1

]) is well approximated by an exponential law.

Another estimator of h(µ

_φ

) is the so-called plug-in estimator. We could also obtain concentration bounds for it in the spirit of [8].

3.3.4 Speed of convergence of the empirical measure

Instead of looking at the frequency of a block a

^k₁

we can consider a global object, namely the empirical measure

E

n

(x) = 1 n

n−1

X

j=0

δ

_Tjx

.

For µ

φ

-almost every x, we know that 1 n

n−1

X

j=0

δ

_T^j_x

−−−→

^n→∞

µ

_φ

(16)

where the convergence is in the weak topology on the space of probability measures M (Ω) on Ω. This is a consequence of Birkhoff’s ergodic theorem.

The natural question is: how fast does this convergence takes place? We can answer this question by using the Kantorovich distance d

K

which metrizes weak topology on M (Ω):

d

K

(ν

₁

, ν

₂

) = sup Z

f dν

₁

− Z

f dν

₂

: f : Ω → R such that Lip

_θ

(f ) = 1

.

We have the following result.

Theorem 3.8. For all u > 0 and all n ≥ 1 we have µ

φ

x :

d

K

(E

n

(x), µ

φ

) − Z

d

K

(E

n

(y), µ

φ

) dµ

φ

(y) ≥ u

≤ 2 e

^−c^(3.8)^nu²

(3.8) where c

_(3.8)

= (4C

_(3.1)

)

⁻¹

.

Proof. Let

K(x

0

, . . . , x

n−1

) = sup





 1 n

n−1

X

j=0

f (x

j

) − Z

fdµ

_φ

: f : Ω → R with Lip

_θ

(f ) = 1





 .

Of course, K(x, T x, . . . , T

ⁿ⁻¹

x) = d

K

( E

n

(x), µ

_φ

). It is left to the reader to check that

Lip

_θ,i

(K ) ≤ 1

n , i = 0, . . . , n − 1 . The result follows at once by applying inequality (3.3).

It is natural to ask for a good upper bound for R

d

K

( E

n

(y), µ

_φ

)dµ

_φ

(y) because this would give a control on the fluctuations of d

K

(E

n

(x), µ

φ

) around 0. Getting such a bound turns out to be difficult. In [5, Section 8] it is proved

that Z

d

K

( E

n

(y), µ

_φ

) dµ

_φ

(y) 1 n

1 2(1+log|A|)

.

For two positive sequences (a

_n

), (b

_n

), a

_n

b

_n

means that lim sup

_n^log_log^a_bⁿ

n

≤ 1.

One could in principle get a non-asymptotic but messy bound.

3.3.5 Relative entropy, d-distance and speed of Markov approxi- ¯ mation

Given n ∈ N and x

^0,n−1

, y

^0,n−1

∈ A

ⁿ

the (non normalized) Hamming distance between x and y is

d ¯

_n

(x

^0,n−1

, y

^0,n−1

) =

n−1

X

i=0

1

{xⁱ6=yⁱ}

. (3.9)

(17)

Now, given two shift-invariant probability measures µ, ν on Ω, denote by µ

_n

and ν

n

their projections on A

ⁿ

, and define their ¯ d

n

-distance by

d ¯

n

(µ

n

, ν

n

) = inf X

x^0,n−1∈Aⁿ

X

y^0,n−1∈Aⁿ

d ¯

n

(x, y) P

n

(x

^0,n−1

, y

^0,n−1

)

where the infimum is taken over all the joint shift-invariant probability distri- butions P

n

on A

ⁿ

× A

ⁿ

such that P

y^0,n−1∈Aⁿ

P

n

(x

^0,n−1

, y

^0,n−1

) = µ

_n

(x

^0,n−1

) and P

x^0,n−1∈Aⁿ

P

n

(x

^0,n−1

, y

^0,n−1

) = ν

n

(y

^0,n−1

). By [20, Theorem I.9.6, p.

92], the limit following exists:

d(µ, ν) := lim ¯

n→∞

1 n

d ¯

n

(µ

n

, ν

n

) (3.10) and defines a distance on the set of shift-invariant probability measures. It induces a finer topology than the weak topology and, in particular, the ¯ d-limit of ergodic measures is ergodic, and the entropy is ¯ d-continuous on the class of ergodic measures.

⁴

Next, given n ∈ N and a shift-invariant probability measure ν on Ω, define the n-block relative entropy of ν with respect to µ

_φ

by

H

n

(ν|µ

_φ

) = X

x^0,n−1∈Aⁿ

ν

n

(x

^0,n−1

) log ν

n

(x

^0,n−1

) µ

_φ,n

(x

^0,n−1

) .

One can easily prove that the following limit exists and defines the relative entropy of ν with respect to µ

φ

:

n→∞

lim 1

n H

n

(ν

n

|µ

_φ,n

) =: h(ν|µ

_φ

) = P (φ) − Z

φ dν − h(ν) (3.11) where P (φ) is the topological pressure of φ:

P(φ) = lim

n→∞

1 n log X

a^0,n−1∈Aⁿ

e

^sup

Snφ(x):x∈[a^0,n−1]

.

This limit exists for any continuous φ. (To prove (3.11), we use that there exists a positive sequence (ε

_n

)

_n

going to 0 such that, for any a

^0,n−1

∈ A

ⁿ

and any x ∈ [a

^0,n−1

], µ

_φ

([a

^0,n−1

])/ exp(−nP (φ) + S

n

φ(x)) is bounded below by exp(−nε

_n

) and above by exp(−nε

_n

).) By the variational principle, h(ν|µ

_φ

) ≥ 0 with equality if and only if ν = µ

_φ

(recall that µ

_φ

is the unique equilibrium state of φ). We refer to [21] for details. We can now formulate the first theorem of this section.

Theorem 3.9. For every shift-invariant probability measure ν on Ω and for all n ∈ N , we have

d ¯

n

(ν

n

, µ

φ,n

) ≤ c

_(3.9)

q

nH

n

(ν

n

|µ

_φ,n

) (3.12)

4

These two properties are false in the weak topology.

(18)

where c

_(3.9)

= p

2C

_(3.1)

. In particular d(ν, µ ¯

φ

) ≤ c

_(3.9)

q

h(ν|µ

_φ

) . (3.13)

Proof. For a function f : A

ⁿ

→ R , define for each i = 0, . . . , n − 1 δ

_i

(f ) = sup{|f (a

^0,n−1

) − f (b

^0,n−1

)| : a

^j

= b

^j

, ∀j 6= i} . We obviously have that for all a

^0,n−1

, b

^0,n−1

∈ A

ⁿ

|f(a

^0,n−1

) − f (b

^0,n−1

)| ≤

n−1

X

j=0

1

{a^j6=b^j}

δ

j

(f ) .

A function f : A

ⁿ

→ R such that δ

j

(f ) = 1, i = 0, . . . , n − 1 is 1-Lipschitz for the Hamming distance (3.9). We now consider the set of functions

H(n, φ) =

f : A

ⁿ

→ R : f 1-Lipschitz for ¯ d

_n

, Z

Aⁿ

f dµ

_φ,n

= 0

.

We can identify a function f ∈ H(n, φ) with a function ˜ f : Ω

ⁿ

→ R in a natural way: ˜ f (x

₀

, . . . , x

n−1

) = f (π(x

₀

), . . . , π(x

n−1

)) where π : Ω → A is defined by π(x) = x

⁰

. We obviously have R f ˜ dµ

φ

= 0 and it is easy to check that Lip

_j

( ˜ f ) = δ

j

(f ) = 1, j = 0, . . . , n − 1. Therefore we can apply the Gaussian concentration bound (3.1) to get

Z

Aⁿ

e

^ξf

dµ

φ,n

≤ e

^C^(3.1)^nξ²

, for all f ∈ H(n, φ) and for all ξ ∈ R . (3.14) We now apply an abstract result [1, Theorem 3.1] which says that (3.14) is equivalent to

d(ν ¯

_n

, µ

_φ,n

) ≤ q

2C

_(3.1)

nH

_n

(ν

_n

|µ

_φ,n

) for all probability measures ν

_n

on A

ⁿ

. Hence (3.12) is proved. To get (3.13), divide by n on both sides and take the limit n → ∞ and use (3.10) and (3.11).

We now give an application of inequality (3.13). Let

φ

₁

(x) = log µ

_φ

(x

⁰

) and φ

_n

(x) = log µ

_φ

(x

ⁿ⁻¹

|x

^0,n−2

), n ≥ 2.

The equilibrium state for φ

_n

is a (n − 1)-step Markov measure. One can

prove that in the weak topology (µ

φn

)

n

converges to µ

φ

, but one cannot get

any speed of convergence. We get the following upper bound on the speed of

convergence of (µ

_φ_n

)

_n

to µ

_φ

in the finer ¯ d topology.

(19)

Corollary 3.10. Assume, without loss of generality, that φ is normalized in the sense that

X

a∈A

e

^φ(ax)

= 1, ∀x ∈ Ω.

Then there exists n

φ

≥ 1 such that, for all n ≥ n

φ

, we have

d(µ ¯

φn

, µ

φ

) ≤ ρ

φ

var

n

(φ) (3.15) where

ρ

_φ

= q

2|A| C

_(3.1)

(e −1) e

³²^kφk^∞

.

More details on how to normalize a potential are given in Subsection 4.1.

Proof. Using (3.11) and the variational principle we get h(µ

_φ_n

|µ

_φ

) = −

Z

φ dµ

_φ_n

− h(µ

_φ_n

) = Z

(φ

_n

− φ) dµ

_φ_n

. (3.16) Indeed, since φ and φ

n

are normalized, we have in particular that P (φ) = P(φ

_n

) = 0, and by the variational principle h(µ

_φ_n

) = − R

φ

_n

dµ

_φ_n

. Now Z

(φ

n

− φ) dµ

φn

= Z

log e

^φⁿ

e

^φ

dµ

φn

= Z

log

1 + e

^φⁿ

− e

^φ

e

^φ

dµ

φn

≤

Z e

^φⁿ

− e

^φ

e

^φ

dµ

φn

(3.17)

where we used the inequality log(1 + u) ≤ u for all u > −1. Now using the shift-invariance of µ

_φ_n

and replacing e

^φⁿ

by e

^φⁿ

− e

^φ

+ e

^φ

we get

Z e

^φⁿ

− e

^φ

e

^φ

dµ

φn

= Z

dµ

φn

(x) X

a∈A

e

^φⁿ^(ax)

e

^φⁿ^(ax)−e^φ(ax)

e

^φ(ax)

= Z

dµ

_φ_n

(x) X

a∈A

e

^φⁿ^(ax)

− e

^φ(ax)

2

e

^φ(ax)

+

Z

dµ

_φ_n

(x) X

a∈A

e

^φⁿ^(ax)

− e

^φ(ax)

≤ |A| e

^kφk^∞

(k e

^φⁿ

− e

^φ

k

_∞

)

²

(3.18) where we used that P

a∈A

(e

^φⁿ^(ax)

− e

^φ(ax)

) = 0. Combining (3.13), (3.16), (3.17) and (3.18) we thus obtain

d(µ ¯

_φ_n

, µ

_φ

) ≤ q

2|A| C

_(3.1)

e

^kφk^∞

k e

^φⁿ

− e

^φ

k

_∞

. It remains to estimate k e

^φⁿ

− e

^φ

k

_∞

in terms of var

_n

(φ). We have k e

^φⁿ

− e

^φ

k

_∞

=

e

^φⁿ

− e

^φ

_∞

≤ e

^kφk^∞

e

^φ−φⁿ

−1

_∞

≤ (e −1) e

^kφk^∞

kφ−φ

_n

k

_∞

(20)

provided that kφ−φ

_n

k

_∞

< 1, where we used the inequality | e

^u

−1| ≤ (e −1)|u|

valid for |u| < 1. Finally, since kφ − φ

n

k

∞

≤ var

n

(φ), we define n

φ

to be the smallest integer sucht var

n

(φ) < 1 and we can take

ρ

_φ

= q

2|A| C

_(3.1)

(e −1) e

³²^kφk^∞

. We thus proved (3.15).

Let us mention the paper [13] in which the authors obtain the same bound for the speed of convergence of Markov approximation, up to the constant.

Their approach is a direct estimation of ¯ d(µ

φn

, µ

φ

) by using a coupling method.

The point here is to obtain the same speed of convergence as an easy corollary of inequality (3.13). Let us remark that from (4.8) we get a worse result since we end up with a bound proportional to p

var

n

(φ). The trick which leads to the correct bound was told us by Daniel Takahashi.

3.3.6 Shadowing of orbits

Let A be a Borel subset of Ω such that µ

_φ

(A) > 0 and define for all n ∈ N S

_A

(x, n) = 1

n inf

y∈A n−1

X

j=0

d

θ

(T

^j

x, T

^j

y)

A basic example of set A is a cylinder set [a

^0,k−1

]. The quantity S

_A

(x, n), which lies between 0 and 1, measures how we can trace, in the best possible way, the orbit of some initial condition not in A by an orbit starting in A.

Theorem 3.11. For any Borel subset A ⊂ Ω such that µ

φ

(A) > 0, for any n ∈ N and for any u > 0

µ

_φ

x ∈ Ω : S

_A

(x, n) ≥ u

A

+ u

√ n

≤ e

⁻

u2 4C(3.1)

where

u

A

= 2 q

− C

_(3.1)

ln µ

_φ

(A) . We give a shorter and simpler proof than in [11].

Proof. Let K(x

₀

, . . . , x

n−1

) =

_n¹

inf

y∈A

P

n−1

j=0

d

_θ

(x

_j

, T

^j

y). One can easily check that

Lip

_θ,i

(K) = 1

n , ∀i = 0, . . . , n − 1.

It follows from (3.2) that µ

_φ

S

_A

(x, n) ≥ Z

S

_A

(y, n) dµ

_φ

(y) + u

√ n

≤ e

⁻

u2 4C(3.1)

(21)

for all n ≥ 1 and for all u > 0. We now need an upper bound for R

S

_A

(y, n)dµ

_φ

(y).

We simply observe that by (3.1) and the definition of S

_A

(·, n) µ

_φ

(A) =

Z

e

^−ξS^A^(x,n)

1

A

(x) dµ

_φ

(x) ≤ Z

e

^−ξS^A^(x,n)

dµ

_φ

(x)

≤ e

^−ξ

RS_A(y,n) dµφ(y)

e

C(3.1)ξ2 n

for all ξ > 0. Hence Z

S

_A

(y, n) dµ

φ

(y) ≤ C

_(3.1)

ξ

n + 1

ξ ln(µ

φ

(A)

⁻¹

) Optimizing this bound over ξ > 0 gives

Z

S

_A

(y, n) dµ

_φ

(y) ≤ 2 s

C

_(3.1)

ln(µ

φ

(A)

⁻¹

)

n .

The theorem follows at once.

3.3.7 Almost-sure central limit theorem

It was proved in [18, Chapter 2] that (Ω, T, µ

_φ

) satisfies the central limit theorem for the class of d

θ

-Lipschitz functions f : Ω → R such R

f dµ

φ

= 0, that is, for any such f the process {f ◦ T

ⁿ

}

_n≥0

satisfies

µ

_φ

x : S

_k

f (x)

√

k ≤ u

= Z

1

_Skf_(x)

√

k ≤t

dµ

_φ

(x) −−−→

^k→∞

G

_0,σ²_(f)

((−∞, t]) (3.19) where

σ

²

(f ) = Z

f

²

dµ

_φ

+ 2 X

i≥1

Z

f · f ◦ T

ⁱ

dµ

_φ

∈ [0, +∞) .

If σ

²

(f ) > 0, G

_0,σ2

denotes the law of a Gaussian random variable with mean 0 and variance σ

²

(f), that is,

dG

_0,σ²_(f)

(u) = 1 σ √

2π e

⁻

u2

2σ2(f)

du, u ∈ R . When σ

²

(f ) = 0 we set G

_0,0

= δ

₀

, the Dirac mass at zero.

Remark 3.1. In fact, a more general statement was proved in [18, Chapter I.2]. Namely, (3.19) holds when φ is such that P

k

_k

< +∞ and f ∈ L

_φ

. Now, for each N ≥ 1 and x ∈ Ω, define the probability measure

A

_N

(x) = 1 L

N

X

n=1

1 n δ

Snf(x)√ n

(3.20)

(22)

where L

_N

= P

N n=1 1

n

and where, as usual, δ

_u

is the Dirac mass at point u ∈ R . Of course, L

N

= log N + O(1). Notice that A

_N

is a random probability measure. Finally, the Wasserstein distance between two probability measures ν, ν

⁰

on the Borel sigma-algbra B ( R ) is

W

₁

(ν, ν

⁰

) = inf

π∈Π(ν,ν⁰)

Z

d

_θ

(x, x

⁰

) dπ(x, x

⁰

)

where the infimum is taken over all probability measures such that Z

π(B, x

⁰

) dx

⁰

= ν(B) and Z

π(x, B) dx = ν

⁰

(B)

for any Borel subset of R . By the Kantorovich-Rubinstein duality theorem, W

₁

(ν, ν

⁰

) is equal to the Kantorovich distance which is the supremum of R ` dν − R

` dν

⁰

over the set of 1-Lipschitz functions ` : R → R . We refer to [12] for background and proofs.

Now we can formulate the almost-sure central limit theorem.

Theorem 3.12. Let f : Ω → R be a d

_θ

-Lipschitz function. Then, for µ

_φ

almost every x ∈ Ω, we have

W

1

(A

_N

(x), G

_0,σ²_(f)

) −−−−−→

N→+∞

0 .

We make several comments. Recall that the Wasserstein distance metrizes the weak topology on the set of probability measures ν on B ( R ). Moreover, if (ν

_n

)

n≥1

is a sequence of probability measures on B ( R ) and ν a probability measure on B ( R ), then

n→∞

lim W

1

(ν

n

, ν) = 0 ⇐⇒ ν

n

−

law

− → ν and Z

|u| dν

n

(u) −

^n→+∞

−−−− → Z

|u| dν(u) where “ −

^law

− →” means weak convergence of probability measures on B ( R ).

To compare with (3.19), observe that Theorem 3.12 implies that for µ

φ

-almost every x, A

_N

(x) −

^law

− → G

_0,σ²_(f)

, which in turn implies that

Z

1

{u≤t}

dA

_N

(u) = 1 L

N

X

n=1

1 n 1

{Snf /√

n≤t}

−−−−−→

N→+∞

G

_0,σ2(f)

((−∞, t]).

Therefore, the expectation with respect to µ

_φ

in (3.19) is replaced by a path- wise logarithmic average in the almost-sure central limit theorem.

Proof. The proof follows from an abstract theorem proved in [6]. In words,

that theorem says the following. Let (X

n

)

n≥0

be a stochastic stationary pro-

cess where the X

_n

’s are random variables taking values in Ω. Assume that

(23)

if f : Ω → R is d-Lipschitz and such that E [f (X

₀

)] = 0, then it satisfies the central limit theorem, that is, for all u ∈ R ,

P

n−1 j=0

f(X

j

)

√ n ≤ u

!

−−−→

n→∞

G

_0,σ2(f)

((−∞, u])

where σ

²

(f) := E [f

²

(X

₀

)] + 2 P

`≥1

E [f (X

₀

)f (X

_`

)] is assumed to be 6= 0.

Moreover, assume that the process (X

n

)

n≥0

satisfies the following variance inequality: There exists C > 0 such that for all separately d-Lipschitz functions K : Ω

ⁿ

→ R for some distance d on Ω,

E

(K(X

₀

, . . . , X

n−1

) − E [K(X

₀

, . . . , X

n−1

)])

²

≤ C

n−1

X

i=0

Lip

_i

(K)

²

. Then, the conclusion is that, almost surely,

1 L

_N

N

X

n=1

1 n δ

X0+···+√Xn−1 n

converges in Wasserstein distance (or, equivalently, in Kantorovich distance) to G

_0,σ²_(f₎

((−∞, u]). We apply this abstract theorem to the process (x, T x, . . .) where x ∈ Ω is distributed according to µ

_φ

with Ω = A

^N

and d = d

_θ

. Since we have (3.19) and (3.4), the theorem follows.

Remark 3.2. The previous result relies only upon the variance inequality (3.4), which is much weaker than the Gaussian concentration bound of The- orem 3.1. On the one hand, the variance inequality (3.4) should be true for less regular potentials than the ones we consider here. On the other hand, the Gaussian concentration bound should provide a strengthening of Theorem 3.12, namely a speed of convergence.

4 Proof of Theorem 3.1

We follow the proof given in [7] with the appropriate modifications to go beyond Lipschitz potentials.

Gaussian Concentration bound for potentials satisfying Walters condition with subexponential continuity rates

HAL Id: hal-02292419

https://hal.archives-ouvertes.fr/hal-02292419

Submitted on 19 Sep 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires