Large deviations for independent random variables - Application to Erdös-Renyi's functional law of large numbers

(1)

April 2005, Vol. 9, p. 116–142 DOI: 10.1051/ps:2005006

LARGE DEVIATIONS FOR INDEPENDENT RANDOM VARIABLES – APPLICATION TO ERD ¨ OS-RENYI’S FUNCTIONAL LAW OF LARGE

NUMBERS

Jamal Najim

¹

Abstract. A Large Deviation Principle (LDP) is proved for the family _n¹_n

1f(xⁿi)·Ziⁿwhere the de- terministic probability measure_n¹_n

1δxⁿ_i converges weakly to a probability measureRand (Zⁿi)_i∈Nare R^d-valued independent random variables whose distribution depends onxⁿi and satisfies the following exponential moments condition:

sup

i,n Ee^α^∗^|Zⁿⁱ^|<+∞ for some 0< α^∗<+∞.

In this context, the identification of the rate function is non-trivial due to the absence of equidistribu- tion. We rely on fine convex analysis to address this issue. Among the applications of this result, we extend Erd¨os and R´enyi’s functional law of large numbers.

Mathematics Subject Classification. 46E30, 60F10, 60G57.

Received November 13, 2003.

1. Introduction

Let (Z_iⁿ)i≤n, n∈N be a sequence ofR^d-valued independent random variables and let (xⁿ_i; 1≤i≤n; n≥1) be a deterministicX-valued sequence of elements satisfying

1 n

n 1

δxⁿ

i

weakly

−−−−→

n→∞ R

whereX is a topological space endowed with its Borelσ-ﬁeld. In this article, we investigate the Large Deviations (LD) of the empirical mean

Ln,f= 1 n

n 1

f(xⁿ_i)·Z_iⁿ where

1 n

n 1

f(xⁿ_i)·Z_iⁿ= 1 n

n 1





f1(xⁿ_i)·Z_iⁿ ... fm(xⁿ_i)·Z_iⁿ



,

Keywords and phrases. Large deviations, epigraphical convergence, Erd¨os-R´enyi’s law of large numbers.

1CNRS, École Nationale Supérieure des Télécommunications, 46 rue Barrault 75634 Paris Cedex 13, France;[email protected] c EDP Sciences, SMAI 2005

(2)

thefk’s being the rows of the m×dmatrixf. Here, eachfk is a bounded continuous function fromX toR^d and·denotes the scalar product inR^d. In particular the random variableLn,fisR^m-valued.

The random variablesZ_iⁿ are independent and satisfy the following exponential moment condition:

sup

i,n

Ee^α^∗^|^Zⁱⁿ^|<∞ for some α^∗>0. (1.1)

The law ofZ_iⁿ (denoted by L(Ziⁿ)) depends onxⁿ_i in the sens that if xⁿ_i and xⁿ_j are close, then so are L(Ziⁿ) andL(Zjⁿ) for the following distance between probability measures:

dOW(P, Q) = inf

η inf a >0;

R^d×R^dτ

z−z a

η(dzdz)≤1

,

whereη is a probability with given marginalsP andQandτ(z) = e^|^z^|−1. This distance allows us to quantify easily the “exponential proximity” of two probability measures.

The large deviation principle

The main result of the article is the Large Deviations Principle (LDP) established for Ln,f under the previously mentioned assumptions: Cram´er’s condition (1.1) and variablesZ_iⁿ which are independent but not identically distributed.

The Large Deviations associated to this model are of interest in gaz theory and have been studied in the one-dimensional case by Ellis, Gough and Pul´e in [15] in the case whereL(Z_iⁿ)∝e^g(xⁿⁱ⁾^·^zP(dz). Related LDPs have been established by Dembo and Zajic [10] and Gamboa and Gassiat in [17]. In both cases [15, 17], it is important to consider random variables which do not have all their exponential moments (1.1). This model is also of interest in information theory where Z_iⁿ =f(xⁿ_i, Y_iⁿ), theY_iⁿ’s being i.i.d., and have been studied by Dembo and Kontoyiannis [9] and Chi [3, 4] in the case where the processY satisﬁes some mixing conditions.

There are two key issues to establish the LDP.

1. Since no steepness is assumed, we are not able to use Gärtner-Ellis’ theorem nor to adapt in a different way Cramér’s exponential change of measure argument. Our main tools to overcome it are the technique developed in [22] in the i.i.d. setting and a coupling argument based on the distancedOW.

2. The identification of the rate function is non trivial. In fact, the technique developed in [22] relies heavily on exponential approximation and thus yields to a very abstract rate function. The model being more complex here, so is the identification of the rate function. Convex Analysis provides us with very efficient tools to address this point.

Let us mention that this kind of problems often arises when dealing with LD of independent but not identically distributed random variables (Dawson and G¨artner [6, 7], Djellout, Guillin and Wu [13]). In a diﬀerent fashion, let us mention the work of Zabell [29] around Mosco-convergence and Large Deviations.

Applications of the LDP

LDPs in infinite dimension. The LDP of Ln,f is the cornerstone to get LDPs in inﬁnite dimensional settings via Dawson and G¨artner’s projective limit approach. We derive LDPs for the empirical measure Ln

and the random walk ¯Zn where

Ln= 1 n

n i=1

Z_iⁿδxⁿ

i and Z¯n(t) = 1 n

[nt]

i=1

Z_iⁿ, t∈[0,1].

These results extend those of [22] to the case where the random variablesZ_iⁿare not identically distributed and yield to rate functions with an extra term (see Ths. 4.1 and 4.3) `a la Lynch and Sethuraman [21]. For instance,

(3)

the rate function of the LDP satisﬁed byLn is given by:

I(µ) =I1(µa) +I2(µs),

where µ=µa+µs is the Lebesgue decomposition ofµwith respect toR. In particular,I(µ) can be ﬁnite for measures µwhich are not absolutely continuous with respect toR. One shall notice that the LDP for ¯Zn has been studied by Schuette in [28] under diﬀerent conditions.

Erd¨os and R´enyi’s functional law of large numbers (FLLN). Assume that the Z_iⁿ’s are as previously, that EZ_iⁿ= 0 and consider the process

ηn,m:t → 1 An

[m+tA_n] i=m

Zi, t∈[0,1]

where An = [logn] ([x] denotes the integer part of x) andm lies between 1 and n−An. Our main result is the description of the cluster points of the sets Kn ={ηn,m, 1 ≤ m ≤n−An}. This problem is known as Erdös-Rényi’s FLLN and has been studied by several authors among which Deheuvels [8], Borovkov [2], Sanchis [26, 27] and Gantert [18]. Let us first remind the crucial observation of Erdös and Rényi. Let

Un= max

0≤m≤n−A_n

Zm+1+· · ·+Zm+A_n

An ·

If An = n, then the limit of Un is a.s. zero (law of large numbers). If An is finite, say An = 1, then the limit is infinite (provided that the Z’s are not bounded). Erdös and Rényi [16] have shown that for a scaling in-between, namelyAn= [logn], then the limit ofUn exists, is non trivial and might by expressed by means of the rate function associated to the LDP of the empirical mean (Cramér’s theorem).

We describe the set of cluster points by means of the rate function associated to the LDP of ¯Zn. Our results extend Deheuvels’ result to the multidimensional setting and extends Borovkov’s result to the case where the random variablesZi do not have all their exponential moments. We also relax the i.i.d. assumption.

Outline of the paper

The paper is organized as follows: in Section 2, the notations and the assumptions are given, the distancedOW

is defined and examples fulfilling the assumptions are studied (Sect. 2.4); the LDP together with the identification of the rate function are stated and proved in Section 3. Large Deviations for the empirical measureLn and the random walk ¯Zn are proved in Section 4. Finally, Section 5 is devoted to Erdös and Rényi’s functional law of large numbers.

2. Notations and assumptions 2.1. Notations

LetX be a topological vector space endowed with its Borelσ-ﬁeldB(X) and letRbe a probability measure onX. We denote byC(X) (resp. Cd(X)) the set ofR-valued (resp. R^d-valued) continuous bounded functions onX; byL¹_d(X) the set ofR^d-valuedR-integrable functions onX and byP(X) the set of probability measures onX. We shall sometimes dropX and denote the previous sets by C, Cd, L¹_d.

Let| · |denote a norm on any ﬁnite-dimensional vector space (usuallyR,R^d or R^m^×^d). We denote by · the supremum norm on the space of bounded continuous functions from X with values in R, R^d or R^m^×^d, i.e. f= sup_x_∈X|f(x)|. As usual,δa is the Dirac measure at a. Let a be a m×d matrix and letz ∈R^d.

(4)

We denote by·the usual matrix product, that is

a·z=



 a1·z

... am·z





whereaj is thej-th row of the matrixa. Hence,·denotes the scalar productλ·z or the matrix producta·z, depending on the context. Let f :X →R^m^×^d be a (matrix-valued) continuous bounded function andθ∈R^m, then

f(x)·z=





f1(x)·z ... fm(x)·z



 and θ·f(x) = m j=1

θjfj(x)

where eachfj ∈Cd(X) is thej-th row of the matrixf. Letu:X →R^dbe a measurable function, we denote by

X

f(x)·u(x)R(dx) =





Xf1(x)·u(x)R(dx) ...

Xfm(x)·u(x)R(dx)



.

In the sequel, we shall follow the convention thatx∈ X,y andθare elements ofR^mand zandλ, ofR^d. We will denote by ¯A(resp. int(A)) the closure (resp. the interior) of the setA.

2.2. The Orlicz-Wasserstein distance

We introduce here a distance on the space of probability measures having some exponential moments:

R^de^α^|^z^|P(dz)<∞ for some α >0.

We call it the Orlicz-Wasserstein distance. This distance appears to be very useful to quantify the “exponential proximity” of two probability measures and is described below. It is then natural to ask to the distributions of the family (Z_iⁿ) to change continuously (with respect toxⁿ_i) with respect to this distance.

Letτ(z) = e^|^z^|−1,z∈R^dand let us consider

Pτ(R^d) = P∈ P(R^d), ∃ a >0

R^dτ z

a

P(dz)<∞

= P∈ P(R^d), ∃ α >0

R^de^α^|^z^|P(dz)<∞

Pτ is the set of probability distributions having some exponential moments. We denote byM(P, Q) the set of all laws onR^d×R^d with given marginalsP andQ. Consider the following Orlicz-Wasserstein distance deﬁned onPτ(R^d) by:

dOW(P, Q) = inf

η∈M(P,Q)inf a >0;

R^d×R^dτ

z−z a

η(dzdz)≤1

. Lemma 2.1. dOW is a distance on Pτ(R^d).

Remark 2.1. Let us comparedOW with the usual Wasserstein distance:

W(P, Q) = inf

R^d×R^d|z−z|η(dzdz), η∈M(P, Q)

.

(5)

Then,W(P, Q)≤dOW(P, Q). In fact,|z| ≤τ(z) therefore

τ

z−z a

η(dzdz)≤1⇒

|z−z|η(dzdz)≤a.

The proof of this lemma follows closely the standard proof of Dudley ([14], Lem. 11.8.3) and is omitted.

In the following, we shall endowPτ(R^d) with the topology induced by dOW.

2.3. Assumptions

Let us now introduce some assumptions on the model:

Assumption A-1. R is a probability measure on(X,B(X))satisfying:

ifU is a nonempty open set, then R(U)>0.

Assumption A-2. The family (xⁿ_i)1≤i≤n, 1≤n ⊂ X satisfies 1

n n

1

δxⁿ_i −−−−→weakly

n→∞ R. (2.1)

Remark 2.2. The combination of Assumptions (A-1) and (A-2) is standard (see [1, 15, 17]). It has been shown in [22] that the LDP forLn,fmight fail to hold if Assumption (A-1) is not fulﬁlled.

Assumption A-3. X is a compact space.

Assumption A-4. Let (Z_iⁿ) be a sequence of independent and R^d-valued random variables. There exists a family of probability measures (P(x,·), x∈ X) overR^d and a sequence (xⁿ_i; 1≤i≤n; 1≤n)with values in X such that the law of eachZ_iⁿ is given by:

L(Z_iⁿ)∼P(xⁿ_i,dz).

We shall call(P(x,·), x∈ X)thedistribution kernelassociated to the family(Z_iⁿ). We will equally writeP(x,·), Px orPx(dz).

We can associate to each kernel (P(x,·), x∈ X) a cumulant generating function deﬁned by Λ(x, λ) = log

R^de^λ^·^zP(x,dz) where λ∈R^d, and its convex conjugate

Λ^∗(x, z) = sup

λ∈R^d{λ·z−Λ(x, λ)} where z∈R^d.

Assumption A-5. Let(P(x,·);x∈ X)⊂ Pτ(R^d)be a given kernel. The applicationx →P(x, A)is measurable whenever the setA⊂R^d is borel. Moreover, the function

Γ :X → Pτ(R^d) x →P(x,·)

is continuous when Pτ(R^d)is endowed with the topology induced by the distancedOW.

(6)

Remark 2.3. One shall notice that the two assumptions (A-3) and (A-5) yield the existence of a real number α^∗>0 such that

S^∗= sup

x∈X

R^de^α^∗^|^z^|Px(dz)<∞. (2.2)

In fact,x →dOW(Px, Px₀) is a continuous application and is therefore bounded on the compact setX, that is dOW(Px, Px₀)≤afor everyx∈ X. In particular, for allx∈ X, there exists a probabilityηx∈M(Px, Px₀) such

that

e^|z−z|^a ηx(dzdz)≤2.

Letα+β= 1. Chooseb large enough so thatα b > a and

e^|z|^{β b}Px₀(dz)<∞are satisﬁed. Then

e^|z|^b Px(dz) =

e^|z−z

+z|

b ηx(dzdz)

≤

e^|z−z|^{α b} ηx(dzdz) α

e^|z|^{β b}Px₀(dz) β

≤2^α

e^|z|^a Px₀(dz) β

<∞ and (2.2) is established by takingα^∗=b⁻¹.

Assumption (A-5) is probably the harder to check. Once (A-5) is fulﬁlled by a family of kernels (P(x,·))x, it is easy to build independent random variables whose law is given by L(Z_iⁿ) ∼ P(xⁿ_i,dz). In Section 2.4, Assumption (A-5) is proved to hold for the examples mentioned in the introduction.

2.4. Examples of families satisfying Assumption (A-5)

2.4.1. A family defined by its densities with respect to some probability measure P

LetP be such that

R^de^α^∗^|^z^|P(dz)<∞ for someα^∗>0, (2.3) and consider the family of probability distributionsPx deﬁned by

dPx

dP (z) = e^g(x)^·^z

R^de^g(x)^·^zP(dz),

where g is a bounded continuous function from the compact set X ⊂ R^k into R^d. We will assume that P(dz) =f(z) dz where dz stands for the Lebesgue measure on R^d and thatf(z) =f(|z|),i.e. f is radial. We also assume:

0<|g(z)|=λ < λ^∗. (2.4) Under these assumptions,

R^de^g(x)^·^zP(dz) does not depend onx. In order to prove that limx→x₀dOW(Px, Px₀) = 0, it is suﬃcient to prove that for every >0, there existsδ >0 and a joint probabilityη such that

R^d×R^de^|z−z| η(dz,dz)≤2

where|x−x0|< δ andη∈M(Px₀, Px). By (2.4), there exists an orthogonal matrixOxsuch that:

Oxg(x0) =g(x) and lim

x→x₀|I−Ox|= 0, (2.5)

(7)

where I denotes the identity matrix. Consider now the joint probability measure η associated to the random variable (Z, OxZ) where

L(Z)∼ e^g(x⁰⁾^·^z

R^de^g(x⁰⁾^·^zP(dz)P(dz).

Then,η∈M(Px₀, Px). Let A(x) be deﬁned by A(x) =

R^d×R^d

e^|z−z| η(dz,dz) =

R^d

e^{|(I−Ox)·z|} Px₀(dz).

Notice thatA(x0) = 1. The Dominated Convergence theorem together with (2.4) and (2.5) yields then that

xlim→x₀A(x) = 1.

In particular, ifx−x0 is small enough say|x−x0|< δ thenA(x)≤2.

2.4.2. Twisted i.i.d. random variables

LetZ be aR^d-valued random variable with distributionQ. We introduce the following Orlicz space (recall that τ(z) = e^|^z^|−1):

Lτ= f :R^d→R^d; ∃a >0,

R^dτ f(z)

a

Q(dz)<∞

. We endow it with the norm

fτ = inf a >0;

τ

f a

dQ≤1

.

Then (Lτ, · τ) is a Banach space. Note that

f ∈Lτ⇔ ∃α >0, Ee^α^|^f(Z)^|<∞.

Let X be a compact metric space and considerC(X;Lτ), the set of continuous functions fromX to Lτ. The fact thatφ∈C(X;Lτ) implies two things:

• φ(x,·)∈Lτ ∀x∈ X;

• x →φ(x,·) is continuous with respect to the norm · τ.

Denote byPxthe distribution ofφ(x, Z). Then the family (Px)x∈X satisﬁes (A-5). In fact, consider the random variablesY =φ(x, Z) and ˜Y =φ(x, Z). Sinceφis continuous,

Eτ

|Y −Y˜|

=Eτ

|φ(x, Z)−φ(x, Z)|

≤1

forx close tox. ThusdOW(Px, Px)≤ .

Here is an example of a family for which G¨artner-Ellis’ theorem might not apply: let g : R^d → R and f :X →R(X compact) be bounded and continuous and letZ be a random variable with values inR^d having some exponential moments:

∃α >0, Ee^α^|^Z^|<∞. Then the family of laws associated toZx=₁₊_|_f(x)u(Z)^Z _| satisﬁes (A-5).

(8)

2.4.3. Csisz`ar’s example

We study here a family of probability measures with non steep logarithmic moment generating functions.

Consider the probability distributionPa associated with the following distribution function:

Fa(x) = 1− e⁻^ax

(1 +x)⁴, x≥0 where a∈[α, β]⊂(0,∞).

This kind of distribution has been introduced by Csiszàr [5] to emphasize some side effects in conditional limit theorems (see also Léonard and Najim [20] for further information).

It is straightforward to check that Λ(λ) = log_∞

0 e^λxPa(dx) is not steep for each value ofa∈[α, β]. We shall prove here that Assumption (A-5) holds in this situation. In other words ifa0∈[α, β] and [α, β]a→a0, then

alim→a₀dOW(Pa, Pa₀) = 0.

LetF_a⁻¹ be deﬁned by

1− e⁻^aF^a⁻¹^(u)

(1 +Fa⁻¹(u))⁴ =u, u∈[0,1). (2.6)

It is straightforward to check that

F_β⁻¹(u)≤Fa⁻¹(u)≤Fα⁻¹(u), ∀a∈[α, β]. (2.7) Consider now the joint probability distributionη associated to the random variable (F_a⁻¹(U), F_a⁻₀¹(U)) whereU is uniformily distributed over [0,1]. Then η’s marginals arePa andPa₀. From (2.6), we deduce that

F_a⁻¹(u) =−1

alog(1−u)−4

alog(1 +F_a⁻¹(u))≤ −1

alog(1−u). (2.8)

On the other hand, (2.7) yields

F_a⁻¹(u)≥ −1

alog(1−u)−4

alog(1 +F_α⁻¹(u)) = α

aF_α⁻¹(u) (2.9)

where the last equality follows from (2.8). Standard considerations about implicit functions (see for example Dieudonn´e [12], Chap. 3) yield that

F_α⁻¹(u) =−1

αlog(1−u)− (α, u) log(1−u) where (α, u)−−−→

u→1 0. (2.10)

In particular, equations (2.8) and (2.9) together with (2.10) yield F_a⁻¹(u) +1

alog(1−u) ≤α

a (α, u) log(1−u). (2.11) Notice that (α, u) does not depend ona.

Let us prove now that for every >0, there existsδ^∗ >0 such that|a−a0| ≤δ^∗ implies that

R⁺×R⁺

e^|z−z| η(dz,dz) = 1

0

e

|F−1 a (u)−F−1

a0(u)|

du≤2.

(9)

First choose u0 close enough to 1 such that1 u₀e^|F

a−1(u)−F−1 a0(u)|

du ≤1 whenever a−a0 is small enough, say

|a−a0|< δ (use (2.11)). Then choosea−a0 smaller enough, say|a−a0|< δ^∗< δ, such that u₀

0 e

|F−1 a (u)−F−1

a0(u)|

du≤1.

This is possible via the Dominated Convergence theorem since lima→a₀F_a⁻¹(u) = F_a⁻₀¹(u) by the Implicit Function theorem and|Fa⁻¹(u)−F_a⁻₀¹(u)| ≤2F_α⁻¹(u) which is bounded over [0, u0].

3. The large deviation principle

We can now state the Large Deviation Principle.

Theorem 3.1 (the LDP). Let f :X → R^m^×^d be a bounded continuous function. Assume that (A-1), (A-2), (A-3), (A-4) and (A-5) hold. Then, the family

Ln,f= 1 n

n 1

f(xⁿ_i)·Z_iⁿ

satisfies the LDP in (R^m,B(R^m))with some good rate function Υ.

One can have a look at (3.8) to get more insight on the rate function Υ.

Theorem 3.2(identiﬁcation of the rate function). Under the assumptions of Theorem 3.1, let (Px)x∈X be the distribution kernel associated to the family (Z_iⁿ)and letI_f be defined by

I_f(y) = sup

θ∈R^m θ·y−

XΛ(x, θ·f(x))R(dx)

, y∈R^m, whereΛ(x, λ) is the cumulant generating function associated to(Px). Then

Υ =I_f.

3.1. Proof of the LDP

The proof follows closely proof of Theorem 2.2 in [22]. Let us brieﬂy outline it in the real case. Consider 1

n n i=1

f(xⁿi)Ziⁿ

where f and theZ_iⁿ’s are real-valued and satisfy the assumptions of Theorem 3.1. Let ˜f(x) =p

k=1ak1A_k(x) be a step function and let (Pk)1≤k≤p bepprobability measures with some exponential moments. Assume that there are independent random variables ˜Z_iⁿ such that if xⁿ_i ∈ Ak then L(Ziⁿ) =Pk. Under these conditions, one can easily establish a LDP for

1 n

n i=1

f(x˜ ⁿ_i) ˜Z_iⁿ= a1

n

j₁∈{i:xⁿ

i∈A₁}

Z˜_jⁿ₁+· · ·+ap

n

j_p∈{i:xⁿ

i∈A_p}

Z˜_jⁿ

p

(10)

viaCram´er’s theorem (this is the content of Lem. 3.3). The whole point of the proof of Theorem 3.1 is to build an exponential approximation of _n¹n

i=1f(xⁿ_i)Z_iⁿbased on simpler objects_n¹n

i=1f˜(xⁿ_i) ˜Z_iⁿ(this is the content of Lem. 3.4) where

(1) the step function ˜f approximatesf for the sup-normf˜−f ≤ ;

(2) the random variable ˜Z_iⁿ approximatesZ_iⁿ with respect to the distancedOW:

dOW(L(Ziⁿ),L( ˜Ziⁿ))≤ . (3.1)

One should notice that in [22], step (2) is not necessary since the variables are i.i.d.

Lemma 3.3. Assume that (A-4) holds with the following kernel:

P(x,dz) = p k=1

1A_k(x)Pk(dz)

where Pk ∈ Pτ(R^d) and (Ak; 1 ≤ k ≤ p) is a measurable partition of X (in particular, L(Ziⁿ) ∼ Pk(dz) if xⁿ_i ∈Ak).

Assume moreover that R(Ak) > 0 and R(∂Ak) = 0 where ∂Ak = ¯Ak \int(Ak) and consider the (matrix valued) step function:

f :X −→ R^m^×^d x −→ f(x) =

p k=1

a_k1A_k(x)

wherea_k are m×dmatrices. Then,

Ln,f= 1 n

n 1

f(xⁿ_i)·Z_iⁿ

= 1 n

j₁∈{i:xⁿ_i∈A₁}

a₁·Z_jⁿ₁+· · ·+1 n

j_p∈{i:xⁿ_i∈A_p}

a_p·Z_jⁿ₁

satisfies the LDP in (R^m,B(R^m))with the good rate function I_f(y) = sup

θ∈R^m y·θ−

XΛ (x, θ·f(x))R(dx)

, y∈R^m (3.2)

whereΛ(x,·)is the cumulant generating function associated to(Px)x∈X.

Remark 3.1 (on the assumptionsR(Ak)>0 andR(∂Ak) = 0). In [22], a probability measureP with some exponential moments is built such that

(1) ifZ is a random variable with L(Z) =P, then the LDP does not hold for ^Z_n; (2) ifZi are i.i.d. P-distributed then the LDP does not hold for ¹_nn−1

1 Zi−^Z_nⁿ; (3) however, if ^N_n⁽ⁿ⁾→c >0, then _n¹N(n)

1 Zisatisﬁes a LDP.

(11)

Assumptions R(Ak) > 0 and R(∂Ak) = 0 are made in order to avoid the eﬀects (1) and (2). Denote by NA_k(n) ={i:, xⁿ_i ∈Ak}, then ^N^Ak_n⁽ⁿ⁾→R(Ak) and the LDP holds for

1 n

j_k∈{i:,xⁿ_i∈A_k}

ak·Z_jⁿ

k

by virtue of (3).

Proof of Lemma 3.3. The proof is a direct adaptation of Lemmas 5.2 and 5.6 in [22]. The LDP is established viaCram´er’s theorem (Lem. 5.2) and the identiﬁcation of the rate function relies on duality arguments

(Lem. 5.6).

Lemma 3.4. Assume that (A-1), (A-2), (A-3) and (A-5) hold and letf :X →R^m^×^d be a bounded continuous function.

(i) There exists a family of random variables (Z_iⁿ)i,n with distribution kernel (Px) given by (A-5) which satisfy (A-4).

(ii) There exists a family of independent random variables (Z_i^n,)i,n associated to a kernel(Px)x∈X and a step function f such that

Ln,f= 1 n

n i=1

f(xⁿ_i)·Z_i^n, satisfies the LDP with good rate function given by

I_f(y) = sup

θ∈R^m y·θ−

X

Λ(x, θ·f(x))R(dx)

, y∈R^m

whereΛ(x,·)is associated to the kernel(Px).

(iii) The familyLn,fis an exponential approximation ofLn,f, i.e.:

lim→0lim sup

n→∞

1

nlogP{|Ln,f − Ln,f|> δ}=−∞ for every δ >0.

(iv) The random variable Ln,fsatisfies the full LDP with good rate function Υ(y) = sup

η>0lim inf

→0 inf

y∈B(y,η)I_f(y),

= sup

η>0lim sup

→0 inf

y∈B(y,η)I_f(y), y∈R^m whereB(y, ) ={y∈R^m,|y−y|< }.

Proof.

• The step function f and the kernel P. In the sequel, we shall build a step functionf and a probability kernelP

f(x) = q ι=1

αι1D_ι(x), P(x,dz) = q ι=1

P˜ι(dz)1D_ι(x)

(12)

such that

(1) R(Dι)>0 andR(∂Dι) = 0;

(2) f−f< ;

(3) dOW(P(x,dz), P(x,dz))< ifx∈Dι,

in order to fulﬁll Assumptions of Lemma 3.3. We will rely heavily on Assumption (A-2). This is a key step.

Let >0 be ﬁxed and considerf :X →R^m^×^d. By Proposition A.1 in [22], there exista₁, . . . ,a_p∈R^m^×^dand

1, . . . , p ∈(0, ] such that the open sets B^f_k =f⁻¹B(ak, k) (where B(ak, k) ={a ∈R^m^×^d, |a−ak|< k}) areR-continuous and form a cover ofX:

X ⊂ ∪^p_k=1B_k^f.

Similarly, as Γ :x →Pxis continuous andX is compact by (A-3), Proposition A.1 in [22] yields the existence of P1, . . . , Pp ∈ Pτ(R^d) and ₁, . . . , _p ∈(0, ] such that the open setsC_l^P = Γ⁻¹BOW(Pl, _l) (whereBOW(Pl, _l) = {P∈ Pτ; dOW(P, Pl)< _l}) areR-continuous and form a cover ofX:

X ⊂ ∪^p_l=1 C_l^P.

Thus, the family (B_k^f∩C_l^P)k,l is a cover ofX based onR-continuous open sets. Assumption (A-1) implies that each set is either empty or with strictly positive R-measure. Let us keep the non-empty sets. The previous family satisﬁes the assumptions of Lemma A.2 in [22]. Therefore, there exists a partition (Dι; 1 ≤ ι ≤ q) satisfying the properties of Lemma A.2 in [22]: each of theDι isR-continuous withR-measure strictly positive.

Moreover, for everyι, there existk andlsuch that Dι⊂B^f_k∩C_l^P

⊂f⁻¹B(ak, k)∩Γ⁻¹BOW(Pl, _l). (3.3) Consider now a pairing which associates to eachιa unique couple (k, l) = (k(ι), l(ι)) such that (3.3) is satisﬁed.

We denote by

f= q ι=1

a_k(ι) 1D_ι.

It is then straightforward to check that f−f ≤ and that eachDι is R-continuous with strictly positive measure. Moreover, ifx∈Dι, then

dOW(Px, Pl(ι))≤ . We denote byPthe following kernel:

P(x,dz) = q ι=1

1D_ι(x)Pl(ι)(dz). (3.4)

We can now build properly the families (Z_iⁿ)i,n and (Z_i^n,)i,n.

• The families(Z_iⁿ)i,n and(Z_i^n,)i,n.

Letxⁿ_i ∈ X. There exists a uniqueDι such thatxⁿ_i ∈Dι. In particular dOW(Pxⁿ_i, P_l(ι))< ,

where l(ι) is associated to ι via the pairing introduced after (3.3). Therefore, there exists a probability mea- sureη^i,n on (R^d×R^d,B(R^d×R^d)) with given marginalsPxⁿ_i andPl(ι) such that

R^d×R^d

τ

x−y

η^i,n(dxdy)≤1.

(13)

Endow (R^d×R^d,B(R^d×R^d)) with the probability measure η^i,n and set Z_iⁿ(x, y) = x and Z_i^n,(x, y) = y.

ThenZ_iⁿ isPxⁿ_i-distributed,Z_i^n, isP_l(ι)-distributed and Eτ

Z_iⁿ−Z_i^n,

≤1 ⇒ Ee^|Znⁱ^−Z

n, i |

≤2. (3.5)

The usual countable extension ⊗i,nηî,n on the cartesian product Πi,nR^d ×R^d yields the existence of a family (Z_iⁿ)i,nsatisfying (A-4). The previous construction also yields the fact that the distribution kernel associated to (Z_i^n,) is given by (3.4). Thus, Ln,f satisfies the assumptions of Lemma 3.3 and therefore satisfies the LDP with good rate functionI_f.

• The exponential approximation.

First notice that

P{|Ln,f − L_n,f|> δ} ≤ P 1 n

n 1

f(xi)·(Z_iⁿ−Z_i^n,) > δ

2

+P

1 n

n 1

(f(xi)−f(xi))·Z_i^n, > δ

2

.

Thus

lim sup

n

1

nlogP{|Ln,f − Ln,f|> δ} ≤ lim sup

n

1

nlogP 1 n

n 1

f(xi)·(Z_iⁿ−Z_i^n,) >δ

2

∨lim sup

n

1

nlogP 1 n

n 1

2

,

wherea∨b= max(a, b). Consider ﬁrst:

P 1 n

n 1

2

≤P

1 n

n 1

|Z_iⁿ−Z_i^n,|> δ 2f

≤exp

− nδ 2f

n 1

Ee^|Znⁱ^−Z

n,i |

≤exp

− nδ 2f

2ⁿ,

where the last inequality comes from (3.5). Thus

lim sup

n

1

nlogP 1 n

n 1

2

≤ − nδ

2f + log 2. (3.6)

Consider now

P 1 n

n 1

2

≤P

1 n

n 1

|Zi^n,|> δ 2

≤exp

−α^∗nδ 2

n 1

Ee^α^∗^|^Zⁱ^n,^|,