Laplace transform estimates and deviation inequalities

(1)

LAPLACE TRANSFORM ESTIMATES AND DEVIATION INEQUALITIES

ESTIMÉES DE LA TRANSFORMÉE DE LAPLACE ET INÉGALITÉS DE DÉVIATION

Olivier CATONI

Laboratoire de probabilités et modèles aléatoires, U.M.R. 7599 du C.N.R.S., case 188, Université Paris 6, bureau 4 E 19, bât Chevaleret, 4, place Jussieu, 75252 Paris cedex 05, France

Received 16 February 2000, revised 11 December 2001

ABSTRACT. – We derive deviation inequalities from non-asymptotic bounds of the log-Laplace transform of a function ofNrandom variables. We assume either that these random variables are independent or that they form a Markov chain. We assume also that the partial finite differences of order one and two of the function are suitably bounded, or more generally that they have some exponential moments. The estimates we get are sharp enough to induce a central limit theorem whenN goes to infinity and to prove non-asymptotic “almost Gaussian” deviation bounds.

2003 Éditions scientifiques et médicales Elsevier SAS MSC: 60E15; 60F05; 94A17; 60G42

Keywords: Concentration of product measures; Deviation inequalities; Markov chains;

Maximal coupling; Central limit theorem

RÉSUMÉ. – Nous démontrons des inégalités de déviation à partir d’estimées du logarithme de la transformée de Laplace d’une fonction deN variables aléatoires. Nous supposons soit que ces variables sont indépendantes, soit qu’elles forment une chaîne de Markov. Nous supposons aussi que les différences finies partielles d’ordre un et deux de la fonction sont convenablement bornées, ou plus généralement qu’elles ont certains moments exponentiels. Nos estimées sont suffisamment précises pour induire un théorème de la limite centrale quandN tend vers l’infini et pour prouver des inégalités de déviation “presque Gaussiennes”.

2003 Éditions scientifiques et médicales Elsevier SAS

E-mail address: [email protected] (O. Catoni).

(2)

Introduction

We are going to give “almost Gaussian” finite sample bounds for the log-Laplace transform of some functions of the type f (X1, . . . , XN), where the random variables (X1, . . . , XN)are assumed to be independent, or to form a Markov chain.

We will use throughout the paper a normalisation that parallels the classical case of the sum

√1 N

N i=1

Xi

of real valued random variables. As is usual in these matters, we will deduce from upper log-Laplace estimates finite sample almost sub-Gaussian deviation inequalities for f (X1, . . . , XN). We will also obtain that the central limit theorem holds when N goes to infinity. Although limit laws are not the main subject of this paper, they will be a guide for using a relevant normalisation of constants.

To make sure that this is feasible, it is necessary to make assumptions not only on the first order partial derivatives (or more generally first order partial finite differences) of f, but also on the second order partial derivatives (or more generally on the second order partial finite differences) off. Indeed, the simple example of

f (X1, . . . , XN)=g 1

√N N i=1

Xi

,

should immediately convince the reader that Lipschitz conditions are not enough to enforce a Gaussian limit.

A proper normalisation being chosen, we will be interested in expansions of Z ^def= f (X1, . . . , XN)−E(f (X1, . . . , XN))of the type

logEexp(λZ)=λ²

2V(Z)+ · · ·,

whereλis “of order one”,V(Z)=E{[Z−E(Z)]²}and the remaining terms are small whenN is large.

Our line of proof will be a combination of the martingale difference sequence approach initiated by Hoeffding [6] and Yurinskii [21] and the statistical mechanics philosophy we already used in [4]. The martingale approach to deviation inequalities is also reviewed in [10, p. 30] and [16]. More precisely, we will decompose Z into its martingale difference sequence Z =^Ni=1Fi and we will take appropriate partial derivatives of the log-Laplace transform

(λ1, . . . , λN)→logEexp _N

i=1

λiFi

.

We will consider first the case of independent random variablesX1, . . . , XN, ranging in some product of probability spaces ^N_i₌₁(Xi,B_i, µi). In the first section, we will

(3)

assume that the partial finite differences of f (x1, . . . , xN) of order one and two are bounded. In the second section, we will assume that they have exponential moments instead, and in the third section, we will study the case of Markov chains.

1. Bounded range functionals of independent variables

Let the collection of random variables X =(X1, . . . , XN) take its values in some product of measurable spaces ^N_i₌₁(Xi,Bi). We will assume in the following that (X1, . . . , XN) is the canonical process. Let P = ^Ni=1µi be a product probability measure on ^N_i₌₁(Xi,B_i).

For any bounded measurable functionW of(X1, . . . , XN), we will define the modified probability distributionPW by

dPW= exp(W ) E[exp(W )]dP,

and we will use the notation EW for the expectation operator with respect toPW. On the other hand, ifFis some sub-sigma algebra of ^N_i₌₁Bi, thenE^Fwill be used as a short notation for the conditional expectation with respect toF.

As(E^F)W =(EW)^F, we will simply writeE^FW for this conditional expectation which we can more explicitely define as

E^FW(U )=E[Uexp(W )|F] E[exp(W )|F] .

Note that we have EW(E^FW)= EW, whereas E(E^FW) =EW and EW(E^F) =EW in general. In the same way we will use the notation VW(U )forEW{[U−EW(U )]²}and the notation M³W(U ) for EW{[U −EW(U )]³}. Note that for any bounded measurable functionW,

∂

∂λlogEexp(λW )=EW(W ),

∂²

∂λ²logEexp(λW )=VW(W ), and

∂³

∂λ³logEexp(λW )=M³W(W ).

For eachi=1, . . . , N, letF_i be the sigma algebra generated by(X1, . . . , Xi)and let Gi be the sigma algebra generated by(X1, . . . , Xi−1, Xi+1, . . . , XN).

To stress the role of the independence assumption, we will put the superscript “i” on the equalities and inequalities requiring this assumption.

Let us introduce some notations linked with the martingale differences of a bounded random variableW measurable with respect toF_N. We will put

Gi(W )=W −E^Gⁱ(W ), Fi(W )=E^Fⁱ(W )−E^Fⁱ⁻¹(W )

=i E^Fⁱ(Gi).

(4)

As we explained in the introduction, we will study the log-Laplace transform of Z=f (X)−Ef (X)

of some bounded measurable real valued functionf:^N_i₌₁X_i →R. We will decompose Z into the sum of its martingale differences

Z= N i=1

Fi(Z), and use the short notationZi=E^Fⁱ(Z).

In this context, it is natural to assume that for some positive constants Bj, j = 1, . . . , N, for any(x1, . . . , xN)∈^Nj=1X_j, for anyi=1, . . . , N, anyyi ∈X_i,

f (x1, . . . , xN)−f (x1, . . . , xi−1, yi, xi+1, . . . , xN) Bi

√N.

The reader should understand that we are interested mainly in the case when the constantsBi are of order one. Although all these scaling factors are not really needed for finite sample bounds, we have found them useful to indicate what should be considered to be small and what should not.

To ensure that f (X) is almost Gaussian, we need to make also an assumption on its second partial differences. This corresponds to conditions on the second partial derivatives off in the case when the random variables (X1, . . . , XN) take their values in some finite dimensional vector space, are bounded, andf is a smooth function.

For simplicity, we will use the short notationx₁^N for(x1, . . . , xN). Let us put for any x₁^N∈^Nj=1X_j, anyyi ∈X_i,

ifx₁^N, yi

=fx₁^N−fx₁ⁱ⁻¹, yi, x_i^N₊₁.

For a fixed value of yi,if may be seen as a function ofx₁^N, and when we will write jif (x₁^N, yi, yj), we will mean that we applyj to this function and toyj. (A more accurate but lengthy notation would have beenj(if (·, yi))(x₁^N, yj).)

Let us assume that for some nonnegative exponentζ, for anyi=j, for some positive constantCi,j, and for anyx₁^N∈^Nk=1X_k,yi ∈X_i,yj∈X_j,

ijfx₁^N, yj, yi

Ci,j

N^3/2⁻^ζ.

Note that ijf (x₁^N, yj, yi)=jif (x^N₁, yi, yj) and therefore that we can assume thatCi,j =Cj,i. We will moreover assume by convention thatCi,i=0.

The normalisation is made so thatζ =0 corresponds to the case of f (X1, . . . , XN)=√

N g X1

N , . . . ,XN

N

,

(5)

where g:[0,1]^N →R is a smooth real function of N real bounded arguments with bounded first and second partial derivatives.

Another class of functions satisfying these hypotheses are the functions of the type fx₁^N= 1

N^3/2

1i<jN

ψi,j(xi, xj),

whereψi,j are bounded measurable functions. Hereζ=0, and we can take Bi= 2

N _{j <i}ψj,i∞+

j >i

ψi,j∞

, and

Ci,j =Cj,i=4ψi,j∞, i < j.

More generally

f (x^N₁)=N^1/2⁻^r

1i1<···<irN

ψ(i₁,...,ir)(xi₁, . . . , xir) also satisfies our hypotheses, when the functionsψ(i₁,...,i_r)are bounded.

THEOREM 1.1. – Under the previous hypotheses, for any positiveλ, logEexpλf (X)−λEf (X)−λ²

2 Vf (X)

λ³

N^1/2⁻^ζ BCB

4N² + λ³

√N N i=1

B_i³ 3N.

COROLLARY 1.1. – Thusf (X)satisfies the following deviation inequalities:

Pf (X)Ef (X)+εexp

− ε²

2(V(f (X))+_V_{(f (X))}^ηε )

, Pf (X)Ef (X)−εexp

− ε²

2(V(f (X))+_V_{(f (X))}^ηε )

, with

η= 1

2N^1/2⁻^ζ BCB

N² + 2 3√ N

N i=1

B_i³ N .

Remark 1.1. – We obtain for some constant K depending on max1iNBi and max1i,jNCi,j, but not onN, that

logEexpλf (X)−λEf (X)−λ²

2Vf (X) Kλ³ N^1/2⁻^ζ.

Therefore if we consider a sequence of problems indexed byNsuch that the constantsBi

andCi,j stay bounded, and such thatV(f (X))converges, we get a central limit theorem

(6)

as soon as ζ <1/2 (with the caveat that the limiting distribution may degenerate to a Dirac mass if the asymptotic variance is 0): f (X)−E(f (X))converges in distribution to a Gaussian measure.

Remark 1.2. – The critical valueζc=1/2 is sharp, since when f (X)=g

1

√N N

i=1

Xi

,

the central limit theorem obviously does not hold in general, andζ =1/2.

Proof. – After decomposingZinto the sum of its martingale differences, we can view the log-Laplace transform ofZas a function ofN equal “temperatures”:

logEexp(λZ)=logE

exp N

i=1

λiFi(Z)

, λi=λ, i=1, . . . , N .

The first step is to take three derivatives with respect to λi, for i ranging from N backward to 1:

logEexp(λZi)=logEexp(λZi−1)+ λ 0

EλZ_i₋₁+αF_i(Z)

Fi(Z)dα.

Therefore

logEexp(λZ)= N i=1

λ 0

EλZ_i₋₁+αF_i(Z)

Fi(Z)dα

= N i=1

λ 0

(λ−β)VλZ_i−1+βFi(Z)

Fi(Z)dβ

= N i=1

λ² 2 EλZ_i₋₁

Fi(Z)²+ λ 0

(λ−γ )²

2 M³λZ_i₋₁+γ F_i(Z)

Fi(Z)dγ .

Thus, using the fact that for any real random variable

Eξ−E(ξ )³2ξ_∞Eξ−E(ξ )²2ξ_∞Eξ²2ξ³_∞, we obtain

LEMMA 1.1. –

logEexp(λZ)−λ² 2

N i=1

E_λ_E^Fi−1(Z)

Fi(Z)²

N i=1

λ³B_i³ 3N^3/2.

(7)

Remark 1.3. – The upper bound can easily be improved in the following way: notice that for any nonnegative γ, EλZ_i−1+γ F_i(Z)[Fi(Z)]0, because this expression is equal to zero when γ =0 and is nondecreasing with respect to γ. As moreover for any real random variable r →E[(ξ −r)³] is nonincreasing, we see that when E(ξ )0, E{[ξ−E(ξ )]³}E(ξ³)ξ³_∞. This shows that indeed

N i=1

E_λ_E^Fi−1(Z)

Fi(Z)² N

i=1

λ³B_i³ 6N^3/2.

To proceed in the proof of Theorem 1.1, we have now to approximateEλZ_i₋₁[Fi(Z)²] byE[Fi(Z)²], in order to get the variance ofZ, that can be written as

N i=1

EFi(Z)².

Let us put for shortVi=Fi(Z)²and let us introduce its martingale differences:

EλZ_i−1

Vi−E(Vi)=

i−1

j=1

EλZ_i−1

Fj(Vi).

To deal with the jth term of this sum, we introduce the conditional expectation with respect toGj:

E^GλZ^ji−1

Fj(Vi)=E^GλG^jj(Zi−1)

Fj(Vi)

=E^G^jFj(Vi)

=0i

+ λ 0

E^GαG^jj(Zi−1)

Fj(Vi)Gj(Zi−1)

−E^GαG^j_j(Z_i₋₁)Gj(Zi−1)dα. (1) As a consequence, lettingU=Fj(Vi)and W =(Gj(Zi−1)−E^GαG^jj(Zi−1)Gj(Zi−1))and applying the Cauchy–Schwartz inequality, we get

E^GλZ^j_i₋₁

Fj(Vi) λ 0

E^GαG^j_j(Z_i₋₁)

U²E^GαG^j_j(Z_i₋₁)

W²dα.

Reminding that we are analysing the case whenj < i, we can now observe that Gj(Zi−1)=ⁱ E^Fⁱ⁻¹Gj(Z),

and therefore that its conditional range is upper bounded by ess supGj(Zi−1)|Gj

−ess infGj(Zi−1)|Gj

Bj

√N.

This implies that its variance is bounded by ^B

2 j

4N.

(8)

(Let us remind that by definition, for any real random variable W on a probability space(),F,P), and any sub sigma algebraG⊂F,

ess sup(W |G)^def=supλ∈R: P(Wλ|G) >0. It can equivalently be defined by the relation

ess sup(W |G) > λ=P(W λ|G) >0, λ∈Q,

which shows that it belongs toL⁰(),G,P)– the set of measurable functions factorised by almost sure equality with respect to P. Here and below, inequalities involving conditional expectations and conditional essential suprema and infima are meant without further notice to hold almost surely.)

Let us consider now on some enlarged probability space two independent random variables X_i and X_j, such that (X1, . . . , XN, X_j, X_i) is distributed according to

N

k=1µk⊗µj⊗µi. We have Fj

Fi(Z)²=Fj

E^Fⁱif (X^N₁, X_i)². Moreover for any functionh(X^N₁),

Fj(h(X)²)=E^F^jjh²(X^N₁, X_j)

=E^F^jh(X₁^N)+h(X^j₁⁻¹, X_j, X^N_j₊₁)jh(X₁^N, X_j) 2 ess suph(X)E^F^jjh(X^N₁, X_j).

Applying this toh(X)=E^Fⁱ(if (X^N₁, X_i)), we get Fj

Fi(Z)²2 Bi

√NE^F^jE^Fⁱjif (X^N₁, X_i, X_j)2BiCi,j

N²⁻^ζ . Therefore

EλZi−1Fj(Vi)λBiCi,jBj

N^5/2⁻^ζ .

This, combined with Lemma 1.1, ends the proof of Theorem 1.1. The derivation of its corollary is standard: it is obtained by taking

λ= ε

V(f )+ηε/V(f ), and by applying successively the theorem tof and−f. ✷

2. Extension to unbounded ranges

The boundedness assumption of the finite differences of f can be relaxed to exponential moment assumptions. To achieve this, we will suitably modify the bounds of the previous section, still assuming that f is bounded, and will afterwards let more general functions be approximated by bounded ones.

(9)

THEOREM 2.1. – Let λ be a positive real constant. Let ^N_i₌₁(Xi,B_i, µi) be some product of probability spaces. Letf ∈L²( ^N_i₌₁(Xi,B_i, µi))be such that

Eexpλf (X)<+∞,

whereX=(X1, . . . , XN)is the canonical process. Let(X1, . . . , XN, X₁, . . . , X_N)be the canonical process on( ^N_i₌₁(Xi,B_i))^⊗²and letEbe the expectation with respect to the probability measure ( ^N_i₌₁µi)^⊗² defined on this enlarged space. Let us introduce the short notations

i=if (X₁^N, X_i), 1iN,

j,i=jif (X^N₁, X_i, X_j), 1j < iN, ,(r)=exp(r)−1−r−r²

2 = r

0

(r−u)²

2 exp(u) du, and let us decide by convention that 0×(+∞)= +∞. Then

2 Vf (X)

5

N i=1

ess supE^Gⁱ,λ|i| +√

2λ² N i=1

i−1

j=1

ess supE^Gⁱ(²_i)^1/2ess supE^F^j⁻¹(²_j,i)^1/2

×ess supE^G^jλ|j|exp(2λ|j|)−1^1/2. Let us begin with a lemma.

LEMMA 2.1. – In the case whenf ∈L^∞,

N i=1

EλZ_i₋₁

Fi(Z)² 5

N i=1

ess supE^Fⁱ⁻¹,λFi(Z). Proof. – We can notice that for any bounded real random variableξ

E[ξ−E(ξ )]³E|ξ|(ξ−E(ξ ))²+E(|ξ|)E(ξ−E(ξ ))² E(|ξ|³)+2E(ξ²)E(|ξ|)+E(|ξ|)³+E(|ξ|)E(ξ²) 5E(|ξ|³).

(Indeed E(ξ²)E(|ξ|) E(|ξ|³), from the convexity of g:β → log[E(|ξ|^β)], which implies thatg(1)−g(0)g(3)−g(2).)

Coming back to the proof of Lemma 1.1 we can write the following chain of inequalities:

λ o

(λ−γ )²

2 M³λZ_i−1+γFi(z)

Fi(Z)dγ

(10)

5 2 λ 0

(λ−γ )²EλZ_i−1+γ Fi(Z)

|Fi(Z)|³dγ

EλZ_i₋₁

5 2 λ

0

(λ−γ )²E^Fⁱ⁻¹expγ|Fi(Z)||Fi(Z)|³dγ

5

2ess supE^Fⁱ⁻¹ ^λ

0

(λ−γ )²|Fi(Z)|³expγ|Fi(Z)|dγ

.

(As in the case of Lemma 1.1 – see Remark 1.3 – the upper bound can be improved, noticing that for any real random variable E{[ξ−E(ξ )]³}E(ξ³)as soon asE(ξ ) 0.) ✷

Proof of Theorem 2.1. – Let us prove the theorem first whenf ∈L^∞. In addition to those introduced in Theorem 2.1, we will need the following short notation:

j

i=if(X1, . . . , Xj−1, X_j, X_j^N₊₁), X_i, 1j < iN.

Let us come back to Eq. (1) and notice that

Fj(Vi)=E^F^jE^Fⁱi+^ji

E^Fⁱ(j,i), Gj(Zi−1)=E^Fⁱ⁻¹(i),

and therefore, from the Cauchy–Schwartz inequality and the convexity ofr→r², that Fj(Vi)E^F^ji+^ji

21/2

E^F^j²_j,i^1/2. We can thus bound the integrand ofE^GGj^j(Z_i−1)in the last term of Eq. (1) by

Fj(Vi)Gj(Zi−1)−E^GGj^j(Z_i−1)Gj(Zi−1)A×B, where

A=E^F^j²_j,i^1/2exp

−α

2Gj(Zi−1)

, B=exp

α

2Gj(Zi−1)E^F^ji +^ji

21/2Gj(Zi−1)+E^GGj^j(Z_i−1)Gj(Zi−1). Applying the Cauchy–Schwartz inequality and noticing that for any real random variable ξ such thatE(ξ )0,

Eexp(αξ )expαE(ξ )1, we get

EλZi−1

Fj(Vi)EλZi−1

λ 0

E^G_αG^j_j_(Z_i₋₁₎A²^1/2E^G_αG^j_j_(Z_i₋₁₎B²^1/2dα

(11)

EλZ_i−1

λ 0

E^G^jexpαGj(Zi−1)A²₀^1/2E^GαG^jj(Zi−1)

B²^1/2dα

2EλZ_i₋₁

E^F^j−1²_j,i^1/2

× λ 0

E^GαG^j_j(Z_i₋₁)

expαGj(Zi−1)Gj(Zi−1)²

+E^GαG^jj(Z_i−1)Gj(Zi−1)²^1/2dα

×ess supE^F^j²_i+ess supE^F^j^j²_i^1/2,

where we have also used the fact that(a+b)²2(a²+b²). To simplify the bound, we can now notice that for any random variableξsuch thatE(ξ )0,[E(ξ )]²E[exp(αξ )] E[ξ²exp(αξ )](because[E(ξ )]²[Eαξ(ξ )]²Eαξ(ξ²)which is the inequality we seek).

We can also notice that

ess supE^F^j²_i=ess supE^F^j^j²_iess supE^Gⁱ²_i. These two remarks lead to

EλZ_i₋₁

Fj(Vi) 4EλZi−1

E^F^j⁻¹²_j,i^1/2

× λ 0

E^G^jexp2αGj(Zi−1)Gj(Zi−1)²^1/2dα

ess supE^Gⁱ²_i^1/2. Remembering that Gj(Zi−1)=E^Fⁱ⁻¹(j), using the convexity ofr →r²exp(2αr)on the positive real axis and another round of the Cauchy–Schwartz inequality, we get

λ 0

E^G^jexp2αGj(Zi−1)Gj(Zi−1)²^1/2dα

λ 2

E^G^jE^Fⁱ⁻¹exp(2λ|j|)−1|j|^1/2

λ 2

ess supE^G^jexp(2λ|j|)−1|j|^1/2. Thus EλZi−1

Fj(Vi)

2√

2EλZi−1

E^F^j⁻¹²_j,i^1/2

×ess supE^G^jλ|j|exp(2λ|j|)−1^1/2ess supE^Gⁱ²_i^1/2.

(12)

This combined with Lemma 2.1 proves the following slight improvement of Theorem 2.1 in the case whenf ∈L^∞:

2 Vf (X)

5

N i=1

ess supE^Fⁱ⁻¹,λ|i| +√

2λ² N i=1

i−1

j=1

ess supE^Gⁱ²_i^1/2EλZ_i−1

E^F^j−1²_j,i^1/2

×ess supE^G^jλ|j|exp2λ|j|−1^1/2. (2) When f satisfies only the weaker assumptions of Theorem 2.1, we can introduce the bounded functionsf^T(X)=min{max{f (X),−T}, T},T ∈N.

The terms of the left-hand side of Eq. (2) for f^T converge from the dominated convergence theorem to their counterpart forf. Indeed

expλf^T(X)max1,expλf (X)∈L¹, f^T(X)|f (X)| ∈L².

In the right-hand side, we can use the bounds

if^T(X, X_i)if (X, X_i),

and apply the dominated convergence theorem to prove that, introducing the notations j,i(f )=jif (X, X_i, X_j),

Zi(f )=E^Fⁱ[f (X)] −E[f (X)],

T→+∞lim EλZ_i−1(f^T)

E^F^j−1j,i(f^T)²^1/2=EλZ_i₋₁(f )

E^F^j−1[j,i(f )]²^1/2,

as soon as ess supE^Gⁱ(²_i) <+∞ (in the other case, Theorem 2.1 is trivial). Indeed EλZ_i−1(f )=E_λ_E^Fi−1[f (X)],

expλE^Fⁱ⁻¹f^T(X)max1,E^Fⁱ⁻¹expλf (X)∈L¹ and, putting^ji(f )=if (X^j, X_i), (so thatj,i(f )=i(f )−^ji(f )),

E^F^j⁻¹[j,i(f^T)]²2E^F^j⁻¹i(f^T)²+^ji(f^T)²4 ess supE^Gⁱi(f )². This proves that inequality (2), and consequently Theorem 2.1 which is weaker, holds forf. ✷

(13)

3. Generalisation to Markov chains

We will study here the case when(X1, . . . , XN)is a Markov chain. The assumptions on f will be the same as in the first section. Therefore we will assume throughout this section that

ifx₁^N, yi

Bi

√N, i=1, . . . , N, and that

jifx₁^N, yi, yj

Ci,j

N^3/2⁻^ζ, 1j < iN.

When the random variables (X1, . . . , XN) are dependent, we have to modify the definition of the operators Gi(Z) and of the sigma algebras G_i. Indeed to generalise the first part of the proof we would like to have the identity

E^Fⁱ(Gi(Z))=Fi(Z),

whereGi is “as small as possible”. This identity does not hold in the general case with the definition we had forGi, so we will have to change it.

To generalise the second part of the proof, we need to consider a new definition of the sigma algebraGj for which

E^G^jFj(W )=0, andZ−E^G^j(Z)is as small as possible.

We propose a solution here where the two objectsGi andGj are built with the help of coupled processes. This use of coupling was inspired by the works of Katalin Marton [11–13].

Let us remind first some general facts about maximal coupling: given some probability space (),S), and two probability measures µ and ν on (),S), a maximal coupling between µ and ν is a probability measure ρ on ()×),S⊗S) such thatρ(dω1)= µ(dω1), ρ(dω2)=ν(dω2)and ρ(ω1=ω2) is maximal (whereω1 andω2 are the first and second coordinates on)×)). In words, a maximal coupling is a joint distribution with prescribed marginals and a maximal weight given to the diagonal. The maximal weight of the diagonal is obviously (µ∧ν)()), whereµ∧ν is defined asd(µ∧ν)= min{_d(µ^dµ₊_ν),_d(µ^dν₊_ν)}d(µ+ν) and is necessarily the trace of any maximal coupling distribution on the diagonal; it is also related to the variation distance µ−νvar

def=

1

2|µ−ν|())betweenµandνby the formula(µ∧ν)())=1− µ−νvar. A maximal coupling betweenµ and ν is not unique (its off diagonal behaviour being arbitrary, as long as the marginal constraints are satisfied), however one possible explicit construction is the following: consider on the product space()×)× [0,1],S^⊗²⊗B), whereBis the Borel sigma algebra, the product probability measureµ⊗ µ−ν⁻_var¹(ν−µ)₊⊗U, where (ν−µ)₊=ν−(ν∧µ) is the positive part of the signed measure ν−µ and whereU is the Lebesgue measure on[0,1]. Let(ω1, ω2, r) be the three coordinates of