Large Deviations for a matching problem related to the $\infty$-Wasserstein distance

(1)

HAL Id: hal-00641378

https://hal.archives-ouvertes.fr/hal-00641378v2

Submitted on 15 Feb 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de

Large Deviations for a matching problem related to the

∞ -Wasserstein distance

José Trashorras

To cite this version:

José Trashorras. Large Deviations for a matching problem related to the ∞ -Wasserstein distance.

ALEA : Latin American Journal of Probability and Mathematical Statistics, Instituto Nacional de

Matemática Pura e Aplicada, 2018, 15, pp.247-278. �hal-00641378v2�

(2)

Large Deviations for a matching problem related to the ∞-Wasserstein distance

Jos´ e Trashorras

^∗

February 15, 2017

Abstract

Let (E, d) be a compact metric space, X = (X

1

, . . . , X

n

, . . . ) and Y = (Y

1

, . . . , Y

n

, . . . ) two independent sequences of independent E-valued random variables and (L

^X_n

)

n≥1

and (L

^Y_n

)

n≥1

the associated sequences of empirical measures. We establish a large deviation principle for (W

∞

(L

^X_n

, L

^Y_n

))

n≥1

where W

∞

is the ∞-Wasserstein distance, W

∞

(L

^Xn

, L

^Yn

) = min

σ∈S_n

max

1≤i≤n

d(X

i

, Y

σ(i)

) where S

n

stands for the set of permutations of {1, . . . , n}.

1 Introduction

We say that a sequence of Borel probability measures (P

ⁿ

)

n≥1

on a topological space Y obeys a Large Deviation Principle (hereafter abbreviated LDP) with rate function I if I is a non-negative, lower semi-continuous function defined on Y such that

− inf

y∈A^o

I(y) ≤ lim inf

n→∞

1 n log P

ⁿ

(A) ≤ lim sup

n→∞

1 n log P

ⁿ

(A) ≤ − inf

y∈A¯

I(y) for any measurable set A ⊂ Y, whose interior is denoted by A

^o

and closure by A. If the level sets ¯ {y : I(y) ≤ α} are compact for every α < ∞, I is called a good rate function. With a slight abuse of language we say that a sequence of random variables obeys an LDP when the sequence of measures induced by these random variables obeys an LDP. For a background on the theory of large deviations see Dembo and Zeitouni [4] and references therein.

Let (E, d) be a metric space. There has been a lot of interest for years in considering the space M

¹

(E) of Borel probability measures on E endowed with the so-called p-Wasserstein distances

W

_p

(ν, γ) = inf

Q∈C(ν,γ)

( Z

E×E

d(x, y)

^p

Q(dx, dy)

^1/p

)

∗Universit´e Paris-Dauphine, Ceremade, Place du mar´echal de Lattre de Tassigny, 75775 Paris Cedex 16 France. Email: xose@ceremade.dauphine.fr

(3)

where p ∈ [1, ∞) and C(ν, γ) stands for the set of Borel probability measures on E

²

with first marginal Q

₁

= ν and second marginal Q

₂

= γ, see Chapter 6 in [20] for a broad review. However, the ∞-Wasserstein distance equivalently defined by either

W

_∞

(ν, γ) = lim

p→∞

W

p

(ν, γ) or

W

_∞

(ν, γ) = inf

Q∈C(ν,γ)

sup

u, u ∈ S(Q ◦ d

⁻¹

) (1) where S(Q◦ d

⁻¹

) ⊂ E

²

stands for the support of the probability measure Q◦ d

⁻¹

has attracted much less attention so far.

¹

Our framework is the following : We are given a compact metric space (E, d). Without loss of generality we can assume that sup

_x,y∈E

d(x, y) = 1. Let (X

n

)

_n≥1

and (Y

n

)

_n≥1

be two independent sequences of E-valued independent random variables defined on the same probability space (Ω, A, P). We assume that all the X

i

’s (resp. Y

i

’s) have the same distribution µ

¹

(resp. µ

²

). For every n ≥ 1 we consider the empirical measures

L

^X_n

= 1 n

n

X

i=1

δ

Xi

and L

^Y_n

= 1 n

n

X

i=1

δ

Yi

.

In [7] Ganesh and O’Connell conjecture that (W

∞

(L

^X_n

, L

^Y_n

))

n≥1

obeys an LDP with rate function

I

_∞

(x) = inf

ν¹,ν²∈M¹(E)

W∞(ν1,ν2 )=x

H (ν

¹

|µ

¹

) + H (ν

²

|µ

²

)

where for any two ν, µ ∈ M

¹

(E) H (ν|µ) =

R

E

log

^dν_dµ

dν if ν µ

∞ otherwise.

If instead of I

_∞

one considers its lower semi-continuous regularization J

_∞

which is defined by

J

_∞

(x) = sup

δ>0

inf

y∈B(x,δ)

I

_∞

(y), see e.g. Chapter 1 in [15], then

Theorem 1.1 The sequence (W

∞

(L

^X_n

, L

^Y_n

))

n≥1

satisfies an LDP on [0, 1] with good rate function J

∞

(x).

In Section 3.1 we show that if e.g. E = [0, 1] is endowed with the usual Euclidean distance and µ

¹

= µ

²

= µ admits f (x) =

³₂

1

_[0,1

3]

(x) +

³₂

1

_[2

3,1]

(x) as a density w.r.t. the Lebesgue measure then, due to the disconnectedness of the support of µ, I

_∞

fails to be lower semi-continuous. Hence, I

_∞

and J

_∞

do not always coincide. This example illustrates a general feature of the problem considered

1Let us recall that the supportS(P) of a Borel probability measureP on a metric space (E, d) is defined by x∈ S(P) if and only if for everyε > 0, P(B(x, ε))> 0 whereB(x, ε) stands for the open ball centered atxwith radiusε. IfE is separable, in particular ifE is compact,S(P) is closed, see e.g. Theorem 2.1 in [14].

(4)

here : the connectedness properties of the support of the reference measures are the key ingredient in the behavior of (W

_∞

(L

^X_n

, L

^Y_n

))

_n≥1

. We will say more about that below, in Proposition 1.3 below.

Since for every n ≥ 1 every Q ∈ C(L

^X_n

, L

^Y_n

) can be represented as a bi- stochastic matrix and since, according to the Birkhoff-Von Neumann Theorem, every bi-stochastic matrix is a convex combination of permutation matrices (see e.g. Theorem 5.5.1 in [17]) we have

W

_∞

(L

^X_n

, L

^Y_n

) = min

σ∈Sn

1≤i≤n

max d(X

i

, Y

_σ(i)

) (2) where S

n

stands for the set of permutations of {1, . . . , n}. Hence, computing W

∞

(L

^X_n

, L

^Y_n

) is nothing but solving a minimax matching problem which is a fundamental combinatorial question. Equality (2) can be rephrased as

W

_∞

(L

^X_n

, L

^Y_n

) = d

_H

(S(L

^X_n

), S(L

^Y_n

)) (3) where S(L

^X_n

) = {X

1

, . . . , X

_n

}, S(L

^Y_n

) = {Y

1

, . . . , Y

_n

} and d

_H

is the so-called Hausdorff distance. Lets us recall that the Hausdorff distance is defined on the set of non-empty closed subsets of E by

d

_H

(A, B) = max

sup

x∈A

inf

y∈B

d(x, y), sup

y∈B

inf

x∈A

d(x, y)

see e.g. Section 6.2.2 in [3]. It is the essential tool in many applications like image processing [8], object matching [18], face detection [10] and evolutionary optimization [16] to name just a few.

Now that we have stated our main result, let us briefly review some facts about W

_∞

. The reference paper on the subject is [1] by Champion, De Pascale and Juutinen. They consider measures with support on compact subsets of R

^d

(d ≥ 1). Some of their results have been generalized from R

^d

to the setting of Polish spaces and general cost functions by Jylh¨ a in [9]. We recall them for latter use.

Lemma 1.1 (Proposition 2.1 in [1] and Theorem 2.6 in [9]) For any two ν, γ ∈ M

¹

(E) there exists at least one Q ∈ C(ν, γ) such that

W

_∞

(ν, γ) = sup

u, u ∈ S(Q ◦ d

⁻¹

) .

They further establish the existence of nice solutions to the problem (1) which they call infinitely cyclically monotone couplings.

Definition 1.1 A probability measure P ∈ M

¹

(E

²

) is called infinitely cyclically monotone if and only if for every integer n ≥ 2, every (x

₁

, y

₁

), . . . , (x

_n

, y

_n

) ∈ S(P ) and every σ ∈ S

_n

we have

max

1≤i≤n

d(x

_i

, y

_i

) ≤ max

1≤i≤n

d(x

_i

, y

_σ(i)

). (4)

Lemma 1.2 (Theorem 3.2 and Theorem 3.4 in [1] and Theorem 2.16 in [9]) For any two γ, ν ∈ M

¹

(E) there exists at least one infinitely cyclically monotone P ∈ C(γ, ν) such that

W

_∞

(γ, ν) = sup

u, u ∈ S (P ◦ d

⁻¹

) .

(5)

In order to check that P ∈ M

¹

(E

²

) is infinitely cyclically monotone we only need to know its support. The way mass is spread over S(P) does not matter : this property is concerned with subsets of E

²

rather than with fully specified probability measures. This is the reason why we will also call infinitely cyclically monotone subsets of E

²

that satisfy (4). Infinitely cyclically monotone couplings are nice solutions to the problem (1) in the sense that

Lemma 1.3 (Theorem 3.4 in [1] and Theorem 2.17 in [9]) Any infinitely cyclically monotone P ∈ M

¹

(E

²

) satisfies

W

∞

(P

1

, P

2

) = sup

u, u ∈ S(P ◦ d

⁻¹

) .

Indeed, once an infinitely cyclically monotone A ⊂ E

²

is given, no matter how one spreads mass over A provided the couple that maximizes (x, y) 7→ d(x, y) over A is covered, the marginals of the resulting probability measure over E

²

will always be at the same W

_∞

distance. This will show useful when constructing approximations. For any two γ, ν ∈ M

¹

(E) we shall denote by C

∞

(ν, γ) the set of infinitely cyclically monotone couplings of ν and γ.

Let us give an elementary example in order to illustrate the difference between the W

p

’s and W

∞

. Let E = [0, 1] and consider the sequence (ν

ⁿ

= (1 −

¹_n

)δ

₀

+

_n¹

δ

₁

)

_n≥1

. The ν

ⁿ

’s are perturbations of δ

₀

in the sense that for every p ∈ [1, ∞) we have lim

n→∞

W

_p

(ν

ⁿ

, δ

₀

) = 0. Nevertheless for every n ≥ 1 we have W

_∞

(ν

ⁿ

, δ

₀

) = 1 since S (ν

ⁿ

) = {0, 1} while S (δ

₀

) = {0}. This shows that perturbations so small that they do not impact the W

_p

distances in the limit can have dramatic consequences when using the W

_∞

distance instead.

Again, an illustration of how this singularity affects (W

_∞

(L

^X_n

, L

^Y_n

))

_n≥1

is given in Proposition 1.3 below.

Next we briefly review the literature on results involving matching problems and the W

_p

’s and W

_∞

distances. The first results on (W

_∞

(L

^X_n

, L

^Y_n

))

_n≥1

have been focused on concentration inequalities for independent [0, 1]

²

-valued X

_i

’s and Y

_i

’s with common distribution µ

¹

= µ

²

= λ where λ stands for the uniform distribution over [0, 1]

²

. In [13] Leighton and Shor establish that there exists a K > 0 such that

1 K n

^−1/2

(log n)

^3/4

≤ W

_∞

(L

^X_n

, L

^Y_n

) ≤ Kn

^−1/2

(log n)

^3/4

with probability 1 − o(1). Using majorizing measures, Talagrand showed in [19] that the latter result is a particular case of a general property of empirical discrepancies.

Concerning Large Deviations, in [7] the authors prove that if E = [0, 1]

²

and µ

¹

= λ is the uniform distribution over E then (W

_∞

(L

^X_n

, λ))

_n≥1

obeys an LDP with good rate function

T

_∞

(x) = inf

ν∈M¹(E)

W∞(ν,λ)=x

{H (ν |λ)} .

To obtain this result they prove that in this particular case the map ν 7→

W

_∞

(ν, λ) is continuous with respect to the weak convergence topology, and

then apply the contraction principle, see Theorem 4.2.1 in [4]. They further

show in the same framework as here, i.e. E-valued X

i

’s and Y

i

’s where (E, d)

(6)

is any compact metric space, that (W

₁

(L

^X_n

, L

^Y_n

))

_n≥1

obeys an LDP with good rate function

I

1

(x) = inf

ν¹,ν²∈M¹(E)

W1 (ν1,ν2 )=x

H (ν

¹

|µ

¹

) + H(ν

²

|µ

²

) .

Their proof relies on the fact that, according to the Kantorovitch-Rubinstein Theorem (see e.g. Theorem 11.8.2 in [5]), when E is compact W

1

generates the weak convergence topology. As a consequence (L

^X_n

)

n≥1

and (L

^Y_n

)

n≥1

satisfy an LDP on M

¹

(E) endowed with W

1

and again the contraction principle leads to an LDP for (W

1

(L

^X_n

, L

^Y_n

))

n≥1

. Following the same approach, one can deduce from Theorem 1.1 in [21] that for every p ∈ [1, ∞) the sequence (W

p

(L

^X_n

, L

^Y_n

))

n≥1

obeys an LDP with good rate function I

_p

(x) = inf

ν¹,ν²∈M¹(E)

Wp(ν1,ν2 )=x

H (ν

¹

|µ

¹

) + H(ν

²

|µ

²

) .

One might wonder why this idea is not applicable to the analysis of the LD properties of (W

∞

(L

^X_n

, L

^Y_n

))

n≥1

. Actually an LDP for (L

^X_n

)

n≥1

can not hold when M

¹

(E) is endowed with W

∞

, at least at this level of generality. Indeed, consider the probability measure µ

¹

on E = [0, 1] which density with respect to the Lebesgue measure is g(x) = 2(1

_[0,¹

4]

(x)+1

_[³

4,1]

(x)) and assume that (L

^X_n

)

n≥1

satisfies an LDP on (M

¹

(E), W

_∞

) with some rate function R. Clearly for every odd integer n we have P(W

_∞

(L

^X_n

, µ

¹

) < 3/8) = 0 hence for every ν ∈ B(µ

¹

, 3/8) we should necessarily have R(ν) = ∞. So we would get

lim sup

n→∞

1 n log P(W

_∞

(L

^X_n

, µ

¹

) ≤ 1

4 ) = −∞ (5)

but for every even n we have P(W

_∞

(L

^X_n

, µ

¹

) ≤

¹₄

) ≥

_n/2ⁿ

2

⁻ⁿ

which contradicts (5). Again this is due to the lack of connectedness of the support of µ

¹

and, in a sense, Theorem 1.1 is the only possible “Sanov-like” statement involving W

_∞

without any additional assumption.

Now let us give some facts about the zeros of J

_∞

. First notice that since 0 ≤ I

_∞

(W

_∞

(µ

¹

, µ

²

)) ≤ H(µ

¹

|µ

¹

) + H(µ

²

|µ

²

) = 0

we necessarily have J

∞

(W

∞

(µ

¹

, µ

²

)) = 0. Actually, due to the highly discon- tinuous nature of W

∞

the zeros of J

∞

need not be reduced to W

∞

(µ

¹

, µ

²

). We shall first prove the following

Lemma 1.4 Let x ∈ [0, 1] be such that J

_∞

(x) = 0. Necessarily x ≥ W

_∞

(µ

¹

, µ

²

).

The zeros of J

_∞

are related to some particular elements of E

²

as defined next Definition 1.2 We shall say that (a, b) ∈ E

²

is a couple of directly connected points and write a ↔ b if and only if a ∈ S(µ

¹

), b ∈ S(µ

²

) and there exists Q ∈ C

_∞

(µ

¹

, µ

²

) such that for every integer N ≥ 2, every (α

2

, β

2

), . . . , (α

N

, β

N

) ∈ S(Q) and every σ ∈ S

N

we have d(a, b) ≤ max

i=1,...,N

d(α

i

, β

_σ(i)

) where

(α

1

, β

1

) = (a, b).

(7)

Notice that in this definition (a, b) need not be an element of S(Q). In words, a ↔ b if and only if there exists Q ∈ C

_∞

(µ

¹

, µ

²

) such that {(a, b)} ∪ S(Q) is still an infinitely cyclically monotone subset of E

²

. Consider the set

Z

_µ1,µ²

= {x ∈ [0, 1] : ∃(a, b) ∈ E

²

such that a ↔ b and x = d(a, b)}

Proposition 1.1 J

∞

(x) = 0 if and only if x ∈ Z

µ¹,µ²

.

The notion of directly connected points takes a simpler form when µ

¹

= µ

²

. Proposition 1.2 If µ

¹

= µ

²

= µ then a, b ∈ E are directly connected if and only if a, b ∈ S(µ) and for every integer N ≥ 3 and every sequence α

1

, . . . , α

N

of elements of S(µ) such that α

1

= a and α

N

= b there is at least one integer L such that 1 ≤ L ≤ N − 1, and d(a, b) ≤ d(α

L

, α

L+1

).

In words, if µ

¹

= µ

²

= µ then a ↔ b if and only if one can not decompose a journey from a to b into several stages between elements of S(µ) such that each of these stages has length strictly smaller than d(a, b). As an example consider again E = [0, 1] and µ

¹

= µ

²

= µ with density f (x) =

³₂

1

_[0,1

3]

(x) +

³₂

1

_[2 3,1]

(x) w.r.t. the Lebesgue measure. Then

¹₃

↔

²₃

hence J

∞

(

¹₃

) = 0 as it is established through a direct computation in Section 3.1.

Finally we investigate the almost sure behavior of (W

_∞

(L

^X_n

, L

^Y_n

))

_n≥1

as n → ∞ when µ

¹

= µ

²

= µ. The elements of Z

µ

, hence the asymptotic behavior of (W

_∞

(L

^X_n

, L

^Y_n

))

_n≥1

, are directly related to the connectedness properties of S(µ). So let us write S(µ) = ∪

^D_i=1

A

i

where the A

i

are the connected components of S (µ). Here D = 1 means that S (µ) is connected and we do not exclude the possibility that D = ∞. We have

Lemma 1.5 Z

µ

= {0} if and only if S(µ) is connected.

For every x ∈ S(µ) let us denote by A

_i(x)

the connected component of S(µ) that contains x and for any two closed B, C ⊂ E we set

d(B, C) = inf

(x,y)∈B×C

d(x, y).

Lemma 1.6 For every a, b ∈ S(µ), if a ↔ b then d(a, b) = d(A

_i(a)

, A

_i(b)

).

For any two connected components A

i

, A

j

of S(µ) we shall write A

i

↔ A

j

if and only if there exists a ∈ A

i

and b ∈ A

j

such that a ↔ b. It directly follows from Lemma 1.6 that

Z

µ

=

d(A

_i

, A

_j

), (i, j) ∈ {1, . . . , D}

²

such that A

_i

↔ A

_j

. In order to formulate our result we need the following

Lemma 1.7 If S(µ) is not connected every x ∈ Z

µ

such that x 6= 0 is an isolated point.

Hence, if S (µ) is not connected, we can order the elements of Z

µ

β

₁

> β

₂

> · · · > β

_n

> · · · where β

1

= max Z

µ

, and for every integer i ≥ 2

β

_i

=

max Z

µ

\ {β

1

, . . . , β

_i−1

} if β

_i−1

6= 0

0 if β

_i−1

= 0.

(8)

Proposition 1.3 1. If S(µ) is connected then W

_∞

(L

^X_n

, L

^Y_n

) → 0 almost surely.

2. If S(µ) is not connected and

(a) there exists only two connected components A

i

, A

j

of S (µ) such that A

i

↔ A

j

and d(A

i

, A

j

) = β

1

then

lim inf W

_∞

(L

^X_n

, L

^Y_n

) = β

₂

and lim sup W

_∞

(L

^X_n

, L

^Y_n

) = β

₁

(6) almost surely.

(b) there are more than two connected components A

i

, A

j

of S(µ) such that A

i

↔ A

j

and d(A

i

, A

j

) = β

1

then W

_∞

(L

^X_n

, L

^Y_n

) → β

1

almost surely.

Hence, if E = [0, 1] and µ admits f (x) =

³₂

1

_[0,¹

3]

(x) +

³₂

1

_[²

3,1]

(x) as a density w.r.t. the Lebesgue measure then lim sup W

_∞

(L

^X_n

, L

^Y_n

) =

¹₃

almost surely and lim inf W

_∞

(L

^X_n

, L

^Y_n

) = 0 almost surely.

The paper is structured as follows : Section 2 is devoted to the proof of Theorem 1.1. In Section 3 we discuss the properties of I

_∞

and J

_∞

. Section 4 is concerned with the almost sure asymptotic behavior of (W

_∞

(L

^X_n

, L

^Y_n

))

_n≥1

while Section 5 deals with the proof of some complementary results.

2 Proof of Theorem 1.1

The basic idea in the proof of Theorem 1.1 is to partition E in such a way that we are essentially led to consider finite set valued X

i

’s and Y

i

’s. Indeed, on the one hand we will see that W

_∞

is well-behaved with respect to partitioning (see Lemma 2.1 below) and on the other hand proceeding this way reduces the computation of probabilities to simple classical combinatorial estimates. In order to go from the particular case, i.e. E finite, to the general one we shall need some results on the weak convergence of nets of probability measures. In Section 2.1 we give an account on partitions, nets and the weak convergence topology. The proof of all the statements there is postponed to Section 5. The LD lower bound is established in Section 2.2 while the LD upper bound is derived in Section 2.3.

2.1 Some facts about partitions of E, nets and the weak convergence topology

Let P be the set of finite measurable partitions of E into non-empty sets. To every Π = (A

₁

, . . . , A

_L

) ∈ P we associate once for all through Section 2 a family (s

₁

, . . . , s

_L

) ∈ E

^L

such that for every 1 ≤ i ≤ L we have s

_i

∈ A

_i

. We further associate to every Π = (A

₁

, . . . , A

_L

) ∈ P a map π as follows

π : M

¹

(E) → M

¹

({s

1

, . . . , s

L

})

ν 7→ P

L

i=1

ν(A

i

)δ

s_i

.

Finally for every Π = (A

1

, . . . , A

L

) ∈ P we define its maximal diameter as

∆(Π) = max

1≤i≤L

sup

x,y∈A_i

d(x, y).

(9)

The following result links the W

_∞

distance between elements of M

¹

(E) to the analogue W

_∞

distance between their contractions through π.

Lemma 2.1 For every ν

¹

, ν

²

∈ M

¹

(E) and every Π ∈ P we have

|W

∞

(ν

¹

, ν

²

) − W

_∞

(π(ν

¹

), π(ν

²

))| ≤ 2∆(Π).

Let Π = (A

₁

, . . . , A

_L

) and K = (B

₁

, . . . , B

_R

) be two elements of P. We say that K is a refinement of Π if and only if for every 1 ≤ i ≤ L, there exists J

_i

⊂ {1, . . . , R} such that A

_i

= ∪

_j∈J_i

B

_j

, and we denote it by Π K. This makes (P , ) a directed set. Let us recall that (J, E ) is a directed set if and only if J is a non-empty set and E is a reflexive and transitive relation on J such that for every i, j ∈ J there exists a k ∈ J such that i E k and j E k. We introduce a general directed set (J, E ) since we will use both (P, ) and (N, ≤) as directed sets and we do not want to be too specific in the results below. We call net any map (P

^j

)

_j∈J

defined on a directed set (J, E ). A topological space (T, T )-valued net (P

^j

)

_j∈J

is said to converge to some P ∈ T if and only if for every neighborhood U of P there exists a j(U) ∈ J such that for every k ∈ J satisfying j(U ) E k we have P

^k

∈ U . We shall denote this lim

j∈J

P

^j

= P. We call subnet of (P

^j

)

j∈J

any sub-family (P

^l

)

l∈L

parametrized by a cofinal L, i.e.

a subset L ⊂ J such that for every j ∈ J there exists l ∈ L satisfying j E l.

(While this is not the most general definition of a subnet it will be sufficient for the problem considered here). Let us recall that a topological space (T, T ) is compact if and only if every net in T admits a subnet that converges to some point in T . Finally, for every real-valued net (P

^j

)

_j∈J

we shall consider as usual

lim sup

j∈J

P

^j

= inf

j∈J

sup

i:jEi

P

ⁱ

and lim inf

j∈J

P

^j

= sup

j∈J

inf

i:jEi

P

ⁱ

.

For this and other questions related to nets we refer to e.g. [12]. The following lemmas are a consequence of a kind of Portmanteau result for nets of probability measures that will be derived in Section 5.

Lemma 2.2 Every net (P

^j

)

_j∈J

of probability measures supported on R that converges weakly to some probability measure P satisfies

lim sup

j∈J

sup S (P

^j

) ≥ sup S(P ). (7)

Lemma 2.3 Let (P

^j

)

_j∈J

be a net of Borel probability measures on a compact metric space (Y, δ) that converges weakly to some probability measure P . For every x ∈ S (P ) and every j ∈ J there exists an x

^j

∈ S (P

^j

) such that the net (x

^j

)

_j∈J

converges to x.

Lemma 2.4 Every P ∈ M

¹

(E

²

) which is the limit in the weak convergence topology of a net (P

^j

)

_j∈J

of infinitely cyclically monotone elements of M

¹

(E

²

) is infinitely cyclically monotone.

2.2 LD lower bound

For every integer n ≥ 1 we consider M

^1,n

(E)=

(

ν ∈M

¹

(E) : there exists (x

1

, . . . , x

n

) ∈ E

ⁿ

such that ν = 1 n

n

X

i=1

δ

x_i

)

.

The following lemma is the key point in the proof of the LD lower bound

(10)

Lemma 2.5 For every ε > 0, every Π = (A

₁

, . . . , A

_L

) ∈ P with ∆(Π) ≤ ε/2 and every ν

¹

, ν

²

∈ M

¹

(E) there exists two sequences (ν

^1,n

)

_n≥1

and (ν

^2,n

)

_n≥1

such that

1. For every n ≥ 1 we have ν

^1,n

, ν

^2,n

∈ M

^1,n

(E).

2. For every 1 ≤ i ≤ L we have ν

^1,n

(A

i

) → ν

¹

(A

i

) and ν

^2,n

(A

i

) → ν

²

(A

i

).

3. There exists an N

0

such that for every n ≥ N

0

we have

|W

_∞

(ν

¹

, ν

²

) − W

_∞

(ν

^1,n

, ν

^2,n

)| ≤ ε.

Proving Lemma 2.5 actually requires one more lemma

Lemma 2.6 For every Q ∈ M

¹

(E

²

) and every Π = (A

1

, . . . , A

L

) ∈ P there exists a sequence (Q

ⁿ

=

_n¹

P

n

i=1

δ

_(xⁿ

i,y_iⁿ)

)

_n≥1

of elements of M

¹

(E

²

) such that 1. For every 1 ≤ u, v ≤ L we have Q

ⁿ

(A

_u

× A

_v

) → Q(A

_u

× A

_v

).

2. The sequence (Q

ⁿ

)

_n≥1

converges weakly to Q.

3. If Q(A

u

× A

v

) = 0 then for every n ≥ 1 we have Q

ⁿ

(A

u

× A

v

) = 0.

Proof of Lemma 2.6 Let Q be an element of M

¹

(E

²

) and Π = (A

₁

, . . . , A

_L

) ∈ P . Let (X

i

, Y

i

)

_i≥1

be a sequence of independent and Q−identically distributed random couples. According to the strong Law of Large Numbers there exists an event B of probability 1 such that for every 1 ≤ u, v ≤ L

1 n

n

X

i=1

1

_A_u_×A_v

(X

_i

, Y

_i

) → Q(A

_u

× A

_v

)

on B. Moreover, since E

²

is separable (it is compact), according to Varadara- jan’s Lemma, see e.g. Theorem 11.4.1 in [5], there exists an event C of probability 1 such that

1 n

n

X

i=1

δ

_(X_i_,Y_i₎

→

^w

Q

on C where →

^w

stands for the weak convergence of probability measures. Hence almost every realization of

_n¹

P

n

i=1

δ

_(X_i_,Y_i₎

n≥1

can play the role of (Q

ⁿ

)

_n≥1

and the conclusions of Lemma 2.6 follow.

Proof of Lemma 2.5 Let ν

¹

and ν

²

be two elements of M

¹

(E), ε > 0 and Π ∈ P with ∆(Π) ≤ ε/2. According to Lemma 1.2 there is a Q ∈ C(ν

¹

, ν

²

) such that

W

_∞

(ν

¹

, ν

²

) = sup S(Q ◦ d

⁻¹

).

Let (Q

ⁿ

)

_n≥1

be the sequence of elements of M

¹

(E

²

) associated to Q and Π by Lemma 2.6. We shall prove that (ν

^1,n

= Q

ⁿ₁

)

_n≥1

an (ν

^2,n

= Q

ⁿ₂

)

_n≥1

meet the conditions of Lemma 2.5. First, (1) and (2) in Lemma 2.5 are clearly satisfied with these (ν

^1,n

)

_n≥1

and (ν

^2,n

)

_n≥1

. Now, due to the definition of W

_∞

, for every n ≥ 1 we have

W

_∞

(ν

^1,n

, ν

^2,n

) ≤ sup S(Q

ⁿ

◦ d

⁻¹

)

(11)

and due to (1) and (3) in Lemma 2.6 there exists N

₁

such that for every n ≥ N

₁

we have

sup S(Q

ⁿ

◦ d

⁻¹

) ≤ sup S (Q ◦ d

⁻¹

) + ε

hence W

_∞

(ν

^1,n

, ν

^2,n

) ≤ W

_∞

(ν

¹

, ν

²

) +ε. So we are left to prove that there exists an N

2

such that for every n ≥ N

2

we have W

_∞

(ν

¹

, ν

²

) − ε ≤ W

_∞

(ν

^1,n

, ν

^2,n

).

Let us assume that this is not true i.e. that there exists a sequence (n

k

)

_k≥1

such that for every k ≥ 1 we have

W

∞

(ν

¹

, ν

²

) − ε > W

∞

(ν

^1,n^k

, ν

^2,n^k

). (8) According to Lemma 1.1 for every k ≥ 1 there exists a C

ⁿ^k

∈ C(ν

^1,n^k

, ν

^2,n^k

) such that

W

_∞

(ν

^1,n^k

, ν

^2,n^k

) = sup S(C

ⁿ^k

◦ d

⁻¹

).

The sequence (C

ⁿ^k

)

_k≥1

admits a weakly converging subsequence since M

¹

(E

²

) is compact for the weak convergence topology, see Theorem 6.4 in [14]. By a slight abuse of notation we still denote (C

ⁿ^k

)

_k≥1

this converging subsequence.

Let C be its limit. Due to (2) in Lemma 2.6 we necessarily have C

₁

= ν

¹

and C

2

= ν

²

. Moreover (C

ⁿ^k

◦ d

⁻¹

)

_k≥1

is a sequence of probability measures on R that weakly converges to C ◦ d

⁻¹

. Combining (8) with Lemma 2.2 we have

W

_∞

(ν

¹

, ν

²

) − ε ≥ lim sup

k→∞

W

_∞

(ν

^1,n^k

, ν

^2,n^k

)

= lim sup

k→∞

sup S (C

ⁿ^k

◦ d

⁻¹

)

≥ sup S(C ◦ d

⁻¹

)

≥ W

_∞

(ν

¹

, ν

²

)

which can not be. The announced result follows.

Proof of the LD lower bound. As usual, in order to prove the LD lower bound it is sufficient to prove that for every x ∈ [0, 1] and every ε > 0 we have

lim inf

n→∞

1 n log P(W

_∞

(L

^X_n

, L

^Y_n

) ∈]x − ε, x + ε[) ≥ −J

_∞

(x). (9) In particular we can assume that J

_∞

(x) < ∞ for otherwise (9) trivially holds.

If J

_∞

(x) < ∞ then for every m ≥ 1 there exists an y

m

such that |x − y

m

| <

1/m, I

_∞

(y

m

) < ∞ and lim

_m→∞

I

_∞

(y

m

) = J

_∞

(x). Now let us assume that for every m and every ν

¹

, ν

²

∈ M

¹

(E) such that W

_∞

(ν

¹

, ν

²

) = y

m

we have

lim inf

n→∞

1 n log P(W

_∞

(L

^X_n

, L

^Y_n

) ∈]y

m

− ε/4, y

m

+ ε/4[) ≥ −H (ν

¹

|µ

¹

) − H(ν

²

|µ

²

).

(10) It follows that for every m large enough and every ν

¹

, ν

²

∈ M

¹

(E) such that W

_∞

(ν

¹

, ν

²

) = y

m

lim inf

n→∞

1 n log P(W

_∞

(L

^X_n

, L

^Y_n

) ∈]x − ε, x + ε[) ≥

≥ lim inf

n→∞

1 n log P(W

_∞

(L

^X_n

, L

^Y_n

) ∈]y

m

− ε/4, y

_m

+ ε/4[)

≥ −H (ν

¹

|µ

¹

) − H(ν

²

|µ

²

)

(12)

hence lim inf

n→∞

1 n log P(W

∞

(L

^X_n

, L

^Y_n

) ∈]x − ε, x + ε[) ≥

≥ sup

ν¹,ν²∈M¹(E)

W∞(ν1,ν2 )=ym

−H(ν

¹

|µ

¹

) − H (ν

²

|µ

²

)

≥ − inf

ν¹,ν²∈M¹(E)

W∞(ν1,ν2 )=ym

H (ν

¹

|µ

¹

) + H(ν

²

|µ

²

)

≥ −I

_∞

(y

m

)

which leads to (9) by letting m → ∞. Hence it is sufficient to establish (10) and this is what we do now.

Let ε > 0, y

m

∈ [0, 1] be such that I

_∞

(y

m

) < ∞ and ν

¹

, ν

²

∈ M

¹

(E) be such that W

_∞

(ν

¹

, ν

²

) = y

m

. According to Lemma 2.5 to any Π = (A

1

, . . . , A

L

) ∈ P with ∆(Π) ≤ ε/32 we can associate two sequences (ν

^1,n

)

_n≥1

and (ν

^2,n

)

_n≥1

and an integer N

0

such that for every n ≥ 1 we have ν

^1,n

, ν

^2,n

∈ M

^1,n

(E) and for every n ≥ N

0

one has

P(W

∞

(L

^X_n

, L

^Y_n

) ∈]y

m

− ε/4, y

m

+ ε/4[) =

= P(|W

_∞

(L

^X_n

, L

^Y_n

) − W

_∞

(ν

¹

, ν

²

)| < ε/4)

≥ P(|W

_∞

(L

^X_n

, L

^Y_n

) − W

_∞

(ν

^1,n

, ν

^2,n

)| ≤ ε/8).

We obviously have

|W

_∞

(L

^X_n

, L

^Y_n

) − W

_∞

(ν

^1,n

, ν

^2,n

)| ≤ |W

_∞

(L

^X_n

, L

^Y_n

) − W

_∞

(π(L

^X_n

), π(L

^Y_n

))|+

+|W

∞

(π(L

^X_n

), π(L

^Y_n

)) − W

∞

(π(ν

^1,n

), π(ν

^2,n

))|+

+|W

∞

(π(ν

^1,n

), π(ν

^2,n

)) − W

∞

(ν

^1,n

, ν

^2,n

)|

so due to Lemma 2.1 we get

{π(L

^X_n

) = π(ν

^1,n

)} ∩ {π(L

^Y_n

) = π(ν

^2,n

)} ⊂

⊂ {|W

_∞

(L

^X_n

, L

^Y_n

) − W

_∞

(ν

^1,n

, ν

^2,n

)| ≤ ε/8}

hence

P(|W

_∞

(L

^X_n

, L

^Y_n

) − W

_∞

(ν

^1,n

, ν

^2,n

)| ≤ ε/8) ≥

≥ P({π(L

^X_n

) = π(ν

^1,n

)} ∩ {π(L

^Y_n

) = π(ν

^2,n

)})

= P({π(L

^X_n

) = π(ν

^1,n

)})P({π(L

^Y_n

) = π(ν

^2,n

)}) since the sequences (X

_n

)

_n≥1

and (Y

_n

)

_n≥1

are independent. It follows from elementary combinatorics (see e.g. Lemma 2.1.9 in [4]) that

P({π(L

^X_n

) = π(ν

^1,n

)}) ≥ (n + 1)

^−L

exp

^−nH(π(ν^1,n^)|π(µ¹⁾⁾

and the analogue for P({π(L

^Y_n

) = π(ν

^2,n

)}) also holds true. As a consequence lim inf

n→∞

1 n log P(W

∞

(L

^X_n

, L

^Y_n

) ∈]y

m

− ε/4, y

m

+ ε/4[) ≥

≥ − lim sup

n→∞

(H(π(ν

^1,n

)|π(µ

¹

)) + H (π(ν

^2,n

)|π(µ

²

)))

= −H (π(ν

¹

)|π(µ

¹

)) − H(π(ν

²

)|π(µ

²

)) (11)

≥ −H (ν

¹

|µ

¹

) − H (ν

²

|µ

²

) (12)

where (11) is due to (2) in Lemma 2.5 and (12) comes from the fact that

H (π(ν

¹

)|π(µ

¹

)) ≤ H(ν

¹

|µ

¹

) for every Π ∈ P, see e.g. Theorem 1.4.3 in [6].

(13)

2.3 LD upper bound

We first establish the LD upper bound under the assumption that E is finite.

Then we extend this result to the general case.

2.3.1 A particular case

Here we assume that E = {a

₁

, . . . , a

_N

}. Since for every n ≥ 1 we have W

_∞

(L

^X_n

, L

^Y_n

) ∈ Γ = {d(a

_i

, a

_j

), 1 ≤ i, j ≤ N} which is a finite set it is sufficient in order to establish the LD upper bound to show that for every δ ∈ Γ we have

lim sup

n→∞

1 n log P(W

_∞

(L

^X_n

, L

^Y_n

) = δ) ≤ − inf

ν¹,ν²∈M¹(E)

W∞(ν1,ν2 )=δ

H(ν

¹

|µ

¹

) + H (ν

²

|µ

²

) . (13) For every n ≥ 1 we get

P(W

_∞

(L

^X_n

, L

^Y_n

) = δ) (14)

= X

ν^1,n,ν^2,n∈M^1,n(E)

W∞(ν1,n ,ν2,n)=δ

P(L

^X_n

= ν

^1,n

, L

^Y_n

= ν

^2,n

)

≤ X

ν^1,n,ν^2,n∈M^1,n(E)

W∞(ν1,n ,ν2,n)=δ

exp −nH (ν

^1,n

|µ

¹

) − nH(ν

^2,n

|µ

²

)

(15)

≤ (n + 1)

^2N

exp





 −n inf

ν¹,ν²∈M¹(E)

W∞(ν1,ν2 )=δ

H(ν

¹

|µ

¹

) + H (ν

²

|µ

²

)







see e.g. Lemma 2.1.9 in [4] for the elementary combinatorial estimate (15). The announced (13) follows.

2.3.2 The general case

Let us introduce some more notations. For every Π ∈ P we consider I

_∞^Π

(x) = inf

ν¹,ν²∈M¹(E)

W∞(ν1,ν2 )=x

H (π(ν

¹

)|π(µ

¹

)) + H (π(ν

²

)|π(µ

²

))

and we shall denote by J

_∞^Π

the lower semi-continuous regularization of I

_∞^Π

. Since [0, 1] is compact the sequence (W

_∞

(L

^X_n

, L

^Y_n

))

_n≥1

is naturally exponentially tight so it is sufficient in order to establish the LD upper bound to consider events of the form {W

_∞

(L

^X_n

, L

^Y_n

) ∈ [a, b]}, see Lemma 1.2.18 in [4]. According to Lemma 2.1 for every a, b ∈ [0, 1], a ≤ b, and every Π ∈ P we have for every n ≥ 1 that

P(W

_∞

(L

^X_n

, L

^Y_n

) ∈ [a, b]) ≤ P(W

_∞

(π(L

^X_n

), π(L

^Y_n

)) ∈ [a − 2∆(Π), b + 2∆(Π)]).

Hence, due to the computation carried out assuming that E is finite we get for

every Π ∈ P

(14)

lim sup

n→∞

1 n log P(W

∞

(L

^X_n

, L

^Y_n

) ∈ [a, b]) ≤

≤ − inf

ν¹,ν²∈M¹({s1,...,s_L})

W∞(ν1,ν2 )∈[a−2∆(Π),b+2∆(Π)]

H (ν

¹

|π(µ

¹

)) + H (ν

²

|π(µ

²

))

≤ − inf

ν¹,ν²∈M¹(E)

W∞(ν1,ν2 )∈[a−2∆(Π),b+2∆(Π)]

H (π(ν

¹

)|π(µ

¹

)) + H (π(ν

²

)|π(µ

²

))

≤ − inf

x∈[a−2∆(Π),b+2∆(Π)]

I

_∞^Π

(x)

≤ − inf

x∈[a−2∆(Π),b+2∆(Π)]

J

_∞^Π

(x).

Hence, to conclude the proof of the LD upper bound we are left to prove that sup

Π∈P

inf

x∈[a−2∆(Π),b+2∆(Π)]

J

_∞^Π

(x) ≥ inf

x∈[a,b]

J

_∞

(x). (16) Assume for a while the following

Lemma 2.7 For every x ∈ [0, 1] we have sup

_Π

J

_∞^Π

(x) = J

_∞

(x).

Assume also that (16) does not hold : There exists a δ > 0 such that for every Π ∈ P

inf

x∈[a−2∆(Π),b+2∆(Π)]

J

_∞^Π

(x) < inf

x∈[a,b]

J

_∞

(x) − δ.

Since for every Π ∈ P the map x 7→ J

_∞^Π

(x) is lower semi-continuous there exists a net (x

_Π

)

_Π∈P

such that x

_Π

∈ [a − 2∆(Π), b + 2∆(Π)] and

J

_∞^Π

(x

Π

) − δ/2 < inf

x∈[a,b]

J

_∞

(x) − δ.

Due to the log sum inequality, see e.g. Theorem 2.7.1 in [2], for every K, K

⁰

∈ P such that K K

⁰

and every x ∈ [0, 1] we have I

_∞^K

(x) ≤ I

_∞^K⁰

(x) hence J

_∞^K

(x) ≤ J

_∞^K⁰

(x) so for every Π

⁰

∈ P such that Π

⁰

Π we have

J

_∞^Π⁰

(x

Π

) − δ/2 < inf

x∈[a,b]

J

_∞

(x) − δ.

Since (x

Π

)

_Π∈P

is a [0, 1]-valued net it admits a converging subnet which limit we denote x

^∗

∈ [a, b]. Due to the fact that J

_∞^Π⁰

is lower semi-continuous we get

lim inf

Π∈P

J

_∞^Π⁰

(x

_Π

) ≥ J

_∞^Π⁰

(x

^∗

) hence

J

_∞^Π⁰

(x

^∗

) < inf

x∈[a,b]

J

∞

(x) − δ/2

which, according to Lemma 2.7, implies J

∞

(x

^∗

) < inf

_x∈[a,b]

J

∞

(x) − δ/2. Since the latter can not be (16) holds and the LD upper bound follows.

Proof of Lemma 2.7 According to Theorem 1.4.3 in [6] for every x ∈ [0, 1]

and every Π ∈ P I

_∞^Π

(x) ≤ I

_∞

(x) hence J

_∞^Π

(x) ≤ J

_∞

(x) whence sup

_Π

J

_∞^Π

(x) ≤

J

_∞

(x). Conversely let x be a fixed element of [0, 1]. We shall assume that

(15)

sup

_Π

J

_∞^Π

(x) < ∞ for otherwise the claimed equality trivially holds. Hence for every Π ∈ P and every δ > 0 there exists y

_δ,Π

such that |x − y

_δ,Π

| < δ, lim

_δ→0

I

_∞^Π

(y

δ,Π

) = J

_∞^Π

(x) and

I

_∞^Π

(y

_δ,Π

) − δ/2 < inf

y∈B(x,δ)

I

_∞^Π

(y). (17)

In particular we can assume that I

_∞^Π

(y

_δ,Π

) < ∞ thus there exists ν

₁^Π,δ

, ν

₂^Π,δ

∈ M

¹

(E) such that W

_∞

(ν

₁^Π,δ

, ν

₂^Π,δ

) = y

_δ,Π

and

H (π(ν

₁^Π,δ

)|π(µ

¹

)) + H (π(ν

₂^Π,δ

)|π(µ

²

)) − δ ≤ I

_∞^Π

(y

δ,Π

) − δ/2. (18) Next we define ρ

^Π,δ₁

, ρ

^Π,δ₂

∈ M

¹

(E) by

ρ

^Π,δ₁

(F ) = X

Ai∈Π

π(ν

₁^Π,δ

)(A

i

)

µ

¹

(A

i

) µ

¹

(F ∩ A

i

) and accordingly

ρ

^Π,δ₂

(F ) = X

A_i∈Π

π(ν

₂^Π,δ

)(A

i

)

µ

²

(A

i

) µ

²

(F ∩ A

_i

)

for every borelian F ⊂ E. Clearly π(ν

₁^Π,δ

) = π(ρ

^Π,δ₁

) and π(ν

₂^Π,δ

) = π(ρ

^Π,δ₂

) hence

|W

∞

(ρ

^Π,δ₁

, ρ

^Π,δ₂

) − W

∞

(ν

₁^Π,δ

, ν

₂^Π,δ

)|

≤ |W

_∞

(ρ

^Π,δ₁

, ρ

^Π,δ₂

) − W

_∞

(π(ρ

^Π,δ₁

), π(ρ

^Π,δ₂

))| +

+|W

∞

(π(ρ

^Π,δ₁

), π(ρ

^Π,δ₂

)) − W

∞

(π(ν

₁^Π,δ

), π(ν

₂^Π,δ

))| + +|W

_∞

(π(ν

₁^Π,δ

), π(ν

₂^Π,δ

)) − W

_∞

(ν

₁^Π,δ

, ν

₂^Π,δ

)|

≤ 4∆(Π).

According to Theorem 1.4.3 in [6] and Theorem 2.7.1 in [2] we have H (ρ

^Π,δ₁

|µ

¹

) = sup

K∈P

X

Bj∈K

ρ

^Π,δ₁

(B

j

) log ρ

^Π,δ₁

(B

j

) µ

¹

(B

_j

)

= sup

K∈P

KΠ

X

Ai∈Π

X

Bj∈K

Bj⊂Ai

ρ

^Π,δ₁

(B

j

) log ρ

^Π,δ₁

(B

j

) µ

¹

(B

j

)

= sup

K∈P

KΠ

X

Ai∈Π

X

Bj∈K

Bj⊂Ai

π(ν

₁^Π,δ

)(A

i

)

µ

¹

(A

i

) µ

¹

(B

j

) log π(ν

₁^Π,δ

)(A

i

) µ

¹

(A

i

)

= sup

K∈P

KΠ

X

Ai∈Π

π(ν

₁^Π,δ

)(A

i

) log π(ν

₁^Π,δ

)(A

i

) µ

¹

(A

i

)

= H(π(ν

₁^Π,δ

)|π(µ

¹

))

and H (ρ

^Π,δ₂

|µ

²

) = H (π(ν

₂^Π,δ

)|π(µ

²

)) as well, hence

H (ρ

^Π,δ₁

|µ

¹

) + H (ρ

^Π,δ₂

|µ

²

) = H (π(ν

₁^Π,δ

)|π(µ

¹

)) + H (π(ν

₂^Π,δ

)|π(µ

²

)).

(16)

Thus it follows from the preceding, (17) and (18) that for every δ > 0 and every Π ∈ P we have

inf

u∈B(yδ,Π,4∆(Π))

I

_∞

(u) − δ ≤ inf

y∈B(x,δ)

I

_∞^Π

(y) hence

y∈B(x,δ+4∆(Π))

inf I

∞

(y) − δ ≤ inf

y∈B(x,δ)

I

_∞^Π

(y) whence

y∈B(x,δ+4∆(Π))

inf I

∞

(y) − δ ≤ sup

Π∈P

sup

δ>0

y∈B(x,δ)

inf I

_∞^Π

(y)

= sup

Π

J

_∞^Π

(x).

But for every δ > 0 we have sup

Π∈P

inf

y∈B(x,δ+4∆(Π))

I

_∞

(y) = sup

Π∈P

∆(Π)<δ

inf

y∈B(x,δ+4∆(Π))

I

_∞

(y) hence

inf

y∈B(x,6δ)

I

∞

(y) ≤ sup

Π∈P

J

_∞^Π

(x)

whence sup

_Π

J

_∞^Π

(x) ≥ J

_∞

(x).

3 On some properties of I _∞ and J _∞ .

3.1 I

∞

can fail to be lower semi-continuous

All through Section 3.1 we consider that E = [0, 1] is endowed with the usual Euclidean distance and that µ

¹

= µ

²

= µ admits f (x) =

³₂

1

_[0,1

3]

(x) +

³₂

1

_[2 3,1]

(x) as a density w.r.t. the Lebesgue measure. That in this setting I

∞

is not lower semi-continuous is an immediate consequence of the following two lemmas.

Lemma 3.1 Let ν

¹

and ν

²

be two elements of M

¹

(E) such that W

_∞

(ν

¹

, ν

²

) =

1

3

, S (ν

¹

) ⊂ S(µ) and S(ν

²

) ⊂ S(µ). Necessarily at least one of ν

¹

or ν

²

has an atom.

Lemma 3.2 The sequence (ρ

ⁿ

)

_n≥4

of elements of M

¹

(E) defined by the densi- ties

g

ⁿ

(x) = 3 2 1

_[0,1

3−¹_n]

(x) + 3

2 − 1 n

1

_[1

3−_n¹,¹₃]

(x)+

+ 3

2 + 1 n

1

_[2

3,²₃+_n¹]

(x) + 3 2 1

_[2

3+¹_n,1]

(x) is such that for every n ≥ 4 we have

1 3 < W

_∞

(ρ

ⁿ

, µ) ≤ 1 3 + 2

n and

H (ρ

ⁿ

|µ) = 3 2n

1 − 2

3n

log

1 − 2 3n

+ 3

2n

1 + 2 3n

log

1 + 2

3n

.

(17)

Indeed, let ν

¹

, ν

²

∈ M

¹

(E) be such that W

_∞

(ν

¹

, ν

²

) =

¹₃

. We have two possi- bilities :

i) S(ν

¹

) or S (ν

²

) is not included in S(µ) which implies that H(ν

¹

|µ) + H(ν

²

|µ) = ∞;

ii) both S(ν

¹

) and S(ν

²

) are included in S(µ) which, according to Lemma 3.1, implies that at least one of ν

¹

or ν

²

has an atom, hence H (ν

¹

|µ) + H(ν

²

|µ) = ∞.

Thus I

∞

(

¹₃

) = ∞. Meanwhile, it follows from Lemma 3.2 that the sequence (x

n

)

n≥4

defined by x

n

= W

∞

(ρ

ⁿ

, µ) is such that x

n

→

¹₃

and 0 ≤ I

∞

(x

n

) ≤ H (ρ

ⁿ

|µ) + H (µ|µ) → 0 as n → ∞, whence I

∞

is not lower semi-continuous.

From the preceding we deduce that J

_∞

(

¹₃

) = 0.

3.1.1 Proof of Lemma 3.1

Lemma 3.1 is an immediate consequence of the following three lemmas

Lemma 3.3 Let ν

¹

and ν

²

be two elements of M

¹

(E) such that S(ν

¹

) ⊂ S(µ) and S(ν

²

) ⊂ S(µ). If W

∞

(ν

¹

, ν

²

) ≤

¹₃

then necessarily at least one of these two statements is true

i) ν

¹

([0,

¹₃

]) = ν

²

([0,

¹₃

]);

ii) at least one of ν

¹

or ν

²

has an atom.

Lemma 3.4 Let ν

¹

and ν

²

be two elements of M

¹

(E) such that S(ν

¹

) ⊂ S(µ), S(ν

²

) ⊂ S(µ), W

_∞

(ν

¹

, ν

²

) ≤

¹₃

and ν

¹

([0,

¹₃

]) = ν

²

([0,

¹₃

]) = α with α ∈]0, 1[.

Then at least one of these two statements is true i) we have

W

_∞

(ν

¹

, ν

²

) = max

W

_∞

(γ

¹

, γ

²

), W

_∞

(θ

¹

, θ

²

) where γ

¹

, γ

²

∈ M

¹

([0,

¹₃

]) and θ

¹

, θ

²

∈ M

¹

([

²₃

, 1]) are defined by

γ

¹

(A) = ν

¹

(A)

ν

¹

([0,

¹₃

]) , θ

¹

(B) = ν

¹

(B ) ν

¹

([

²₃

, 1])

for every measurable A ⊂ [0,

¹₃

] and B ⊂ [

²₃

, 1], γ

²

and θ

²

being defined on the ground of ν

²

accordingly;

ii) at least one of ν

¹

or ν

²

has an atom.

Lemma 3.5 Let ζ

¹

and ζ

²

be two elements of M

¹

([0,

¹₃

]) (resp. M

¹

([

²₃

, 1])) such that W

∞

(ζ

¹

, ζ

²

) =

¹₃

. Then at least one of ζ

¹

or ζ

²

has an atom.

Proof of Lemma 3.3. Let ν

¹

and ν

²

be two elements of M

¹

(E) such that

S(ν

¹

) ⊂ S(µ) and S(ν

²

) ⊂ S (µ). We shall prove the following, which is equiv-

alent to the announced statement : If ν

¹

([0,

¹₃

]) 6= ν

²

([0,

¹₃

]) and neither ν

¹

nor

ν

²

have an atom then W

_∞

(ν

¹

, ν

²

) >

¹₃

. To this end assume that ν

¹

([0,

¹₃

]) >