HAL Id: hal-00641378
https://hal.archives-ouvertes.fr/hal-00641378v2
Submitted on 15 Feb 2017
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de
Large Deviations for a matching problem related to the
∞ -Wasserstein distance
José Trashorras
To cite this version:
José Trashorras. Large Deviations for a matching problem related to the ∞ -Wasserstein distance.
ALEA : Latin American Journal of Probability and Mathematical Statistics, Instituto Nacional de
Matemática Pura e Aplicada, 2018, 15, pp.247-278. �hal-00641378v2�
Large Deviations for a matching problem related to the ∞-Wasserstein distance
Jos´ e Trashorras
∗February 15, 2017
Abstract
Let (E, d) be a compact metric space, X = (X
1, . . . , X
n, . . . ) and Y = (Y
1, . . . , Y
n, . . . ) two independent sequences of independent E-valued ran- dom variables and (L
Xn)
n≥1and (L
Yn)
n≥1the associated sequences of em- pirical measures. We establish a large deviation principle for (W
∞(L
Xn, L
Yn))
n≥1where W
∞is the ∞-Wasserstein distance, W
∞(L
Xn, L
Yn) = min
σ∈Sn
max
1≤i≤n
d(X
i, Y
σ(i)) where S
nstands for the set of permutations of {1, . . . , n}.
1 Introduction
We say that a sequence of Borel probability measures (P
n)
n≥1on a topological space Y obeys a Large Deviation Principle (hereafter abbreviated LDP) with rate function I if I is a non-negative, lower semi-continuous function defined on Y such that
− inf
y∈Ao
I(y) ≤ lim inf
n→∞
1
n log P
n(A) ≤ lim sup
n→∞
1
n log P
n(A) ≤ − inf
y∈A¯
I(y) for any measurable set A ⊂ Y, whose interior is denoted by A
oand closure by A. If the level sets ¯ {y : I(y) ≤ α} are compact for every α < ∞, I is called a good rate function. With a slight abuse of language we say that a sequence of random variables obeys an LDP when the sequence of measures induced by these random variables obeys an LDP. For a background on the theory of large deviations see Dembo and Zeitouni [4] and references therein.
Let (E, d) be a metric space. There has been a lot of interest for years in considering the space M
1(E) of Borel probability measures on E endowed with the so-called p-Wasserstein distances
W
p(ν, γ) = inf
Q∈C(ν,γ)
( Z
E×E
d(x, y)
pQ(dx, dy)
1/p)
∗Universit´e Paris-Dauphine, Ceremade, Place du mar´echal de Lattre de Tassigny, 75775 Paris Cedex 16 France. Email: xose@ceremade.dauphine.fr
where p ∈ [1, ∞) and C(ν, γ) stands for the set of Borel probability measures on E
2with first marginal Q
1= ν and second marginal Q
2= γ, see Chapter 6 in [20] for a broad review. However, the ∞-Wasserstein distance equivalently defined by either
W
∞(ν, γ) = lim
p→∞
W
p(ν, γ) or
W
∞(ν, γ) = inf
Q∈C(ν,γ)
sup
u, u ∈ S(Q ◦ d
−1) (1) where S(Q◦ d
−1) ⊂ E
2stands for the support of the probability measure Q◦ d
−1has attracted much less attention so far.
1Our framework is the following : We are given a compact metric space (E, d). Without loss of generality we can assume that sup
x,y∈Ed(x, y) = 1. Let (X
n)
n≥1and (Y
n)
n≥1be two independent sequences of E-valued independent random variables defined on the same probability space (Ω, A, P). We assume that all the X
i’s (resp. Y
i’s) have the same distribution µ
1(resp. µ
2). For every n ≥ 1 we consider the empirical measures
L
Xn= 1 n
n
X
i=1
δ
Xiand L
Yn= 1 n
n
X
i=1
δ
Yi.
In [7] Ganesh and O’Connell conjecture that (W
∞(L
Xn, L
Yn))
n≥1obeys an LDP with rate function
I
∞(x) = inf
ν1,ν2∈M1(E)
W∞(ν1,ν2 )=x
H (ν
1|µ
1) + H (ν
2|µ
2)
where for any two ν, µ ∈ M
1(E) H (ν|µ) =
R
E
log
dνdµdν if ν µ
∞ otherwise.
If instead of I
∞one considers its lower semi-continuous regularization J
∞which is defined by
J
∞(x) = sup
δ>0
inf
y∈B(x,δ)
I
∞(y), see e.g. Chapter 1 in [15], then
Theorem 1.1 The sequence (W
∞(L
Xn, L
Yn))
n≥1satisfies an LDP on [0, 1] with good rate function J
∞(x).
In Section 3.1 we show that if e.g. E = [0, 1] is endowed with the usual Euclidean distance and µ
1= µ
2= µ admits f (x) =
321
[0,13]
(x) +
321
[23,1]
(x) as a density w.r.t. the Lebesgue measure then, due to the disconnectedness of the support of µ, I
∞fails to be lower semi-continuous. Hence, I
∞and J
∞do not always coincide. This example illustrates a general feature of the problem considered
1Let us recall that the supportS(P) of a Borel probability measureP on a metric space (E, d) is defined by x∈ S(P) if and only if for everyε > 0, P(B(x, ε))> 0 whereB(x, ε) stands for the open ball centered atxwith radiusε. IfE is separable, in particular ifE is compact,S(P) is closed, see e.g. Theorem 2.1 in [14].
here : the connectedness properties of the support of the reference measures are the key ingredient in the behavior of (W
∞(L
Xn, L
Yn))
n≥1. We will say more about that below, in Proposition 1.3 below.
Since for every n ≥ 1 every Q ∈ C(L
Xn, L
Yn) can be represented as a bi- stochastic matrix and since, according to the Birkhoff-Von Neumann Theorem, every bi-stochastic matrix is a convex combination of permutation matrices (see e.g. Theorem 5.5.1 in [17]) we have
W
∞(L
Xn, L
Yn) = min
σ∈Sn
1≤i≤n
max d(X
i, Y
σ(i)) (2) where S
nstands for the set of permutations of {1, . . . , n}. Hence, computing W
∞(L
Xn, L
Yn) is nothing but solving a minimax matching problem which is a fundamental combinatorial question. Equality (2) can be rephrased as
W
∞(L
Xn, L
Yn) = d
H(S(L
Xn), S(L
Yn)) (3) where S(L
Xn) = {X
1, . . . , X
n}, S(L
Yn) = {Y
1, . . . , Y
n} and d
His the so-called Hausdorff distance. Lets us recall that the Hausdorff distance is defined on the set of non-empty closed subsets of E by
d
H(A, B) = max
sup
x∈A
inf
y∈B
d(x, y), sup
y∈B
inf
x∈A
d(x, y)
see e.g. Section 6.2.2 in [3]. It is the essential tool in many applications like image processing [8], object matching [18], face detection [10] and evolutionary optimization [16] to name just a few.
Now that we have stated our main result, let us briefly review some facts about W
∞. The reference paper on the subject is [1] by Champion, De Pascale and Juutinen. They consider measures with support on compact subsets of R
d(d ≥ 1). Some of their results have been generalized from R
dto the setting of Polish spaces and general cost functions by Jylh¨ a in [9]. We recall them for latter use.
Lemma 1.1 (Proposition 2.1 in [1] and Theorem 2.6 in [9]) For any two ν, γ ∈ M
1(E) there exists at least one Q ∈ C(ν, γ) such that
W
∞(ν, γ) = sup
u, u ∈ S(Q ◦ d
−1) .
They further establish the existence of nice solutions to the problem (1) which they call infinitely cyclically monotone couplings.
Definition 1.1 A probability measure P ∈ M
1(E
2) is called infinitely cyclically monotone if and only if for every integer n ≥ 2, every (x
1, y
1), . . . , (x
n, y
n) ∈ S(P ) and every σ ∈ S
nwe have
max
1≤i≤n
d(x
i, y
i) ≤ max
1≤i≤n
d(x
i, y
σ(i)). (4)
Lemma 1.2 (Theorem 3.2 and Theorem 3.4 in [1] and Theorem 2.16 in [9]) For any two γ, ν ∈ M
1(E) there exists at least one infinitely cyclically monotone P ∈ C(γ, ν) such that
W
∞(γ, ν) = sup
u, u ∈ S (P ◦ d
−1) .
In order to check that P ∈ M
1(E
2) is infinitely cyclically monotone we only need to know its support. The way mass is spread over S(P) does not matter : this property is concerned with subsets of E
2rather than with fully specified probability measures. This is the reason why we will also call infinitely cyclically monotone subsets of E
2that satisfy (4). Infinitely cyclically monotone couplings are nice solutions to the problem (1) in the sense that
Lemma 1.3 (Theorem 3.4 in [1] and Theorem 2.17 in [9]) Any infinitely cycli- cally monotone P ∈ M
1(E
2) satisfies
W
∞(P
1, P
2) = sup
u, u ∈ S(P ◦ d
−1) .
Indeed, once an infinitely cyclically monotone A ⊂ E
2is given, no matter how one spreads mass over A provided the couple that maximizes (x, y) 7→ d(x, y) over A is covered, the marginals of the resulting probability measure over E
2will always be at the same W
∞distance. This will show useful when constructing approximations. For any two γ, ν ∈ M
1(E) we shall denote by C
∞(ν, γ) the set of infinitely cyclically monotone couplings of ν and γ.
Let us give an elementary example in order to illustrate the difference be- tween the W
p’s and W
∞. Let E = [0, 1] and consider the sequence (ν
n= (1 −
1n)δ
0+
n1δ
1)
n≥1. The ν
n’s are perturbations of δ
0in the sense that for every p ∈ [1, ∞) we have lim
n→∞W
p(ν
n, δ
0) = 0. Nevertheless for every n ≥ 1 we have W
∞(ν
n, δ
0) = 1 since S (ν
n) = {0, 1} while S (δ
0) = {0}. This shows that perturbations so small that they do not impact the W
pdistances in the limit can have dramatic consequences when using the W
∞distance instead.
Again, an illustration of how this singularity affects (W
∞(L
Xn, L
Yn))
n≥1is given in Proposition 1.3 below.
Next we briefly review the literature on results involving matching problems and the W
p’s and W
∞distances. The first results on (W
∞(L
Xn, L
Yn))
n≥1have been focused on concentration inequalities for independent [0, 1]
2-valued X
i’s and Y
i’s with common distribution µ
1= µ
2= λ where λ stands for the uniform distribution over [0, 1]
2. In [13] Leighton and Shor establish that there exists a K > 0 such that
1
K n
−1/2(log n)
3/4≤ W
∞(L
Xn, L
Yn) ≤ Kn
−1/2(log n)
3/4with probability 1 − o(1). Using majorizing measures, Talagrand showed in [19] that the latter result is a particular case of a general property of empirical discrepancies.
Concerning Large Deviations, in [7] the authors prove that if E = [0, 1]
2and µ
1= λ is the uniform distribution over E then (W
∞(L
Xn, λ))
n≥1obeys an LDP with good rate function
T
∞(x) = inf
ν∈M1(E)
W∞(ν,λ)=x
{H (ν |λ)} .
To obtain this result they prove that in this particular case the map ν 7→
W
∞(ν, λ) is continuous with respect to the weak convergence topology, and
then apply the contraction principle, see Theorem 4.2.1 in [4]. They further
show in the same framework as here, i.e. E-valued X
i’s and Y
i’s where (E, d)
is any compact metric space, that (W
1(L
Xn, L
Yn))
n≥1obeys an LDP with good rate function
I
1(x) = inf
ν1,ν2∈M1(E)
W1 (ν1,ν2 )=x
H (ν
1|µ
1) + H(ν
2|µ
2) .
Their proof relies on the fact that, according to the Kantorovitch-Rubinstein Theorem (see e.g. Theorem 11.8.2 in [5]), when E is compact W
1generates the weak convergence topology. As a consequence (L
Xn)
n≥1and (L
Yn)
n≥1satisfy an LDP on M
1(E) endowed with W
1and again the contraction principle leads to an LDP for (W
1(L
Xn, L
Yn))
n≥1. Following the same approach, one can deduce from Theorem 1.1 in [21] that for every p ∈ [1, ∞) the sequence (W
p(L
Xn, L
Yn))
n≥1obeys an LDP with good rate function I
p(x) = inf
ν1,ν2∈M1(E)
Wp(ν1,ν2 )=x
H (ν
1|µ
1) + H(ν
2|µ
2) .
One might wonder why this idea is not applicable to the analysis of the LD properties of (W
∞(L
Xn, L
Yn))
n≥1. Actually an LDP for (L
Xn)
n≥1can not hold when M
1(E) is endowed with W
∞, at least at this level of generality. Indeed, consider the probability measure µ
1on E = [0, 1] which density with respect to the Lebesgue measure is g(x) = 2(1
[0,14]
(x)+1
[34,1]
(x)) and assume that (L
Xn)
n≥1satisfies an LDP on (M
1(E), W
∞) with some rate function R. Clearly for every odd integer n we have P(W
∞(L
Xn, µ
1) < 3/8) = 0 hence for every ν ∈ B(µ
1, 3/8) we should necessarily have R(ν) = ∞. So we would get
lim sup
n→∞
1
n log P(W
∞(L
Xn, µ
1) ≤ 1
4 ) = −∞ (5)
but for every even n we have P(W
∞(L
Xn, µ
1) ≤
14) ≥
n/2n2
−nwhich contradicts (5). Again this is due to the lack of connectedness of the support of µ
1and, in a sense, Theorem 1.1 is the only possible “Sanov-like” statement involving W
∞without any additional assumption.
Now let us give some facts about the zeros of J
∞. First notice that since 0 ≤ I
∞(W
∞(µ
1, µ
2)) ≤ H(µ
1|µ
1) + H(µ
2|µ
2) = 0
we necessarily have J
∞(W
∞(µ
1, µ
2)) = 0. Actually, due to the highly discon- tinuous nature of W
∞the zeros of J
∞need not be reduced to W
∞(µ
1, µ
2). We shall first prove the following
Lemma 1.4 Let x ∈ [0, 1] be such that J
∞(x) = 0. Necessarily x ≥ W
∞(µ
1, µ
2).
The zeros of J
∞are related to some particular elements of E
2as defined next Definition 1.2 We shall say that (a, b) ∈ E
2is a couple of directly connected points and write a ↔ b if and only if a ∈ S(µ
1), b ∈ S(µ
2) and there exists Q ∈ C
∞(µ
1, µ
2) such that for every integer N ≥ 2, every (α
2, β
2), . . . , (α
N, β
N) ∈ S(Q) and every σ ∈ S
Nwe have d(a, b) ≤ max
i=1,...,Nd(α
i, β
σ(i)) where
(α
1, β
1) = (a, b).
Notice that in this definition (a, b) need not be an element of S(Q). In words, a ↔ b if and only if there exists Q ∈ C
∞(µ
1, µ
2) such that {(a, b)} ∪ S(Q) is still an infinitely cyclically monotone subset of E
2. Consider the set
Z
µ1,µ2= {x ∈ [0, 1] : ∃(a, b) ∈ E
2such that a ↔ b and x = d(a, b)}
Proposition 1.1 J
∞(x) = 0 if and only if x ∈ Z
µ1,µ2.
The notion of directly connected points takes a simpler form when µ
1= µ
2. Proposition 1.2 If µ
1= µ
2= µ then a, b ∈ E are directly connected if and only if a, b ∈ S(µ) and for every integer N ≥ 3 and every sequence α
1, . . . , α
Nof elements of S(µ) such that α
1= a and α
N= b there is at least one integer L such that 1 ≤ L ≤ N − 1, and d(a, b) ≤ d(α
L, α
L+1).
In words, if µ
1= µ
2= µ then a ↔ b if and only if one can not decompose a journey from a to b into several stages between elements of S(µ) such that each of these stages has length strictly smaller than d(a, b). As an example consider again E = [0, 1] and µ
1= µ
2= µ with density f (x) =
321
[0,13]
(x) +
321
[2 3,1](x) w.r.t. the Lebesgue measure. Then
13↔
23hence J
∞(
13) = 0 as it is established through a direct computation in Section 3.1.
Finally we investigate the almost sure behavior of (W
∞(L
Xn, L
Yn))
n≥1as n → ∞ when µ
1= µ
2= µ. The elements of Z
µ, hence the asymptotic behavior of (W
∞(L
Xn, L
Yn))
n≥1, are directly related to the connectedness properties of S(µ). So let us write S(µ) = ∪
Di=1A
iwhere the A
iare the connected components of S (µ). Here D = 1 means that S (µ) is connected and we do not exclude the possibility that D = ∞. We have
Lemma 1.5 Z
µ= {0} if and only if S(µ) is connected.
For every x ∈ S(µ) let us denote by A
i(x)the connected component of S(µ) that contains x and for any two closed B, C ⊂ E we set
d(B, C) = inf
(x,y)∈B×C
d(x, y).
Lemma 1.6 For every a, b ∈ S(µ), if a ↔ b then d(a, b) = d(A
i(a), A
i(b)).
For any two connected components A
i, A
jof S(µ) we shall write A
i↔ A
jif and only if there exists a ∈ A
iand b ∈ A
jsuch that a ↔ b. It directly follows from Lemma 1.6 that
Z
µ=
d(A
i, A
j), (i, j) ∈ {1, . . . , D}
2such that A
i↔ A
j. In order to formulate our result we need the following
Lemma 1.7 If S(µ) is not connected every x ∈ Z
µsuch that x 6= 0 is an isolated point.
Hence, if S (µ) is not connected, we can order the elements of Z
µβ
1> β
2> · · · > β
n> · · · where β
1= max Z
µ, and for every integer i ≥ 2
β
i=
max Z
µ\ {β
1, . . . , β
i−1} if β
i−16= 0
0 if β
i−1= 0.
Proposition 1.3 1. If S(µ) is connected then W
∞(L
Xn, L
Yn) → 0 almost surely.
2. If S(µ) is not connected and
(a) there exists only two connected components A
i, A
jof S (µ) such that A
i↔ A
jand d(A
i, A
j) = β
1then
lim inf W
∞(L
Xn, L
Yn) = β
2and lim sup W
∞(L
Xn, L
Yn) = β
1(6) almost surely.
(b) there are more than two connected components A
i, A
jof S(µ) such that A
i↔ A
jand d(A
i, A
j) = β
1then W
∞(L
Xn, L
Yn) → β
1almost surely.
Hence, if E = [0, 1] and µ admits f (x) =
321
[0,13]
(x) +
321
[23,1]
(x) as a density w.r.t. the Lebesgue measure then lim sup W
∞(L
Xn, L
Yn) =
13almost surely and lim inf W
∞(L
Xn, L
Yn) = 0 almost surely.
The paper is structured as follows : Section 2 is devoted to the proof of Theorem 1.1. In Section 3 we discuss the properties of I
∞and J
∞. Section 4 is concerned with the almost sure asymptotic behavior of (W
∞(L
Xn, L
Yn))
n≥1while Section 5 deals with the proof of some complementary results.
2 Proof of Theorem 1.1
The basic idea in the proof of Theorem 1.1 is to partition E in such a way that we are essentially led to consider finite set valued X
i’s and Y
i’s. Indeed, on the one hand we will see that W
∞is well-behaved with respect to partitioning (see Lemma 2.1 below) and on the other hand proceeding this way reduces the computation of probabilities to simple classical combinatorial estimates. In order to go from the particular case, i.e. E finite, to the general one we shall need some results on the weak convergence of nets of probability measures. In Section 2.1 we give an account on partitions, nets and the weak convergence topology. The proof of all the statements there is postponed to Section 5. The LD lower bound is established in Section 2.2 while the LD upper bound is derived in Section 2.3.
2.1 Some facts about partitions of E, nets and the weak convergence topology
Let P be the set of finite measurable partitions of E into non-empty sets. To every Π = (A
1, . . . , A
L) ∈ P we associate once for all through Section 2 a family (s
1, . . . , s
L) ∈ E
Lsuch that for every 1 ≤ i ≤ L we have s
i∈ A
i. We further associate to every Π = (A
1, . . . , A
L) ∈ P a map π as follows
π : M
1(E) → M
1({s
1, . . . , s
L})
ν 7→ P
Li=1
ν(A
i)δ
si.
Finally for every Π = (A
1, . . . , A
L) ∈ P we define its maximal diameter as
∆(Π) = max
1≤i≤L
sup
x,y∈Ai
d(x, y).
The following result links the W
∞distance between elements of M
1(E) to the analogue W
∞distance between their contractions through π.
Lemma 2.1 For every ν
1, ν
2∈ M
1(E) and every Π ∈ P we have
|W
∞(ν
1, ν
2) − W
∞(π(ν
1), π(ν
2))| ≤ 2∆(Π).
Let Π = (A
1, . . . , A
L) and K = (B
1, . . . , B
R) be two elements of P. We say that K is a refinement of Π if and only if for every 1 ≤ i ≤ L, there exists J
i⊂ {1, . . . , R} such that A
i= ∪
j∈JiB
j, and we denote it by Π K. This makes (P , ) a directed set. Let us recall that (J, E ) is a directed set if and only if J is a non-empty set and E is a reflexive and transitive relation on J such that for every i, j ∈ J there exists a k ∈ J such that i E k and j E k. We introduce a general directed set (J, E ) since we will use both (P, ) and (N, ≤) as directed sets and we do not want to be too specific in the results below. We call net any map (P
j)
j∈Jdefined on a directed set (J, E ). A topological space (T, T )-valued net (P
j)
j∈Jis said to converge to some P ∈ T if and only if for every neighborhood U of P there exists a j(U) ∈ J such that for every k ∈ J satisfying j(U ) E k we have P
k∈ U . We shall denote this lim
j∈JP
j= P. We call subnet of (P
j)
j∈Jany sub-family (P
l)
l∈Lparametrized by a cofinal L, i.e.
a subset L ⊂ J such that for every j ∈ J there exists l ∈ L satisfying j E l.
(While this is not the most general definition of a subnet it will be sufficient for the problem considered here). Let us recall that a topological space (T, T ) is compact if and only if every net in T admits a subnet that converges to some point in T . Finally, for every real-valued net (P
j)
j∈Jwe shall consider as usual
lim sup
j∈J
P
j= inf
j∈J
sup
i:jEi
P
iand lim inf
j∈J
P
j= sup
j∈J
inf
i:jEi
P
i.
For this and other questions related to nets we refer to e.g. [12]. The following lemmas are a consequence of a kind of Portmanteau result for nets of probability measures that will be derived in Section 5.
Lemma 2.2 Every net (P
j)
j∈Jof probability measures supported on R that converges weakly to some probability measure P satisfies
lim sup
j∈J
sup S (P
j) ≥ sup S(P ). (7)
Lemma 2.3 Let (P
j)
j∈Jbe a net of Borel probability measures on a compact metric space (Y, δ) that converges weakly to some probability measure P . For every x ∈ S (P ) and every j ∈ J there exists an x
j∈ S (P
j) such that the net (x
j)
j∈Jconverges to x.
Lemma 2.4 Every P ∈ M
1(E
2) which is the limit in the weak convergence topology of a net (P
j)
j∈Jof infinitely cyclically monotone elements of M
1(E
2) is infinitely cyclically monotone.
2.2 LD lower bound
For every integer n ≥ 1 we consider M
1,n(E)=
(
ν ∈M
1(E) : there exists (x
1, . . . , x
n) ∈ E
nsuch that ν = 1 n
n
X
i=1
δ
xi)
.
The following lemma is the key point in the proof of the LD lower bound
Lemma 2.5 For every ε > 0, every Π = (A
1, . . . , A
L) ∈ P with ∆(Π) ≤ ε/2 and every ν
1, ν
2∈ M
1(E) there exists two sequences (ν
1,n)
n≥1and (ν
2,n)
n≥1such that
1. For every n ≥ 1 we have ν
1,n, ν
2,n∈ M
1,n(E).
2. For every 1 ≤ i ≤ L we have ν
1,n(A
i) → ν
1(A
i) and ν
2,n(A
i) → ν
2(A
i).
3. There exists an N
0such that for every n ≥ N
0we have
|W
∞(ν
1, ν
2) − W
∞(ν
1,n, ν
2,n)| ≤ ε.
Proving Lemma 2.5 actually requires one more lemma
Lemma 2.6 For every Q ∈ M
1(E
2) and every Π = (A
1, . . . , A
L) ∈ P there exists a sequence (Q
n=
n1P
ni=1
δ
(xni,yin)
)
n≥1of elements of M
1(E
2) such that 1. For every 1 ≤ u, v ≤ L we have Q
n(A
u× A
v) → Q(A
u× A
v).
2. The sequence (Q
n)
n≥1converges weakly to Q.
3. If Q(A
u× A
v) = 0 then for every n ≥ 1 we have Q
n(A
u× A
v) = 0.
Proof of Lemma 2.6 Let Q be an element of M
1(E
2) and Π = (A
1, . . . , A
L) ∈ P . Let (X
i, Y
i)
i≥1be a sequence of independent and Q−identically distributed random couples. According to the strong Law of Large Numbers there exists an event B of probability 1 such that for every 1 ≤ u, v ≤ L
1 n
n
X
i=1
1
Au×Av(X
i, Y
i) → Q(A
u× A
v)
on B. Moreover, since E
2is separable (it is compact), according to Varadara- jan’s Lemma, see e.g. Theorem 11.4.1 in [5], there exists an event C of proba- bility 1 such that
1 n
n
X
i=1
δ
(Xi,Yi)→
wQ
on C where →
wstands for the weak convergence of probability measures. Hence almost every realization of
n1P
ni=1
δ
(Xi,Yi)n≥1
can play the role of (Q
n)
n≥1and the conclusions of Lemma 2.6 follow.
Proof of Lemma 2.5 Let ν
1and ν
2be two elements of M
1(E), ε > 0 and Π ∈ P with ∆(Π) ≤ ε/2. According to Lemma 1.2 there is a Q ∈ C(ν
1, ν
2) such that
W
∞(ν
1, ν
2) = sup S(Q ◦ d
−1).
Let (Q
n)
n≥1be the sequence of elements of M
1(E
2) associated to Q and Π by Lemma 2.6. We shall prove that (ν
1,n= Q
n1)
n≥1an (ν
2,n= Q
n2)
n≥1meet the conditions of Lemma 2.5. First, (1) and (2) in Lemma 2.5 are clearly satisfied with these (ν
1,n)
n≥1and (ν
2,n)
n≥1. Now, due to the definition of W
∞, for every n ≥ 1 we have
W
∞(ν
1,n, ν
2,n) ≤ sup S(Q
n◦ d
−1)
and due to (1) and (3) in Lemma 2.6 there exists N
1such that for every n ≥ N
1we have
sup S(Q
n◦ d
−1) ≤ sup S (Q ◦ d
−1) + ε
hence W
∞(ν
1,n, ν
2,n) ≤ W
∞(ν
1, ν
2) +ε. So we are left to prove that there exists an N
2such that for every n ≥ N
2we have W
∞(ν
1, ν
2) − ε ≤ W
∞(ν
1,n, ν
2,n).
Let us assume that this is not true i.e. that there exists a sequence (n
k)
k≥1such that for every k ≥ 1 we have
W
∞(ν
1, ν
2) − ε > W
∞(ν
1,nk, ν
2,nk). (8) According to Lemma 1.1 for every k ≥ 1 there exists a C
nk∈ C(ν
1,nk, ν
2,nk) such that
W
∞(ν
1,nk, ν
2,nk) = sup S(C
nk◦ d
−1).
The sequence (C
nk)
k≥1admits a weakly converging subsequence since M
1(E
2) is compact for the weak convergence topology, see Theorem 6.4 in [14]. By a slight abuse of notation we still denote (C
nk)
k≥1this converging subsequence.
Let C be its limit. Due to (2) in Lemma 2.6 we necessarily have C
1= ν
1and C
2= ν
2. Moreover (C
nk◦ d
−1)
k≥1is a sequence of probability measures on R that weakly converges to C ◦ d
−1. Combining (8) with Lemma 2.2 we have
W
∞(ν
1, ν
2) − ε ≥ lim sup
k→∞
W
∞(ν
1,nk, ν
2,nk)
= lim sup
k→∞
sup S (C
nk◦ d
−1)
≥ sup S(C ◦ d
−1)
≥ W
∞(ν
1, ν
2)
which can not be. The announced result follows.
Proof of the LD lower bound. As usual, in order to prove the LD lower bound it is sufficient to prove that for every x ∈ [0, 1] and every ε > 0 we have
lim inf
n→∞
1
n log P(W
∞(L
Xn, L
Yn) ∈]x − ε, x + ε[) ≥ −J
∞(x). (9) In particular we can assume that J
∞(x) < ∞ for otherwise (9) trivially holds.
If J
∞(x) < ∞ then for every m ≥ 1 there exists an y
msuch that |x − y
m| <
1/m, I
∞(y
m) < ∞ and lim
m→∞I
∞(y
m) = J
∞(x). Now let us assume that for every m and every ν
1, ν
2∈ M
1(E) such that W
∞(ν
1, ν
2) = y
mwe have
lim inf
n→∞
1
n log P(W
∞(L
Xn, L
Yn) ∈]y
m− ε/4, y
m+ ε/4[) ≥ −H (ν
1|µ
1) − H(ν
2|µ
2).
(10) It follows that for every m large enough and every ν
1, ν
2∈ M
1(E) such that W
∞(ν
1, ν
2) = y
mlim inf
n→∞
1
n log P(W
∞(L
Xn, L
Yn) ∈]x − ε, x + ε[) ≥
≥ lim inf
n→∞
1
n log P(W
∞(L
Xn, L
Yn) ∈]y
m− ε/4, y
m+ ε/4[)
≥ −H (ν
1|µ
1) − H(ν
2|µ
2)
hence lim inf
n→∞
1
n log P(W
∞(L
Xn, L
Yn) ∈]x − ε, x + ε[) ≥
≥ sup
ν1,ν2∈M1(E)
W∞(ν1,ν2 )=ym
−H(ν
1|µ
1) − H (ν
2|µ
2)
≥ − inf
ν1,ν2∈M1(E)
W∞(ν1,ν2 )=ym
H (ν
1|µ
1) + H(ν
2|µ
2)
≥ −I
∞(y
m)
which leads to (9) by letting m → ∞. Hence it is sufficient to establish (10) and this is what we do now.
Let ε > 0, y
m∈ [0, 1] be such that I
∞(y
m) < ∞ and ν
1, ν
2∈ M
1(E) be such that W
∞(ν
1, ν
2) = y
m. According to Lemma 2.5 to any Π = (A
1, . . . , A
L) ∈ P with ∆(Π) ≤ ε/32 we can associate two sequences (ν
1,n)
n≥1and (ν
2,n)
n≥1and an integer N
0such that for every n ≥ 1 we have ν
1,n, ν
2,n∈ M
1,n(E) and for every n ≥ N
0one has
P(W
∞(L
Xn, L
Yn) ∈]y
m− ε/4, y
m+ ε/4[) =
= P(|W
∞(L
Xn, L
Yn) − W
∞(ν
1, ν
2)| < ε/4)
≥ P(|W
∞(L
Xn, L
Yn) − W
∞(ν
1,n, ν
2,n)| ≤ ε/8).
We obviously have
|W
∞(L
Xn, L
Yn) − W
∞(ν
1,n, ν
2,n)| ≤ |W
∞(L
Xn, L
Yn) − W
∞(π(L
Xn), π(L
Yn))|+
+|W
∞(π(L
Xn), π(L
Yn)) − W
∞(π(ν
1,n), π(ν
2,n))|+
+|W
∞(π(ν
1,n), π(ν
2,n)) − W
∞(ν
1,n, ν
2,n)|
so due to Lemma 2.1 we get
{π(L
Xn) = π(ν
1,n)} ∩ {π(L
Yn) = π(ν
2,n)} ⊂
⊂ {|W
∞(L
Xn, L
Yn) − W
∞(ν
1,n, ν
2,n)| ≤ ε/8}
hence
P(|W
∞(L
Xn, L
Yn) − W
∞(ν
1,n, ν
2,n)| ≤ ε/8) ≥
≥ P({π(L
Xn) = π(ν
1,n)} ∩ {π(L
Yn) = π(ν
2,n)})
= P({π(L
Xn) = π(ν
1,n)})P({π(L
Yn) = π(ν
2,n)}) since the sequences (X
n)
n≥1and (Y
n)
n≥1are independent. It follows from elementary combinatorics (see e.g. Lemma 2.1.9 in [4]) that
P({π(L
Xn) = π(ν
1,n)}) ≥ (n + 1)
−Lexp
−nH(π(ν1,n)|π(µ1))and the analogue for P({π(L
Yn) = π(ν
2,n)}) also holds true. As a consequence lim inf
n→∞
1
n log P(W
∞(L
Xn, L
Yn) ∈]y
m− ε/4, y
m+ ε/4[) ≥
≥ − lim sup
n→∞
(H(π(ν
1,n)|π(µ
1)) + H (π(ν
2,n)|π(µ
2)))
= −H (π(ν
1)|π(µ
1)) − H(π(ν
2)|π(µ
2)) (11)
≥ −H (ν
1|µ
1) − H (ν
2|µ
2) (12)
where (11) is due to (2) in Lemma 2.5 and (12) comes from the fact that
H (π(ν
1)|π(µ
1)) ≤ H(ν
1|µ
1) for every Π ∈ P, see e.g. Theorem 1.4.3 in [6].
2.3 LD upper bound
We first establish the LD upper bound under the assumption that E is finite.
Then we extend this result to the general case.
2.3.1 A particular case
Here we assume that E = {a
1, . . . , a
N}. Since for every n ≥ 1 we have W
∞(L
Xn, L
Yn) ∈ Γ = {d(a
i, a
j), 1 ≤ i, j ≤ N} which is a finite set it is suffi- cient in order to establish the LD upper bound to show that for every δ ∈ Γ we have
lim sup
n→∞
1
n log P(W
∞(L
Xn, L
Yn) = δ) ≤ − inf
ν1,ν2∈M1(E)
W∞(ν1,ν2 )=δ
H(ν
1|µ
1) + H (ν
2|µ
2) . (13) For every n ≥ 1 we get
P(W
∞(L
Xn, L
Yn) = δ) (14)
= X
ν1,n,ν2,n∈M1,n(E)
W∞(ν1,n ,ν2,n)=δ
P(L
Xn= ν
1,n, L
Yn= ν
2,n)
≤ X
ν1,n,ν2,n∈M1,n(E)
W∞(ν1,n ,ν2,n)=δ
exp −nH (ν
1,n|µ
1) − nH(ν
2,n|µ
2)
(15)
≤ (n + 1)
2Nexp
−n inf
ν1,ν2∈M1(E)
W∞(ν1,ν2 )=δ
H(ν
1|µ
1) + H (ν
2|µ
2)
see e.g. Lemma 2.1.9 in [4] for the elementary combinatorial estimate (15). The announced (13) follows.
2.3.2 The general case
Let us introduce some more notations. For every Π ∈ P we consider I
∞Π(x) = inf
ν1,ν2∈M1(E)
W∞(ν1,ν2 )=x
H (π(ν
1)|π(µ
1)) + H (π(ν
2)|π(µ
2))
and we shall denote by J
∞Πthe lower semi-continuous regularization of I
∞Π. Since [0, 1] is compact the sequence (W
∞(L
Xn, L
Yn))
n≥1is naturally exponentially tight so it is sufficient in order to establish the LD upper bound to consider events of the form {W
∞(L
Xn, L
Yn) ∈ [a, b]}, see Lemma 1.2.18 in [4]. According to Lemma 2.1 for every a, b ∈ [0, 1], a ≤ b, and every Π ∈ P we have for every n ≥ 1 that
P(W
∞(L
Xn, L
Yn) ∈ [a, b]) ≤ P(W
∞(π(L
Xn), π(L
Yn)) ∈ [a − 2∆(Π), b + 2∆(Π)]).
Hence, due to the computation carried out assuming that E is finite we get for
every Π ∈ P
lim sup
n→∞
1
n log P(W
∞(L
Xn, L
Yn) ∈ [a, b]) ≤
≤ − inf
ν1,ν2∈M1({s1,...,sL})
W∞(ν1,ν2 )∈[a−2∆(Π),b+2∆(Π)]
H (ν
1|π(µ
1)) + H (ν
2|π(µ
2))
≤ − inf
ν1,ν2∈M1(E)
W∞(ν1,ν2 )∈[a−2∆(Π),b+2∆(Π)]
H (π(ν
1)|π(µ
1)) + H (π(ν
2)|π(µ
2))
≤ − inf
x∈[a−2∆(Π),b+2∆(Π)]
I
∞Π(x)
≤ − inf
x∈[a−2∆(Π),b+2∆(Π)]
J
∞Π(x).
Hence, to conclude the proof of the LD upper bound we are left to prove that sup
Π∈P
inf
x∈[a−2∆(Π),b+2∆(Π)]
J
∞Π(x) ≥ inf
x∈[a,b]
J
∞(x). (16) Assume for a while the following
Lemma 2.7 For every x ∈ [0, 1] we have sup
ΠJ
∞Π(x) = J
∞(x).
Assume also that (16) does not hold : There exists a δ > 0 such that for every Π ∈ P
inf
x∈[a−2∆(Π),b+2∆(Π)]
J
∞Π(x) < inf
x∈[a,b]
J
∞(x) − δ.
Since for every Π ∈ P the map x 7→ J
∞Π(x) is lower semi-continuous there exists a net (x
Π)
Π∈Psuch that x
Π∈ [a − 2∆(Π), b + 2∆(Π)] and
J
∞Π(x
Π) − δ/2 < inf
x∈[a,b]
J
∞(x) − δ.
Due to the log sum inequality, see e.g. Theorem 2.7.1 in [2], for every K, K
0∈ P such that K K
0and every x ∈ [0, 1] we have I
∞K(x) ≤ I
∞K0(x) hence J
∞K(x) ≤ J
∞K0(x) so for every Π
0∈ P such that Π
0Π we have
J
∞Π0(x
Π) − δ/2 < inf
x∈[a,b]
J
∞(x) − δ.
Since (x
Π)
Π∈Pis a [0, 1]-valued net it admits a converging subnet which limit we denote x
∗∈ [a, b]. Due to the fact that J
∞Π0is lower semi-continuous we get
lim inf
Π∈P
J
∞Π0(x
Π) ≥ J
∞Π0(x
∗) hence
J
∞Π0(x
∗) < inf
x∈[a,b]
J
∞(x) − δ/2
which, according to Lemma 2.7, implies J
∞(x
∗) < inf
x∈[a,b]J
∞(x) − δ/2. Since the latter can not be (16) holds and the LD upper bound follows.
Proof of Lemma 2.7 According to Theorem 1.4.3 in [6] for every x ∈ [0, 1]
and every Π ∈ P I
∞Π(x) ≤ I
∞(x) hence J
∞Π(x) ≤ J
∞(x) whence sup
ΠJ
∞Π(x) ≤
J
∞(x). Conversely let x be a fixed element of [0, 1]. We shall assume that
sup
ΠJ
∞Π(x) < ∞ for otherwise the claimed equality trivially holds. Hence for every Π ∈ P and every δ > 0 there exists y
δ,Πsuch that |x − y
δ,Π| < δ, lim
δ→0I
∞Π(y
δ,Π) = J
∞Π(x) and
I
∞Π(y
δ,Π) − δ/2 < inf
y∈B(x,δ)
I
∞Π(y). (17)
In particular we can assume that I
∞Π(y
δ,Π) < ∞ thus there exists ν
1Π,δ, ν
2Π,δ∈ M
1(E) such that W
∞(ν
1Π,δ, ν
2Π,δ) = y
δ,Πand
H (π(ν
1Π,δ)|π(µ
1)) + H (π(ν
2Π,δ)|π(µ
2)) − δ ≤ I
∞Π(y
δ,Π) − δ/2. (18) Next we define ρ
Π,δ1, ρ
Π,δ2∈ M
1(E) by
ρ
Π,δ1(F ) = X
Ai∈Π
π(ν
1Π,δ)(A
i)
µ
1(A
i) µ
1(F ∩ A
i) and accordingly
ρ
Π,δ2(F ) = X
Ai∈Π
π(ν
2Π,δ)(A
i)
µ
2(A
i) µ
2(F ∩ A
i)
for every borelian F ⊂ E. Clearly π(ν
1Π,δ) = π(ρ
Π,δ1) and π(ν
2Π,δ) = π(ρ
Π,δ2) hence
|W
∞(ρ
Π,δ1, ρ
Π,δ2) − W
∞(ν
1Π,δ, ν
2Π,δ)|
≤ |W
∞(ρ
Π,δ1, ρ
Π,δ2) − W
∞(π(ρ
Π,δ1), π(ρ
Π,δ2))| +
+|W
∞(π(ρ
Π,δ1), π(ρ
Π,δ2)) − W
∞(π(ν
1Π,δ), π(ν
2Π,δ))| + +|W
∞(π(ν
1Π,δ), π(ν
2Π,δ)) − W
∞(ν
1Π,δ, ν
2Π,δ)|
≤ 4∆(Π).
According to Theorem 1.4.3 in [6] and Theorem 2.7.1 in [2] we have H (ρ
Π,δ1|µ
1) = sup
K∈P
X
Bj∈K
ρ
Π,δ1(B
j) log ρ
Π,δ1(B
j) µ
1(B
j)
= sup
K∈P
KΠ
X
Ai∈Π
X
Bj∈K
Bj⊂Ai
ρ
Π,δ1(B
j) log ρ
Π,δ1(B
j) µ
1(B
j)
= sup
K∈P
KΠ
X
Ai∈Π
X
Bj∈K
Bj⊂Ai
π(ν
1Π,δ)(A
i)
µ
1(A
i) µ
1(B
j) log π(ν
1Π,δ)(A
i) µ
1(A
i)
= sup
K∈P
KΠ
X
Ai∈Π
π(ν
1Π,δ)(A
i) log π(ν
1Π,δ)(A
i) µ
1(A
i)
= H(π(ν
1Π,δ)|π(µ
1))
and H (ρ
Π,δ2|µ
2) = H (π(ν
2Π,δ)|π(µ
2)) as well, hence
H (ρ
Π,δ1|µ
1) + H (ρ
Π,δ2|µ
2) = H (π(ν
1Π,δ)|π(µ
1)) + H (π(ν
2Π,δ)|π(µ
2)).
Thus it follows from the preceding, (17) and (18) that for every δ > 0 and every Π ∈ P we have
inf
u∈B(yδ,Π,4∆(Π))
I
∞(u) − δ ≤ inf
y∈B(x,δ)
I
∞Π(y) hence
y∈B(x,δ+4∆(Π))
inf I
∞(y) − δ ≤ inf
y∈B(x,δ)
I
∞Π(y) whence
y∈B(x,δ+4∆(Π))
inf I
∞(y) − δ ≤ sup
Π∈P
sup
δ>0
y∈B(x,δ)
inf I
∞Π(y)
= sup
Π
J
∞Π(x).
But for every δ > 0 we have sup
Π∈P
inf
y∈B(x,δ+4∆(Π))
I
∞(y) = sup
Π∈P
∆(Π)<δ
inf
y∈B(x,δ+4∆(Π))
I
∞(y) hence
inf
y∈B(x,6δ)
I
∞(y) ≤ sup
Π∈P
J
∞Π(x)
whence sup
ΠJ
∞Π(x) ≥ J
∞(x).
3 On some properties of I ∞ and J ∞ .
3.1 I
∞can fail to be lower semi-continuous
All through Section 3.1 we consider that E = [0, 1] is endowed with the usual Euclidean distance and that µ
1= µ
2= µ admits f (x) =
321
[0,13]
(x) +
321
[2 3,1](x) as a density w.r.t. the Lebesgue measure. That in this setting I
∞is not lower semi-continuous is an immediate consequence of the following two lemmas.
Lemma 3.1 Let ν
1and ν
2be two elements of M
1(E) such that W
∞(ν
1, ν
2) =
1
3
, S (ν
1) ⊂ S(µ) and S(ν
2) ⊂ S(µ). Necessarily at least one of ν
1or ν
2has an atom.
Lemma 3.2 The sequence (ρ
n)
n≥4of elements of M
1(E) defined by the densi- ties
g
n(x) = 3 2 1
[0,13−1n]
(x) + 3
2 − 1 n
1
[13−n1,13]
(x)+
+ 3
2 + 1 n
1
[23,23+n1]
(x) + 3 2 1
[23+1n,1]