IVAR EKELAND AND WALTER SCHACHERMAYER
Abstract. A classical theorem due to R. Phelps states that if C is a weakly compact set in a Banach space E, the strongly exposing functionals form a dense subset of the dual space E0. In this paper, we look at the concrete
situation where C L1(Rd)is the closed convex hull of the set of random
variables Y 2 L1(Rd) having a given law . Using the theory of optimal
transport, we show that every random variable X 2 L1(Rd), the law of which
is absolutely continuous with respect to Lebesgue measure, strongly exposes the set C. Of course these random variables are dense in L1(Rd).
1. Introduction
Throughout this paper we deal with a …xed probability space ( ; F; P ). It will be assumed that ( ; F; P ) has no atoms. The space of d-dimensional random vectors will be denoted by L0 ; F; P ; Rd , and the space of p-integrable ones by
Lp ; F; P ; Rd ; shortened to L0 and Lp if there is no ambiguity. The law X of
a random vector X is the probability on Rd de…ned by:
8f 2 Cb(Rd), Z f (X (!)) dP = Z Rd f (x) d X
where Cb(Rd) is the space of continuous and bounded functions on Rd. The last term is, as usual, denoted by E X[f ]. Clearly, X 2 L
p
Rd i¤ E X[jxj
p
] < 1. Our aim is to prove the following result:
Theorem 1. Let X 2 L1 Rd be given, and let C L1(Rd) be the closed convex hull of all random variables Y such that X = Y. Take any Z 2 L1 Rd the
law of which is absolutely continuous with respect to Lebesgue measure. Then there exists a unique X 2 C where Z attains its maximum on C. The law of X is X,
and for every sequence Xn 2 C such that
hZ; Xni ! Z; X
we have kXn Xk1! 0.
This will be proved as Theorem 17 at the end of this paper. In addition, Theorem 18 will provide a converse.
2. Preliminaries
2.1. Law-invariant subsets and functions. We shall write X1 X2 to mean
that X1 and X2 have the same law. This is an equi-valence relation on the space
of random vectors. A set C L0will be called law-invariant if:
[X12 C and X1 X2] =) X22 C; Date : May 2012.
and a function ' : L0! R is law-invariant if ' (X
1) = ' (X2) whenever X1 X2.
Given 2 P Rd , we shall denote by M ( ) the equivalence class consisting of all
X with law :
M ( ) := fX j X = g
The set M ( ) is not convex in general.
Lemma 2. If has …nite p-moment, R jxjpd < 1; for 1 p 1 set M( ) is
closed in the Lp-norm.
Proof. If Xn 2 M( ) and kXm Xkp ! 0; then we can extract a subsequence
which converges almost everywhere. If f 2 Cb(Rd), applying Lebesgue’s dominated
convergence theorem, we have R f (X)dP = limn
R
f (Xn)dP: But the right-hand
side is equal toRf (x)d for every n.
We shall say that : ! is a measure-preserving transformation if it is a bijection, and 1 are measurable, and P 1(A) = P (A) = P ( (A)) for
all A 2 A. The set of all measure-preserving transformations is a group which operates on random vectors and preserves the law:
8 2 , 8X 2 L0, X X :
The converse is not true: given two variables X1 and X2 with X1 X2, there
may be no 2 such that X1 = X2. However, it comes close. By Lemma A.4
from [2], we have:
Proposition 3. Let C be a norm-closed subset of Lp ; A; P ; Rd , 1 p 1. Then C is law-invariant if and only if it is transformation-invariant. As a con-sequence:
8X 2 M ( ) ; M ( ) = fX j 2 g the closure being taken in the Lp-norm.
2.2. Choquet ordering of probability laws. Denote by P Rd the space of probability laws on Rd, and endow it with the weak* topology induced by C0(Rd),
the space of continuous functions on Rd which go to zero at in…nity. It is known
that there is a complete metric on P Rd which is compatible with this topology:
[ n! weak*] () 8f 2 C0 Rd ;
Z
fnd !
Z f d
Denote by P1 Rd the set of probability laws on Rd which have …nite …rst
moment:
(2.1) 2 P1 Rd ()
Z
Rdjxj d < 1
Note that P1 Rd is convex, but not closed in P Rd . If 2 P1 Rd , every
linear function f (x) is -integrable. The point: x :=
Z
Rd
yd (y) will be called the barycenter of the probability .
Since every convex function on Rd is bounded below by an a¢ ne function, we
…nd that E [f] is well-de…ned (possibly +1) for every real convex function. So the following de…nition makes sense:
De…nition 4. For and in P1 Rd , we shall say that 4 if, for every convex function f : Rd! R, we have: Z Rd f (x) d Z Rd f (x) d
For technical reasons, in order to avoid in…nities, we shall introduce an equivalent de…nition. Denote by C the set of convex functions f : Rd! R which are the
point-wise supremum of …nitely many a¢ ne functions, i.e. f (x) = maxi2Ifhyi; xi aig,for
some …nite family (yi; ai) 2 Rd R. Because of (2.1), if f 2 C and 2 P1 Rd ,
thenRf (x)d < 1:
Clearly, for any convex function g, there is an increasing sequence fn 2 C such
that g = supnfn.
Lemma 5. For and 2 P1 Rd , we have 4 i¤:
(2.2) 8f 2 C;
Z
f (x) d Z
f (x) d
Proof. For any g convex, we have, by the preceding lemma g = supmfm; for some
increasing sequence fm2 C: The inequality holds for each fm; and we conclude by
Lebesghe’s monotone convergence theorem. We note the following, for future use
Lemma 6. Suppose we have an equi-integrable sequence Xn in L1 Rd such that
their laws n converge weak* to . Then:
8f 2 C; Z
f (x) d n!
Z
f (x) d
Proof. If f 2 C, it must have linear growth at in…nity: there are constants m and M such that f (x) m + M jxj. Let ' 2 C0 Rd be such that ' (x) = 1 for jxj 1 and ' (x) = 0 for jxj 2; with ' (x) 0 everywhere. For any " > 0, by the equi-integrability property, we can …nd R so large that, for all n,
Z
f (x) ' xR 1 d n
Z
f (x) d n "
Since n converges weak* to , the …rst term converges to
R
f (x) ' xR 1 d .
Letting R ! 1, we …nd the desired result.
Relation (2.2) de…nes an (incomplete) order relation on the set of probability measures with …nite …rst moment. It is known in potential theory as the Choquet ordering (see [5], chapter XI.2 ). Note that if f is linear, both f and f are convex, so that, if 4 , then: Z Rd f (x) d = Z Rd f (x) d for all n In particular, if - then and have the same barycenter.
Informally speaking, 4 means that they have the same barycenter, but is more spread out than . In potential theory, this is traditionally expressed by saying that " est une balayée de ", that is, " is swept away from ". The following elementary properties illustrates this basic intuition:
(1) (certainty equivalence) If x0= E [x] (x0is the barycenter of ) and x0 is
(2) (diversi…cation) If X1 X2 have law , and Y = 12(X1+ X2) has law ,
then 4 . Indeed, if f is convex: Z Rd f (x) d = Z f (Y ) dP 1 2 Z f (X1) dP + 1 2 Z f (X2) dP = 1 2 + 1 2 Z Rd f (x) d = Z Rd f (x) d
Lemma 7. Let 2 P1(Rd) and let I[ ] be the Choquet order interval of in
P1(Rd)
I[ ] = f 2 P1(Rd) : 4 g:
Then I[ ] is a compact subset of P1(Rd) with respect to the weak-star topology
induced by C0(Rd):
Proof. As the weak* topology on P1(Rd) is metrisable it will su¢ ce to show that
every sequence ( n)1n=1in I[ ] has a cluster point.
The relation n 4 implies in particular that the …rst moment of the n are
bounded by the …rst moment of : This in turn implies that Prokhorov’s condition is satis…ed, i.e. for " > 0 there is a compact K Rd such that n(K) 1 "; for
all n 2 N:
By Prokhorov’s theorem we may …nd a subsequence, still denoted by ( n)1n=1,
converging weak* to a probability measure 2 P(Rd). To show that 2 I[ ]; let f : Rd! R be convex. By the weak* semi-continuity of the function ! hf; i on
P(Rd), we obtain
hf; i lim
n!1sup hf; ni hf; i:
The relationship with weak convergence in L1 is given by the next result. To
motivate it, consider a sequence of i.i.d. random variables Xn such that P [Xn =
1] = 1=2 = P [Xn= 1] . Then Xn ! 0 weakly, and the law of the limit is 0; but
all the Xn have the law 12 1+12 1. Clearly 0412 1+12 1.
Proposition 8. Suppose Xn is a sequence in L1 ; A; P ; Rd , converging weakly
to Y . Denote by n the law of Xn and by the law of Y . Suppose n converges
weak* to some 2 P1 Rd . Then 4 , with equality if and only if kXn Y k1! 0
Proof. First note that E[Y ]. Indeed, if f 2 C, we have, by Jensen’s inequality:
Z
f (x)d n=
Z
f (Xn)dP f (E[Xn])
By Lemma 7 and the equi-integrability of the Xn, the left hand side converges to
R
f (x)d while the right-hand side converges to f (E[y]):
Now consider a …nite -algebra G F: Denote by A the collection of atoms of G: We have: Z f (x)d n= Z E[f (Xn)jG]dP Z f (E[XnjG]) dP
and by the same method we show that: X
A2A
Now let (Gk) ; k 2 N, be a sequence of …nite sub-sigma-algebras of F such that
Y is measurable w.r.t. (SkGk). Denoting by k the law of E [Y j Gk], we have
by the above argument:
k for all k
and hence by taking the limit when k ! 1.
Turning to the …nal assertion, it follows from Lebesgue’s dominated convergence theorem that, if Xn converges to Y in the L1 norm, the law n of Xn converges to
the law of Y weak* in P1 Rd .
Conversely suppose that (Xn)1n=1 converges to Y weakly in L1(Rd) and = .
We claim that for every A 2 F; and every function f 2 C we then have
(2.3) lim
n!1E [f(Xn)1A] = E [f(Y )1A]
Indeed by Jensen’s inequality, we have: lim
n!1E [f(Xn)1A] E [f(Y )1A]
lim
n!1E[f(Xn)1 nA)] E[f(Y )1 nA)]
On the other hand, by Lemma 6, we have: lim
n!1E[f(Xn)] = limn!1hf; ni = hf; i = hf; i = E[f(Y )]:
So equality must hold in (2.3), as announced.
Now suppose that (Xn)1n=1 fails to converge to Y in the norm of L1(Rd), i.e.,
there is 1 > > 0 such that
P[jXn Y j ] ;
for in…nitely many n. By approximating Y by step functions we may …nd a set A 2 F; with P [A] > 0; and a point y02 A such that jY y0j <
2 5 on A and P[A \ jXn y0j 2] 2P[A]: We then have E[jY y0j1A] 2 5 P[A] while E[jXn y0j1A] 2 4 P[A]; a contradiction to (2.3).
The Choquet ordering can be completely characterized in terms of Markov ker-nels
De…nition 9. A Borel map : Rd ! P1 Rd is a Markov kernel if, for every
x 2 X, the barycenter of xis x:
8x 2 X; Z
Rd
yd x= x
If is a Markov kernel, and 2 P Rd , we de…ne :=R
Rd xd 2 P Rd by: Z Rd f (x) d = Z Rd x(f ) d
Theorem 10. If and are in P1 Rd we have 4 if and only if there exists
a Markov kernel x such that =
R
Proof. Suppose there exists such a Markov kernel. For any convex function f , since x is the barycenter of x, Jensen’s inequality implies that x(f ) f (x).
Integrating, we get: Z Rd f (x) d = Z Rd x(f )d Z Rd f (x) d so 4 . The converse is known as Strassen’s theorem (see [7], [5])
2.3. Optimal transport. In the sequel, and will be given in P1 Rd , and
will have bounded support. We are interested in the following problem: maximize Z
Rdhx; T (x)i d
among all Borel maps T :Rd! Rd which map on :
T \ = () Z
f (y)d = Z
f (T (x))d 8f 2 C0(R)
In the sequel, this will be referred to as the basic problem, and denoted by (BP[ ; ]). If there is an optimal solution T , it has the property that if X is any r.v. with law , then, among all r.v. Y with law , the one such that the correlation E [hX; Y i] is maximal is T (X):
There is also a relaxed problem, denoted (RP[ ; ]). It consists of maximizing: Z
Rd Rdhx; yi d
among all probability measures on Rd Rd which have and as marginals.
Obviously, we have sup(BP ) sup(RP ), and the latter is …nite because has bounded support and has …nite …rst moment.
Finally, there is a dual problem, de…ned by (DP[ ; ]), which consists of minim-izing Z Rd '(x)d + Z Rd (y)d
over all pairs of functions '(x) and (y) such that '(x) + (y) hx; yi.
The following theorem summarizes results due to Kantorovitch [3], Kellerer [4] Rachev and Ruschendorf [6], and Brenier [1]. It was originally formulated for the case when and have …nite second moment, and this is also what is found in [8]. Indeed, in this case, since T \ = ,we have:
Z kx T (x) k2d = Z kxk2d + Z kT (x) k2d 2 Z hx; T (x)i d = Z kxk2d + Z kyk2d 2 Z hx; T (x)i d
Since the two …rst terms on the right-hand side do not depend on T , the problem of maximisingR hx; T (x)i d (bilinear cost) is equivalent to the problem of minimizing R
kx T (x) k2d (quadratic cost), for which general techniques are available. In
the case at hand, we will not assume that has …nite second moment, so this approach is not available: the square distance is not de…ned, while the correlation maximisation still makes sense.
Theorem 11. Suppose has compact support and is absolutely continuous w.r.t. Lebesgue measure. Suppose also has …nite …rst moment. Then the basic problem (BP[ ; ]) has a solution T , which is unique up to negligible subsets, and there is a convex function ' : Rd! R such that T (x) = r'(x) a.e..
The relaxed problem (RP[ ; ]) has =R T (x)d (x) as a unique solution.
Denoting by the Fenchel transform of ', all solutions to the dual problem (DP[ ; ]) are of the form (' + a; a) for some constant a, up to -, resp -., a.s. equivalence. The values of the minimum in problem (DP) and of the maximum in problems (BP) and (RP) are equal:
(2.4) max(BP [ ; ]) = max(RP [ ; ]) = min(DP [ ; ])
Let us denote by mc[ ; ] this common value. We shall call it the maximal correlation between and . It follows from the theorem that for any T0; 0; '0; 0 satisfying the admissibility conditions, we have:
Z Rdhx; T 0(x)i d mc[ ; ] Z Rd Rdhx; yi d 0 mc[ ; ] Z Rd '0(x)d + Z 0 Rd(y)d mc[ ; ]
As an interesting consequence, we have:
Proposition 12. Let ; 1; 2 be probability measures on Rd such that is
abso-lutely continuous w.r.t the Lebesgue measure and has bounded support, while 1and 2 have …nite …rst moment. Suppose 1 4 2 and 1 6= 2. Then mc[ ; 1] <
mc[ ; 2].
Proof. By Theorem 10, there is a Markov kernel such that:
(2.5) 2=
Z
Rd x
d 1
Let T1be the optimal solution of (BP[ ; 1]). Consider the probability measure
on Rd Rd de…ned by: (2.6) Z f (x; y) d (x; y) = Z d (x) Z f (x; y) d T1(x)(y)
Since T1(x)is a probability measure, the …rst marginal of is . Let us compute
the second marginal. We have, for any f 2 C0(Rd),
Z Rd Rd f (y)d (x; y) = Z Rd T1(x) (f )d (x) = Z Rd x (f )d 1(x) = 2(f )
where the second equality comes from the fact that T1maps on 1and the second
problem (RP[ ; 2]). A similar computation gives: Z Rd Rdhx; yi d (x; y) = Z Rd x; Z Rd d T1(x)(y) d (x) = Z Rdhx; T1(x)i d (x) = mc[ ; 1 ]
Since has marginals and 2, it is admissible in the relaxed problem(RP[ ; 2]),
so that the left-hand side is at most mc[ ; 2] while the right-hand side is equal to
mc[ ; 1]. It follows that mc[ ; 1] mc[ ; 2]. If there is equality, then is an
optimal solution to (RP[ ; 2]). By the uniqueness part of Theorem 11, we must
have =R T1(x)d (x). Comparing with equation (2.6), we …nd y = y, holding
true 1-almost surely. Writing this in equation (2.5) we get 1= 2.
2.4. Strongly exposed points. Let E be a Banach space, and C E a closed subset. For v 2 E0, consider the optimization problem:
(2.7) sup
u2Chv; ui
De…nition 13. We say that v 2 E0exposes u 2 C if u solves problem (2.7) and is the unique solution. We shall say that v 2 E0 strongly exposes u 2 C if it exposes
u and all maximizing sequences in problem (2.7) converge to u: n
un2 C; lim
n hv; uni = hv; ui
o
=) limn ku unk = 0
We shall say that u 2 C is an exposed point (resp. strongly exposed) if it is exposed (resp. strongly exposed) by some continuous linear functional v. It is a classical result of Phelps that every weakly compact convex subset C of E is the closed convex hull of its strongly exposed points.
3. Some geometric properties of law-invariant subsets of L1 Rd
Recall that, given 2 P1 Rd , we have de…ned subsets M ( ) and C( ) of
L1 Rd by:
M ( ) = X 2 L1j X=
C ( ) = X 2 L1j X4
M ( ) is closed in L1but not convex. To investigate the relation between M ( )
and C ( ), we shall need the following result:
Proposition 14. Let Y 2 L1(Rd) with law (Y ) = , and 2 P1(Rd) such that
< : Then there is a sequence (Xn)1n=1 in M ( ) such that (Xn)1n=1 converges
weakly to Y in L1(Rd): As a consequence, there is a sequence (Y
n)1n=1 2 conv
(M ( )) converging strongly to Y in L1(Rd):
We start by recalling a well-known result from ergodic theory.
Lemma 15. Let = f 1; 1gZequipped with the Borel sigma-algebra F and
Haar-measure P , and Tn the n-shift, that is:
8k 2 Z, [Tn( )]k = k n
For any Z 2 L1 ; F; P ; Rd , the sequence of functions Z Tn converges weakly to
Proof. Suppose that Z depends only on …nitely many coordinates and let A 2 F also depend only on …nitely many coordinates of f 1; 1gZ: Then, for n large enough,
Zn:= Z Tn is independent of A so that
E[ZnjA] = E[Zn] = E[Z]:
The general case follows from approximation.
Proof. (of Proposition 14): Assume (w.l.o.g.) that L1 ; F; P ; Rd is separable.
Recall that, ( ; F; P) has no atoms. Suppose …rst that Y takes only …nitely many values, i.e. Y = N X j=1 yj1Aj
where (yj)Nj=1 2 Rd and (A1; : : : ; AN) forms a partition of into sets in F with
strictly positive measure.
By Theorem 10 we may …nd a Markov kernel = ( yj)
N
j=1such that the
bary-center of yj is yj and: (3.1) = N X j=1 P[Aj] yj
Each of the sets Aj, equipped with normalized measure P [Aj] 1P jAj is Borel
isomorphic to f 1; 1gZ, equipped with Haar measure. Hence, by the preceding lemma, for each j = 1; : : : ; N we may …nd a random variable Zj : Aj ! Rd under
P [Aj] 1P jAj such that law (Zj) = yj, as well as a sequence (Tj;n)1n=1of
measure-preserving transformations of Aj such that, in the weak topology of L1 Rd :
lim n!1(Zj Tjn)1Aj = yj1Aj; j = 1; : : : ; N: Letting Xn = N X j=1 (Zj Tjn)1Aj
we obtain by (3.1) a sequence in L1(Rd) with law (Xn) = and converging weakly
to Y =PNj=1yj1Aj:
Now drop the assumption that Y is a simple function and …x a sequence (Gm)1m=1
of …nite sub-sigma-algebras of F, generating F: Note that if Ym= E[Y jGm] and m
is the law of Ym, we have m ; by Jensen’s inequality.
By the …rst part we may …nd, for each m 1; a sequence (Xm;n)1n=1 in M ( )
such that (Xm;n)1n=1converges weakly to Ym. Noting that (Ym)1m=1converges to Y
(in the norm of L1(Rd) and therefore also weakly) we may …nd a sequence (n m)1m=1
tending su¢ ciently fast to in…nity, such that (Xm;nm)1m=1 converges weakly to Y .
The …nal assertion follows from the Hahn-Banach theorem. The relationship between C ( ) and M ( ) now follows:
Theorem 16. The set C ( ) is convex, weakly compact, and equals the weak closure of M ( ):
Proof. Obviously M ( )w C ( ). Conversely, take any X 2 C( ). By Proposition 14, there is a sequence Xn in M ( ) such that Xn ! X weakly, so X 2 M( )
w
. This shows that C ( ) = M ( )w.
By Proposition 8, C( ) is convex. It remains to show that it is weakly compact. Since C ( ) is the weak closure of M ( ), it is enough to show that M ( ) is weakly relatively compact. To do that, we shall use the Dunford-Pettis criterion. We claim that M ( ) is equi-integrable. Indeed, …x some X 2 M( ). For any other Y 2 M ( ), and any m > 0, we have:
Z jY j mjY jdP = Z jxj mjxj d (x) = Z jXj mjXjdP
which goes to 0 when m ! 1, independently of Y . The result follows.
We now investigate strongly exposing functionals and strongly exposed points of C( ). We will show that any Z 2 L1, the law of which is a.c. w.r.t. Lebesgue measure, strongly exposes a point of C( ) (which must then belong to M ( )) and conversely, provided is absolutely continuous w.r.t. Lebesgue measure, that any point of M ( ) is strongly exposed by such a Z.
Theorem 17. Let 2 P1 Rd , Z 2 L1 and suppose the law of Z is absolutely
continuous with respect to Lebesgue measure. Then Z strongly exposes C( ), and the exposed point in fact belongs to M ( ):
Proof. Let be the law of Z and consider the maximal correlation problem (BP[ ; ]). By Theorem 11, it has a unique solution T . Set X = T (Z). Clearly X has law , and by uniqueness:
(3.2) [X02 M( ); X06= X] =) hZ; Xi > hZ; X0i
So X is an exposed point in M ( ). Take any Y 2 C( ), so that Y 4 . By
Proposition 12, we have hZ; Xi hZ; Y i, and if hZ; Xi = hZ; Y i, then Y = X=
. So Y must belong to M ( ), and by formula (3.2), we must have Y = X. So X is an exposed point in C( ) as well.
It remains to prove that it is strongly exposed. For this, take a maximizing sequence Xn in C( ). Since C ( ) is weakly compact and n ; where n is the
law of Xn, there is a subsequence Xnkwhich converges weakly to some X02 C( ).
By Lemma 7, the set of all - is weak* compact, so we may assume that the laws nk converge weak* to some : Obviously X0 maximizes hZ; X0i, and since
Z exposes X, we must have X0 = X. So the X
nkconverge weakly to X, and, by
Proposition 8, X = 4 .
On the other hand, take any convex function f with linear growth. Since nk 4
we have: Z
f (x)d nk
Z f (x)d Letting k ! 1, we get from Lemma 7Z
f (x)d = lim k Z f (x)d nk Z f (x)d
So = , and Proposition 8 then implies that kXnk Xk1! 0. Since the limit
does not depend on the subsequence, the whole sequence Xn converges, and X is
strongly exposed, as announced. Here is a kind of converse:
Theorem 18. Fix two measures and on Rd, the …rst one having …nite …rst
moment and the second one compact support. Suppose both of them are absolutely continuous with respect to Lebesgue measure. Then, for every X with law , there is a unique Z with law which strongly exposes X in C ( ).
Proof. Consider the maximal correlation problem (BP[ ; ]). It has a unique solu-tion T : Rd! Rdverifying T ] = . Since both and are absolutely continuous
with respect to Lebesgue measure, the problem (BP[ ; ]) also has a unique solution S : Rd! Rd verifying S] = . Clearly S = T 1and T = S 1. De…ne Z = T (X).
It is then the case that the law of Z is and S (Z) = S T (X) = X: Repeating the preceding proof we …nd that Z strongly exposes X in C ( ).
Note that the condition that be absolutely continuous with respect to the Lebesgue measure cannot be dropped from the preceding theorem. This may be seen by a variant of a well-known example in optimal transport theory ([9], Example 4.9). On R2 consider the measure which is uniformly distributed on the interval f0g [0; 1] while is uniformly distributed on the rectangle [ 1; 1] [0; 1]. Then is absolutely continuous w.r.t. Lebesgue measure, while is not. Clearly the optimal transport T from to for the maximal correlation problem is given by the projection on the vertical axis. This map is not invertible.
Let ( ; F; P ) be given by = [0; 1] equipped with the Lebesgue measure P on the Borel -algebra. De…ne a random vector X 2 L1 ; F; P ; R2 by X (!) =
(0; !) ; so that the law of X is . Let us now calculate the maximal correlation between and . Let Z02 L1have law and de…ne X0= T (Z0) so that X0has
law . By the proof of theorem 17 we get: mc( ; ) = Z hX0; Z0i dP = Z R2hx; T (x)i d = 1 2 Z 1 1 Z 1 0 x22dx2 dx1= Z 1 0 x22dx2= 1 3: On the other hand, we claim that:
(3.3)
Z
hX; Z0i dP <
1 3:
Since this holds for any Z0 with law , it shows that X does not expose any
point in C ( ). This is the desired counterexample.To prove (3.3), write Z0(!) =
(Z0;1(!) ; Z0;2(!)) and note that P [Z0;2 6= X2] > 0. Indeed, assume otherwise, so
that Z0;2(!) = X2(!) = ! almost surely. Then Z0;1(!) is fully determined by
Z0;2(!), meaning that, in the image of by Z, the coordinate z1 is determined by
the coordinate z2. This clearly contradicts the fact that the law of Z is . Since
the law of Z0;2 is the Lebesgue measure on [0; 1], but Z0;2 does not coincide with
X2(!) = !, we have, from the uniqueness of the Brenier map:
Z hX; Z0i dP Z X2Z0;2dP < Z X22dP = 1 3
Let us summarize our …ndings: There are measures and on R2 with compact
support, being absolutely continuous with respect to Lebesgue measure, and some X 2 L1 R2 with law such that there is no Z 2 L1 R2 with law which
References
[1] Brenier, Yann, "Polar factorization and monotone rearrangements of vector-valued functions", Comm. Pure and Applied Math. 64, 375-417 (1991)
[2] Elyes, Jouini, Walter Schachermayer and Nizar Touzi, "Law-invariant risk measures have the Fatou property", Adv. Math. Econ. 9, 49-71
[3] Kantorovitch. L.V. "On the transfer of masses", Doklady Akad. Nauk. CCCR (1942), 113-161 [4] Kellerer, H.G. "Duality theorems for marginal problems" Z. Wahrsch. Verw. Gebiete 67 (1984),
399-432
[5] Meyer, Paul-André, "Probabilités et potentiel", Hermann (1966)
[6] Rachev, S.T. and Rüschendorf, L. , "A characterization of random variables with minimal L2
distance", J. Multivariate An. 32, 48-54 (1990)
[7] Strassen, V. "The existence of probability measures with given marginals" Ann. Math. Stat 36 (1965) p.423-439
[8] Villani, Cedric, "Topics in optimal transportation", Publications of the AMS, 2003 [9] Villani, Cedric "Optimal transport, old and new", Springer Verlag