Optimal transport and the geometry of L1(Rd)

(1)

IVAR EKELAND AND WALTER SCHACHERMAYER

Abstract. A classical theorem due to R. Phelps states that if C is a weakly compact set in a Banach space E, the strongly exposing functionals form a dense subset of the dual space E0_{. In this paper, we look at the concrete}

situation where C L1_(Rd₎_{is the closed convex hull of the set of random}

variables Y 2 L1_(Rd₎ _{having a given law} _. _{Using the theory of optimal}

transport, we show that every random variable X 2 L1_(Rd_{), the law of which}

is absolutely continuous with respect to Lebesgue measure, strongly exposes the set C. Of course these random variables are dense in L1_(Rd_).

1. Introduction

Throughout this paper we deal with a …xed probability space ( ; F; P ). It will be assumed that ( ; F; P ) has no atoms. The space of d-dimensional random vectors will be denoted by L0 _{; F; P ; R}d _{, and the space of p-integrable ones by}

Lp _{; F; P ; R}d ; shortened to L0 and Lp if there is no ambiguity. The law X of

a random vector X is the probability on Rd _{de…ned by:}

8f 2 Cb(Rd), Z f (X (!)) dP = Z Rd f (x) d X

where Cb_(Rd_{) is the space of continuous and bounded functions on R}d. The last term is, as usual, denoted by E X[f ]. Clearly, X 2 L

p

Rd i¤ E X[jxj

p

] < 1. Our aim is to prove the following result:

Theorem 1. _{Let X 2 L}1 Rd be given, and let C L1_(Rd) be the closed convex hull of all random variables Y such that X = Y. Take any Z 2 L1 Rd the

law of which is absolutely continuous with respect to Lebesgue measure. Then there exists a unique X _{2 C where Z attains its maximum on C. The law of X is} X,

and for every sequence Xn 2 C such that

hZ; Xni ! Z; X

we have kXn Xk1! 0.

This will be proved as Theorem 17 at the end of this paper. In addition, Theorem 18 will provide a converse.

2. Preliminaries

2.1. Law-invariant subsets and functions. We shall write X1 X2 to mean

that X1 and X2 have the same law. This is an equi-valence relation on the space

of random vectors. A set C L0_{will be called law-invariant if:}

[X12 C and X1 X2] =) X22 C; Date : May 2012.

(2)

and a function ' : L0_{! R is law-invariant if ' (X}

1) = ' (X2) whenever X1 X2.

Given _{2 P R}d _{, we shall denote by M ( ) the equivalence class consisting of all}

X with law :

M ( ) := fX j X = g

The set M ( ) is not convex in general.

Lemma 2. If has …nite p-moment, R _jxjp_d _{< 1; for 1} _p _{1 set M( ) is}

closed in the Lp_-norm.

Proof. If Xn 2 M( ) and kXm Xkp ! 0; then we can extract a subsequence

which converges almost everywhere. If f 2 Cb_(Rd_{), applying Lebesgue’s dominated}

convergence theorem, we have R f (X)dP = limn

R

f (Xn)dP: But the right-hand

side is equal toRf (x)d for every n.

We shall say that : _! is a measure-preserving transformation if it is a bijection, and 1 _{are measurable, and P} 1_{(A) = P (A) = P ( (A)) for}

all A 2 A. The set of all measure-preserving transformations is a group which operates on random vectors and preserves the law:

8 2 , 8X 2 L0, X X :

The converse is not true: given two variables X1 and X2 with X1 X2, there

may be no ₂ such that X1 = X2. However, it comes close. By Lemma A.4

from [2], we have:

Proposition 3. Let C be a norm-closed subset of Lp _{; A; P ; R}d , 1 p _1. Then C is law-invariant if and only if it is transformation-invariant. As a con-sequence:

8X 2 M ( ) ; M ( ) = fX j 2 g the closure being taken in the Lp_-norm.

2.2. Choquet ordering of probability laws. Denote by P Rd the space of probability laws on Rd_{, and endow it with the weak* topology induced by C}0_(Rd_),

the space of continuous functions on Rd _{which go to zero at in…nity. It is known}

that there is a complete metric on P Rd _{which is compatible with this topology:}

[ n! weak*] () 8f 2 C0 Rd ;

Z

fnd !

Z f d

Denote by P1 Rd the set of probability laws on Rd which have …nite …rst

moment:

(2.1) _{2 P}1 Rd ()

Z

Rdjxj d < 1

Note that P1 Rd is convex, but not closed in P Rd . If 2 P1 Rd , every

linear function f (x) is -integrable. The point: x :=

Z

Rd

yd (y) will be called the barycenter of the probability .

Since every convex function on Rd _{is bounded below by an a¢ ne function, we}

…nd that E [f] is well-de…ned (possibly +1) for every real convex function. So the following de…nition makes sense:

(3)

De…nition 4. For and _{in P}1 Rd , we shall say that 4 if, for every convex function f : Rd_{! R, we have:} Z Rd f (x) d Z Rd f (x) d

For technical reasons, in order to avoid in…nities, we shall introduce an equivalent de…nition. Denote by C the set of convex functions f : Rd_{! R which are the}

point-wise supremum of …nitely many a¢ ne functions, i.e. f (x) = maxi2Ifhyi; xi aig,for

some …nite family (yi; ai) 2 Rd R. Because of (2.1), if f 2 C and 2 P1 Rd ,

thenR_{f (x)d < 1:}

Clearly, for any convex function g, there is an increasing sequence fn 2 C such

that g = sup_nfn.

Lemma 5. For and _{2 P}1 Rd , we have 4 i¤:

(2.2) _{8f 2 C;}

Z

f (x) d Z

f (x) d

Proof. For any g convex, we have, by the preceding lemma g = sup_mfm; for some

increasing sequence fm2 C: The inequality holds for each fm; and we conclude by

Lebesghe’s monotone convergence theorem. We note the following, for future use

Lemma 6. Suppose we have an equi-integrable sequence Xn in L1 Rd such that

their laws n converge weak* to . Then:

8f 2 C; Z

f (x) d n!

Z

f (x) d

Proof. If f 2 C, it must have linear growth at in…nity: there are constants m and M such that f (x) _{m + M jxj. Let ' 2 C}0 Rd be such that ' (x) = 1 for jxj 1 and ' (x) = 0 for jxj 2; with ' (x) 0 everywhere. For any " > 0, by the equi-integrability property, we can …nd R so large that, for all n,

Z

f (x) ' xR 1 d n

Z

f (x) d n "

Since n converges weak* to , the …rst term converges to

R

f (x) ' xR 1 _{d .}

Letting R ! 1, we …nd the desired result.

Relation (2.2) de…nes an (incomplete) order relation on the set of probability measures with …nite …rst moment. It is known in potential theory as the Choquet ordering (see [5], chapter XI.2 ). Note that if f is linear, both f and f are convex, so that, if 4 , then: Z Rd f (x) d = Z Rd f (x) d for all n In particular, if - then and have the same barycenter.

Informally speaking, 4 means that they have the same barycenter, but is more spread out than . In potential theory, this is traditionally expressed by saying that " est une balayée de ", that is, " is swept away from ". The following elementary properties illustrates this basic intuition:

(1) (certainty equivalence) If x0= E [x] (x0is the barycenter of ) and x0 is

(4)

(2) (diversi…cation) If X1 X2 have law , and Y = 1₂(X1+ X2) has law ,

then 4 . Indeed, if f is convex: Z Rd f (x) d = Z f (Y ) dP 1 2 Z f (X1) dP + 1 2 Z f (X2) dP = 1 2 + 1 2 Z Rd f (x) d = Z Rd f (x) d

Lemma 7. Let _{2 P}1(Rd) and let I[ ] be the Choquet order interval of in

P1(Rd)

I[ ] = f 2 P1(Rd) : 4 g:

Then I[ ] is a compact subset of P1(Rd) with respect to the weak-star topology

induced by C0_(Rd_):

Proof. As the weak* topology on P1(Rd) is metrisable it will su¢ ce to show that

every sequence ( n)1n=1in I[ ] has a cluster point.

The relation n 4 implies in particular that the …rst moment of the n are

bounded by the …rst moment of : This in turn implies that Prokhorov’s condition is satis…ed, i.e. for " > 0 there is a compact K Rd such that n(K) 1 "; for

all n 2 N:

By Prokhorov’s theorem we may …nd a subsequence, still denoted by ( n)1n=1,

converging weak* to a probability measure _{2 P(R}d). To show that _{2 I[ ]; let} f : Rd_{! R be convex. By the weak* semi-continuity of the function} _{! hf; i on}

P(Rd_{), we obtain}

hf; i lim

n!1sup hf; ni hf; i:

The relationship with weak convergence in L1 _{is given by the next result. To}

motivate it, consider a sequence of i.i.d. random variables Xn such that P [Xn =

1] = 1=2 = P [Xn= 1] . Then Xn ! 0 weakly, and the law of the limit is 0; but

all the Xn have the law 1₂ 1+1_{2 1}. Clearly 041₂ 1+1_{2 1}.

Proposition 8. Suppose Xn is a sequence in L1 ; A; P ; Rd , converging weakly

to Y . Denote by n the law of Xn and by the law of Y . Suppose n converges

weak* to some _{2 P}1 Rd . Then 4 , with equality if and only if kXn Y k1! 0

Proof. First note that E[Y ]. Indeed, if f 2 C, we have, by Jensen’s inequality:

Z

f (x)d n=

Z

f (Xn)dP f (E[Xn])

By Lemma 7 and the equi-integrability of the Xn, the left hand side converges to

R

f (x)d while the right-hand side converges to f (E[y]):

Now consider a …nite _{-algebra G} _{F: Denote by A the collection of atoms of} G: We have: Z f (x)d n= Z E[f (Xn)jG]dP Z f (E[XnjG]) dP

and by the same method we show that: X

A2A

(5)

Now let (Gk) ; k 2 N, be a sequence of …nite sub-sigma-algebras of F such that

Y is measurable w.r.t. (S_k_Gk). Denoting by k the law of E [Y j Gk], we have

by the above argument:

k for all k

and hence _{by taking the limit when k ! 1.}

Turning to the …nal assertion, it follows from Lebesgue’s dominated convergence theorem that, if Xn converges to Y in the L1 norm, the law n of Xn converges to

the law _{of Y weak* in P}1 Rd .

Conversely suppose that (Xn)1n=1 converges to Y weakly in L1(Rd) and = .

We claim that for every A 2 F; and every function f 2 C we then have

(2.3) lim

n!1E [f(Xn)1A] = E [f(Y )1A]

Indeed by Jensen’s inequality, we have: lim

n!1E [f(Xn)1A] E [f(Y )1A]

lim

n!1E[f(Xn)1 nA)] E[f(Y )1 nA)]

On the other hand, by Lemma 6, we have: lim

n!1E[f(Xn)] = limn!1hf; ni = hf; i = hf; i = E[f(Y )]:

So equality must hold in (2.3), as announced.

Now suppose that (Xn)1n=1 fails to converge to Y in the norm of L1(Rd), i.e.,

there is 1 > > 0 such that

P[jXn Y j ] ;

for in…nitely many n. By approximating Y by step functions we may …nd a set A 2 F; with P [A] > 0; and a point y02 A such that jY y0j <

2 5 on A and P[A \ jXn y0j ₂] ₂P[A]: We then have E[jY y0j1A] 2 5 P[A] while E[jXn y0j1A] 2 4 P[A]; a contradiction to (2.3).

The Choquet ordering can be completely characterized in terms of Markov ker-nels

De…nition 9. A Borel map _{: R}d _{! P}1 Rd is a Markov kernel if, for every

x 2 X, the barycenter of xis x:

8x 2 X; Z

Rd

yd x= x

If is a Markov kernel, and _{2 P R}d _{, we de…ne} _:=R

Rd xd 2 P Rd by: Z Rd f (x) d = Z Rd x(f ) d

Theorem 10. If and _{are in P}1 Rd we have 4 if and only if there exists

a Markov kernel x such that =

R

(6)

Proof. Suppose there exists such a Markov kernel. For any convex function f , since x is the barycenter of x, Jensen’s inequality implies that x(f ) f (x).

Integrating, we get: Z Rd f (x) d = Z Rd x(f )d Z Rd f (x) d so 4 . The converse is known as Strassen’s theorem (see [7], [5])

2.3. Optimal transport. In the sequel, and _{will be given in P}1 Rd , and

will have bounded support. We are interested in the following problem: maximize Z

Rdhx; T (x)i d

among all Borel maps T :Rd_{! R}d _{which map} _{on :}

T \ = ₍₎ Z

f (y)d = Z

f (T (x))d 8f 2 C0(R)

In the sequel, this will be referred to as the basic problem, and denoted by (BP[ ; ]). If there is an optimal solution T , it has the property that if X is any r.v. with law , then, among all r.v. Y with law , the one such that the correlation E [hX; Y i] is maximal is T (X):

There is also a relaxed problem, denoted (RP[ ; ]). It consists of maximizing: Z

Rd _Rdhx; yi d

among all probability measures _{on R}d _Rd _{which have} _and _{as marginals.}

Obviously, we have sup(BP ) sup(RP ), and the latter is …nite because has bounded support and has …nite …rst moment.

Finally, there is a dual problem, de…ned by (DP[ ; ]), which consists of minim-izing Z Rd '(x)d + Z Rd (y)d

over all pairs of functions '(x) and (y) such that '(x) + (y) _{hx; yi.}

The following theorem summarizes results due to Kantorovitch [3], Kellerer [4] Rachev and Ruschendorf [6], and Brenier [1]. It was originally formulated for the case when and have …nite second moment, and this is also what is found in [8]. Indeed, in this case, since T \ = ,we have:

Z kx T (x) k2d = Z kxk2d + Z kT (x) k2d 2 Z hx; T (x)i d = Z kxk2d + Z kyk2d 2 Z hx; T (x)i d

Since the two …rst terms on the right-hand side do not depend on T , the problem of maximisingR _{hx; T (x)i d (bilinear cost) is equivalent to the problem of minimizing} R

kx T (x) k2_d _{(quadratic cost), for which general techniques are available. In}

the case at hand, we will not assume that has …nite second moment, so this approach is not available: the square distance is not de…ned, while the correlation maximisation still makes sense.

(7)

Theorem 11. Suppose has compact support and is absolutely continuous w.r.t. Lebesgue measure. Suppose also has …nite …rst moment. Then the basic problem (BP[ ; ]) has a solution T , which is unique up to negligible subsets, and there is a convex function ' : Rd_{! R such that T (x) = r'(x) a.e..}

The relaxed problem (RP[ ; ]) has =R T (x)d (x) as a unique solution.

Denoting by the Fenchel transform of ', all solutions to the dual problem (DP[ ; ]) are of the form (' + a; a) for some constant a, up to -, resp -., a.s. equivalence. The values of the minimum in problem (DP) and of the maximum in problems (BP) and (RP) are equal:

(2.4) max(BP [ ; ]) = max(RP [ ; ]) = min(DP [ ; ])

Let us denote by mc[ ; ] this common value. We shall call it the maximal correlation between and . It follows from the theorem that for any T0; 0; '0; 0 satisfying the admissibility conditions, we have:

Z Rdhx; T 0_{(x)i d} _{mc[ ; ]} Z Rd _Rdhx; yi d 0 _{mc[ ; ]} Z Rd '0(x)d + Z 0 Rd(y)d mc[ ; ]

As an interesting consequence, we have:

Proposition 12. Let ; 1; 2 be probability measures on Rd such that is

abso-lutely continuous w.r.t the Lebesgue measure and has bounded support, while 1and 2 have …nite …rst moment. Suppose 1 4 2 and 1 6= 2. Then mc[ ; 1] <

mc[ ; 2].

Proof. By Theorem 10, there is a Markov kernel such that:

(2.5) 2=

Z

Rd x

d 1

Let T1be the optimal solution of (BP[ ; 1]). Consider the probability measure

on Rd _Rd _{de…ned by:} (2.6) Z f (x; y) d (x; y) = Z d (x) Z f (x; y) d T1(x)(y)

Since T1(x)is a probability measure, the …rst marginal of is . Let us compute

the second marginal. We have, for any f 2 C0_(Rd_),

Z Rd _Rd f (y)d (x; y) = Z Rd T1(x) (f )d (x) = Z Rd x (f )d 1(x) = 2(f )

where the second equality comes from the fact that T1maps on 1and the second

(8)

problem (RP[ ; 2]). A similar computation gives: Z Rd _Rdhx; yi d (x; y) = Z Rd x; Z Rd d T1(x)(y) d (x) = Z Rdhx; T1(x)i d (x) = mc[ ; 1 ]

Since has marginals and 2, it is admissible in the relaxed problem(RP[ ; 2]),

so that the left-hand side is at most mc[ ; 2] while the right-hand side is equal to

mc[ ; 1]. It follows that mc[ ; 1] mc[ ; 2]. If there is equality, then is an

optimal solution to (RP[ ; 2]). By the uniqueness part of Theorem 11, we must

have =R T1(x)d (x). Comparing with equation (2.6), we …nd y = y, holding

true 1-almost surely. Writing this in equation (2.5) we get 1= 2.

2.4. Strongly exposed points. Let E be a Banach space, and C E a closed subset. For v 2 E0_{, consider the optimization problem:}

(2.7) sup

u2Chv; ui

De…nition 13. _{We say that v 2 E}0_{exposes u 2 C if u solves problem (2.7) and is} the unique solution. We shall say that v 2 E0 _{strongly exposes u 2 C if it exposes}

u and all maximizing sequences in problem (2.7) converge to u: n

un2 C; lim

n hv; uni = hv; ui

o

=) lim_n ku unk = 0

We shall say that u 2 C is an exposed point (resp. strongly exposed) if it is exposed (resp. strongly exposed) by some continuous linear functional v. It is a classical result of Phelps that every weakly compact convex subset C of E is the closed convex hull of its strongly exposed points.

3. Some geometric properties of law-invariant subsets of L1 _Rd

Recall that, given _{2 P}1 Rd , we have de…ned subsets M ( ) and C( ) of

L1 _Rd _by:

M ( ) = X 2 L1j X=

C ( ) = X 2 L1j X4

M ( ) is closed in L1_{but not convex. To investigate the relation between M ( )}

and C ( ), we shall need the following result:

Proposition 14. _{Let Y 2 L}1_(Rd_{) with law (Y ) =} _{, and} _{2 P}1_(Rd_{) such that}

< : Then there is a sequence (Xn)1n=1 in M ( ) such that (Xn)1n=1 converges

weakly to Y in L1_(Rd_{): As a consequence, there is a sequence (Y}

n)1n=1 2 conv

(M ( )) converging strongly to Y in L1_(Rd_):

We start by recalling a well-known result from ergodic theory.

Lemma 15. Let _{= f 1; 1g}Z_{equipped with the Borel sigma-algebra F and}

Haar-measure P , and Tn the n-shift, that is:

8k 2 Z, [Tn( )]_k = k n

For any Z 2 L1 ; F; P ; Rd , the sequence of functions Z Tn converges weakly to

(9)

Proof. Suppose that Z depends only on …nitely many coordinates and let A 2 F also depend only on …nitely many coordinates of f 1; 1gZ_{: Then, for n large enough,}

Zn:= Z Tn is independent of A so that

E[ZnjA] = E[Zn] = E[Z]:

The general case follows from approximation.

Proof. (of Proposition 14): Assume (w.l.o.g.) that L1 _{; F; P ; R}d _{is separable.}

Recall that, ( ; F; P) has no atoms. Suppose …rst that Y takes only …nitely many values, i.e. Y = N X j=1 yj1Aj

where (yj)Nj=1 2 Rd and (A1; : : : ; AN) forms a partition of into sets in F with

strictly positive measure.

By Theorem 10 we may …nd a Markov kernel = ( yj)

N

j=1such that the

bary-center of yj is yj and: (3.1) = N X j=1 P[Aj] yj

Each of the sets Aj, equipped with normalized measure P [Aj] 1P jAj is Borel

isomorphic to f 1; 1gZ, equipped with Haar measure. Hence, by the preceding lemma, for each j = 1; : : : ; N we may …nd a random variable Zj : Aj ! Rd under

P [Aj] 1P jAj such that law (Zj) = yj, as well as a sequence (Tj;n)1n=1of

measure-preserving transformations of Aj such that, in the weak topology of L1 Rd :

lim n!1(Zj Tjn)1Aj = yj1Aj; j = 1; : : : ; N: Letting Xn = N X j=1 (Zj Tjn)1Aj

we obtain by (3.1) a sequence in L1_(Rd) with law (Xn) = and converging weakly

to Y =PN_j=1yj1Aj:

Now drop the assumption that Y is a simple function and …x a sequence (Gm)1m=1

of …nite sub-sigma-algebras of F, generating F: Note that if Ym= E[Y jGm] and m

is the law of Ym, we have m ; by Jensen’s inequality.

By the …rst part we may …nd, for each m 1; a sequence (Xm;n)1n=1 in M ( )

such that (Xm;n)1n=1converges weakly to Ym. Noting that (Ym)1m=1converges to Y

(in the norm of L1_(Rd_{) and therefore also weakly) we may …nd a sequence (n} m)1m=1

tending su¢ ciently fast to in…nity, such that (Xm;nm)1m=1 converges weakly to Y .

The …nal assertion follows from the Hahn-Banach theorem. The relationship between C ( ) and M ( ) now follows:

Theorem 16. The set C ( ) is convex, weakly compact, and equals the weak closure of M ( ):

(10)

Proof. Obviously M ( )w _{C ( ). Conversely, take any X 2 C( ). By Proposition} 14, there is a sequence Xn in M ( ) such that Xn ! X weakly, so X 2 M( )

w

. This shows that C ( ) = M ( )w.

By Proposition 8, C( ) is convex. It remains to show that it is weakly compact. Since C ( ) is the weak closure of M ( ), it is enough to show that M ( ) is weakly relatively compact. To do that, we shall use the Dunford-Pettis criterion. We claim that M ( ) is equi-integrable. Indeed, …x some X 2 M( ). For any other Y 2 M ( ), and any m > 0, we have:

Z jY j mjY jdP = Z jxj mjxj d (x) = Z jXj mjXjdP

which goes to 0 when m ! 1, independently of Y . The result follows.

We now investigate strongly exposing functionals and strongly exposed points of C( ). We will show that any Z 2 L1, the law of which is a.c. w.r.t. Lebesgue measure, strongly exposes a point of C( ) (which must then belong to M ( )) and conversely, provided is absolutely continuous w.r.t. Lebesgue measure, that any point of M ( ) is strongly exposed by such a Z.

Theorem 17. Let _{2 P}1 Rd , Z 2 L1 and suppose the law of Z is absolutely

continuous with respect to Lebesgue measure. Then Z strongly exposes C( ), and the exposed point in fact belongs to M ( ):

Proof. Let be the law of Z and consider the maximal correlation problem (BP[ ; ]). By Theorem 11, it has a unique solution T . Set X = T (Z). Clearly X has law , and by uniqueness:

(3.2) [X0_{2 M( ); X}0_{6= X] =) hZ; Xi > hZ; X}0_i

So X is an exposed point in M ( ). Take any Y 2 C( ), so that Y 4 . By

Proposition 12, we have hZ; Xi hZ; Y i, and if hZ; Xi = hZ; Y i, then Y = X=

. So Y must belong to M ( ), and by formula (3.2), we must have Y = X. So X is an exposed point in C( ) as well.

It remains to prove that it is strongly exposed. For this, take a maximizing sequence Xn in C( ). Since C ( ) is weakly compact and n ; where n is the

law of Xn, there is a subsequence Xnkwhich converges weakly to some X02 C( ).

By Lemma 7, the set of all - is weak* compact, so we may assume that the laws nk converge weak* to some : Obviously X0 maximizes hZ; X0i, and since

Z exposes X, we must have X0 _{= X. So the X}

nkconverge weakly to X, and, by

Proposition 8, X = 4 .

On the other hand, take any convex function f with linear growth. Since nk 4

we have: _Z

f (x)d nk

Z f (x)d Letting k ! 1, we get from Lemma 7_Z

f (x)d = lim k Z f (x)d nk Z f (x)d

So _{= , and Proposition 8 then implies that kX}nk Xk1! 0. Since the limit

does not depend on the subsequence, the whole sequence Xn converges, and X is

strongly exposed, as announced. Here is a kind of converse:

(11)

Theorem 18. Fix two measures and _{on R}d_{, the …rst one having …nite …rst}

moment and the second one compact support. Suppose both of them are absolutely continuous with respect to Lebesgue measure. Then, for every X with law , there is a unique Z with law which strongly exposes X in C ( ).

Proof. Consider the maximal correlation problem (BP[ ; ]). It has a unique solu-tion T : Rd_{! R}d_{verifying T ] = . Since both} _and _{are absolutely continuous}

with respect to Lebesgue measure, the problem (BP[ ; ]) also has a unique solution S : Rd_{! R}d _{verifying S] = . Clearly S = T} 1_{and T = S} 1_{. De…ne Z = T (X).}

It is then the case that the law of Z is and S (Z) = S T (X) = X: Repeating the preceding proof we …nd that Z strongly exposes X in C ( ).

Note that the condition that be absolutely continuous with respect to the Lebesgue measure cannot be dropped from the preceding theorem. This may be seen by a variant of a well-known example in optimal transport theory ([9], Example 4.9). On R2 consider the measure which is uniformly distributed on the interval f0g [0; 1] while is uniformly distributed on the rectangle [ 1; 1] [0; 1]. Then is absolutely continuous w.r.t. Lebesgue measure, while is not. Clearly the optimal transport T from to for the maximal correlation problem is given by the projection on the vertical axis. This map is not invertible.

Let ( ; F; P ) be given by = [0; 1] equipped with the Lebesgue measure P on the Borel _{-algebra. De…ne a random vector X 2 L}1 _{; F; P ; R}2 _{by X (!) =}

(0; !) ; so that the law of X is . Let us now calculate the maximal correlation between and . Let Z02 L1have law and de…ne X0= T (Z0) so that X0has

law . By the proof of theorem 17 we get: mc( ; ) = Z hX0; Z0i dP = Z R2hx; T (x)i d = 1 2 Z 1 1 Z 1 0 x2₂dx2 dx1= Z 1 0 x2₂dx2= 1 3: On the other hand, we claim that:

(3.3)

Z

hX; Z0i dP <

1 3:

Since this holds for any Z0 with law , it shows that X does not expose any

point in C ( ). This is the desired counterexample.To prove (3.3), write Z0(!) =

(Z0;1(!) ; Z0;2(!)) and note that P [Z0;2 6= X2] > 0. Indeed, assume otherwise, so

that Z0;2(!) = X2(!) = ! almost surely. Then Z0;1(!) is fully determined by

Z0;2(!), meaning that, in the image of by Z, the coordinate z1 is determined by

the coordinate z2. This clearly contradicts the fact that the law of Z is . Since

the law of Z0;2 is the Lebesgue measure on [0; 1], but Z0;2 does not coincide with

X2(!) = !, we have, from the uniqueness of the Brenier map:

Z hX; Z0i dP Z X2Z0;2dP < Z X22dP = 1 3

Let us summarize our …ndings: There are measures and _{on R}2 _{with compact}

support, being absolutely continuous with respect to Lebesgue measure, and some X 2 L1 _R2 _{with law} _{such that there is no Z 2 L}1 _R2 _{with law} _which

(12)

References

[1] Brenier, Yann, "Polar factorization and monotone rearrangements of vector-valued functions", Comm. Pure and Applied Math. 64, 375-417 (1991)

[2] Elyes, Jouini, Walter Schachermayer and Nizar Touzi, "Law-invariant risk measures have the Fatou property", Adv. Math. Econ. 9, 49-71

[3] Kantorovitch. L.V. "On the transfer of masses", Doklady Akad. Nauk. CCCR (1942), 113-161 [4] Kellerer, H.G. "Duality theorems for marginal problems" Z. Wahrsch. Verw. Gebiete 67 (1984),

399-432

[5] Meyer, Paul-André, "Probabilités et potentiel", Hermann (1966)

[6] Rachev, S.T. and Rüschendorf, L. , "A characterization of random variables with minimal L2

distance", J. Multivariate An. 32, 48-54 (1990)

[7] Strassen, V. "The existence of probability measures with given marginals" Ann. Math. Stat 36 (1965) p.423-439

[8] Villani, Cedric, "Topics in optimal transportation", Publications of the AMS, 2003 [9] Villani, Cedric "Optimal transport, old and new", Springer Verlag