• Aucun résultat trouvé

model A geometric view of optimal transportation and generative Computer Aided Geometric Design

N/A
N/A
Protected

Academic year: 2022

Partager "model A geometric view of optimal transportation and generative Computer Aided Geometric Design"

Copied!
21
0
0

Texte intégral

(1)

Contents lists available atScienceDirect

Computer Aided Geometric Design

www.elsevier.com/locate/cagd

A geometric view of optimal transportation and generative model

Na Lei

a,f

, Kehua Su

b

, Li Cui

c

, Shing-Tung Yau

d

, Xianfeng David Gu

e,∗

aDUT-RUISE,DalianUniversityofTechnology,Dalian,China bSchoolofComputeScience,Wuhanuniversity,Wuhan,China cMathematicsDepartment,BeijingNormalUniversity,Beijing,China dMathematicsDepartment,HarvardUniversity,Cambridge,USA eStateUniversityofNewYorkatStonyBrook,StonyBrook,USA

fKeyLaboratoryforUbiquitousNetworkandServiceSoftwareofLiaoningProvince,Dalian,China

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Availableonline9November2018

Keywords:

OptimalMassTransportation Monge–Ampere

GAN

Wassersteindistance

Inthiswork,wegiveageometricinterpretationto theGenerativeAdversarialNetworks (GANs). The geometric view is based on the intrinsic relation between Optimal Mass Transportation (OMT)theory and convexgeometry, and leads toa variationalapproach to solve the Alexandrov problem: constructing a convexpolytope with prescribed face normalsandvolumes.

ByusingtheoptimaltransportationviewofGAN model,weshowthatthediscriminator computestheWassersteindistanceviatheKantorovichpotential,thegeneratorcalculates thetransportationmap.Foralargeclassoftransportationcosts,theKantorovichpotential cangivetheoptimaltransportationmapbyaclose-formformula.Therefore,itissufficient to solely optimize the discriminator. This shows the adversarial competition can be avoided,andthecomputationalarchitecturecanbesimplified.

Preliminaryexperimentalresultsshowthegeometricmethodoutperformsthetraditional Wasserstein GAN for approximating probability measures with multiple clusters inlow dimensionalspace.

©2018ElsevierB.V.Allrightsreserved.

1. Introduction

GANmodel GenerativeAdversarial Networks (GANs) (Goodfellow etal., 2014) aim atlearning a mapping froma simple distribution toa givendistribution. AGAN modelconsistsof a generator G anda discriminator D, both arerepresented as deepneural networks(DNNs). The generator captures the datadistribution and generates samples, the discriminator estimates the probability that a sample came from the training data ratherthan the generator. Both the generator and thediscriminatorare trainedsimultaneously. Thecompetitiondrivesbothofthemtoimprovetheirperformance untilthe generatedsamples are indistinguishablefromthe genuine datasamples.At theNash equilibrium (Zhao etal., 2016),the distribution generated by G equals to the real data distribution. GANs have several advantages: they can automatically generatesamplesandreduce theamountofrealdatasamples;furthermore,GANs donotneed theexplicitexpressionof thedistributionofgivendata.

*

Correspondingauthor.

E-mailaddresses:[email protected](N. Lei),[email protected](K. Su),[email protected](L. Cui),[email protected](S.-T. Yau),[email protected] (X.D. Gu).

https://doi.org/10.1016/j.cagd.2018.10.005 0167-8396/©2018ElsevierB.V.Allrightsreserved.

(2)

Fig. 1.Wasserstein Generative Adversarial Networks (W-GAN) framework.

Fig. 2.The GAN model, OMT theory and convex geometry has intrinsic relations.

Recently, GANs receive an exploding amount of attention. For examples, GANs have been widely applied to numer- ous computer visiontasks such asimage inpainting (Pathak etal., 2016; Yeh etal., 2017; Lietal., 2017b), image super resolution (Ledig etal., 2016; Iizukaetal.,2017),semanticsegmentation (ZhuandXie,2016; Lucetal., 2016), objectde- tection (Radford etal., 2015; Lietal., 2017a; Wangetal., 2017), videoprediction (Mathieu etal., 2015; Vondricketal., 2016),imagetranslation (Isolaetal.,2016;Zhuetal.,2017;Dongetal.,2017;Liuetal.,2017),3Dvision (Wuetal.,2016;

Park etal., 2017), faceediting (Larsenetal., 2015; LiuandTuzel,2016;Perarnau etal.,2016;Shen andLiu,2017; Brock et al., 2017; Shuet al., 2017; Huang etal., 2017),etc. Also, in machinelearning field,GANs have beenapplied tosemi- supervisedlearning (Odena,2016;Kumaretal.,2017;Salimansetal.,2016),clustering (Springenberg,2016),crossdomain learning (Taigmanetal.,2016;Kimetal.,2017),andensemblelearning (Tolstikhinetal.,2017).

Optimaltransportationview In deep learning, the “data distribution hypothesis” is well accepted: natural data sets dis- tribute close to low dimensional manifolds.Therefore, the central goalof deep learning is to learn thesemanifolds and thedistributionsonthem.Generativemodels,suchasGenerativeAdversarialNetworks(GAN)andVariationalAutoencoders (VAE), achievethisbymappinga datamanifoldembedded intheambientspaceto thelowdimensionallatentspaceand manipulatingthemappingtoadjustthepushforwarddistributionsonthelatentspace.

Recently, Optimal Mass Transportation (OMT) theory has been applied to improve VAEs andGANs. The Wasserstein distance has been adapted by GANs as the loss function as the discriminator, such as WGAN (Arjovsky et al., 2017), WGAN-GP (Gulrajani et al., 2017) and RWGAN (Guoetal., 2017). Whenthe supports oftwo distributions have noover- lap,Wassersteindistancestillprovidesasuitable gradientforthegeneratortoupdate.TheWassersteindistanceinVAEis calculatedusinglinearprogrammingmethodinLiuetal.(2018),whichgivesmoretransparentandaccurateresults.

Fig.1showstheoptimalmasstransportation pointofview ofWGAN (Arjovsky etal., 2017). Theambientimagespace isX,withtherealdatadistribution

ν

.ThelatentspaceisZ withmuchlowerdimension.Thegenerator Gcanbetreated as a “decoding map” from the latent space to the sample space, gθ:ZX, realized by a deep neural network with parameter θ.Letζ be a fixed distributionin thelatent space, such asuniformdistribution orGaussian distribution.The generator G pushesforwardζ toa distribution

μ

θ =gθ#ζ inthe ambientspaceX.The discriminator D uses thepower ofEuclideandistanceasthecostfunction andcomputestheWassersteindistancebetween

μ

θ and

ν

, Wc(

μ

θ,

ν

),realized byanotherdeepneuralnetworkwithparameterξ.CalculatingtheWassersteindistanceWc(

μ

θ,

ν

)isequivalenttofinding theso-calledKantorovichpotential

ϕ

ξ.Therefore,G improvesthedecodingmapgθ toapproximate

ν

by gθ#ζ; D improves theKantorovich potential

ϕ

ξ toincreasetheapproximationaccuracytotheWassersteindistance.Thegenerator Gandthe discriminator D aretrainedalternatively,untilthecompetitionreachesanequilibrium.

Insummary,theGenerativeAdversarialNetworkmodel(GAN)hasnaturalconnectionwiththeOptimalMassTransporta- tion(OMT)theory:

1. IngeneratorG,thegeneratingmap gθ inGANisequivalenttotheoptimaltransportationmapinOMT;

2. Indiscriminator D,theWassersteindistancebetweendistributionsisequivalenttotheKantorovichpotential

ϕ

ξ. 3. ThealternativetrainingprocessofW-GANisthemin-maxoptimizationofexpectations:

minθ max

ξ

E

zζ

( ϕ

ξ

(

(

z

))) + E

yν

( ϕ

ξc

(

y

)).

The deepneural networkof D computesthe Wassersteindistanceby the maximizationprocess toapproximate Kan- torovich potentials

ϕ

ξ, parameterized by ξ; the network G calculates the optimal transportation map gθ by the minimizationprocess,parameterizedbyθ (seeFig.1).

TheGANmodelandtheconvexgeometryareconnectedbytheoptimaltransportationtheory(seeFig.2).

(3)

Geometricinterpretation The Optimal MassTransportation theory hasintrinsic connections withthe convex geometry.A specialtypeofOMT problemisequivalenttotheAlexandrovprobleminconvexgeometry,specifically,findingtheoptimal transportation map with L2 cost is equivalent to constructing a convexpolytope withuser prescribed normals andface volumes.The geometricviewleads toapracticalalgorithm,whichfindsthegeneratingmap byaconvexoptimization.

Furthermore,theoptimizationcanbecarriedout usingNewton’smethodwithexplicitgeometric meaning.Thegeometric interpretation also gives the direct relation betweenthe transportation map gθ for G and the Kantorovich potential

ϕ

ξ forD.

Theseconceptscanbeexplainedusingtheplainlanguageincomputationalgeometry(Edelsbrunner,1987), 1. theKantorovichpotential

ϕ

ξ correspondstothepowerdistance;

2. theoptimaltransportationmap gθ representsthemappingfromthepowerdiagramtothepowercenters,eachpower cellismappedtothecorrespondingsite.

Imaginaryadversary Inthecurrentwork, weuse OptimalMass Transportationtheory to showthe factthat: by carefully designingthemodelandchoosingspecialdistancefunctionsc,thegeneratormap gθ andthediscriminator function(Kan- torovichpotential)

ϕ

ξ areequivalent, onecanbe deducedfromtheother bya simpleclosedformula.Therefore,oncethe Kantorovichpotential reachestheoptimum,thegeneratormapcanbe obtaineddirectlywithouttraining.Oneofthedeep neuralnetworksforGorD isredundant,oneofthetrainingprocessesiswasteful.Thecompetitionbetweenthegenerator G andthediscriminator D isunnecessary,andimaginary.

Contributions Themajorcontributionsofthecurrentworkareasfollows:

1. BasedontheconnectionbetweenconvexgeometryandOptimalTransportation,developanexplicitgeometricconstruc- tionforoptimaltransportationmapforthepurposeofGenerativeAdversarialNetworks;

2. Demonstrate in Theorem 3.7that if thecost function c(x,y)=h(xy),where h is a strictly convexfunction, then oncetheoptimaldiscriminatorisobtained,thegeneratorcanbewrittendowninanexplicitformula.Inthissituation, thecompetitionbetweenthediscriminatorandthegeneratorisunnecessaryandthecomputationalarchitecturecanbe simplified;

3. Proposeanovelframeworkforgenerativemodel,whichusesgeometricconstructionoftheoptimalmasstransportation map;

4. Conductpreliminaryexperimentsfortheproofofconcepts.

Organization The articleis organized asfollows: section 2 explains the Optimal Mass Transportation view of WGANin details; section 3 lists the main theory of OMT; section 4 gives the detailed exposition of Minkowski and Alexandrov theorems in convex geometry, andits close relation with power diagram theory in computational geometry, an explicit computational algorithm is givento solve Alexandrov’sproblem; section 5 analyzes semi-discrete optimal transportation problem, andconnectsAlexandrov problemwiththeoptimaltransportation map;preliminaryexperimentsare conducted forproofofconcept,whicharereportedinsection6.Theworkconcludesinthesection7.

2. OptimaltransportationviewofGAN

Inthissection,theGANmodelisinterpretedfromtheoptimaltransportationpointofview.Weshowthatthediscrimi- natormainlylooksfortheKantorovichpotential.

LetX⊂Rnbethe(ambient)imagespace,P(X)betheWassersteinspaceofallprobabilitymeasuresonX.Assumethe realdatadistributionis

ν

P(X),inpracticeapproximatedbyanempiricaldistribution

ν :=

1 n

n j=1

δ

yj

,

(1)

where yjX,j=1,. . . ,n aredatasamples,δyj istheDiracfunction.Agenerativemodel producesaparametric familyof probabilitydistributions

μ

θ,θ,aMinimumKantorovitchEstimatorforθ isdefinedasanysolutiontotheproblem

min

θ Wc

( μ

θ

, ν ),

where Wc is the Kantorovich cost on P(X) andthe groundcost function c:X×X→R. When c is a power of the Euclideandistance,Wc istheWassersteindistancebetween

μ

and

ν

,

Wc

( μ , ν ) =

min

ρ∈P(X×X)

⎧ ⎨

X×X

c

(

x

,

y

)

d

ρ (

x

,

y

)| π

x#

ρ = μ , π

y#

ρ = ν

⎫ ⎬

(2)

(4)

where

π

x and

π

y areprojectors,

π

x#and

π

y# aremarginalizationoperators.Inagenerativemodel,theimagesamplesare encoded toa low dimensional latentspace (ora feature space)Z⊂Rm,mn. Letζ be a fixed distributionsupported onZ.A WassersteinGAN(WGAN)producesaparametricmapping gθ:ZX,whichistreatedasa“decodingmap”from thelatentspaceZ totheoriginalimagespaceX.gθ pushesζforwardto

μ

θP(X),

μ

θ=gθ#ζ.TheminimalKantorovich estimatorinWGANisformulatedas

min

θ E

(θ ) :=

min

θ Wc

(

gθ#

ζ, ν ).

Accordingtotheoptimaltransportationtheory (Villani,2003),theKantorovichproblemhasadualformulation

E

(θ ) =

max ϕ,ψ

⎧ ⎨

Z

ϕ (

gθ

(

z

))

d

ζ (

z

) +

X

ψ (

y

)

d

ν (

y

) : ϕ (

x

) + ψ (

y

)

c

(

x

,

y

)

⎫ ⎬

(3)

Thegradientofthedualenergywithrespecttoθ canbewrittenas

E

(θ ) =

Z

[∂

θgθ

(

z

)]

T

ϕ

(

gθ

(

z

))

d

ζ (

z

),

where

ϕ

istheoptimalKantorovichpotential.Inpractice,ψ canbereplacedbythec-transform of

ϕ

,definedas

ϕ

c

(

y

) :=

inf

x c

(

x

,

y

)ϕ (

x

).

The function

ϕ

is called the Kantorovichpotential. According to the optimal transportation theory (Villani, 2003, 2008), ψ=

ϕ

c andsymmetrically

ϕ

=ψc.Since

ν

is discrete,ψ is justdefinedonthe support Y:= {yi}of

ν

,andψc=

ϕ

.Let ψ(yi)=ψi,theoptimizationover{ψi}canthenbeachievedusingstochasticgradientdescent,asin Genevayetal.(2016).

InWGAN (Arjovsky etal.,2017),thedualproblemEqn. (3) is solvedby approximatingtheKantorovich potential

ϕ

by the so-called“adversarial”map,

ϕ

ξ :X →R,where ξ is representedby adiscriminative deepnetwork.This leadsto the Wasserstein-GANproblem

minθ max ξ

Z

ϕ

ξ

(

z

)

d

ζ (

z

) +

1 n

n j=1

ϕ

ξc

(

yj

).

(4)

The generatorproduces ,thediscriminatorestimates

ϕ

ξ,by simultaneoustraining,thecompetitionreachestheequilib- rium. In WGAN (Arjovskyet al., 2017), c(x,y)= |xy|,then the c-transform of

ϕ

ξ equals to −

ϕ

ξ, subjectto

ϕ

ξ being a 1-Lipschitz function. Thisis used to replace

ϕ

cξ by −

ϕ

ξ inEqn. (4) and usedeep network madeof ReLuunits whose Lipschitzconstantisupper-boundedby1.

3. Optimalmasstransporttheory

In this section, we give a briefreview forthe classical optimal masstransportation theory forengineering purposes, neglecting thetechnicalanddelicate aspectsofthetheory,suchastheconditionsfortheexistence ofa feasibleplan,the existence of optimal Kantorovich potentials and their regularities. Formore rigorous and thorough treatments, we refer readerstoVillani’sbooks(Villani,2003,2008).Theorem3.7showstheintrinsicrelationbetweentheWassersteindistance (equivalent to the Kantorovich potential) and the optimal transportation map (equivalent to the Brenier potential), this demonstratesthatoncetheoptimaldiscriminatorisknown,thegeneratorisautomaticallyobtained.Thegamebetweenthe discriminatorandthegeneratorisunnecessary.

The problemoffindinga mapthatminimizestheinter-domaintransportationcost whilepreservesmeasurequantities was firststudiedbyBonnotte(2012) in the18thcentury.Let X andY betwometricspaceswithprobabilitymeasures

μ

and

ν

respectively.Assume X andY haveequaltotalmeasure

X

d

μ =

Y

d

ν .

Definition3.1(Push-forwardmeasure).AmapT:XY isgiven,ifforanymeasurablesetBY,

μ (

T1

(

B

)) = ν (

B

),

(5)

then

ν

issaidtobethepush-forwardof

μ

byT,andwewrite

ν

=T#

μ

.

(5)

IfthemappingT:XY isdifferentiable, X andY arethesameEuclideanspaceRd,

μ

and

ν

haveLebesguedensities, which are identifiedwith themeasures themselves, then Eqn. (5) canbe formulated asthefollowing Jacobian equation,

μ

(x)dx=

ν

(T(x))dT(x), det

(

D T

(

x

)) = μ (

x

)

ν

T

(

x

) .

(6)

Letusdenotethetransportationcostforsending xX to yY byc(x,y),thenthetotaltransportationcostisgivenby C

(

T

) :=

X

c

(

x

,

T

(

x

))

d

μ (

x

).

(7)

Problem3.2(Monge’sOptimalMassTransportation, Bonnotte,2012).Givenmeasures

μ

and

ν

,andatransportationcostfunc- tionc:X×Y→R,findthetransportationmapT:XY thatminimizesthetotaltransportationcost

(

M P

)

Wc

( μ , ν ) =

inf

T:XY

⎧ ⎨

X

c

(

x

,

T

(

x

))

d

μ (

x

) :

T#

μ = ν

⎫ ⎬

.

(8) If c is the power of the Euclidean distance, the total transportation cost Wc(

μ

,

ν

) is called the Wassersteindistance betweenthetwomeasures

μ

and

ν

.

3.1. Kantorovich’sapproach

IntheMongeformulationtheinfimumisnotattainedingeneral.Inthe1940s,Kantorovichintroducedtherelaxationof Monge’sproblem (Kantorovich,1948).Anystrategysending

μ

onto

ν

canberepresentedby ajointmeasure

ρ

on X×Y, suchthatforevery A,BBorelsubsetsofX,Y respectively,

ρ (

A

×

Y

) = μ (

A

), ρ (

X

×

B

) = ν (

B

),

(9)

ρ

(A×B)iscalledatransportationplan,whichrepresentsthesharetobemovedfrom Ato B.Wedenotetheprojectionto X andY as

π

x and

π

yrespectively,then

π

x#

ρ

=

μ

and

π

y#

ρ

=

ν

.Thetotalcostofthetransportationplan

ρ

is

C

( ρ ) :=

X×Y

c

(

x

,

y

)

d

ρ (

x

,

y

).

(10)

The Monge–Kantorovichproblemconsistsinfinding the

ρ

, amongallthe suitable transportationplans withmarginals

μ

and

ν

,minimizingC(

ρ

)inEqn. (10),

(

K P

)

Wc

( μ , ν ) :=

min

ρ

⎧ ⎨

X×Y

c

(

x

,

y

)

d

ρ (

x

,

y

) : π

x#

ρ = μ , π

y#

ρ = ν

⎫ ⎬

(11)

When

μ

is adiffusemeasure (i.e.

μ

({x})=0 forevery xX) andc iscontinuous, the infimumof(MP)isequals tothe minimumof(KP).

3.2. Kantorovichdualformulation

BecauseEqn. (11) isalinearprogram,ithasadualformulation,knownastheKantorovichproblem(Villani,2008):

(

D P

)

Wc

( μ , ν ) :=

max

ϕ,ψ

⎧ ⎨

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

) : ϕ (

x

) + ψ (

y

)

c

(

x

,

y

)

⎫ ⎬

(12)

where

ϕ

:X→Randψ:Y→Rarerealfunctionsdefinedon X andY respectively.Equivalently,wecanreplaceψ bythe c-transformof

ϕ

.

Definition3.3(c-transform).Givenarealfunction

ϕ

:X→R,thec-transformof

ϕ

isdefinedby

ϕ

c

(

y

) =

inf

xX

(

c

(

x

,

y

)ϕ (

x

)) .

(6)

ThentheKantorovichproblemcanbereformulatedasthefollowingdualproblem:

(

D P

)

Wc

( μ , ν ) :=

max

ϕ

⎧ ⎨

X

ϕ (

x

)

d

μ (

x

) +

Y

ϕ

c

(

y

)

d

ν (

y

)

⎫ ⎬

,

(13) anyoptimal

ϕ

wherethe maximumisattainedinEqn. (13) iscalleda Kantorovichpotential.

ϕ

andψ playsa symmetric role inEqn. (12),ψ canbetreatedastheKantorovichpotentialaswell.ComputingtheWassersteindistanceisequivalent tofindingaKantorovichpotential.

When X=Y, for L1 transportation cost c(x,y)= |xy|in Rn,ifthe Kantorovich potential

ϕ

is1-Lipschitz, thenits c-transformhasaspecialrelation

ϕ

c= −

ϕ

.TheWassersteindistanceisgivenby

Wc

( μ , ν ) :=

max ϕ

⎧ ⎨

X

ϕ (

x

)

d

μ (

x

)

Y

ϕ (

y

)

d

ν (

y

)

⎫ ⎬

.

(14) ForL2transportationcostc(x,y)=1/2|xy|2inRn,thec-transformandtheclassicalLegendretransformhavespecial relations.

Definition3.4.Givenafunction

ϕ

:Rn→R,itsLegendretransformisdefinedas

ϕ

(

y

) :=

sup

x

(

x

,

y

ϕ (

x

)) .

(15)

Wecanshowthefollowingrelationholdswhenc=1/2|xy|2, 1

2

|

y

|

2

ϕ

c

=

1

2

|

x

|

2

ϕ

.

(16)

3.3. Brenier’sapproach

Attheendof1980’s,Brenier(1991) discoveredtheintrinsicconnectionbetweenoptimalmasstransportmapandconvex geometry (seealsoforinstanceVillani,2003,Theorem2.12(ii),andTheorem2.32).

Assume X=Y=Rn,functionu:X→RisaC2 continuousconvexfunction,namelyitsHessianmatrixissemi-positive definite.Itsgradientmap∇u:XY isdefinedasx→ ∇u(x).

Theorem3.5(Brenier,1991).Suppose X andY aretheEuclideanspaceRn,andthetransportationcostisthequadraticEuclidean distancec(x,y)= |xy|2.If

μ

isabsolutelycontinuousand

μ

and

ν

havefinitesecondordermoments,thenthereexistsaconvex functionu:X→R,suchthatthegradientmapu givestheuniquesolutiontotheMonge’sproblem,whereu iscalledBrenier’s potential,u iscalledBreniermaportheoptimalmasstransportationmap.Ingeneral,u isnotunique.

ThistheoremconvertstheMonge’sproblemtosolvingthefollowingMonge–Ampèrepartialdifferentialequation:

det

2u

xi

xj

(

x

) = μ (

x

)

ν ◦ ∇

u

(

x

) .

(17)

Thefunctionu:X→RiscalledtheBrenierpotential.Brenierprovedthepolarfactorizationtheorem.

Theorem3.6(Brenierfactorization,Brenier,1991).Suppose X andY aretheEuclideanspaceRn,

μ

isabsolutelycontinuouswith respecttoLebesguemeasure,amapping

ϕ

:XY pushes

μ

forwardto

ν

,

ϕ

#

μ

=

ν

.Thenthereexistsaconvexfunctionu:X→R, suchthat

ϕ = ∇

u

s

,

wheres:XX ismeasure-preserving,s#

μ

=

μ

.Furthermore,thisfactorizationisunique.

Thefollowingtheoremiswellknowninoptimaltransportationtheory,theproofcanbefoundinVillani’sbook (Villani, 2003)andinthebookofAmbrosioetal.(2008).WeapplythistheoremtoDeepLearningandshowthatthegeneratorand thediscriminatorinWGANmodelwithL2costareequivalent.Forthecompleteness,wegivethedetailedproofhere.

(7)

Theorem3.7(Generator–discriminatorequivalence).Given

μ

and

ν

onacompactdomain⊂Rnthereexistsanoptimaltransport plan

ρ

forthecostc(x,y)=h(xy)withh strictlyconvex.Itisuniqueandoftheform(id,T#)

μ

,provided

μ

isabsolutelycontinuous and∂isnegligible.Moreover,thereexistsaKantorovichpotential

ϕ

,andT canberepresentedas

T

(

x

) =

x

(

h

)

1

(ϕ (

x

)).

Proof. Assume

ρ

isthejointprobability,satisfyingtheconditions

π

x#

ρ

=

μ

,

π

y#

ρ

=

ν

,(x0,y0)isapointinthesupport of

ρ

,bydefinition

ϕ

c(y0)=infx(c(x,y0)

ϕ

(x)),hence∇x(c(x,y0)

ϕ

(x))|x=x0=0,

ϕ (

x0

) = ∇

xc

(

x0

,

y0

) = ∇

h

(

x0

y0

).

Becausehisstrictlyconvex,therefore∇hisinvertible, x0

y0

= (

h

)

1

(ϕ (

x0

)),

hence y0=x0(h)1(

ϕ

(x0)). 2 Whenc(x,y)=12|xy|2,wehave

T

(

x

) =

x

− ∇ ϕ (

x

) = ∇

x2

2

ϕ (

x

)

= ∇

u

(

x

).

Inthiscase,theBrenier’spotentialuandtheKantorovich’spotential

ϕ

isrelatedby

u

(

x

) =

x2

2

ϕ (

x

).

(18)

Asdiscussedinsection2,inWassersteinGANframework,thediscriminatorD computestheWassersteindistance,which isequivalentto findtheKantorovich potential

ϕ

;thegeneratorG computestheoptimaltransportationmap ∇u,whichis equivalenttofindtheBrenierpotentialu.Thisshowsthecomputationalresultsofthediscriminatorandthegeneratorare closelyrelated,theresultobtainedbythediscriminatorcanbeuseddirectlybythegenerator,andviceversa.

Corollary3.8.UndertheconditionsofBreniertheorem,theMonge–Kantorovichproblem

minρ

⎧ ⎨

X×Y

c

(

x

,

y

)

d

ρ (

x

,

y

) : ( π

x

)

#

ρ = μ , ( π

y

)

#

ρ = ν

⎫ ⎬

,

isequivalenttotheKantorovichdualproblem,

maxϕ,ψ

⎧ ⎨

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

) : ϕ (

x

) + ψ (

y

)

c

(

x

,

y

)

⎫ ⎬

.

Proof. Thequadraticcostofageneraltransportplancanbewrittenas 1

2

X×Y

|

x

y

|

2d

ρ =

1 2

X×Y

|

x

|

2d

ρ +

1 2

X×Y

|

y

|

2d

ρ

X×Y

x

,

y

d

ρ

=

1 2

X

|

x

|

2d

μ (

x

) +

1 2

Y

|

y

|

2d

ν (

y

)

X×Y

x

,

y

d

ρ

uptosubstractingthequadraticmomentsof

μ

and

ν

,itisequivalenttominimizethecostc(x,y)= −x,y.Theinverseof theoptimaltransportationmapT:XY,T1:YX isalsooptimal,byBrienertheorem,thereexistsaconvexfunction v:Y→R,suchthatv=T1.Furthermore,thefollowingrelationshold

u

(

x

) =

1

2

|

x

|

2

ϕ (

x

),

v

(

y

) =

1

2

|

y

|

2

ψ (

y

).

Asimilardecompositionholdsforthedualproblem:

(8)

Fig. 3.Minkowski and Alexandrov theorems for convex polytopes with prescribed normals and areas.

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

) =

X

1

2

|

x

|

2

u

(

x

)

d

μ (

x

) +

Y

1

2

|

y

|

2

v

(

y

)

d

ν (

y

)

=

1 2

X

|

x

|

2d

μ (

x

) +

1 2

Y

|

y

|

2d

ν (

y

)

X

u

(

x

)

d

μ (

x

) +

Y

v

(

y

)

d

ν (

y

)

=

1 2

X

|

x

|

2d

μ (

x

) +

1 2

Y

|

y

|

2d

ν (

y

)

X×Y

(

u

(

x

) +

v

(

y

))

d

ρ

fromthecondition

ϕ (

x

) + ψ (

y

)

1 2

|

x

y

|

2

weobtain

u

(

x

) +

v

(

y

) =

1

2

|

x

|

2

ϕ (

x

) +

1

2

|

y

|

2

ψ (

y

)

x

,

y

Hence

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

)

1 2

X

|

x

|

2d

μ (

x

) +

1 2

Y

|

y

|

2d

ν (

y

)

X×Y

x

,

y

d

ρ 2

4. Convexgeometry

This section introduces MinkowskiandAlexandrov problemsin convexgeometry,which canbe described by Monge–

Ampère equation aswell. Thisintrinsic connection givesa geometric interpretation to optimalmass transportation map withL2transportationcost.

4.1. Alexandrov’stheorem

Minkowski proved theexistence andthe uniqueness of convexpolytope withuserprescribed face normals andareas (seeFig.3leftframe).

Theorem4.1(Minkowski).Supposen1,...,nkareunitvectorswhichspanRnand

ν

1,...,

ν

k>0sothatk

i=1

ν

ini=0.Thereexistsa compactconvexpolytopeP⊂Rnwithexactlyk codimension-1facesF1,...,FksothatniistheoutwardnormalvectortoFiandthe volumeofFiis

ν

i.Furthermore,suchP isuniqueuptoparalleltranslation.

Minkowski’sproofisvariationalandsuggestsanalgorithmtofindthepolytope.Minkowskitheoremforunboundedcon- vexpolytopeswasconsideredandsolvedbyA.D.AlexandrovandhisstudentA.Pogorelov.Inhisbookonconvexpolyhedra (Alexandrov,2005),Alexandrovprovedthefollowingfundamentaltheorem(Theorem7.3.2andtheorem6.4.2inAlexandrov, 2005):

(9)

Theorem4.2(Alexandrov,2005).Supposeisacompactconvexpolytopewithnon-emptyinteriorinRn,n1,...,nk⊂Rn+1 are distinctk unitvectors,the(n+1)-thcoordinatesarenegative,and

ν

1,...,

ν

k>0sothatk

i=1

ν

i=vol().Thenthereexistsconvex polytopeP⊂Rn+1withexactk codimension-1facesF1,. . . ,Fk,sothatniisthenormalvectortoFiandtheintersectionbetween andtheprojectionofFiiswithvolume

ν

i.Furthermore,suchP isuniqueuptoverticaltranslation (seeFig.3rightframe).

Alexandrov’sproofisbasedonalgebraictopologyandnon-constructive.Guetal.(2016) gaveavariationalproofforthe generalizedAlexandrovtheoremstatedintermsofconvexfunctions.

Giveny1,. . . ,yk∈Rn andh=(h1,. . . ,hk)∈Rk,wedefineapiecewiselinearconvexfunctionuh(x)as uh

(

x

) =

maxk

i=1

{

x

,

yi

+

hi

} .

The graph of uh is a convex polytope in Rn+1, the projection induces a cell decomposition ofRn. Eachcell isa closed convexpolytope,

Wi

(

h

) =

x

∈ R

n

|∇

uh

(

x

) =

yi

.

Somecells maybeemptyorunbounded.Givena probabilitymeasure

μ

definedon,the

μ

-volumeofWi(h)isdefined as

wi

(

h

) := μ (

Wi

(

h

)) =

Wi(h)∩

d

μ .

Theorem4.3(Guetal.,2016).Let beacompactconvexdomaininRn,{y1,...,yk}beasetofdistinctpointsinRnand

μ

a probabilitymeasureon.Thenforany

ν

1,...,

ν

k>0withk

i=1

ν

i=

μ

(),thereexistsh=(h1,...,hk)∈Rk,uniqueuptoaddinga constant(c,...,c),sothatwi(h)=

ν

i,foralli.Thevectorsh areexactlymaximumpointsoftheconcavefunction

E

(

h

) =

k

i=1 hi

ν

i

h 0

k i=1

wi

( η )

d

η

i (19)

ontheopenconvexset

H

= {

h

∈ R

k

|

wi

(

h

) >

0

,∀

i

}.

Furthermore,uhminimizesthequadraticcost

|

x

T

(

x

)|

2d

μ (

x

)

amongalltransportmapsT#

μ

=

ν

,wheretheDiracmeasure

ν

=k

i=1

ν

iδ(yyi).

Fortheconvenienceofdiscussion,wedefinetheAlexandrov’spotentialasfollows:

Definition4.4(Alexandrovpotential).Undertheabovecondition,theconvexfunction

A

(

h

) =

h

k

i=1

wi

( η )

d

η

i (20)

iscalledtheAlexandrovpotential.

Wedefinetheadmissiblespacefortheheightvectorh

H

:=

h

∈ R

k

|

wi

(

h

) >

0

k

i=1 hk

=

0

ByBrunn–Minkowskiinequality,wecanshowthatHisaconvexopensetinRk. FromEqn. (24),itiseasytoshowthefollowingsymmetricrelation:

wi

(

h

)

hj

=

wj

(

h

)

hi

,

(10)

Fig. 4.GeometricInterpretationtoOptimalTransportMap:Brenierpotentialuh:→R,Legendredualuh,optimaltransportationmapuh:Wi(h)yi, powerdiagramV,weightedDelaunaytriangulationT.

thereforethedifferentialform

ω :=

k i=1

wi

(

h

)

dhi

isaclosedoneform.BecauseHissimplyconnected,

ω

isexact,hencetheAlexandrov potentialEqn. (20) iswell-defined.

ThegeometricmeaningofA(h)isthevolumeundertheupperenvelope.Detailscanbefoundin Guetal.(2016).

4.2. Powerdiagram

Alexandrov’stheoremhascloserelationwiththeconventionalpowerdiagram.Wecanusepowerdiagramalgorithmto solvetheAlexandrov’sproblem.

Definition4.5(powerdistance).Givenapoint yi∈Rn withapowerweightψi,thepowerdistanceisgivenby pow

(

x

,

yi

) =

1

2

|

x

yi

|

2

ψ

i

.

Definition4.6(powerdiagram).Given weighted points {(y11),(y22),. . . ,(ykk)}, thepower diagram isthe cell de- compositionofRn,denotedasV(ψ),

R

n

=

k i=1

Wi

(ψ ),

whereeachcellisaconvexpolytope

Wi

(ψ ) = {

x

∈ R

n

|

pow

(

x

,

yi

)

pow

(

x

,

yj

),∀

j

}.

TheweightedDelaunaytriangulation,denotedasT(ψ),isthePoincarédualtothepowerdiagram,ifWi(ψ)Wj(ψ)= ∅ thenthereisanedgeconnectingyi andyj intheweightedDelaunaytriangulation (seeFig.5).

Notethatpow(x,yi)pow(x,yj)isequivalentto

x

,

yi

+

1

2

i

− |

yi

|

2

)

x

,

yj

+

1

2

j

− |

yj

|

2

),

let

hi

=

1

/

2

i

− |

yi

|

2

),

(21)

thenpow(x,yi)pow(x,yj)isequivalenttox,yi+hix,yj+hj.Theupperenvelopeoftheplanes{x,yi+hi=0}is thegraphoftheconvexfunction

uh

(

x

) =

max

i

{

x

,

yi

+

hi

}.

(22)

Theprojectionofthegraphofuh givesthepowerdiagramV(ψ).

(11)

Fig. 5.Powerdiagram(blue)anditsdualweightedDelaunaytriangulation(black),thepowerweightψi equaltothesquareofradiusri(redcircle).(For interpretationofthecolorsinthefigure(s),thereaderisreferredtothewebversionofthisarticle.)

4.3. Convexoptimization

Now,wecanusethepowerdiagramtoexplainthegradientandtheHessianoftheenergyEqn. (19),bydefinition

E

(

h

) = ( ν

1

w1

(

h

), ν

2

w2

(

h

), · · · , ν

k

wk

(

h

))

T

.

(23) TheHessianmatrixisgivenbypowerdiagram–weightedDelaunaytriangulation,foradjacentcellsinthepowerdiagram,

2E

(

h

)

hi

hj

=

wi

(

h

)

hj

= − μ (

Wi

(

h

)

Wj

(

h

))

|

yj

yi

|

(24)

Supposeedgeei j isintheweightedDelaunaytriangulation,connectingyiandyj.IthasauniquedualcellDi j inthepower diagram,then

wi

(

h

)

hj

= − μ (

Di j

)

|

ei j

| ,

thevolumeratiobetweenthedualcells.ThediagonalelementintheHessianis

2E

(

h

)

h2i

=

wi

(

h

)

hi

= −

j=i

wi

(

h

)

hj

.

(25)

Therefore,inorderto solveAlexandrov’sproblemto constructtheconvexpolytopewithuserprescribednormalandface volume,wecanoptimizetheenergyinEqn. (19) usingclassicalNewton’smethoddirectly.

Let’sobserve the convexfunction uh, its graphis the convexhullC(h). Then the discrete Hessian determinantof uh assignseach vertexv ofC(h)thevolumeoftheconvexhullofthegradientsofuh attop-dimensionalcells adjacentto v.

Therefore,solvingAlexandrov’sproblemisequivalenttosolveadiscreteMonge–Ampèreequation.

5. Semi-discreteoptimalmasstransport

In thissection, we solvethe semi-discrete optimal transportation problemfromgeometric point ofview. Thisspecial caseisusefulinpractice.

Supposetheclosureof

μ

,isacompactconvexsubsetoftheEuclideanspace X,

=

supp

μ = {

x

X

| μ (

x

) >

0

} .

ThespaceY isdiscretizedtoY= {y1,y2,· · ·,yk}withDiracmeasure

ν

=k

j=1

ν

jδyj.Thetotalmassisequal

d

μ (

x

) =

k i=1

ν

i

.

5.1. Kantorovichdualapproach

WedefinethediscreteKantorovichpotentialψ:Y→R,ψj:=ψ(yj),hereY isadiscretesetin X=Rn,then

Y

ψ

d

ν =

k

j=1

ψ

j

ν

j

.

(26)

(12)

Thec-transformationofψ isgivenby

ψ

c

(

x

) =

min

1jk

{

c

(

x

,

yj

)ψ

j

}.

(27)

Thisinducesacelldecompositionof X,

X

=

k i=1

Wi

(ψ ),

whereeachcellisgivenby Wi

(ψ ) =

x

X

|

c

(

x

,

yi

)ψ

i

c

(

x

,

yj

)ψ

j

,

1

j

k

.

AccordingtothedualformulationoftheWassersteindistanceEqn. (13) andintegrationEqn. (26),wedefinetheenergy E

(ψ ) =

X

ψ

cd

μ +

Y

ψ

d

ν

thenobtaintheformula

E

(ψ ) =

k i=1

ψ

i

( ν

i

wi

(ψ )) +

k

j=1

Wj(ψ )

c

(

x

,

yj

)

d

μ ,

(28)

where wi(ψ)isthemeasureofthecell Wi(ψ), wi

(ψ ) = μ (

Wi

(ψ )) =

Wi(ψ )

d

μ (

x

).

(29)

ThentheWassersteindistancebetween

μ

and

ν

equalsto Wc

( μ , ν ) =

max

ψ E

(ψ ).

5.2. Brenier’sapproach

Kantorovich’sdualapproachisforgeneralcostfunctions.WhenthecostfunctionistheL2 distancec(x,y)=1/2|xy|2, wecanapplyBrenier’sapproachdirectly.

Wedefine aheightvector h=(h1,h2,· · ·,hk)∈Rn,consistingofk realnumbers.Foreach yiY,weconstructahyper- planedefinedon X,

π

i(h): x,yi+hi=0.WedefinetheBrenierpotentialfunctionas

uh

(

x

) =

maxk

i=1

{

x

,

yi

+

hi

},

(30)

then uh(x) is a convexfunction. The graph of uh(x) is an infinite convexpolyhedron with supporting planes

π

i(h). The projectionofthegraphinducesapolygonalpartitionof,

=

k i=1

Wi

(

h

),

(31)

whereeachcellWi(h)istheprojectionofafacetofthegraphofuh onto,

Wi

(

h

) = {

x

X

|∇

uh

(

x

) =

yi

} ∩ .

(32)

ThemeasureofWi(h)isgivenby wi

(

h

) =

Wi(h)

d

μ .

(33)

Theconvexfunctionuh oneachcell Wi(h)isalinearfunction

π

i(h),therefore,thegradientmap

uh

:

Wi

(

h

)

yi

,

i

=

1

,

2

, · · · ,

k

,

(34) maps each Wi(h) to a single point yi. Accordingto Alexandrov’stheorem, andthe Gu–Luo–Yautheorem, we obtain the followingcorollary:

Références

Documents relatifs

For this particular problem, Gangbo and ´ Swi¸ech proved existence and uniqueness of an optimal transportation plan which is supported by the graph of some transport map.. In

In the present paper the solutions based on cyclographic method are considered on the example of two applied tasks: generation of road surface forms and pocket machining

Among our results, we introduce a PDE of Monge–Kantorovich type with a double obstacle to characterize optimal active submeasures, Kantorovich potential and optimal flow for the

A pioneer application of geometric optimal control was the contribu- tion to the saturation problem of a single spin system [?], where the authors replaced the standard

For the existence of optimal transport plans, it is a consequence of the boundedness and the weak* closure of the set of the transport plans, as well as of the lower semicontinuity

In particular, a topological space is arcwise connected (resp. locally arcwise connected), if and only if every couple of points in is connected by some path [a , b] ⊂ (resp.

with its underlying space functor, is a category of enriched spaces that possesses well behaved subspaces, cylinders and mapping cylinders, all. in the under A

In his famous article [1], Brenier solved the case H ( x, y ) = x, y and proved (under mild regularity assumptions) that there is a unique optimal transportation plan which is