model A geometric view of optimal transportation and generative Computer Aided Geometric Design

(1)

Contents lists available atScienceDirect

Computer Aided Geometric Design

www.elsevier.com/locate/cagd

A geometric view of optimal transportation and generative model

Na Lei

^a^,^f

, Kehua Su

^b

, Li Cui

^c

, Shing-Tung Yau

^d

, Xianfeng David Gu

^e^,∗

aDUT-RUISE,DalianUniversityofTechnology,Dalian,China bSchoolofComputeScience,Wuhanuniversity,Wuhan,China cMathematicsDepartment,BeijingNormalUniversity,Beijing,China dMathematicsDepartment,HarvardUniversity,Cambridge,USA eStateUniversityofNewYorkatStonyBrook,StonyBrook,USA

fKeyLaboratoryforUbiquitousNetworkandServiceSoftwareofLiaoningProvince,Dalian,China

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Availableonline9November2018

Keywords:

OptimalMassTransportation Monge–Ampere

GAN

Wassersteindistance

Inthiswork,wegiveageometricinterpretationto theGenerativeAdversarialNetworks (GANs). The geometric view is based on the intrinsic relation between Optimal Mass Transportation (OMT)theory and convexgeometry, and leads toa variationalapproach to solve the Alexandrov problem: constructing a convexpolytope with prescribed face normalsandvolumes.

ByusingtheoptimaltransportationviewofGAN model,weshowthatthediscriminator computestheWassersteindistanceviatheKantorovichpotential,thegeneratorcalculates thetransportationmap.Foralargeclassoftransportationcosts,theKantorovichpotential cangivetheoptimaltransportationmapbyaclose-formformula.Therefore,itissuﬃcient to solely optimize the discriminator. This shows the adversarial competition can be avoided,andthecomputationalarchitecturecanbesimpliﬁed.

Preliminaryexperimentalresultsshowthegeometricmethodoutperformsthetraditional Wasserstein GAN for approximating probability measures with multiple clusters inlow dimensionalspace.

©²⁰¹⁸^Elsevier^B.V.^All^rights^reserved.

1. Introduction

GANmodel GenerativeAdversarial Networks (GANs) (Goodfellow etal., 2014) aim atlearning a mapping froma simple distribution toa givendistribution. AGAN modelconsistsof a generator G anda discriminator D, both arerepresented as deepneural networks(DNNs). The generator captures the datadistribution and generates samples, the discriminator estimates the probability that a sample came from the training data ratherthan the generator. Both the generator and thediscriminatorare trainedsimultaneously. Thecompetitiondrivesbothofthemtoimprovetheirperformance untilthe generatedsamples are indistinguishablefromthe genuine datasamples.At theNash equilibrium (Zhao etal., 2016),the distribution generated by G equals to the real data distribution. GANs have several advantages: they can automatically generatesamplesandreduce theamountofrealdatasamples;furthermore,GANs donotneed theexplicitexpressionof thedistributionofgivendata.

*

Correspondingauthor.

E-mailaddresses:[email protected](N. Lei),[email protected](K. Su),[email protected](L. Cui),[email protected](S.-T. Yau),[email protected] (X.D. Gu).

https://doi.org/10.1016/j.cagd.2018.10.005 0167-8396/©²⁰¹⁸^Elsevier^B.V.^All^rights^reserved.

(2)

Fig. 1.Wasserstein Generative Adversarial Networks (W-GAN) framework.

Fig. 2.The GAN model, OMT theory and convex geometry has intrinsic relations.

Recently, GANs receive an exploding amount of attention. For examples, GANs have been widely applied to numer- ous computer visiontasks such asimage inpainting (Pathak etal., 2016; Yeh etal., 2017; Lietal., 2017b), image super resolution (Ledig etal., 2016; Iizukaetal.,2017),semanticsegmentation (ZhuandXie,2016; Lucetal., 2016), objectde- tection (Radford etal., 2015; Lietal., 2017a; Wangetal., 2017), videoprediction (Mathieu etal., 2015; Vondricketal., 2016),imagetranslation (Isolaetal.,2016;Zhuetal.,2017;Dongetal.,2017;Liuetal.,2017),3Dvision (Wuetal.,2016;

Park etal., 2017), faceediting (Larsenetal., 2015; LiuandTuzel,2016;Perarnau etal.,2016;Shen andLiu,2017; Brock et al., 2017; Shuet al., 2017; Huang etal., 2017),etc. Also, in machinelearning ﬁeld,GANs have beenapplied tosemi- supervisedlearning (Odena,2016;Kumaretal.,2017;Salimansetal.,2016),clustering (Springenberg,2016),crossdomain learning (Taigmanetal.,2016;Kimetal.,2017),andensemblelearning (Tolstikhinetal.,2017).

Optimaltransportationview In deep learning, the “data distribution hypothesis” is well accepted: natural data sets dis- tribute close to low dimensional manifolds.Therefore, the central goalof deep learning is to learn thesemanifolds and thedistributionsonthem.Generativemodels,suchasGenerativeAdversarialNetworks(GAN)andVariationalAutoencoders (VAE), achievethisbymappinga datamanifoldembedded intheambientspaceto thelowdimensionallatentspaceand manipulatingthemappingtoadjustthepushforwarddistributionsonthelatentspace.

Recently, Optimal Mass Transportation (OMT) theory has been applied to improve VAEs andGANs. The Wasserstein distance has been adapted by GANs as the loss function as the discriminator, such as WGAN (Arjovsky et al., 2017), WGAN-GP (Gulrajani et al., 2017) and RWGAN (Guoetal., 2017). Whenthe supports oftwo distributions have noover- lap,Wassersteindistancestillprovidesasuitable gradientforthegeneratortoupdate.TheWassersteindistanceinVAEis calculatedusinglinearprogrammingmethodinLiuetal.(2018),whichgivesmoretransparentandaccurateresults.

Fig.1showstheoptimalmasstransportation pointofview ofWGAN (Arjovsky etal., 2017). Theambientimagespace isX^,^with^the^real^datadistribution

ν

.ThelatentspaceisZ ^with^much^lower^dimension.^The^generator ^G^can^be^treated as a “decoding map” from the latent space to the sample space, g_θ:Z→X^, ^realized ^by ^a ^deep ^neural ^network ^with parameter θ.Letζ be a ﬁxed distributionin thelatent space, such asuniformdistribution orGaussian distribution.The generator G pushesforwardζ toa distribution

μ

θ =^gθ#ζ inthe ambientspaceX^.^The discriminator D uses thepower ofEuclideandistanceasthecostfunction andcomputestheWassersteindistancebetween

μ

θ and

ν

, Wc(

μ

θ,

ν

),realized byanotherdeepneuralnetworkwithparameterξ.CalculatingtheWassersteindistanceWc(

μ

θ,

ν

)isequivalenttoﬁnding theso-calledKantorovichpotential

ϕ

ξ.Therefore,G improvesthedecodingmapg_θ toapproximate

ν

by g_θ#ζ; D improves theKantorovich potential

ϕ

ξ toincreasetheapproximationaccuracytotheWassersteindistance.Thegenerator Gandthe discriminator D aretrainedalternatively,untilthecompetitionreachesanequilibrium.

Insummary,theGenerativeAdversarialNetworkmodel(GAN)hasnaturalconnectionwiththeOptimalMassTransporta- tion(OMT)theory:

1. IngeneratorG,thegeneratingmap g_θ inGANisequivalenttotheoptimaltransportationmapinOMT;

2. Indiscriminator D,theWassersteindistancebetweendistributionsisequivalenttotheKantorovichpotential

ϕ

ξ. 3. ThealternativetrainingprocessofW-GANisthemin-maxoptimizationofexpectations:

minθ max

ξ

E

z∼ζ

( ϕ

ξ

(

gθ

(

z

))) + E

y∼ν

( ϕ

_ξ^c

(

y

)).

The deepneural networkof D computesthe Wassersteindistanceby the maximizationprocess toapproximate Kan- torovich potentials

ϕ

ξ, parameterized by ξ; the network G calculates the optimal transportation map g_θ by the minimizationprocess,parameterizedbyθ (seeFig.1).

TheGANmodelandtheconvexgeometryareconnectedbytheoptimaltransportationtheory(seeFig.2).

(3)

Geometricinterpretation The Optimal MassTransportation theory hasintrinsic connections withthe convex geometry.A specialtypeofOMT problemisequivalenttotheAlexandrovprobleminconvexgeometry,specifically,findingtheoptimal transportation map with L² cost is equivalent to constructing a convexpolytope withuser prescribed normals andface volumes.The geometricviewleads toapracticalalgorithm,whichfindsthegeneratingmap gθ byaconvexoptimization.

Furthermore,theoptimizationcanbecarriedout usingNewton’smethodwithexplicitgeometric meaning.Thegeometric interpretation also gives the direct relation betweenthe transportation map g_θ for G and the Kantorovich potential

ϕ

ξ forD.

Theseconceptscanbeexplainedusingtheplainlanguageincomputationalgeometry(Edelsbrunner,1987), 1. theKantorovichpotential

ϕ

ξ correspondstothepowerdistance;

2. theoptimaltransportationmap g_θ representsthemappingfromthepowerdiagramtothepowercenters,eachpower cellismappedtothecorrespondingsite.

Imaginaryadversary Inthecurrentwork, weuse OptimalMass Transportationtheory to showthe factthat: by carefully designingthemodelandchoosingspecialdistancefunctionsc,thegeneratormap g_θ andthediscriminator function(Kan- torovichpotential)

ϕ

ξ areequivalent, onecanbe deducedfromtheother bya simpleclosedformula.Therefore,oncethe Kantorovichpotential reachestheoptimum,thegeneratormapcanbe obtaineddirectlywithouttraining.Oneofthedeep neuralnetworksforGorD isredundant,oneofthetrainingprocessesiswasteful.Thecompetitionbetweenthegenerator G andthediscriminator D isunnecessary,andimaginary.

Contributions Themajorcontributionsofthecurrentworkareasfollows:

1. BasedontheconnectionbetweenconvexgeometryandOptimalTransportation,developanexplicitgeometricconstruc- tionforoptimaltransportationmapforthepurposeofGenerativeAdversarialNetworks;

2. Demonstrate in Theorem 3.7that if thecost function c(x,y)=^h(x−^y),where h is a strictly convexfunction, then oncetheoptimaldiscriminatorisobtained,thegeneratorcanbewrittendowninanexplicitformula.Inthissituation, thecompetitionbetweenthediscriminatorandthegeneratorisunnecessaryandthecomputationalarchitecturecanbe simpliﬁed;

3. Proposeanovelframeworkforgenerativemodel,whichusesgeometricconstructionoftheoptimalmasstransportation map;

4. Conductpreliminaryexperimentsfortheproofofconcepts.

Organization The articleis organized asfollows: section 2 explains the Optimal Mass Transportation view of WGANin details; section 3 lists the main theory of OMT; section 4 gives the detailed exposition of Minkowski and Alexandrov theorems in convex geometry, andits close relation with power diagram theory in computational geometry, an explicit computational algorithm is givento solve Alexandrov’sproblem; section 5 analyzes semi-discrete optimal transportation problem, andconnectsAlexandrov problemwiththeoptimaltransportation map;preliminaryexperimentsare conducted forproofofconcept,whicharereportedinsection6.Theworkconcludesinthesection7.

2. OptimaltransportationviewofGAN

Inthissection,theGANmodelisinterpretedfromtheoptimaltransportationpointofview.Weshowthatthediscrimi- natormainlylooksfortheKantorovichpotential.

LetX⊂Rⁿ^be^the^(ambient)^image^space,P(X)betheWassersteinspaceofallprobabilitymeasuresonX^.^Assume^the realdatadistributionis

ν

_∈P(X),inpracticeapproximatedbyanempiricaldistribution

ν :=

¹ n

n j=¹

δ

yj

,

(1)

where y_j∈X,j=¹,. . . ,n aredatasamples,δyj istheDiracfunction.Agenerativemodel producesaparametric familyof probabilitydistributions

μ

θ,θ∈,aMinimumKantorovitchEstimatorforθ isdeﬁnedasanysolutiontotheproblem

min

θ W_c

( μ

θ

, ν ),

where W_c is the Kantorovich cost on P(X) andthe groundcost function c:X×X→R^. ^When ^c îs â ^power ôf ^the Euclideandistance,Wc istheWassersteindistancebetween

μ

and

ν

,

W_c

( μ , ν ) =

^min

ρ∈P(X×X)

⎧ ⎨

⎩

X×X

c

(

x

,

y

)

d

ρ (

x

,

y

)| π

x#

ρ = μ , π

y#

ρ = ν

⎫ ⎬

⎭

⁽²⁾

(4)

where

π

x and

π

y areprojectors,

π

x#and

π

y# aremarginalizationoperators.Inagenerativemodel,theimagesamplesare encoded toa low dimensional latentspace (ora feature space)Z⊂R^m^,^m^n. ^Letζ be a fixed distributionsupported onZ^.A WassersteinGAN(WGAN)producesaparametricmapping g_θ:Z→X^,^whichîs^treatedâsâ^“decoding^map”^from thelatentspaceZ ^to^theôriginalîmage^spaceX^.^gθ pushesζforwardto

μ

θ∈P(X),

μ

θ=^gθ#ζ.TheminimalKantorovich estimatorinWGANisformulatedas

min

θ E

(θ ) :=

^min

θ W_c

(

g_θ_#

ζ, ν ).

Accordingtotheoptimaltransportationtheory (Villani,2003),theKantorovichproblemhasadualformulation

E

(θ ) =

max ϕ,ψ

⎧ ⎨

⎩

Z

ϕ (

g_θ

(

z

))

d

ζ (

z

) +

X

ψ (

y

)

d

ν (

y

) : ϕ (

x

) + ψ (

y

) ≤

c

(

x

,

y

)

⎫ ⎬

⎭

⁽³⁾

Thegradientofthedualenergywithrespecttoθ canbewrittenas

∇

^E

(θ ) =

Z

[∂

θg_θ

(

z

)]

^T

∇ ϕ

(

g_θ

(

z

))

d

ζ (

z

),

where

ϕ

istheoptimalKantorovichpotential.Inpractice,ψ canbereplacedbythec-transform of

ϕ

,deﬁnedas

ϕ

^c

(

y

) :=

inf

x c

(

x

,

y

) − ϕ (

x

).

The function

ϕ

is called the Kantorovichpotential. According to the optimal transportation theory (Villani, 2003, 2008), ψ=

ϕ

^c andsymmetrically

ϕ

₌ψ^c.Since

ν

is discrete,ψ is justdeﬁnedonthe support Y:= {^yi}^of

ν

,andψ^c=

ϕ

.Let ψ(y_i)=ψi,theoptimizationover{ψi}^can^then^beâchievedûsing^stochastic^gradient^descent,âs^{in Genevay}êtâl.^(2016).

InWGAN (Arjovsky etal.,2017),thedualproblemEqn. (3) is solvedby approximatingtheKantorovich potential

ϕ

by the so-called“adversarial”map,

ϕ

ξ :X →R^,^where ξ is representedby adiscriminative deepnetwork.This leadsto the Wasserstein-GANproblem

minθ max ξ

Z

ϕ

ξ

◦

gθ

(

z

)

d

ζ (

z

) +

¹ n

n j=1

ϕ

_ξ^c

(

yj

).

(4)

The generatorproduces gθ,thediscriminatorestimates

ϕ

ξ,by simultaneoustraining,thecompetitionreachestheequilib- rium. In WGAN (Arjovskyet al., 2017), c(x,y)= |^x−^y|^,^then ^the c-transform of

ϕ

ξ equals to −

ϕ

ξ, subjectto

ϕ

ξ being a 1-Lipschitz function. Thisis used to replace

ϕ

^c_ξ by −

ϕ

ξ inEqn. (4) and usedeep network madeof ReLuunits whose Lipschitzconstantisupper-boundedby1.

3. Optimalmasstransporttheory

In this section, we give a briefreview forthe classical optimal masstransportation theory forengineering purposes, neglecting thetechnicalanddelicate aspectsofthetheory,suchastheconditionsfortheexistence ofa feasibleplan,the existence of optimal Kantorovich potentials and their regularities. Formore rigorous and thorough treatments, we refer readerstoVillani’sbooks(Villani,2003,2008).Theorem3.7showstheintrinsicrelationbetweentheWassersteindistance (equivalent to the Kantorovich potential) and the optimal transportation map (equivalent to the Brenier potential), this demonstratesthatoncetheoptimaldiscriminatorisknown,thegeneratorisautomaticallyobtained.Thegamebetweenthe discriminatorandthegeneratorisunnecessary.

The problemofﬁndinga mapthatminimizestheinter-domaintransportationcost whilepreservesmeasurequantities was ﬁrststudiedbyBonnotte(2012) in the18thcentury.Let X andY betwometricspaceswithprobabilitymeasures

μ

and

ν

respectively.Assume X andY haveequaltotalmeasure

X

d

μ =

Y

d

ν .

Definition3.1(Push-forwardmeasure).AmapT:^X→^Y îs^given,îf^forâny^measurable^set^B⊂^Y^,

μ (

T⁻¹

(

B

)) = ν (

B

),

(5)

then

ν

issaidtobethepush-forwardof

μ

byT,andwewrite

ν

=^T#

μ

.

(5)

IfthemappingT:^X→^Y ^isdifferentiable, X andY arethesameEuclideanspaceR^d^,

μ

and

ν

haveLebesguedensities, which are identiﬁedwith themeasures themselves, then Eqn. (5) canbe formulated asthefollowing Jacobian equation,

μ

(x)dx=

ν

(T(x))dT(x), det

(

D T

(

x

)) = μ (

x

)

ν _◦

T

(

x

) .

(6)

Letusdenotethetransportationcostforsending x∈^X ^to ^y∈^Y ^by^c(x,y),thenthetotaltransportationcostisgivenby C

(

T

) :=

X

c

(

x

,

T

(

x

))

d

μ (

x

).

(7)

Problem3.2(Monge’sOptimalMassTransportation, Bonnotte,2012).Givenmeasures

μ

and

ν

,andatransportationcostfunc- tionc:^X×^Y→R^,^ﬁnd^thetransportationmapT:^X→^Y ^that^minimizes^the^totaltransportationcost

(

M P

)

Wc

( μ , ν ) =

inf

T:X→Y

⎧ ⎨

⎩

X

c

(

x

,

T

(

x

))

d

μ (

x

) :

T#

μ = ν

⎫ ⎬

⎭ .

(8) If c is the power of the Euclidean distance, the total transportation cost Wc(

μ

,

ν

) is called the Wassersteindistance betweenthetwomeasures

μ

and

ν

.

3.1. Kantorovich’sapproach

IntheMongeformulationtheinﬁmumisnotattainedingeneral.Inthe1940s,Kantorovichintroducedtherelaxationof Monge’sproblem (Kantorovich,1948).Anystrategysending

μ

onto

ν

canberepresentedby ajointmeasure

ρ

on X×^Y^, suchthatforevery A,BBorelsubsetsofX,Y respectively,

ρ (

A

×

^Y

) = μ (

A

), ρ (

X

×

^B

) = ν (

B

),

(9)

ρ

(A×^B)iscalledatransportationplan,whichrepresentsthesharetobemovedfrom Ato B.Wedenotetheprojectionto X andY as

π

x and

π

yrespectively,then

π

x#

ρ

=

μ

and

π

y#

ρ

=

ν

.Thetotalcostofthetransportationplan

ρ

is

C

( ρ ) :=

X×^Y

c

(

x

,

y

)

d

ρ (

x

,

y

).

(10)

The Monge–Kantorovichproblemconsistsinﬁnding the

ρ

, amongallthe suitable transportationplans withmarginals

μ

and

ν

,minimizingC(

ρ

)inEqn. (10),

(

K P

)

W_c

( μ , ν ) :=

^min

ρ

⎧ ⎨

⎩

X×^Y

c

(

x

,

y

)

d

ρ (

x

,

y

) : π

x#

ρ = μ , π

y#

ρ = ν

⎫ ⎬

⎭

⁽¹¹⁾

When

μ

is adiffusemeasure (i.e.

μ

({^x})=^{0 for}êvery ^x∈^X⁾ ând^c îscontinuous, the infimumof(MP)isequals tothe minimumof(KP).

3.2. Kantorovichdualformulation

BecauseEqn. (11) isalinearprogram,ithasadualformulation,knownastheKantorovichproblem(Villani,2008):

(

D P

)

W_c

( μ , ν ) :=

^max

ϕ,ψ

⎧ ⎨

⎩

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

) : ϕ (

x

) + ψ (

y

) ≤

^c

(

x

,

y

)

⎫ ⎬

⎭

⁽¹²⁾

where

ϕ

_:X→Rândψ:^Y→Râre^real^functions^definedôn ^X ând^Y respectively.Equivalently,wecanreplaceψ bythe c-transformof

ϕ

.

Deﬁnition3.3(c-transform).Givenarealfunction

ϕ

_:X→R^,^thec-transformof

ϕ

isdeﬁnedby

ϕ

^c

(

y

) =

^inf

x∈^X

(

c

(

x

,

y

) − ϕ (

x

)) .

(6)

ThentheKantorovichproblemcanbereformulatedasthefollowingdualproblem:

(

D P

)

Wc

( μ , ν ) :=

^max

ϕ

⎧ ⎨

⎩

X

ϕ (

x

)

d

μ (

x

) +

Y

ϕ

^c

(

y

)

d

ν (

y

)

⎫ ⎬

⎭ ,

(13) anyoptimal

ϕ

wherethe maximumisattainedinEqn. (13) iscalleda Kantorovichpotential.

ϕ

andψ playsa symmetric role inEqn. (12),ψ canbetreatedastheKantorovichpotentialaswell.ComputingtheWassersteindistanceisequivalent toﬁndingaKantorovichpotential.

When X=^Y^, ^for ^L¹ transportation cost c(x,y)= |^x−^y|ⁱⁿ Rⁿ^,^if^the Kantorovich potential

ϕ

is1-Lipschitz, thenits c-transformhasaspecialrelation

ϕ

^c= −

ϕ

.TheWassersteindistanceisgivenby

Wc

( μ , ν ) :=

max ϕ

⎧ ⎨

⎩

X

ϕ (

x

)

d

μ (

x

) −

Y

ϕ (

y

)

d

ν (

y

)

⎫ ⎬

⎭ .

(14) ForL²transportationcostc(x,y)=¹/2|^x−^y|²ⁱⁿRⁿ^,^thec-transformandtheclassicalLegendretransformhavespecial relations.

Deﬁnition3.4.Givenafunction

ϕ

:Rⁿ→R^,îts^Legendre^transformîs^definedâs

ϕ

^∗

(

y

) :=

^sup

x

(

^x

,

y

− ϕ (

x

)) .

(15)

Wecanshowthefollowingrelationholdswhenc=¹/2|^x−^y|²^, 1

2

|

y

|

²

− ϕ

^c

=

1

2

|

x

|

²

− ϕ

_∗

.

(16)

3.3. Brenier’sapproach

Attheendof1980’s,Brenier(1991) discoveredtheintrinsicconnectionbetweenoptimalmasstransportmapandconvex geometry (seealsoforinstanceVillani,2003,Theorem2.12(ii),andTheorem2.32).

Assume X=^Y=Rⁿ^,^functionû:^X→Rîsâ^C² ^continuous^convex^function,^namelyîts^Hessian^matrixîssemi-positive definite.Itsgradientmap∇û:^X→^Y îs^definedâs^x→ ∇û(x).

Theorem3.5(Brenier,1991).Suppose X andY aretheEuclideanspaceRⁿ^,^and^thetransportationcostisthequadraticEuclidean distancec(x,y)= |^x−^y|²^.^If

μ

isabsolutelycontinuousand

μ

and

ν

havefinitesecondordermoments,thenthereexistsaconvex functionu:^X→R^,^such^that^the^gradient^map∇^{u gives}^theûnique^solution^to^the^Monge’s^problem,^where^{u is}^called^Brenier’s potential,∇^{u is}^called^Brenier^mapôr^theôptimal^masstransportationmap.Ingeneral,u isnotunique.

ThistheoremconvertstheMonge’sproblemtosolvingthefollowingMonge–Ampèrepartialdifferentialequation:

det

∂

²u

∂

x_i

∂

x_j

(

x

) = μ (

x

)

ν _{◦ ∇}

u

(

x

) .

(17)

Thefunctionu:^X→R^is^called^the^Brenier^potential.^Brenier^proved^the^polarfactorizationtheorem.

Theorem3.6(Brenierfactorization,Brenier,1991).Suppose X andY aretheEuclideanspaceRⁿ^,

μ

isabsolutelycontinuouswith respecttoLebesguemeasure,amapping

ϕ

:X→Y pushes

μ

forwardto

ν

,

ϕ

#

μ

=

ν

.Thenthereexistsaconvexfunctionu:X→R, suchthat

ϕ = ∇

^u

◦

^s

,

wheres:^X→^{X is}measure-preserving,s#

μ

=

μ

.Furthermore,thisfactorizationisunique.

Thefollowingtheoremiswellknowninoptimaltransportationtheory,theproofcanbefoundinVillani’sbook (Villani, 2003)andinthebookofAmbrosioetal.(2008).WeapplythistheoremtoDeepLearningandshowthatthegeneratorand thediscriminatorinWGANmodelwithL²costareequivalent.Forthecompleteness,wegivethedetailedproofhere.

(7)

Theorem3.7(Generator–discriminatorequivalence).Given

μ

and

ν

onacompactdomain⊂Rⁿ^thereêxistsânôptimal^transport plan

ρ

forthecostc(x,y)=^h(x−^y)withh strictlyconvex.Itisuniqueandoftheform(id,T_#)

μ

,provided

μ

isabsolutelycontinuous and∂isnegligible.Moreover,thereexistsaKantorovichpotential

ϕ

,andT canberepresentedas

T

(

x

) =

^x

− ( ∇

^h

)

⁻¹

( ∇ ϕ (

x

)).

Proof. Assume

ρ

isthejointprobability,satisfyingtheconditions

π

x#

ρ

=

μ

,

π

y#

ρ

=

ν

,(x0,y0)isapointinthesupport of

ρ

,bydeﬁnition

ϕ

^c(y0)=^infx(c(x,y0)−

ϕ

(x)),hence∇x(c(x,y0)−

ϕ

(x))|x=^x0=^0,

∇ ϕ (

x0

) = ∇

xc

(

x0

,

y0

) = ∇

h

(

x0

−

y0

).

Becausehisstrictlyconvex,therefore∇^h^isinvertible, x0

−

^y0

= ( ∇

^h

)

⁻¹

( ∇ ϕ (

x0

)),

hence y0=^x0−(∇^h)⁻¹(∇

ϕ

(x0)). 2 Whenc(x,y)=¹₂|^x−^y|²^,^we^have

T

(

x

) =

x

− ∇ ϕ (

x

) = ∇

x²

2

− ϕ (

x

)

= ∇

u

(

x

).

Inthiscase,theBrenier’spotentialuandtheKantorovich’spotential

ϕ

isrelatedby

u

(

x

) =

^x²

2

− ϕ (

x

).

(18)

Asdiscussedinsection2,inWassersteinGANframework,thediscriminatorD computestheWassersteindistance,which isequivalentto ﬁndtheKantorovich potential

ϕ

;thegeneratorG computestheoptimaltransportationmap ∇u,whichis equivalenttoﬁndtheBrenierpotentialu.Thisshowsthecomputationalresultsofthediscriminatorandthegeneratorare closelyrelated,theresultobtainedbythediscriminatorcanbeuseddirectlybythegenerator,andviceversa.

Corollary3.8.UndertheconditionsofBreniertheorem,theMonge–Kantorovichproblem

minρ

⎧ ⎨

⎩

X×^Y

c

(

x

,

y

)

d

ρ (

x

,

y

) : ( π

x

)

_#

ρ = μ , ( π

y

)

_#

ρ = ν

⎫ ⎬

⎭ ,

isequivalenttotheKantorovichdualproblem,

maxϕ,ψ

⎧ ⎨

⎩

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

) : ϕ (

x

) + ψ (

y

) ≤

^c

(

x

,

y

)

⎫ ⎬

⎭ .

Proof. Thequadraticcostofageneraltransportplancanbewrittenas 1

2

X×Y

|

x

−

y

|

²d

ρ =

¹ 2

X×Y

|

x

|

²d

ρ +

¹ 2

X×Y

|

y

|

²d

ρ −

X×Y

x

,

y

d

ρ

=

¹ 2

X

|

^x

|

²^d

μ (

x

) +

¹ 2

Y

|

^y

|

²^d

ν (

y

) −

X×^Y

^x

,

y

^d

ρ

uptosubstractingthequadraticmomentsof

μ

and

ν

,itisequivalenttominimizethecostc(x,y)= −^x,y^.^Theînverseôf theoptimaltransportationmapT:^X→^Y^,^T⁻¹:^Y→^X îsâlsoôptimal,^by^Briener^theorem,^thereêxistsâ^convex^function v:^Y→R^,^such^that∇^v=^T⁻¹^.Furthermore,thefollowingrelationshold

u

(

x

) =

¹

2

|

x

|

²

− ϕ (

x

),

v

(

y

) =

¹

2

|

y

|

²

− ψ (

y

).

Asimilardecompositionholdsforthedualproblem:

(8)

Fig. 3.Minkowski and Alexandrov theorems for convex polytopes with prescribed normals and areas.

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

) =

X

1

2

|

^x

|

²

−

^u

(

x

)

d

μ (

x

) +

Y

1

2

|

^y

|

²

−

^v

(

y

)

d

ν (

y

)

=

¹ 2

X

|

x

|

²d

μ (

x

) +

¹ 2

Y

|

y

|

²d

ν (

y

) −

⎛

⎝

X

u

(

x

)

d

μ (

x

) +

Y

v

(

y

)

d

ν (

y

)

⎞

⎠

=

¹ 2

X

|

x

|

²d

μ (

x

) +

¹ 2

Y

|

y

|

²d

ν (

y

) −

X×Y

(

u

(

x

) +

v

(

y

))

d

ρ

fromthecondition

ϕ (

x

) + ψ (

y

) ≤

¹ 2

|

^x

−

^y

|

²

weobtain

u

(

x

) +

v

(

y

) =

¹

2

|

x

|

²

− ϕ (

x

) +

¹

2

|

y

|

²

− ψ (

y

)

≥

^x

,

y

Hence

X

ϕ (

x

)

d

μ (

x

) +

Y

ψ (

y

)

d

ν (

y

) ≤

¹ 2

X

|

^x

|

²^d

μ (

x

) +

¹ 2

Y

|

^y

|

²^d

ν (

y

) −

X×^Y

^x

,

y

^d

ρ ₂

4. Convexgeometry

This section introduces MinkowskiandAlexandrov problemsin convexgeometry,which canbe described by Monge–

Ampère equation aswell. Thisintrinsic connection givesa geometric interpretation to optimalmass transportation map withL²transportationcost.

4.1. Alexandrov’stheorem

Minkowski proved theexistence andthe uniqueness of convexpolytope withuserprescribed face normals andareas (seeFig.3leftframe).

Theorem4.1(Minkowski).Supposen₁,...,n_kareunitvectorswhichspanRⁿ^and

ν

1,...,

ν

k>0sothat_k

i=1

ν

in_i=^0.^Thereêxistsâ compactconvexpolytopeP⊂Rⁿ^withêxactlyk codimension-1facesF1,...,F_ksothatniistheoutwardnormalvectortoFiandthe volumeofFiis

ν

i.Furthermore,suchP isuniqueuptoparalleltranslation.

Minkowski’sproofisvariationalandsuggestsanalgorithmtoﬁndthepolytope.Minkowskitheoremforunboundedcon- vexpolytopeswasconsideredandsolvedbyA.D.AlexandrovandhisstudentA.Pogorelov.Inhisbookonconvexpolyhedra (Alexandrov,2005),Alexandrovprovedthefollowingfundamentaltheorem(Theorem7.3.2andtheorem6.4.2inAlexandrov, 2005):

(9)

Theorem4.2(Alexandrov,2005).Supposeisacompactconvexpolytopewithnon-emptyinteriorinRⁿ^,ⁿ1,...,nk⊂Rⁿ⁺¹ ^are distinctk unitvectors,the(n+¹)-thcoordinatesarenegative,and

ν

1,...,

ν

k>0sothatk

i=¹

ν

i=^vol().Thenthereexistsconvex polytopeP⊂Rⁿ⁺¹^with^exactk codimension-1facesF₁,. . . ,F_k,sothatn_iisthenormalvectortoF_iandtheintersectionbetween andtheprojectionofF_iiswithvolume

ν

i.Furthermore,suchP isuniqueuptoverticaltranslation (seeFig.3rightframe).

Alexandrov’sproofisbasedonalgebraictopologyandnon-constructive.Guetal.(2016) gaveavariationalproofforthe generalizedAlexandrovtheoremstatedintermsofconvexfunctions.

Giveny1,. . . ,y_k∈Rⁿ ând^h=(h1,. . . ,h_k)∈R^k^,^we^defineâ^piecewise^linear^convex^functionûh(x)as u_h

(

x

) =

^max^k

i=1

{

^x

,

y_i

+

^hi

} .

The graph of u_h is a convex polytope in Rⁿ⁺¹^, ^the ^projection înduces â ^cell decomposition ofRⁿ^. Êach^cell îsâ ^closed convexpolytope,

W_i

(

h

) =

x

∈ R

ⁿ

|∇

^uh

(

x

) =

^yi

.

Somecells maybeemptyorunbounded.Givena probabilitymeasure

μ

deﬁnedon,the

μ

-volumeofW_i(h)isdeﬁned as

w_i

(

h

) := μ (

W_i

(

h

) ∩ ) =

Wi(h)∩

d

μ .

Theorem4.3(Guetal.,2016).Let beacompactconvexdomaininRⁿ^,{^y1,...,yk}^beâ^setôf^distinct^pointsⁱⁿRⁿând

μ

a probabilitymeasureon.Thenforany

ν

1,...,

ν

k>0with_k

i=¹

ν

i=

μ

(),thereexistsh=(h₁,...,h_k)∈R^k^,ûniqueûp^toâddingâ constant(c,...,c),sothatwi(h)=

ν

i,foralli.Thevectorsh areexactlymaximumpointsoftheconcavefunction

E

(

h

) =

k

i=¹ h_i

ν

_i

₋

h 0

k i=¹

w_i

( η )

d

η

_i (19)

ontheopenconvexset

H

= {

^h

∈ R

^k

|

^wi

(

h

) >

0

,∀

ⁱ

}.

Furthermore,∇^uhminimizesthequadraticcost

|

^x

−

^T

(

x

)|

²^d

μ (

x

)

amongalltransportmapsT#

μ

=

ν

,wheretheDiracmeasure

ν

=k

i=¹

ν

iδ(y−^yi).

Fortheconvenienceofdiscussion,wedeﬁnetheAlexandrov’spotentialasfollows:

Deﬁnition4.4(Alexandrovpotential).Undertheabovecondition,theconvexfunction

A

(

h

) =

h

k

i=¹

w_i

( η )

d

η

i (20)

iscalledtheAlexandrovpotential.

Wedeﬁnetheadmissiblespacefortheheightvectorh

H

:=

h

∈ R

^k

|

^wi

(

h

) >

0

∩

_k

i=¹ h_k

=

⁰

ByBrunn–Minkowskiinequality,wecanshowthatHisaconvexopensetinR^k. FromEqn. (24),itiseasytoshowthefollowingsymmetricrelation:

∂

w_i

(

h

)

∂

h_j

= ∂

w_j

(

h

)

∂

h_i

,

(10)

Fig. 4.GeometricInterpretationtoOptimalTransportMap:Brenierpotentialuh:→R,Legendredualu^∗_h,optimaltransportationmap∇uh:Wi(h)→yi, powerdiagramV^,^weighted^DelaunaytriangulationT^.

thereforethedifferentialform

ω _:=

k i=¹

w_i

(

h

)

dh_i

isaclosedoneform.BecauseH^is^simply^connected,

ω

isexact,hencetheAlexandrov potentialEqn. (20) iswell-deﬁned.

ThegeometricmeaningofA(h)isthevolumeundertheupperenvelope.Detailscanbefoundin Guetal.(2016).

4.2. Powerdiagram

Alexandrov’stheoremhascloserelationwiththeconventionalpowerdiagram.Wecanusepowerdiagramalgorithmto solvetheAlexandrov’sproblem.

Deﬁnition4.5(powerdistance).Givenapoint yi∈Rⁿ ^with^a^power^weightψi,thepowerdistanceisgivenby pow

(

x

,

yi

) =

¹

2

|

x

−

yi

|

²

− ψ

i

.

Definition4.6(powerdiagram).Given weighted points {(y1,ψ1),(y2,ψ2),. . . ,(y_k,ψ_k)}^, ^the^power ^diagram îs^the ^cell ^de- compositionofRⁿ^,^denotedâsV(ψ),

R

ⁿ

=

k i=¹

W_i

(ψ ),

whereeachcellisaconvexpolytope

W_i

(ψ ) = {

^x

∈ R

ⁿ

|

^pow

(

x

,

y_i

) ≤

^pow

(

x

,

y_j

),∀

^j

}.

TheweightedDelaunaytriangulation,denotedasT(ψ),isthePoincarédualtothepowerdiagram,ifWi(ψ)∩^Wj(ψ)= ∅ thenthereisanedgeconnectingyi andyj intheweightedDelaunaytriangulation (seeFig.5).

Notethatpow(x,y_i)≤^pow(x,y_j)isequivalentto

^x

,

y_i

+

¹

2

(ψ

i

− |

^yi

|

²

) ≥

^x

,

y_j

+

¹

2

(ψ

j

− |

^yj

|

²

),

let

h_i

=

¹

/

2

(ψ

i

− |

^yi

|

²

),

(21)

thenpow(x,yi)≤^pow(x,yj)isequivalentto^x,yi+^hi≥ ^x,yj+^hj.Theupperenvelopeoftheplanes{^x,yi+^hi=⁰}^is thegraphoftheconvexfunction

u_h

(

x

) =

^max

i

{

^x

,

y_i

+

^hi

}.

⁽²²⁾

Theprojectionofthegraphofu_h givesthepowerdiagramV(ψ).

(11)

Fig. 5.Powerdiagram(blue)anditsdualweightedDelaunaytriangulation(black),thepowerweightψi equaltothesquareofradiusri(redcircle).(For interpretationofthecolorsintheﬁgure(s),thereaderisreferredtothewebversionofthisarticle.)

4.3. Convexoptimization

Now,wecanusethepowerdiagramtoexplainthegradientandtheHessianoftheenergyEqn. (19),bydeﬁnition

∇

^E

(

h

) = ( ν

1

−

^w1

(

h

), ν

2

−

^w2

(

h

), · · · , ν

k

−

^wk

(

h

))

^T

.

(23) TheHessianmatrixisgivenbypowerdiagram–weightedDelaunaytriangulation,foradjacentcellsinthepowerdiagram,

∂

²E

(

h

)

∂

hi

∂

hj

= ∂

w_i

(

h

)

∂

hj

= − μ (

Wi

(

h

) ∩

^Wj

(

h

) ∩ )

|

^yj

−

^yi

|

⁽²⁴⁾

Supposeedgeei j isintheweightedDelaunaytriangulation,connectingyiandyj.IthasauniquedualcellDi j inthepower diagram,then

∂

w_i

(

h

)

∂

h_j

= − μ (

D_{i j}

)

|

^ei j

| ,

thevolumeratiobetweenthedualcells.ThediagonalelementintheHessianis

∂

²E

(

h

)

∂

h²_i

= ∂

w_i

(

h

)

∂

h_i

= −

j=ⁱ

∂

w_i

(

h

)

∂

h_j

.

(25)

Therefore,inorderto solveAlexandrov’sproblemto constructtheconvexpolytopewithuserprescribednormalandface volume,wecanoptimizetheenergyinEqn. (19) usingclassicalNewton’smethoddirectly.

Let’sobserve the convexfunction u^∗_h, its graphis the convexhullC(^h). Then the discrete Hessian determinantof u^∗_h assignseach vertexv ofC(h)thevolumeoftheconvexhullofthegradientsofu^∗_h attop-dimensionalcells adjacentto v.

Therefore,solvingAlexandrov’sproblemisequivalenttosolveadiscreteMonge–Ampèreequation.

5. Semi-discreteoptimalmasstransport

In thissection, we solvethe semi-discrete optimal transportation problemfromgeometric point ofview. Thisspecial caseisusefulinpractice.

Supposetheclosureof

μ

,isacompactconvexsubsetoftheEuclideanspace X,

=

^supp

μ _{= {}

x

∈

^X

| μ (

x

) >

0

} .

ThespaceY isdiscretizedtoY= {^y1,y₂,· · ·,y_k}^with^Dirac^measure

ν

=k

j=¹

ν

jδyj.Thetotalmassisequal

d

μ (

x

) =

k i=1

ν

i

.

5.1. Kantorovichdualapproach

WedeﬁnethediscreteKantorovichpotentialψ:^Y→R^,ψj:=ψ(yj),hereY isadiscretesetin X=Rⁿ^,^then

Y

ψ

d

ν ₌

k

j=¹

ψ

_j

ν

_j

.

(26)

(12)

Thec-transformationofψ isgivenby

ψ

^c

(

x

) =

^min

1≤j≤k

{

^c

(

x

,

y_j

) − ψ

j

}.

⁽²⁷⁾

Thisinducesacelldecompositionof X,

X

=

k i=1

Wi

(ψ ),

whereeachcellisgivenby W_i

(ψ ) =

x

∈

^X

|

^c

(

x

,

y_i

) − ψ

i

≤

^c

(

x

,

y_j

) − ψ

j

, ∀

¹

≤

^j

≤

^k

.

AccordingtothedualformulationoftheWassersteindistanceEqn. (13) andintegrationEqn. (26),wedeﬁnetheenergy E

(ψ ) =

X

ψ

^cd

μ ₊

Y

ψ

d

ν

thenobtaintheformula

E

(ψ ) =

k i=1

ψ

i

( ν

i

−

^wi

(ψ )) +

k

j=1

W_j(ψ )

c

(

x

,

yj

)

d

μ ,

(28)

where wi(ψ)isthemeasureofthecell Wi(ψ), w_i

(ψ ) = μ (

W_i

(ψ )) =

W_i(ψ )

d

μ (

x

).

(29)

ThentheWassersteindistancebetween

μ

and

ν

equalsto W_c

( μ , ν ) =

^max

ψ E

(ψ ).

5.2. Brenier’sapproach

Kantorovich’sdualapproachisforgeneralcostfunctions.WhenthecostfunctionistheL² distancec(x,y)=¹/2|^x−^y|²^, wecanapplyBrenier’sapproachdirectly.

Wedefine aheightvector h=(h1,h2,· · ·,hk)∈Rⁿ^,^consistingôf^k ^real^numbers.^Forêach ^yi∈^Y^,^we^constructâ^hyper- planedefinedon X,

π

i(h): ^x,y_i+^hi=^0.^We^deﬁne^the^Brenier^potential^function^as

u_h

(

x

) =

^max^k

i=1

{

^x

,

y_i

+

^hi

},

⁽³⁰⁾

then uh(x) is a convexfunction. The graph of uh(x) is an inﬁnite convexpolyhedron with supporting planes

π

i(h). The projectionofthegraphinducesapolygonalpartitionof,

=

k i=1

Wi

(

h

),

(31)

whereeachcellWi(h)istheprojectionofafacetofthegraphofuh onto,

W_i

(

h

) = {

^x

∈

^X

|∇

^uh

(

x

) =

^yi

} ∩ .

(32)

ThemeasureofWi(h)isgivenby w_i

(

h

) =

W_i(h)

d

μ .

(33)

Theconvexfunctionu_h oneachcell W_i(h)isalinearfunction

π

i(h),therefore,thegradientmap

∇

^uh

:

^Wi

(

h

) →

^yi

,

i

=

¹

,

2

, · · · ,

k

,

(34) maps each W_i(h) to a single point y_i. Accordingto Alexandrov’stheorem, andthe Gu–Luo–Yautheorem, we obtain the followingcorollary: