• Aucun résultat trouvé

Gromov-Wasserstein Distances between Gaussian Distributions

N/A
N/A
Protected

Academic year: 2021

Partager "Gromov-Wasserstein Distances between Gaussian Distributions"

Copied!
23
0
0

Texte intégral

(1)

HAL Id: hal-03197398

https://hal.archives-ouvertes.fr/hal-03197398v2

Preprint submitted on 16 Apr 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Distributions

Antoine Salmona, Julie Delon, Agnès Desolneux

To cite this version:

Antoine Salmona, Julie Delon, Agnès Desolneux. Gromov-Wasserstein Distances between Gaussian

Distributions. 2021. �hal-03197398v2�

(2)

Antoine Salmona

1

, Julie Delon

2

, Agn` es Desolneux

1

1

ENS Paris-Saclay, CNRS, Centre Borelli UMR 9010

2

Universit´ e de Paris, CNRS, MAP5 UMR 8145 and Institut Universitaire de France April 16, 2021

Abstract

The Gromov-Wasserstein distances were proposed a few years ago to compare distributions which do not lie in the same space. In particular, they offer an interesting alternative to the Wasserstein distances for comparing probability measures living on Euclidean spaces of different dimensions. In this paper, we focus on the Gromov-Wasserstein distance with a ground cost defined as the squared Euclidean distance and we study the form of the optimal plan between Gaussian distributions. We show that when the optimal plan is restricted to Gaussian distributions, the problem has a very simple linear solution, which is also solution of the linear Gromov-Monge problem. We also study the problem without restriction on the optimal plan, and provide lower and upper bounds for the value of the Gromov-Wasserstein distance between Gaussian distributions.

Keywords—

optimal transport, Wasserstein distance, Gromov-Wasserstein distance, Gaussian distributions.

MSC 2020 subject classifications : 60E99, 68T09, 62H25, 49Q22.

1 Introduction

Optimal transport (OT) theory has become nowadays a major tool to compare probability distri- butions. It has been increasingly used over the last past years in various applied fields such as economy [11], image processing [20, 21], machine learning [4, 5] or more generally data science [18], with applications to domain adaptation [9] or generative models [3, 12], to name just a few.

Given two probability distributions µ and ν on two Polish spaces (X , d

X

) and (Y, d

Y

) and a positive lower semi-continuous cost function c : X × Y R

+

, optimal transport focuses on solving the following optimization problem

inf

π∈Π(µ,ν)

Z

X ×Y

c(x, y)dπ(x, y), (1.1)

where Π(µ, ν) is the set of measures on X × Y with marginals µ and ν. When X and Y are equal and Euclidean, typically R

d

, and c(x, y) = kx − yk

p

with p ≥ 1, Equation (1.1) induces a distance over the set of measures with finite moment of order p, known as the p-Wasserstein distance W

p

:

W

p

(µ, ν) =

π∈Π(µ,ν)

inf Z

Rd×Rd

kx − yk

p

dπ(x, y)

1p

, (1.2)

or equivalently

W

pp

(µ, ν) = inf

X∼µ,Y∼ν

E[kX − Y k

p

], (1.3)

The authors acknowledge support from the French Research Agency through the PostProdLEAP project (ANR-19- CE23-0027-01) and the MISTIC project (ANR-19-CE40-005).

(3)

where the notation X ∼ µ means that X is a random variable with probability distribution µ. It is known that Equation (1.1) always admits a solution [27, 26, 22] , i.e. the infinimum is always reached.

Moreover, in the case of W

2

, it is known [6] that if µ is absolutely continuous, then the optimal transport plan π

is unique and has the form π

= (Id, T )#µ where # is the push-forward operator and T : R

d

R

d

is an application called optimal transport map, satisfying T #µ = ν . The 2-Wasserstein distance W

2

admits a closed-form expression [10, 24] when µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

) are two Gaussian measures with means m

0

R

d

, m

1

R

d

and covariance matrices Σ

0

R

d×d

and Σ

1

R

d×d

, that is given by

W

22

(µ, ν) = km

1

− m

0

k

2

+ tr

Σ

0

+ Σ

1

− 2 Σ

1 2

0

Σ

1

Σ

1 2

0

12

, (1.4)

where for any symmetric semi-definite positive M , M

12

is the unique symmetric semi-definite positive squared root of M . Moreover, if Σ

0

is non-singular, then the optimal transport map T is affine and is given by

∀x ∈ R

d

, T (x) = m

1

+ Σ

1 2

0

Σ

1 2

0

Σ

1

Σ

1 2

0

12

Σ

1 2

0

(x − m

0

), (1.5)

and the corresponding optimal transport plan π

is a degenerate Gaussian measure.

For some applications such as shape matching or word embedding, an important limitation of classic OT lies in the fact that it is not invariant to rotations and translations and more generally to isometries. Moreover, OT implies that we can define a relevant cost function to compare spaces X and Y. Thus, when for instance µ is a measure on R

2

and ν a measure on R

3

, it is not straightforward to design a cost function c : R

2

× R

3

R and so one cannot define easily an OT distance to compare µ with ν. To overcome these limitations, several extensions of OT have been proposed [1, 7, 17]. Among them, the most famous one is probably the Gromov-Wasserstein (GW) problem [16]: given two Polish spaces (X , d

X

) and (Y, d

Y

), each endowed respectively with probability measures µ and ν , and given two measurable functions c

X

: X × X R and c

Y

: Y × Y R , it aims at finding

GW

p

(c

X

, c

Y

, µ, ν) =

inf

π∈Π(µ,ν)

Z

X2×Y2

|c

X

(x, x

0

) − c

Y

(y, y

0

)|

p

dπ(x, y)dπ(x

0

, y

0

)

1p

, (1.6) with p ≥ 1. As for classic OT, it can be shown that Equation (1.6) always admits a solution (see [25]). The GW problem can be seen as a quadratic optimization problem in π, as opposed to OT, which is a linear optimization problem in π. It induces a distance over the space of metric measure spaces (i.e. the triplets (X , d

X

, µ)) quotiented by the strong isomorphisms [18]

1

. The fundamental metric properties of GW

p

have been studied in depth in [23, 16, 8]. In the Euclidean setting, when X = R

m

, Y = R

n

, with m not necessarily being equal to n, and for the natural choice of costs c

X

= k.k

2Rm

and c

Y

= k.k

2Rn

, where k.k

Rm

means the Euclidean norm on R

m

, it can be easily shown that GW

2

(c

X

, c

Y

, µ, ν) is invariant to isometries. With a slight abuse of notations, we will note in the following GW

2

(µ, ν) instead of GW

2

(k.k

2Rm

, k.k

2Rn

, µ, ν).

In this work, we focus on the problem of Gromov-Wasserstein between Gaussian measures. Given µ = N (m

0

, Σ

0

), with m

0

R

m

and with covariance matrix Σ

0

R

m×m

, and ν = N (m

1

, Σ

1

), with m

1

R

n

and with covariance matrix Σ

1

R

n×n

, we aim to solve

GW

22

(µ, ν) = inf

π∈Π(µ,ν)

Z Z

kx − x

0

k

2Rm

− ky − y

0

k

2Rn

2

dπ(x, y)dπ(x

0

, y

0

), (GW) or equivalently

GW

22

(µ, ν) = inf

X,X0,Y,Y0∼π⊗π

E h

kX − X

0

k

2Rm

− kY − Y

0

k

2Rn

2

i

, (1.7)

1We say that (X, dX, µ) is strongly isomorphic to (Y, dY, ν) if it exists a bijection φ :X Y such that φis an isometry (dY(φ(y), φ(y0)) =dX(x, x0)), andφ#µ=ν.

(4)

where for x, x

0

R

m

and y, y

0

R

n

, (π ⊗ π)(x, x

0

, y, y

0

) = π(x, y)π(x

0

, y

0

). In particular, can we find equivalent formulas to (1.4) and (1.5) in the case of Gromov-Wasserstein? In Section 2, we derive an equivalent formulation of the Gromov-Wasserstein problem. This formulation is not specific to Gaussian measures but to all measures with finite order 4 moment. It takes the form of a sum of two terms depending respectively on co-moments of order 2 and 4 of π. Then in Section 3, we derive a lower bound by simply optimizing both terms separately. In Section 4, we show that the problem restricted to Gaussian optimal plans admits an explicit solution and this solution is closely related to Principal Components Analysis (PCA). In Section 5, we study the tightness of the bounds found in the previous sections and we exhibit a particular case where we are able to compute exactly the value of GW

22

(µ, ν) and the optimal plan π

which achieves it. Finally, Section 6 discusses the form of the solution in the general case, and the possibility that the optimal plan between two Gaussian distributions is always Gaussian.

Notations

We define in the following some of the notations that will be used in the paper.

• The notation Y ∼ µ means that Y is a random variable with probability distribution µ.

• If µ is a positive measure on X and T : X → Y is an application, T #µ stands for the push-forward measure of µ by T , i.e. the measure on Y such that ∀A ∈ Y , (T#µ)(A) = µ(T

−1

(A)).

• If X and Y are random vectors on R

m

and R

n

, we denote Cov(X, Y ) the matrix of size m × n of the form E

(X − E[X])(YE[Y ])

T

.

• the notation tr(M ) denotes the trace of a matrix M .

• kM k

F

stands for the Frobenius norm of a matrix M , i.e. kM k

F

= p

tr(M

T

M ).

• rk(M ) stands for the rank of a matrix M .

• I

n

is the identity matrix of size n.

• I ˜

n

stands for any matrix of size n of the form diag((±1)

i≤n

)

• Suppose n ≤ m. For A ∈ R

m×m

, we denote A

(n)

R

n×n

the submatrix containing the n first rows and the n first columns of A.

• Suppose n ≤ m. For A ∈ R

n×n

, we denote A

[m]

R

m×m

the matrix of the form A 0

0 0

.

• We denote S

n

(R) the set of symmetric matrices of size n, S

n+

(R) the set of semi-definite positive matrices, and S

n++

(R) the set of definite positive matrices.

1

n,m

= (1)

i≤n,j≤m

denotes the matrix of ones with n rows and m columns.

• kxk

Rn

stands for the Euclidean norm of x ∈ R

n

. We will denote kxk when there is no ambiguity about the dimension.

• hx, x

0

i

n

stands for the Euclidean inner product in R

n

between x and x

0

.

(5)

2 Derivation of the general problem

In this section, we derive an equivalent

2

formulation of problem (GW) which takes the form of a functional of co-moments of order 2 and 4 of π. This formulation is not specific to Gaussian measures but to all measures with finite 4

th

order moment.

Theorem 2.1. Let µ be a probability measure on R

m

with mean vector m

0

R

m

and covariance matrix Σ

0

R

m×m

such that R

kxk

4

dµ < +∞ and ν a probability measure on R

n

with mean vector m

1

and covariance matrix Σ

1

R

n×n

such that R

kyk

4

dν < +∞. Let P

0

, D

0

and P

1

, D

1

be respective diagonalizations of Σ

0

(= P

0

D

0

P

0T

) and Σ

1

(= P

1

D

1

P

1T

). Let us define T

0

: x ∈ R

m

7→ P

0T

(x − m

0

) and T

1

: y ∈ R

n

7→ P

1T

(y − m

1

). Then problem (GW) is equivalent to problem

sup

X∼T0#µ,Y∼T1

X

i,j

Cov(X

i2

, Y

j2

) + 2kCov(X, Y )k

2F

, (supCOV) where X = (X

1

, X

2

, . . . , X

m

)

T

, Y = (Y

1

, Y

2

, . . . , Y

n

)

T

, and k.k

F

is the Frobenius norm.

This theorem is a direct consequence of the two following intermediary results.

Lemma 2.1. We denote O

m

= {O ∈ R

m×m

| O

T

O = I

m

} the set of orthogonal matrices of size m. Let µ and ν be two probability measures on R

m

and R

n

. Let T

m

: x 7→ O

m

x+x

m

and T

n

: y 7→ O

n

y +y

n

be two affine applications with x

m

R

m

, O

m

O

m

, y

n

R

n

, and O

n

O

n

. Then GW

2

(T

m

#µ, T

n

#ν) = GW

2

(µ, ν).

Lemma 2.2 (Vayer, 2020, [25]). Suppose there exist some scalars a, b, c such that c

X

(x, x

0

) = akxk

2Rm

+bkx

0

k

2Rm

+chx, x

0

i

m

, where h., .i

m

denotes the inner product on R

m

, and c

Y

(y, y

0

) = akyk

2Rn

+ bky

0

k

2Rn

+ chy, y

0

i

n

. Let µ and ν be two probability measures respectively on R

m

and R

n

. Then

GW

22

(c

X

, c

Y

, µ, ν) = C

µ,ν

− 2 sup

π∈Π(µ,ν)

Z(π), (2.1)

where C

µ,ν

= R

c

2X

dµdµ + R

c

2Y

dνdν − 4ab R

kxk

2Rm

kyk

2Rn

dµdν and

Z(π) = (a

2

+ b

2

) Z

kxk

2Rm

kyk

2Rn

dπ(x, y) + c

2

Z

xy

T

dπ(x, y)

2

F

+ (a + b)c Z

kxk

2Rm

hE

Y∼ν

[Y ], yi

n

+ kyk

2Rn

hE

X∼µ

[X], xi

m

dπ(x, y).

(2.2)

Proof of theorem 2.1. Using Lemma 2.1, we can focus without any loss of generality on centered Gaussian measures with diagonal covariance matrices. Thus, defining T

0

: x ∈ R

m

7→ P

0T

(x − m

0

) and T

1

: y ∈ R

n

7→ P

1T

(y − m

1

) and then applying Lemma 2.2 on GW

2

(T

0

#µ, T

1

#ν) with a = 1, b = 1, and c = 2 while remarking that the last term in Equation (2.2) is null because E

X∼T0

[X] = 0 and E

Y∼T1

[Y ] = 0, it comes that problem (GW) is equivalent to

sup

π∈Π(T0#µ,T1#ν)

Z

kxk

2Rm

kyk

2Rn

dπ(x, y) + 2 Z

xy

T

dπ(x, y)

2

F

. (2.3)

Since T

0

#µ and T

1

#ν are centered, we have that R

xy

T

dπ(x, y) = Cov(X, Y ) where X ∼ T

0

#µ and Y ∼ T

1

#ν . Furthermore, it can be easily computed that

Z

kxk

2Rm

kyk

2Rn

dπ(x, y) = X

i,j

Cov(X

i2

, Y

j2

) + X

i,j

E[X

i2

]E[Y

j2

]. (2.4) Since the second term doesn’t depend on π, we get that problem (GW) is equivalent to problem (supCOV).

2We say that two optimization problems are equivalent if the solutions of one are readily obtained from the solutions of the other, and vice-versa.

(6)

The left-hand term of (supCOV) is closely related to the sum of symmetric co-kurtosis and so depends on co-moments of order 4 of π. On the other hand, the right-hand term is directly related to the co-moments of order 2 of π. For this reason, problem (supCOV) is hard to solve because it involves to optimize simultaneously the co-moments of order 2 and 4 of π and so to know the probabilistic rule which links them. This rule is well-known when π is Gaussian (Isserlis lemma) but this is not the case in general to the best of our knowledge and there is no reason for the solution of problem (supCOV) to be Gaussian.

3 Study of the general problem

Since problem (supCOV) is hard to solve because of its dependence on co-moments of order 2 and 4 of π, one can optimize both terms seperately in order to find a lower bound of GW

2

(µ, ν). In the rest of the paper we suppose for convenience and without any loss of generality that n ≤ m.

Theorem 3.1. Suppose without any loss of generality that n ≤ m. Let µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

) be two Gaussian measures on R

m

and R

n

. Let P

0

, D

0

and P

1

, D

1

be the respective diagona- lizations of Σ

0

(= P

0

D

0

P

0T

) and Σ

1

(= P

1

D

1

P

1T

) which sort eigenvalues in decreasing order. We suppose that Σ

0

is non-singular. A lower bound for GW

2

(µ, ν) is then

GW

22

(µ, ν) ≥ LGW

22

(µ, ν), (3.1)

where

LGW

22

(µ, ν) = 4 (tr(D

0

) − tr(D

1

))

2

+ 4 (kD

0

k

F

− kD

1

k

F

)

2

+ 4kD

(n)0

− D

1

k

2F

+ 4

kD

0

k

2F

− kD

0(n)

k

2F

.

(LGW)

The proof of this theorem is divided in smaller intermediary results. First we recall the Isserlis lemma (see [14]), which allows to derive the co-moments of order 4 of a Gaussian distribution as a fonction of its co-moments of order 2.

Lemma 3.1 (Isserlis, 1918, [14]). Let X be a zero-mean Gaussian vector of size n. Then

∀i, j, k, l ≤ n, E[X

i

X

j

X

k

X

l

] = E[X

i

X

j

]E[X

k

X

l

] + E[X

i

X

k

]E[X

j

X

l

] + E[X

i

X

l

]E[X

j

X

k

]. (3.2) Then we derive the following general optimization lemmas. The proofs of these two lemmas are postponed to the Appendix (Section 8).

Lemma 3.2. Suppose that n ≤ m. Let Σ be a semi-definite positive matrix of size m + n of the form Σ =

Σ

0

K K

T

Σ

1

,

with Σ

0

∈ S

m++

(R), Σ

1

∈ S

n+

(R) and K ∈ R

m×n

. Let P

0

, D

0

and P

1

, D

1

be the respective diagonalisations of Σ

0

(= P

0T

D

0

P

0

) and Σ

1

(= P

1T

D

1

P

1

) which sort the eigenvalues in decreasing order. Then

max

Σ1−KTΣ−10 K∈Sn+(R)

kKk

2F

= tr(D

(n)0

D

1

), (3.3) and is achieved at any

K

= P

0T

I ˜

n

(D

0(n)

)

12

D

1 2

1

0

m−n,n

!

P

1

, (opKl2)

where I ˜

n

is of the form diag((±1)

i≤n

).

(7)

Lemma 3.3. Suppose that n ≤ m. Let Σ be a semi-definite positive matrix of size m + n of the form:

Σ =

Σ

0

K K

T

Σ

1

,

where Σ

0

∈ S

m++

(R), Σ

1

∈ S

n+

(R), and K ∈ R

m×n

. Let A ∈ R

n×m

be a matrix with rank 1. Then max

Σ1−KTΣ−10 K∈Sn+(R)

tr(KA) = q

tr(AΣ

0

A

T

Σ

1

). (3.4)

In particular, if Σ

0

= diag(α) and Σ

1

= diag(β) with α ∈ R

m

and β ∈ R

n

, then max

Σ1−KTΣ−10 K∈S+n(R)

tr(K1

n,m

) = p

tr(Σ

0

)tr(Σ

1

), (3.5)

with 1

n,m

= (1)

i≤n,j≤m

, and is achieved at

K

= αβ

T

p tr(Σ

0

)tr(Σ

1

) . (opKl1)

Proof of theorem 3.1. For µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

), we note P

0

, D

0

and P

1

, D

1

the respective diagonalizations of Σ

0

and Σ

1

which sort the eigenvalues in decreasing order. Let T

0

: x ∈ R

m

7→

P

0T

(x − m

0

) and T

1

: y ∈ R

n

7→ P

1T

(y − m

1

). For π ∈ Π(T

0

#µ, T

1

#ν) and (X, Y ) ∼ π, we denote Σ the covariance matrix of π and ˜ Σ the covariance matrix of (X

2

, Y

2

) with X

2

:= ([XX

T

]

i,i

)

i≤m

and Y

2

:= ([Y Y

T

]

j,j

)

j≤n

. Using Isserlis lemma to compute Cov(X

2

, X

2

) and Cov(Y

2

, Y

2

), it comes that Σ and ˜ Σ are of the form:

Σ =

D

0

K K

T

D

1

and Σ = ˜

2D

20

K ˜ K ˜

T

2D

21

. (3.6)

In order to find a supremum for each term of (supCOV), we use a necessary condition for π to be in Π(T

0

#µ, T

1

#ν ) which is that Σ and ˜ Σ must be semi-definite positive. To do so, we can use the equivalent condition that the Schur complements of Σ and ˜ Σ, namely D

1

− K

T

D

−10

K and 2D

12

1

2

K ˜

T

D

−20

K, must also be semi-definite positive. Remarking that the left-hand term in (supCOV) can ˜ be rewritten tr( ˜ K1

m,n

), we have the two following inequalities

sup

X∼T0#µ,Y∼T1

X

i,j

Cov(X

i2

, Y

j2

) ≤ max

2D2112KTD−20 K∈Sn+(R)

tr( ˜ K1

n,m

), (3.7) and

sup

X∼T0#µ,Y∼T1

kCov(X, Y )k

2F

≤ max

D1−KTD0−1K∈S+n(R)

kKk

2F

. (3.8)

Applying Lemmas 3.2 and 3.3 on both right-hand terms, we get on one hand:

sup

X∼T0#µ,Y∼T1

kCov(X, Y )k

2F

≤ tr(D

(n)0

D

1

), (3.9) and on the other hand:

sup

X∼T0#µ,Y∼T1

X

i,j

Cov(X

i2

, Y

j2

) ≤ 2 q

tr(D

02

)tr(D

21

)

= 2kD

0

k

F

kD

1

k

F

.

(3.10)

(8)

Furthermore, using Lemma 2.2, it comes that GW

22

(µ, ν) = C

µ,ν

− 4 sup

X∼T0#µ,Y∼T1

 X

i,j

Cov(X

i2

, Y

j2

) + X

i,j

E[X

i2

]E[X

j2

] + 2 kCov(X, Y )k

2F

≥ C

µ,ν

− 8 q

tr(D

02

)tr(D

21

) − 4tr(D

0

)tr(D

1

) − 8tr(D

(n)0

D

1

),

(3.11)

where

C

µ,ν

= E

U∼N(0,2D0)

[kU k

4Rm

] + E

V∼N(0,2D1)

[kV k

4Rn

] − 4E

X∼µ

[kX k

2Rm

]E

Y∼ν

[kY k

2Rn

]

= 8tr(D

02

) + 4(tr(D

0

))

2

+ 8tr(D

12

) + 4(tr(D

1

))

2

− 4tr(D

0

)tr(D

1

). (3.12) Finally

GW

22

(µ, ν) ≥ 4(tr(D

0

))

2

+ 4(tr(D

1

))

2

− 8tr(D

0

)tr(D

1

) + 8tr(D

20

) + 8tr(D

12

)

− 8 q

tr(D

20

)tr(D

12

) − 8tr(D

(n)0

D

1

)

= LGW

22

(µ, ν).

(3.13)

Inequalities (3.7) and (3.8) become equalities if one can exhibit a plan π such that Σ or ˜ Σ are such that kKk

2F

or tr( ˜ K1

n,m

) are maximized. This is the case for (3.8) where we can exhibit the Gaussian plan π

such that K is of the form (opKl2) but it seems however more tricky to exhibit such a plan for inequality (3.7). Indeed, it can be shown that it doesn’t exist a Gaussian plan such that ˜ K is of the form (opKl1).

The lower bound LGW

2

is reached if it exists a plan π which optimizes both terms simultaneously.

This seems rather unlikely because if a probability distribution has its covariance matrix such that K is of the form (opKl2), then it is necessarily Gaussian thanks to the equality case in Cauchy-Schwarz:

if D

0

= diag(α) and D

1

= diag(β) with α ∈ R

m

and β ∈ R

n

, and if π has its covariance matrix such that K is of the form (opKl2), then for all i ≤ n, Cov(X

i

, Y

i

) = ± √

α

i

β

i

and Y

i

depends linearly in X

i

. As an outcome, π is Gaussian and we can compute, using Isserlis lemma, that tr( ˜ K1

n,m

) = 2tr(D

0

D

1

) and so ˜ K cannot be of the form (opKl1). However, we didn’t prove that the solution of the form (opKl2) is unique so it may exist another solution which doesn’t imply that π has to be Gaussian.

4 Problem restricted to Gaussian transport plans

In this section, we study the following problem, where we constrain the optimal transport plan to be Gaussian.

GGW

22

(µ, ν) = inf

π∈Π(µ,ν)∩Nm+n

Z Z

kx − x

0

k

2Rm

− ky − y

0

k

2Rn

2

dπ(x, y)dπ(x

0

, y

0

), (GaussGW) where N

m+n

is the set of Gaussian measures on R

m+n

. We show the following main result.

Theorem 4.1. Suppose without any loss of generality that n ≤ m. Let µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

) be two Gaussian measures on R

m

and R

n

. Let P

0

, D

0

and P

1

, D

1

be the respective diagona- lizations of Σ

0

(= P

0

D

0

P

0T

) and Σ

1

(= P

1

D

1

P

1T

) which sort eigenvalues in decreasing order. We suppose that Σ

0

is non-singular (µ is not degenerate). Then problem (GaussGW) admits a solution of the form π

= (I

m

, T )#µ with T affine of the form

∀x ∈ R

m

, T (x) = m

1

+ P

1

AP

0T

(x − m

0

). (4.1)

(9)

where A ∈ R

n×m

is written

A =

I ˜

n

D

112

(D

(n)0

)

12

0

n,m−n

,

where I ˜

n

is of the form diag((±1)

i≤n

). Moreover

GGW

22

(µ, ν) = 4(tr(D

0

) − tr(D

1

))

2

+ 8kD

(n)0

− D

1

k

2F

+ 8

kD

0

k

2F

− kD

(n)0

k

2F

. (GGW)

Proof. This theorem is a direct consequence of Isserlis lemma 3.1: indeed, the left term in equation (supCOV) can be in that case rewritten 2kCov(X, Y )k

2F

and so problem (GaussGW) is equivalent to

sup

X∼T0#µ,Y∼T1

kCov(X, Y )k

2F

. (4.2) Applying Lemma 3.2, we can exhibit a Gaussian optimal plan π

∈ Π(µ, ν) with covariance matrix Σ of the form:

Σ =

Σ

0

K

K

∗T

Σ

1

, (4.3)

with

K

= P

0T

I ˜

n

(D

0(n)

)

12

D

112

0

m−n,n

!

P

1

. (4.4)

Thus, using the equality case in Cauchy-Schwarz, we can exhibit an optimal transport map T of the form

∀x ∈ R

m

, T (x) = m

1

+ P

1

AP

0T

(x − m

0

), (4.5) with

A =

I ˜

n

D

112

(D

(n)0

)

12

0

n,m−n

,

where ˜ I

n

is of the form diag((±1)

i≤n

). Moreover, using Lemmas 2.2 and 3.2, it comes that GGW

22

(µ, ν) = C

µ,ν

− 16 sup

X∼T0#µ,Y∼T1

kCov(X, Y )k

2F

= 8tr(D

20

) + 4(tr(D

0

))

2

+ 8tr(D

21

) + 4(tr(D

1

))

2

− 4tr(D

0

)tr(D

1

) − 16tr(D

(n)0

D

1

)

= 4(tr(D

0

) − tr(D

1

))

2

+ 8tr

(D

0(n)

− D

1

)

2

+ 8

tr(D

02

) − tr((D

(n)0

)

2

) .

(4.6)

Link with Gromov-Monge The previous result generalizes Theorem 4.2.6 in [25], which studies the solutions of the linear Gromov-Monge problem between Gaussian distributions

inf

T#µ=ν,T linear

Z Z

kx − x

0

k

2Rm

− kT (x) − T (x

0

)k

2Rn

2

dµ(x)dν(x

0

). (4.7) Indeed, solutions of (4.7) necessarily provide Gaussian transport plans π = (I

m

, T )#µ if T is linear.

Conversely, Theorem 4.1 shows that restricting the optimal plan to be Gaussian in Gromov-Wasserstein

between two Gaussian distributions yields an optimal plan of the form π = (I

m

, T )#µ with a linear

T, whatever the dimensions m and n of the two Euclidean spaces.

(10)

Link with Principal Component Analysis We can easily draw connections between GGW

22

and PCA. Indeed, we can remark that the optimal plan can be derived by performing PCA on both distributions µ and ν in order to obtain distributions ˜ µ and ˜ ν with zero mean vectors and diagonal covariance matrices with eigenvalues in decreasing order (˜ µ = T

0

#µ and ˜ ν = T

1

#ν), then by keeping only the n first components in ˜ µ and finally by deriving the optimal transport plan which achieves W

22

between the obtained truncated distribution and ˜ ν. In other terms, noting P

n

: R

m

R

n

the linear mapping which, for x ∈ R

m

keeps only its n first components (n-frame), T

W2

the optimal transport map such that π

W2

= (I

n

, T

W2

)#P

n

#˜ µ achieves W

2

(P

n

#˜ µ, ν ˜ ), it comes that the optimal plan π

GGW2

which achieves GGW

2

(˜ µ, ν) can be written ˜

π

GGW2

= (I

m

, I ˜

n

#T

W2

#P

n

)#˜ µ. (4.8) An example of π

GGW2

can be found in Figure 1 when m = 2 and n = 1.

Figure 1: transport plan π

GGW2

solution of problem (GaussGW) with m = 2 and n = 1. In that case, π

GGW2

is the degenerate Gaussian distribution supported by the plan of equation y = T

W2

(x), where T

W2

is the classic W

2

optimal transport map when the distributions are rotated and centered first.

Case of equal dimensions When m = n, the optimal plan π

GGW2

which achieves GGW

2

(µ, ν) is closely related to the optimal transport plan π

W2

= (I

m

, T

W2

)#T

0

#µ. Indeed, π

GGW2

can be simply derived by applying the transformations T

0

and T

1

to respectively µ and ν , then by computing π

W2

between T

0

#µ and T

1

#ν, and finally by applying the inverse transformations T

0−1

and T

1−1

. In other terms, π

GGW2

can be written

π

GGW2

= (I

m

, T

1−1

# ˜ I

n

#T

W2

#T

0

)#µ. (4.9) An example of transport between two Gaussians measures in dimension 2 in Figure 2.

As illustrated in Figure 3, the GGW

2

optimal transport map T

GGW2

defined in Equation (4.1) is not equivalent to the W

2

optimal transport map T

W2

defined in (1.5) even when the dimensions m and n are equal. More precisely, it Σ

0

and Σ

1

can be diagonalized in the same orthonormal basis with eigenvalues in the same order (decreasing or increasing), then T

W2

and T

GGW2

are equivalent (top of Figure 3). On the other hand, if Σ

0

and Σ

1

can be diagonalized in the same orthonormal basis but with eigenvalues not in the same order, T

W2

and T

GGW2

will have very different behaviors (bottom of Figure 3). Between those two extreme cases, we can say that the closer the columns of P

0

will be collinear to the columns of P

1

(with the eigenvalues in decreasing order), the more T

W2

and T

GGW2

will tend to have similar behaviors (middle of Figure 3).

(11)

Figure 2: Solution of (GaussGW) between two Gaussians measures in dimension 2. First the distributions are centered and rotated. Then a classic W

2

transport is applied between the two aligned distributions.

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

10.0 7.5 5.0 2.5 0.0 2.5 5.0 7.5 10.0

Figure 3: Comparison between W

2

and GGW

2

mappings between empirical distributions. Left: 2D

source distribution (colored) and target distribution (transparent). Middle: resulting mapping of

Wasserstein T

W2

. Right: resulting mapping of Gaussian Gromov-Wasserstein T

GGW2

. The colors are

added in order to visualize where each sample has been sent.

(12)

Link with Gromov-Wasserstein with inner product as cost function If µ and ν are centered Gaussian measures, let us consider the following problem

GW

22

(h., .i

m

, h., .i

n

, µ, ν) = inf

π∈Π(µ,ν)

Z Z

(hx, x

0

i

m

− hy, y

0

i

n

)

2

dπ(x, y)dπ(x

0

, y

0

). (innerGW) Notice that the above problem is not restricted to Gaussian plans, but the following proposition shows that in fact its solution is Gaussian.

Proposition 4.1. Suppose m ≤ n. Let µ = N (0, Σ

0

) and ν = N (0, Σ

1

) be two centered Gaussian measures respectively on R

m

and R

n

. Then the solution of problem (GaussGW) exhibited in theorem 4.1 is also solution of problem (innerGW).

Proof. The proof of this proposition is a direct consequence of lemma 2.2: indeed, applying it with a = 0, b = 0, and c = 1, it comes that problem (innerGW) is equivalent to

sup

π∈Π(µ,ν)

Z

xy

T

dπ(x, y)

2

F

. (4.10)

Since µ and ν are centered, it comes that problem (innerGW) is equivalent to sup

X∼µ,Y∼ν

kCov(X, Y )k

2F

. (4.11) Applying Lemma 3.2, it comes that the solution exhibited in Theorem 4.1 is also solution of problem (innerGW).

Since GGW

2

is the Gromov-Wasserstein problem restricted to Gaussian transport plan, it is clear that (GaussGW) is an upper bound of (GW). Combining this result with Theorem 3.1, we get the following simple but important result.

Proposition 4.2. If µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

) and Σ

0

is non-singular, then

LGW

22

(µ, ν) ≤ GW

22

(µ, ν) ≤ GGW

22

(µ, ν). (4.12)

5 Tightness of the bounds and particular cases

5.1 Bound on the difference

Proposition 5.1. Suppose without loss of generality that n ≤ m, if µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

), then

GGW

22

(µ, ν) − LGW

22

(µ, ν) ≤ 8kΣ

0

k

F

1

k

F

1 − 1

√ m

. (5.1)

To prove this proposition, we will use the following technical result (the proof is postponed to the Appendix (Section 8)):

Lemma 5.1. Let u ∈ R

m

and v ∈ R

m

be two unit vectors with non-negative coordinates ordered in decreasing order. Then

u

T

v ≥ 1

√ m , (5.2)

with equality if u = (

1m

,

1m

, . . . )

T

and v = (1, 0, . . . )

T

.

(13)

Proof of Proposition 5.1. By subtracting (LGW) from (GGW), it comes that GGW

22

(µ, ν) − LGW

22

(µ, ν) = 8

kD

0

k

F

kD

1

k

F

− tr(D

(n)0

D

1

)

= 8

kD

0

k

F

kD

1[m]

k

F

− tr(D

0

D

[m]1

) ,

(5.3)

where D

[m]1

=

D

1

0

0 0

R

m×m

. Noting α ∈ R

m

and β ∈ R

m

the vectors of eigenvalues of D

0

and D

[m]1

, it comes

GGW

22

(µ, ν) − LGW

22

(µ, ν) = 8(kαkkβk − α

T

β) = 8kαkkβk(1 − u

T

v), (5.4) where u =

kαkα

and v =

kβkβ

. Applying lemma 5.1, we get directly that

GGW

22

(µ, ν) − LGW

22

(µ, ν) ≤ 8kD

0

k

F

kD

[m]1

k

F

1 − 1

√ m

.

= 8kΣ

0

k

F

1

k

F

1 − 1

√ m

.

(5.5)

The difference between GGW

22

(µ, ν) and LGW

22

(µ, ν) can be seen as the difference between the right and left terms of the Cauchy-Schwarz inequality applied to the two vectors of eigenvalues α ∈ R

m

and β ∈ R

m

. The difference is maximized when the vectors α and β are the least collinear possible.

This happens when the eigenvalues of D

0

are all equal and n = 1 or ν is degenerate of true dimension 1. On the other hand , this difference is null when α and β are collinear. Between those two extremal cases, we can say that the difference between GGW

22

(µ, ν) and LGW

22

(µ, ν) will be relatively small if the last m−n eigenvalues D

0

are small compared to the n first eigenvalues and if the n first eigenvalues are close to be proportional to the eigenvalues of D

1

. An example in the case where m = 2 and n = 1 can be found in Figure 4.

0.0 0.2 0.4 0.6 0.8 1.0

2 0

2 4 6 8 10

12 GGW2

LGW2

0.0 0.2 0.4 0.6 0.8 1.0

2 9

10 11 12 13 14 15

16 GGW2

LGW2

0.0 0.2 0.4 0.6 0.8 1.0

2 880

900 920 940

960 GGW2

LGW2

Figure 4: plot of GGW

22

(µ, ν) and LGW

22

(µ, ν) in function of α

2

for µ = N (0, diag(α)), ν = N (0, β

1

), α = (α

1

, α

2

)

T

, for (α

1

, β

1

) = (1, 1) (left), (α

1

, β

1

) = (1, 2) (middle), (α

1

, β

1

) = (1, 10) (right). One can easily compute using (GGW) and (LGW) that GGW

22

(µ, ν) = 12α

22

+ 8α

2

1

− β

1

) + 12(α

1

−β

1

)

2

and LGW

22

(µ, ν) = 12α

22

+ 8α

2

1

− β

1

) − 4 p

α

22

+ α

21

β

1

+ 12(α

1

− β

1

)

2

+ 8α

1

β

1

.

5.2 Explicit case

As seen before, the difference between GGW

22

(µ, ν) and LGW

22

(µ, ν), with µ = N (m

0

, Σ

0

) and ν =

N (m

1

, Σ

1

), is null when the two vectors of eigenvalues of Σ

0

and Σ

1

(sorted in decreasing order) are

collinear. When we suppose Σ

0

non-singular, it implies that m = n and that the eigenvalues of Σ

1

are

proportional to the eigenvalues of Σ

0

(rescaling). This case includes the more particular case where

m = n = 1. In that case µ = N (m

0

, σ

20

) and ν = N (m

1

, σ

12

), because σ

1

is always proportional to σ

0

.

(14)

Proposition 5.2. Suppose m = n. Let µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

) two Gaussian measures on R

m

= R

n

. Let P

0

, D

0

and P

1

, D

1

be the respective diagonalizations of Σ

0

(= P

0

D

0

P

0T

) and Σ

1

(=

P

1

D

1

P

1T

) which sort eigenvalues in non-increasing order. Suppose Σ

0

is non-singular and that it exists a scalar λ ≥ 0 such that D

1

= λD

0

. In that case, GW

22

(µ, ν) = GGW

22

(µ, ν) = LGW

22

(µ, ν) and the problem admits a solution of the form (I

m

, T )#µ with T affine of the form:

∀x ∈ R

m

, T (x) = m

1

+ √

λP

1

I ˜

m

P

0T

(x − m

0

), (5.6) where I ˜

m

of the form diag((±1)

i≤m

). Moreover

GW

22

(µ, ν) = (λ − 1)

2

4(tr(Σ

0

))

2

+ 8kΣ

0

k

2F

. (5.7)

Proof. From (5.3), we have

GGW

22

(µ, ν) − LGW 2

2

(µ, ν) = 8 (kD

0

k

F

kD

1

k

F

− tr(D

0

D

1

)) . (5.8) Noting α ∈ R

m

and β ∈ R

m

the eigenvalues vectors of D

0

and D

1

, it comes

GGW

22

(µ, ν) − LGW 2

2

(µ, ν) = 8(kαkkβk − α

T

β). (5.9) Since it exists λ ≥ 0 such that D

1

= λD

0

, we have β = λα, and so α

T

β = kαkkβ k. Thus GGW

22

(µ, ν)−

LGW

22

(µ, ν) = 0 and using Proposition 4.2, we get that GW

22

(µ, ν) = GGW

22

(µ, ν) = LGW

22

(µ, ν).

We get (5.6) and (5.7) by simply reinjecting in (4.1) and (GGW).

Corollary 5.1. Let µ = N (m

0

, σ

02

) and ν = N (m

1

, σ

12

) be two Gaussian measures on R. Then GW

22

(µ, ν) = 12 σ

20

− σ

21

2

, (5.10)

and the optimal transport plan π

has the form (I

1

, T )#µ with T affine of the form:

∀x ∈ R, T (x) = m

1

± σ

1

σ

0

(x − m

0

). (5.11)

Thus, the solution of W

22

(µ, ν) is also solution of GW

22

(µ, ν).

5.3 Case of degenerate measures

In all the results exposed above, we have supposed Σ

0

non-singular, which means that µ is not degenerate. Yet, if Σ

0

is not full rank, one can easily extend the previous results thanks to the following proposition.

Proposition 5.3. Let µ = N (0, D

0

) and ν = N (0, D

1

) be two centered Gaussian measures on R

m

and R

n

with diagonal covariance matrices D

0

and D

1

with eigenvalues in decreasing order. We denote r = rk(D

0

) the rank of D

0

and we suppose that r < m. Let us define P

r

= I

r

0

r,m−r

R

r×m

. Then GW

22

(µ, ν) = GW

22

(P

r

#µ, ν), GGW

22

(µ, ν) = GGW

22

(P

r

#µ, ν), and LGW

22

(µ, ν) = LGW

22

(P

r

#µ, ν).

Proof. For r < m, we denote Γ

r

(R

m

) the set of vectors x = (x

1

, . . . , x

m

)

T

of R

m

such that x

r+1

=

· · · = x

m

= 0. For π ∈ Π(µ, ν), one can remark that for any borel set A ⊂ R

m

r Γ

r

(R

m

), and any borel set B ⊂ R

n

, we have π(A, B) = 0 and so

GW

22

(µ, ν) = inf

π∈Π(µ,ν)

Z

Rm×Rn

Z

Rm×Rn

(kx − x

0

k

2Rm

− ky − y

0

k

2Rm

)

2

dπ(x, y)dπ(x

0

, y

0

)

= inf

π∈Π(µ,ν)

Z

Γr(Rm)×Rn

Z

Γr(Rm)×Rn

(kx − x

0

k

2Rm

− ky − y

0

k

2Rm

)

2

dπ(x, y)dπ(x

0

, y

0

)

= inf

π∈Π(µ,ν)

Z

Γr(Rm)×Rn

Z

Γr(Rm)×Rn

(kP

r

(x − x

0

)k

2Rr

− ky − y

0

k

2Rm

)

2

dπ(x, y)dπ(x

0

, y

0

)

(5.12)

(15)

Now, observe that for π ∈ Π(µ, ν), (P

r

, I

n

)#π ∈ Π(P

r

#µ, ν). It follows that GW

22

(µ, ν) ≤ inf

π∈Π(Pr#µ,ν)

Z

Rr×Rn

Z

Rr×Rn

(kx − x

0

k

2Rr

− ky − y

0

k

2Rm

)

2

dπ(x, y)dπ(x

0

, y

0

)

= GW

22

(P

r

#µ, ν).

(5.13)

Conversely, since µ has no mass outside of Γ

r

(R

m

), P

rT

#P

r

#µ = µ, which implies that for π ∈ Π(P

r

#µ, ν), (P

rT

, I

n

)#π ∈ Π(µ, ν). It follows that

GW

22

(P

r

#µ, ν) = inf

π∈Π(Pr#µ,ν)

Z

Rr×Rn

Z

Rr×Rn

(kx − x

0

k

2Rr

− ky − y

0

k

2Rm

)

2

dπ(x, y)dπ(x

0

, y

0

)

= inf

π∈Π(Pr#µ,ν)

Z

Rr×Rn

Z

Rr×Rn

(kP

rT

(x − x

0

)k

2Rr

− ky − y

0

k

2Rm

)

2

dπ(x, y)dπ(x

0

, y

0

)

≤ inf

π∈Π(µ,ν)

Z

Rm×Rn

Z

Rm×Rn

(kx − x

0

k

2Rr

− ky − y

0

k

2Rm

)

2

dπ(x, y)dπ(x

0

, y

0

)

≤ GW

22

(µ, ν).

(5.14) The exact same reasoning can be made in the case of GGW

2

. Morover, it can be easily seen when looking at (LGW) that LGW

22

(µ, ν) = LGW

22

(P

r

#µ, ν).

Thus, when Σ

0

is not full rank, one can apply Proposition 5.3 and consider directly the Gromov- Wasserstein distance between the projected (non-degenerate) measure P

r

#µ on R

r

and ν and so Proposition 4.2 still holds when µ is degenerate.

In the case of GGW

2

, an explicit optimal transport plan can still be exhibited. In the following, we denote r

0

and r

1

the ranks of Σ

0

and Σ

1

, and we suppose without loss of generality that r

0

≥ r

1

, but this time not necessarily that m ≥ n. If µ = N (m

0

, Σ

0

) and ν = N (m

1

, Σ

1

) are two Gaussian measures on R

m

and R

n

, and (P

0

, D

0

) and (P

1

, D

1

) are the respective diagonalizations of Σ

0

(= P

0

D

0

P

0T

) and Σ

1

(= P

1

D

1

P

1T

) which sort the eigenvalues in decreasing order, an optimal transport plan which achieves GGW

2

(µ, ν) is of the form π

= (I

m

, T )#µ with

∀x ∈ R

m

, T (x) = m

1

+ P

1

AP

0T

(x − m

0

), (5.15) where A ∈ R

n×m

is of the form

A =

I ˜

r1

(D

(r11)

)

12

(D

(r01)

)

12

0

r1,m−r1

0

n−r1,r1

0

n−r1,m−r1

,

where ˜ I

r1

is any matrix of the form diag((±1)

i≤r1

).

6 Behavior of the empirical solution

To complete the previous study, we perform a simple experiment to illustrate the behavior of the solution of the Gromov Wasserstein problem. In this experiment, we draw independently k samples (X

j

)

j≤k

and (Y

i

)

i≤k

from respectively µ = N (0, diag(α)) and ν = N (0, diag(β )) with α ∈ R

m

and β ∈ R

n

. Then we compute the Gromov-Wasserstein distance between the two histograms X and Y with the algorithm proposed in [19] using the Python Optimal Transport library

3

. In Figure 5, we plot the first coordinates of the samples Y

i

in fonction of the the first coordinate of the samples X

j

they have been assigned to by the algorithm (blue dots). We draw also the line of equation y = ± √ βx to compare with the theorical solution of the Gaussian restricted problem (orange line) for k = 2000,

3The library is accessible here: https://pythonot.github.io/index.html

(16)

Figure 5: plot of the first coordinate of samples Y

i

in fonction of the the first coordinate of their assigned samples X

j

(blue dots) and line of equation y = ± √

βx (orange line) for k = 2000, α = (1, 0.1)

T

and β = 2 (top left), k = 2000, α = (1, 0.1)

T

and β = (2, 0.3)

T

(top right), k = 2000, α = (1, 0.1, 0.01)

T

and β = 2 (middle left), k = 7000, α = (1, 0.3) and β = 2 (middle right), k = 7000, α = (1, 0.1)

T

, and β = (2, 1)

T

(bottom left), and k = 7000 and α = (1, 0.3, 0.1) and β = 2 (bottom right).

α = (1, 0.1)

T

and β = 2 (top left), k = 2000, α = (1, 0.1)

T

and β = (2, 0.3)

T

(top right), k = 2000,

α = (1, 0.1, 0.01)

T

and β = 2 (middle left), k = 7000, α = (1, 0.3) and β = 2 (middle right), k = 7000,

α = (1, 0.1)

T

, and β = (2, 1)

T

(bottom left), and k = 7000 and α = (1, 0.3, 0.1) and β = 2 (bottom

right). Observe that the empirical solution seems to be behaving exactly in the same way as the

theoretical solution exhibited in theorem 4.1 as soon as α and β are close to be collinear. However,

when α and β are further away from collinearity, determining the behavior of the empirical solution

becomes more complex. Solving Gromov-Wasserstein numerically, even approximately, is a particularly

hard task, therefore we cannot conclude if the empirical solution does not behave in the same way as

Références

Documents relatifs

The results and the proof will be given in the two settings of a smooth Riemannian manifold and of a more general metric measure space, more precisely in the setting introduced in [ 6

7: Impact of the number of sample M and the number of iterations S for SaGroW on the GW distance estimation and computational time, for two sets of 500 points sampled from two

Our main application consists in the proof of the exact law of the iterated log- arithm for the Hermite variations of a fractional Brownian motion in the critical case3.

[1] Fredrik Abrahamsson. Strong L 1 convergence to equilibrium without entropy conditions for the Boltzmann equation. Entropy dissipation and long-range interactions. The

Unifying both distances, the Fused Gromov–Wasserstein distance was proposed in a previous work in [18] and used in the discrete setting to encode, in a single OT formulation,

L’augmentation des constantes de vitesse de dégradation du MeO, RMe, BBP et BBT, en présence des deux catalyseurs étudiés avec l’augmentation de la température

It can be observed that only in the simplest (s = 5) linear relaxation there is a small difference between the instances with Non-linear and Linear weight functions: in the latter

Our main contributions are: (i) The use of the Gromov-Hausdorff dis- tance to quantify the relative contributions of each hierarchy of segmentation, (ii) The introduction of