• Aucun résultat trouvé

A short proof on the rate of convergence of the empirical measure for the Wasserstein distance

N/A
N/A
Protected

Academic year: 2021

Partager "A short proof on the rate of convergence of the empirical measure for the Wasserstein distance"

Copied!
7
0
0

Texte intégral

(1)

HAL Id: hal-03117283

https://hal.inria.fr/hal-03117283

Preprint submitted on 21 Jan 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A short proof on the rate of convergence of the

empirical measure for the Wasserstein distance

Vincent Divol

To cite this version:

Vincent Divol. A short proof on the rate of convergence of the empirical measure for the Wasserstein distance. 2021. �hal-03117283�

(2)

A short proof on the rate of convergence of the

empirical measure for the Wasserstein distance

Vincent Divola

aUniversit´e Paris-Saclay and Inria Saclay, France

Abstract

We provide a short proof that the Wasserstein distance between the empirical measure of a n-sample and the estimated measure is of order n−1/d, if the measure has a lower and upper bounded density on the d-dimensional flat torus.

For 1 ≤ p < ∞, let Wp be the p-Wasserstein distance between measures, defined for two probability measures µ, ν with finite pth moments supported on a metric space (Ω, ρ) by

Wp(µ, ν) := inf

π∈Π(µ,ν)Cp(π)

1/p, (1)

where Π(µ, ν) is the set of transport plans between µ and ν, that is the set of probability measures on Ω × Ω, with first marginal µ and second marginal

ν, and Cp(π) = RR ρ(x, y)pdπ(x, y) is the cost of the plan π. We define the distance Wby replacing the quantity Cp(π)1/p by the π-essential supremum of ρ.

Let µ be a probability measure on some metric space (Ω, ρ), and let

µn be the empirical measure associated to a n-sample X1, . . . , Xn of law

µ. The question of studying rates of convergence between µ and µn for Wasserstein distances Wp has attracted a lot of attention over recent years (see e.g. [5, 6]). If no bounds on the density are assumed, then the quantity EWp(µn, µ) is known to be bounded by a quantity of order n

−1

2p + n−1d when

Ω is a d-dimensional domain, and this bound is tight (see e.g [5]). For p = ∞, Nicol´as Garc´ıa Trillos and Dejan Slepˇcev [6] have shown that EW(µn, µ)

(3)

is of order (log n/n)1/d (for d ≥ 3) in the case where µ has a density f

which is lower bounded and upper bounded on some convex domain Ω. As

Wp ≤ W, the same rate also holds for any 1 ≤ p ≤ ∞. This exhibits the

following phenomenon: when 2p > d, the problem of reconstructing µ for the Wasserstein distance is strictly harder if no bounds on the underlying density are assumed.

In this note, we propose to give a short proof of the fact that EWp(µn, µ) .

n−1/d(for d ≥ 3) for bounded densities. We restrict to the case where Ω is the

d-dimensional flat torus Ω in order to avoid complications due to boundary

effects. Let P0 be the set of probability distributions on Ω, having a density

f satisfying fmin≤ f ≤ fmax for some fmax ≥ fmin > 0.

Theorem. Let µ ∈ P0 and 1 ≤ p < ∞. Then, there exists a constant C

such that EWp(µn, µ) ≤ C        n−1/d if d ≥ 3, (log n)1/2n−1/2 if d = 2, n−1/2 if d = 1. (2)

The standard approach for bounding the distance Wp(µn, µ) consists in precisely assessing the masses given by the measures µn and µ on dyadic partitions of the domain Ω (see e.g. [6]). We propose to take a different route by relying on a result from [3] which asserts that the Wasserstein distance is controlled by the pointed negative Sobolev distance when comparing mea-sures having lower bounded densities. The proof is then completed by using tools from Fourier analysis.

We also note that minimax results from [7] (proven for measures on the cube) can be straightforwardly adapted to the setting of the flat torus. In particular, those results imply that the rates exhibited in the theorem are optimal on the class P0 (up to a logarithmic factor for d = 2).

The proof

As Wp ≥ Wq if p ≥ q, we may assume that p ≥ 2. The proof of the theorem is heavily based on the following result of optimal transport theory, appearing in [3, 2]. Let pbe the conjugate exponent of p. For φ ∈ Lp with R

φ = 0, introduce the pointed negative Sobolev norm

kφkH˙−1 p := sup Z φψ, k∇ψkLp∗ ≤ 1  , (3) 2

(4)

where the supremum is taken over all smooth functions ψ defined on Ω. Lemma 1. Let µ, ν be two measures on Ω having densities f, g. Assume that

f ≥ fmin. Then,

Wp(µ, ν) ≤ pf

1/p−1

min kf − gkH˙−1

p . (4)

Let K be a smooth radial nonnegative function withR

K = 1, supported

on the unit ball and, for h > 0, let Kh = h−dK(·/h). Let µn,hbe the measure having density Kh ∗ µn on Ω, i.e. the density at a point x ∈ Ω is given by

fn,h(x) :=Pnj=1Kh(x − Xj)/n.

Lemma 2. We have Wp(µn, µn,h) ≤ C0h, where C0 = (R |x|pK(x)dx)1/p.

Proof. Consider the unique transport plan πj between Kh ∗ δXj and δXj.

The cost of πj is equal to R

|x − Xj|pKh(x − Xj)dx = hp R

|x|pK(x)dx. The measure 1nPn

j=1πj is a transport plan between µn,h and µn, with associated cost equal to hpR |x|pK(x)dx. By Lemmas 1and 2, EWp(µn, µ) ≤ EWp(µn, µn,h) + EWp(µn,h, µ) ≤ C0h + pfmin1/p−1Ekfn,h− f kH˙−1 p . (5)

To further bound this quantity, we use the following relation between the negative Sobolev norm and the Fourier decomposition of a signal. Given

φ ∈ Lp, we let ˆφ be the sequence of Fourier coefficients of φ (indexed by Zd) and denote by∨the inverse Fourier transform. Let |x| :=Pd

i=1|xi| for x ∈ Rd. A multiplier s is a bounded sequence indexed by Zd such that the operator

φ ∈ Lp 7→ (s ˆφ)∈ Lp is bounded. A sufficient condition for a sequence to be a multiplier is given by Mikhlin multiplier theorem [1, Theorem 3.6.7, Theorem 5.2.7].

Lemma 3. Let s : Rd→ R be a smooth function such that |∂αs(ξ)| ≤ B|ξ|−|α|

for every multiindex α with |α| ≤ d/2 + 1. Then, the sequence (s(m))m∈Zd is a multiplier with corresponding operator of norm smaller than Cp,dB.

Let a : Rd → R be a smooth function with a(ξ) = 1/|ξ| for |ξ| ≥ 1 and

a(0) = 0. Let A be the associated multiplier operator (by Lemma3) defined by A(φ) = (a ˆφ)∨.

Lemma 4. Let φ ∈ Lp with R

φ = 0. Then, kφkH˙−1

(5)

Proof. Let ψ : Ω → R be a smooth function with k∇ψkLp∗ ≤ 1. As ˆφ(0) = 0, we have Z φψ = X m∈Zd ˆ φ(m) ˆψ(m) = X m∈Zd a(m) ˆφ(m)|m| ˆψ(m) ≤ kA(φ)kLpk(| · | ˆψ)k Lp∗. Note that | · | = Pd

i=1εiei, where ei(m) = mi and εi(m) is the sign of mi. As

εi is a multiplier (by Lemma3), we have k(|·| ˆψ)∨kLp∗ ≤ c

Pd

i=1k(eiψ)ˆ ∨kLp∗ = cPd

i=1k∂iψkLp∗ ≤ C5.

Hence, to conclude, it suffices to bound EkA(fn,h − f )kLp ≤ kA(fhf )kLp + EkA(fn,h− fh)kLp.

Bound of the bias. Let κ be the Fourier transform of K. As K is smooth

and compactly supported, κ is a multiplier by Lemma 3. Also, the function

M = a · (κ − 1) is a multiplier as a product of multiplier. Remark that

ˆ

fh− ˆf = (κ(h · ) − 1) ˆf , so that A(fh− f ) = h(M (h · ) ˆf )∨. As the multiplier norms of M and M (h · ) are equal [1, Theorem 3.6.7], we have

kA(fh− f )kLp ≤ hC6kf kLp ≤ hC6fmax. (6) Bound of the fluctuations. Eventually, we bound

EkA(fn,h− fh)kLp ≤ E

h

kA(fn,h− fh)kpLp

i1/p

. (7)

The random variable A(fn,h) is equal to n−1Pnj=1Uj, where Uj := A(Kh

δXj) = A(Kh)( · − Xj) and EUj = A(fh). We control the expectation of the Lp-norm of the sum of i.i.d. centered functions thanks to the next lemma, which is a direct consequence of Rosenthal inequality [4].

Lemma 5. Let U1, . . . , Un be i.i.d. functions on Lp. Then, the expectation E 1 n Pn i=1(Ui− EUi) p Lp is smaller than Cpn−p/2 Z  E|U1(x)|2 p/2 dx + Cpn1−p Z E [|U1(x)|p] dx. (8)

Let vh be the sequence in `p∗(Zd) defined by vh(m) = a(m)κ(hm) for

m ∈ Zd. By a change of variable, we obtain E [|U1(x)|p] =

Z

f (y)|A(Kh)(x − y)|pdy ≤ fmaxkA(Kh)kpLp ≤ fmaxkvhk

p `p∗,

(9) 4

(6)

where, at the last line, we applied Hausdorff-Young inequality [8, Section XII.2]. The last step consists in bounding kvhkp

`p∗. We separate this quantity

into two parts: S0 =P|hm|≤1|vh(m)|p

and S1 =P|hm|>1|vh(m)|p

. To bound

S0, we use that κ is bounded on the unit ball, so that S0 is of the order

X |hm|≤1 |m|−p.        hp−d if d ≥ 3 or (d = 2 and p > 2) − log h if p = d = 2 1 if d = 1. (10)

To bound S1, we use that |κ(hm)| ≤ Cγ|hm|−γ for any γ > 0. Choosing γ such that γp+ p> d, we obtain that S1 is of the order

h−γp∗ X

|hm|>1

|m|−γp−p. hp−d. (11)

Putting together inequalities (8), (10) and (11) yields that, for h of the order

n−1/d, the expectation EkA(fn,h− fh)kLp is of the order

         h/nhd. n−1/d if d ≥ 3, q (− log h)/n . (log n)1/2n−1/2 if d = 2, n−1/2 if d = 1. (12)

We conclude the proof by putting together the estimates (5), (6) and (12).

Remark 1. For p = 2, Mikhlin multiplier theorem can be replaced by

Parse-val’s theorem, further simplifying the proof.

Remark 2. A similar proof shows that the risk of the measure µn,h satisfies EWp(µn,h, µ) . n−(s+1)/(2s+d) if f is assumed to be of regularity s. Indeed, we can exploit the regularity of s to show that, if κ has sufficiently many zero derivatives at 0, then the bias term is of order hs+1, while the fluctu-ation terms is bounded in the same way. We then obtain the desired rate by choosing h of the order n−1/(2s+d). This rate is in accordance with the minimax result of [7], where a modified wavelet density estimator is shown to attain the same rate of convergence.

References

(7)

[2] Sloan Nietert, Ziv Goldfeld, and Kengo Kato. From smooth Wasserstein distance to dual Sobolev norm: Empirical approximation and statistical applications, 2021.

[3] R´emi Peyre. Comparison between W2 distance and ˙H−1 norm, and

lo-calization of Wasserstein distance. ESAIM. Control, Optimisation and

Calculus of Variations, 24(4), 2018.

[4] Haskell P Rosenthal. On the subspaces of Lp (p > 2) spanned by se-quences of independent random variables. Israel Journal of Mathematics, 8(3):273–303, 1970.

[5] Shashank Singh and Barnab´as P´oczos. Minimax distribution estimation in Wasserstein distance. arXiv preprint arXiv:1802.08855, 2018.

[6] Nicol´as Garcia Trillos and Dejan Slepˇcev. On the rate of convergence of empirical measures in ∞-transportation distance. Canadian Journal of

Mathematics, 67(6):1358–1383, 2015.

[7] Jonathan Weed and Quentin Berthet. Estimation of smooth densities in Wasserstein distance. In Conference on Learning Theory, pages 3118– 3119, 2019.

[8] A. Zygmund and R. Fefferman. Trigonometric Series. Cambridge Math-ematical Library. Cambridge University Press, 2003.

Références

Documents relatifs

This quantity is usually estimated with the plug-in estimator, defined via a discrete optimal transport problem which can be solved to ε-accuracy by adding an entropic regularization

We establish some deviation inequalities, moment bounds and almost sure results for the Wasserstein distance of order p ∈ [1, ∞) between the empirical measure of independent

We extend the results of [11] about the large time behaviour of the solution (dispersion faster than usual with an additional logarithmic factor, convergence of the rescaled modulus

Hence, in that case, the condition (4.26) is weaker than the condition (5.2), and is in fact equiv- alent to the minimal condition to get the central limit theorem for partial sums

We shall address existence of weak solutions to (1.1)-(1.2) by a simple constructive Euler scheme reminiscent of the seminal works of Jordan-Kinderlehrer and Otto [13] and

Object Matching Algorithms Using Robust Hausdorff Distance Measures IEEE Transactions on Image Processing 8 425–429. [19]

The corresponding daily mean fi elds are available in the CMIP5 simulations for historical and RCP85 for- cings (Taylor et al., 2012), and they are ranked with respect to the

L’augmentation des constantes de vitesse de dégradation du MeO, RMe, BBP et BBT, en présence des deux catalyseurs étudiés avec l’augmentation de la température