• Aucun résultat trouvé

Convergence of an estimator of the Wasserstein distance between two continuous probability distributions

N/A
N/A
Protected

Academic year: 2021

Partager "Convergence of an estimator of the Wasserstein distance between two continuous probability distributions"

Copied!
7
0
0

Texte intégral

(1)

HAL Id: hal-01526879

https://hal.archives-ouvertes.fr/hal-01526879

Preprint submitted on 23 May 2017

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Convergence of an estimator of the Wasserstein distance between two continuous probability distributions

Thierry Klein, Jean-Claude Fort, Philippe Berthet

To cite this version:

Thierry Klein, Jean-Claude Fort, Philippe Berthet. Convergence of an estimator of the Wasserstein

distance between two continuous probability distributions. 2017. �hal-01526879�

(2)

Convergence of an estimator of the Wasserstein distance between two continuous probability distributions

Philippe Berthet

a

, Jean-Claude Fort

b

, Thierry Klein

a,c

aIMT- institut math´ematiques de Toulouse, Universit´e de Toulouse, France.

bMAP5, Universit´e Paris Descartes, SPC, 45 rue des Saints P`eres, 75006 Paris, France

cENAC - Ecole Nationale de l’Aviation Civile , Universit´e de Toulouse, France

Abstract

This article is dedicated to the estimation of Wasserstein distances and Wasserstein costs between two distinct continuous distributions F and GonR. The estimator is based on the order statistics of (possibly dependent) samples ofF resp.G. We prove the consistency and the asymptotic normality of our estimators.

Nous ´etudions la convergence d’estimateurs des distances et des coˆuts de Wasserstein entre deux lois de probabi- lit´es continuesF etGsurRdistinctes. L’estimateur est construit `a partir des statistiques d’ordres des ´echantillons (possiblement d´ependants) deF et deG. Nous montrons leur consistance et un th´eor`eme de la limite centrale.

1. Introduction

The motivation of this work is to be found in the fast development of computer experiments. Nowadays the output of many computer codes is not only a real multidimensional variable but frequently a function computed on numerous sample points. In particular this function may be the density or the cumulative distribution function (c.d.f) of a real random variable (r.v.). To analyze such outputs one needs to choose a distance to compare variousc.d.f.. Among the large possibilities offered by the literature the Wasserstein distances are now commonly used - for more details on general Wasserstein distances we refer to [10].

As many of the computer codes provide large samples of the underlying distributions, the statistical study of such distances is of primordial importance. For one dimensional probability distributions the p-Wasserstein distance is theLp distance between simulatedr.v.from a universal simulatorU uniform on [0,1]:Wpp(F, G) =R1

0 |F(u)−G(u)|pdu=E|F(U)−G(U)|p, whereFis the generalized inverse of F. The most relevant cases for applications arep= 2 andp= 1. It is then natural to estimateWpp(F, G)

Email addresses: philippe.berthet@math.univ-toulouse.fr(Philippe Berthet),fort43@gmail.com(Jean-Claude Fort),thierry.klein@math.univ-toulouse.fr or thierry01.klein@enac.fr(Thierry Klein).

(3)

by its empirical counterpart that isWpp(Fn,Gn) whereFnandGnare the empirical distribution functions of F andG built throughi.i.d.samples of F and G. The two samples are not necessarily independent, for instance they may be issued from simultaneous experimentations.

Many authors were interested in the convergence ofWpp(Fn, F), see the survey paper [3] and [8,6,7,1].

Up to our knowledge there are few works [9] trying to quantify the convergence of Wpp(Fn,Gn) or for more general Wasserstein costs.

This note is organized as follows. Section 2 is dedicated to the definition of the general Wasserstein costs we are considering and their estimators. In Section 3 we state the asymptotic properties of our estimators. In Section??we give a very brief sketch of the proof.

2. Wasserstein distances and costs for probability distributions on R

2.1. Wasserstein costs

LetF andGtwoc.d.f.onR. Thep-Wasserstein distance betweenF andGis defined to be Wpp(F, G) = min

X∼F,Y∼GE|X−Y|p, (1)

whereX ∼F, Y ∼Gmeans that X and Y arer.v.s with respective distributionF andG. The minimum in (3) has the following explicit expression,

Wpp(F, G) = Z 1

0

|F(u)−G(u)|pdu. (2) The Wassertein distances can be generalized to Wasserstein costs. Given a real non negative function c(x, y) of two real variables we consider

Wc(F, G) = min

X∼F,Y∼GEc(X, Y). (3)

Among the costsc, those that define a negative measure onR2 are of special interest. They satisfy the

”measure property”P defined by

P:c(x0, y0)−c(x0, y)−c(x, y0) +c(x, y)≤0, x≤x0 , y≤y0.

From Theorem 2 of [4], we optain an explicit formula of Wc for cost functions satisfying property P, extending the formula (2). Namely ifU is ar.v.uniformly distributed on [0,1] then

Wc(F, G) = Z 1

0

c(F(u), G(u))du=Ec(F(U), G(U)). (4) Remark 1 It is obvious that c(x, y) =−xy satisfies the P property and if c satisfiesP then any function of the forma(x) +b(y) +c(x, y)also satisfiesP. In particular(x−y)2=x2+y2−2xy satisfiesP. More generally ifρ is a convex real function then c(x, y) =ρ(x−y) satisfiesP. This is the case for|x−y|p, p>1.

2.2. Empirical Wasserstein costs

Let us assume that ai.i.d.sample (Xi, Yi)1≤i≤n of r.v.with marginalc.d.f. F andGis available. We writeFn and Gn the empiricalc.d.f built on the two marginal samples and denote by X(i)theithorder

2

(4)

statistic of the sample (Xi)1≤i≤n,X(1)< . . . < X(n). Let us study the following natural estimator ofWc(F, G)

Wcn:=Wc(Fn,Gn) = 1 n

n

X

i=1

c(X(i), Y(i)). (5)

Wcn is a consistent estimator ofWc(F, G)

Theorem 2.1 Assume that the cost c(x,y) satisfies:c(x, y)≤a(f(x) +f(y))with f >0andE[f(X)]<

∞,E[f(Y)]<∞, thenWcn converges almost surely toWc(F, G).

3. A Central Limit Theorem for Wcn

The main result of this note is the weak convergence of

√n(Wc(Fn,Gn)−Wc(F, G)). (6)

we need to fix the relative position of the two tails ofF andG. Moreover these tails induce restrictions on the allowed costs. We assume thatF andGare supported on wholeRbut we only deal with the right hand side of the real line in order to give a more synthetic presentation. Hence we restrict properties of the key functions toR+ and (m,+∞) for somem >0 large enough.

3.1. Assumptions onF andG.

Fork∈NdenoteCk the set of functions that arektimes continuously differentiable onR, andC0 the set of continuous functions. We assume that F and G are C2 and that the densitiesf =F0 andg =G0 are positive and

(FG1) sup

x>m

1−F(x) f(x)

1

x+|f0(x)|

f(x)

<∞and sup

x>m

1−G(x) g(x)

1

x+|g0(x)|

g(x)

<∞ (7) Moreover, the tails have to be strictly separated. Let us denoteF−1andG−1 the quantile functions. We assume that there existsτ0>0 such that foru>F(m),

(FG2) F−1(u)−G−1(u)>τ0. 3.2. Assumptions on the costc.

Among Wasserstein costs satisfying propertyP we consider the regular ones.

LetM2(m) be the subset of functionsϕ∈ C2such that ϕ00is monotone on (m,+). Formlarge enough ϕ,ϕ0 are also monotone on (m,+∞). LetM0(m) be the set of continuous functions monotone on (m).

WriteRV(γ) the set of regularly varying functions at +∞with indexγ>0. We introduceRV2+(γ, m) = RV(γ)∩ M2(m), γ >0.

WhenL∈RV(0) we assume that l1

x 6L0(x) = ε1(x)L(x)

x , lim

x→+∞ε1(x) = 0, l1>1. (8) Whence we defineRV2+(0, m) ={L:L∈RV(0)∩ M2(m,+∞) such that (8) holds}.

(5)

We impose (wlog) thatc(x, x) = 0 and assume that (C1) c(x, y)>0, c∈ C2

(C2)c(x, y) = exp(l(|x−y|)), for (x, y)∈(m,+∞)2 and l∈RV+2(γ,0), γ>0.

Thuscis asymptotically regular and symetric. Finally we need the following contraction ofc(x, y) along the diagonalx=y. We assume that there existsd(m, τ)→0 asτ→0 such that

(C3) |c(x0, y0)−c(x, y)|6d(m, τ) (|x0−x|+|y0−y|) for (x, y),(x0, y0)∈Dm(τ), whereDm(τ) ={(x, y) : max(|x|,|y|)6m,|x−y|6τ}.

3.3. Cross assumptions between c,F andG.

Consider the tail functionsψX(x) =−logP(X > x) andψY(x) =−logP(Y > x).

We require that for someθ >2 it holds, forx∈(l(m),+∞), (CFG) (ψX◦l−1)0(x)>2 +2θ

x.

We point out that (CFG) impliesψX(x)>2l(x)+2θlogl(x) which in turn guaranties that the heaviest tail satisfies

Z +∞

m

p

P(exp(l(X))> x)dx <+∞.

The later condition is close to necessity for the finiteness of the limiting variance in Theorem 3.1 below.

This is the same kind of condition (3.4) in [3] that ensures the convergence ofW1(Fn, F) at rate√ n.

3.4. The main theorem

We say that conditions (C), (FG) and (CFG) hold if they are satisfied by the right and the left tails of thec.d.f F and G(possibly exchangingF andG). We denotehX =f◦F−1, hY =g◦G−1the so-called density quantile functions. Now define

Π(u, v) =P X6F−1(u), Y 6G−1(v) , then the covariance matrix

Σ(u, v) =

min(u, v)−uv hX(u)hX(v)

Π(u, v)−uv hX(u)hY(v) Π(v, u)−uv

hX(v)hY(u)

min(u, v)−uv hY(v)hY(u)

, (9)

and the gradient

∇(u) = ∂

∂xc(F−1(u), G−1(u)), ∂

∂yc(F−1(u), G−1(u))

. (10)

Theorem 3.1 If(C),(FG)and(CFG)hold then

√n(Wc(Fn,Gn)−Wc(F, G))→ NL (0, σ2(Π, c)) with

σ2(Π, c) = Z 1

0

Z 1 0

∇(u)Σ(u, v)∇(v)dudv <+∞. (11) 4

(6)

3.4.1. Sketch of proof Letcn(u) =√

n c(F−1n (u),G−1n (u))−c(F−1(u), G−1(u))

. We write Z 1

1 2

cn(u)du= Z u0

1 2

cn(u)du+ Z 1−ξn

u0

cn(u)du+ Z 1−εn

1−ξn

cn(u)du+ Z 1

1−εn

cn(u)du :=I1(n) +I2(n) +I3(n) +I4(n).

Under (CFG) and (C2) we can findεn→0 such thatI4(n)→0 in probability.

The sequence√

n→0 is an explicit function ofninvolving the functionsl(x),hX(x) andhY(x).

Then our condition (FG1) implies the minimal conditions we indeed need on hX(x), hY(x) and their companion functions (1−u)/FX(x)hX(x) and (1−u)/FY(x)hY(x) to apply the strong gaussian approxi- mation of [5]. When combined with (CFG) and (C2), we are able to show that for anyξn →0 such that

√nξn→+∞we haveI3(n)→0 a.s..

If ξn is slow enough we establish a distribution free a joint Brownian strong approximation of the quantile processes (F−1n (u),G−1n (u)) that we derive from [2].

Then, thanks to the regularity (C3) ofl(x) and the stochastic ordering (FG2), whenu0→1 the term I2(n) is small in probability asn→ ∞.

Finally, the latter strong approximation allows under (C1) to show that the main term I1(n)

is tight and has a limit law that is arbitrarily close to the desired Gaussian limit law whenu0 → 1.

Working with the strongly approximated versions entails the weak convergence claimed at Theorem 3.1.

Moreover (CFG) implies that the variance of the limiting Gaussian process is integrable andσ2(Π, c)<

+∞.

4. Conclusion

In this work we established consistency and asymptotic normality of the natural estimators of a large class of Wasserstein costs between two smooth distributionsF andGhaving separated tails.

The case W1 is not included in the previous theorem since it does not satisfy (C3), but the theorem still holds thanks to a specific proof. Theorem 3.1 applies to all the classical distributions with regularly decreasing tail when choosing an adapted cost. For instance we may take two distributions with poly- nomial (Pareto) tail of same orderβ >2 but shifted and apply our result when choosing a Wasserstein distanceWpp withp < β/2. For Gaussian tailed distributions one may consider exponential costs of type e|x−y|γ− 1, γ <2.

Clearly this result allows to build a test for equality of two smooth distributions.

References

[1] P. C. ´Alvarez-Esteban, E. del Barrio, J. A. Cuesta-Albertos, and C. Matr´an. Uniqueness and approximate computation of optimal incomplete transportation plans. Ann. Inst. Henri Poincar´e Probab. Stat., 47(2):358–375, 2011.

[2] P. Berthet and D. M. Mason. Revisiting two strong approximation results of Dudley and Philipp. InHigh dimensional probability, volume 51 ofIMS Lecture Notes Monogr. Ser., pages 155–172. Inst. Math. Statist., Beachwood, OH, 2006.

[3] M. Bobkov, S. G.and Ledoux. One-dimensional empirical measures, order statistics and kantorovich transport distances.

To appear in: Memoirs of the AMS, Preprint 2016.

[4] Stamatis Cambanis, Gordon Simons, and William Stout. Inequalities forEk(X, Y) when the marginals are fixed. Z.

Wahrscheinlichkeitstheorie und Verw. Gebiete, 36(4):285–294, 1976.

(7)

[5] M. Cs¨org¨o and P. R´ev´esz. Strong approximations of the quantile process.Ann. Statist., 6(4):882–894, 1978.

[6] E. del Barrio, E. Gin´e, and C. Matr´an. Central limit theorems for the Wasserstein distance between the empirical and the true distributions. Ann. Probab., 27(2):1009–1071, 1999.

[7] E. del Barrio, E. Gin´e, and C. Matr´an. Correction: “Central limit theorems for the Wasserstein distance between the empirical and the true distributions” [Ann. Probab.27(1999), no. 2, 1009–1071; MR1698999 (2000g:60034)]. Ann.

Probab., 31(2):1142–1143, 2003.

[8] E. del Barrio, E. Gin´e, and F. Utzet. Asymptotics forL2functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances. Bernoulli, 11(1):131–189, 2005.

[9] M. Sommerfeld and A. Munk. Inference for Empirical Wasserstein Distances on Finite Spaces.ArXiv e-prints, October 2016.

[10] C. Villani.Topics in Optimal Transportation. American Mathematical Society, 2003.

6

Références

Documents relatifs

Thanks to a given noise level δ and a well-suitable inte- gration length window, we show that the derivative estimator error can be O (δ n+1+q q+1 ) where q is the order of

We also study the problem without restriction on the optimal plan, and provide lower and upper bounds for the value of the Gromov-Wasserstein distance between Gaussian

Our equations come from the modelisation of kidney nephrons, in which a concentration gradient is amplified by an active transport pump [16], which plays a fundamental role for

L’augmentation des constantes de vitesse de dégradation du MeO, RMe, BBP et BBT, en présence des deux catalyseurs étudiés avec l’augmentation de la température

Antimicrobial resistance of specific animal pathogens raises veterinary problems in terms of efficacy but, for two reasons, has no direct impact on human medicine: (i) these

In this section we derive a functional inequality ensuring the uniform exponential conver- gence (18) of the solutions to the steady state ν = e −V , give practical criteria for it

Hence, in that case, the condition (4.26) is weaker than the condition (5.2), and is in fact equiv- alent to the minimal condition to get the central limit theorem for partial sums

As a matter of fact, the statistical analysis of the estimates and their asymptotic behaviour in distribution require a particular study of the asymptotic expan- sion of M n that