• Aucun résultat trouvé

Exact rate of convergence of the expected $W_{2}$ distance between the empirical and true Gaussian distribution

N/A
N/A
Protected

Academic year: 2021

Partager "Exact rate of convergence of the expected $W_{2}$ distance between the empirical and true Gaussian distribution"

Copied!
15
0
0

Texte intégral

(1)

HAL Id: hal-02495832

https://hal.archives-ouvertes.fr/hal-02495832

Submitted on 2 Mar 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

between the empirical and true Gaussian distribution

Philippe Berthet, Jean Fort

To cite this version:

Philippe Berthet, Jean Fort. Exact rate of convergence of the expectedW_2 distance between the empirical and true Gaussian distribution. Electronic Journal of Probability, Institute of Mathematical Statistics (IMS), 2020, 25, pp.12. �10.1214/19-EJP410�. �hal-02495832�

(2)

arXiv:2001.09817v1 [math.PR] 27 Jan 2020

DISTANCE BETWEEN THE EMPIRICAL AND TRUE G AUSSIAN DISTRIBUTION

A PREPRINT

Philippe Berthet Jean Claude Fort

January 28, 2020

ABSTRACT

We study the Wasserstein distanceW2 for Gaussian samples. We establish the exact rate of con- vergencep

log logn/nof the expected value of theW2 distance between the empirical and true c.d.f.’s for the normal distribution. We also show that the rate of weak convergence is unexpectedly 1/nin the case of two correlated Gaussian samples.

Keywords Gaussian empiricalc.d.f.·quadratic Wasserstein distance·Central limit theorem·Empirical processes· Strong approximation

AMSSubject Classification: 62G30 ; 62G20 ; 60F05 ; 60F17

1 Introduction

In this article we investigate in details the asymptotic behaviour of the quadratic Wasserstein distance between the empirical cumulative distribution function (c.d.f.) of a sampleX1, . . . , Xnof independent standard Gaussian random variables denoted byFnand the standard normalc.d.f.denoted byΦ. Thus we consider the random variable

W22(Fn,Φ) = Z 1

0 |F−1

n (u)Φ−1(u)|2du.

More precisely we are interested in the exact rate of convergence ofE W22(Fn,Φ)

. Defineh(u) = ΦΦ−1(u)for u(0,1). First note that Corollary 19 in [1] does not apply in this specific case whereb = 2, and indeed we almost surely havelimn→+∞nW22(Fn,Φ) = +. Secondly, to our knowledge the most precise result about the behaviour ofW2(Fn,Φ)is given by Theorem 4.6 (ii) in [9] which implies, asn+, the convergence in distribution

nW22(Fn,Φ)

Z 1−1/n 1/n

u(1u) h2(u) du

Z 1 0

B2(u)E B2(u)

h2(u) du, (1)

whereBis a standard Brownian bridge. This is not enough to controlnE(W22(Fn,Φ))since the deterministic centering integral is diverging. In [4] specific bounds onnE(Wpp(Fn, F))are given forlog-concave distributionF. In the standard Gaussian case Corollary 6.14 of [4] reads

clog logn

n 6E W22(Fn,Φ)

6Clog logn

n (2)

where0< c < C <+. The main achievement below is to compute the exact asymptotic constant in (2). As far as we know this is the first result of this kind.

Institut de Mathématiques de Toulouse UMR 5219 ; Université Paul Sabatier, France.philippe.berthet@math.univ-toulouse.fr

MAP5 UMR 8145; Université Paris-Descartes, France.jean-claude.fort@parisdescartes.fr

(3)

In the spirit of [1] we moreover extend the investigations in the one sample case to the two correlated samples case.

More precisely, we study the random quantityW22(Fn,Gn)whereFn,Gnare the marginal empiricalc.d.f.obtained from an-sample(Xi, Yi)16i6n of standard Gaussian couples with correlationρ. If the Gaussian marginalsΦXand ΦY were not identical the general Theorem 14 in [2] would imply the convergence in distribution

n W22(Fn,Gn)W22X,ΦY)

→ N 0, σ2(Σ)

(3) whereΣis the covariance matrix of(X1, Y1)andσ2(Σ)has a closed form expression that explicitly depends onΣ.

In particular, Corollary 18 of [2] shows that for two independent samples from two distinct Gaussian distributions N(ν, ζ2)andN(µ, ξ2)it holdsσ2(Σ) = 4(ζ2+ξ2)(νµ)2+ 2(ζ2+ξ2)(ζξ)2.

Surprisingly, the second result below establishes that whenever the marginals are the same,ΦX = ΦY = Φ, and the samples are not independent, that isρ 6= 0, the rate of weak convergence of W22(Fn,Gn)is1/nand the limiting distribution is a slight variation of the one given at Theorem 11 in [1], even if the sufficient condition of the latter result is not satisfied.

2 The results

First we provide the limiting constant in (2).

Theorem 1. Let Fn be the empiricalc.d.f.of ani.i.d.standard normal sample of size nandΦthe c.d.f.of the standard normal distribution. Then it holds

n→+∞lim n

log lognE W22(Fn,Φ)

= 1,

n→+∞lim r n

log lognE(W2(Fn,Φ)) = 1.

Remark 2. This result is consistent with (1) and the fact that, by [3], we have

Z 1−1/n 1/n

u(1u)

h2(u) du= log logn+ log 2 +γ0+o(1) which implies that log logn nW22(Fn,Φ)1in probability.

Remark 3. In the case of a sample of unstandardized normal random variables with varianceσ2the expectedW2- distance between the empirical and the true distribution has the same rate as above and limiting constantsσ2andσ, respectively.

Remark 4. If Gn is a second empirical c.d.f. independent of Fn and build from another sample we see that E W22(Fn,Gn)

=E W22(Fn,Φ)

+E W22(Gn,Φ)

sinceE(R1

0(F−1n (u)Φ−1(u))du) = 0. Therefore, in this independent case we have

n→+∞lim n

log lognE W22(Fn,Gn)

= 2 which is in contrast with the forthcoming dependent sample case.

Second, in the setting of [2] and [1] we also get the rate of weak convergence in the two correlated samples case.

Theorem 5. LetFn andGn denote the marginal empiricalc.d.f.of a sizen i.i.d.sample of correlated bivariate standard normal with covarianceρ,0<|ρ|<1. Let

Cρ(u, v) =P(X 6Φ−1(u), Y 6Φ−1(v)), u, v(0,1), G(u) = BX(u)

h(u) BY(u)

h(u) , u(0,1), where(BX,BY)are two standard Brownian bridges with cross covariance

Cov(BX(u),BY(v)) =Cρ(u, v)uv, u, v(0,1). Then we have the convergence in distribution

nW22(Fn,Gn)→ ||G||22= Z 1

0 G(u)2du and the limiting random variable is almost surely finite with finite expectation.

(4)

Remark 6. By Theorem 5 it holds

nW2(Fn,Gn) → ||G||2with a CLT rate and a non degenerate limiting distri- bution with finite variance. This was not expected since in the case of two independent samples, that isρ = 0, it holds

E(||G||22) = Z 1

0

E(G(u)2)du= 2 Z 1

0

u(1u)

h2(u) du= +

which proves by Theorem 1.3 of [8] thatP(||G||2 = +) = 1, and is consistent with the similar case whereGnis replaced withΦas shown by Theorem 1.

Remark 7. Theorem 5 is an extension of Theorem 11 in [1] for Gaussian correlated samples that proves that the dependency between twoi.i.d. samples expressed through the joint law may influence the rate of convergence of W22(Fn,Gn)if the marginal distributions are the same. In the general CLT formulated at Theorem 14 of [2], only the limiting finite variance ofn(W22(Fn,Gn)W22X,ΦY))was affected by the joint law if the marginal distributions are different, not the rate1/nas recalled at(3)above.

3 Proofs

3.1 Preliminaries

Note that the density quantile functionh(u) = ΦΦ−1(u)is symmetric on(0,1)aboutu= 1/2. Straightforward computations yield, asx+,

ψ(x) =log(1Φ(x)) = x2

2 + logx+1

2log(2π) +O 1

x2

, ψ−1(x) =

s 2

x1

2logx1

2log(2π)1

2log 2 +O logx

x

.

As a consequence, we have, asu1, Φ−1(u) =ψ−1

log

1 1u

= s

2

log 1

1u

1 2log log

1 1u

1

2log(4π) +O

log log (1/(1u)) log (1/(1u))

, (4)

and

h(u) = ΦΦ−1(u) =

2(1u) s

log 1

1u 1 +O

log log (1/(1u)) log (1/(1u))

. (5)

Let us extend the results concerning the first and second moments of the extreme order statistics of a Gaussian sample stated at page 376 in [6].

Lemma 8. LetZ1 6 · · · 6 Zn denote the order statistics ofX1, ..., Xn. Let1 6 θ 6 2 andC > 0. For any k6C(logn)θit holds

E(Zn−k) =p

2 lognlog logn+ 2(s1k+1γ0) + log(4π)

8 logn +O

(log logn)2 (logn)3/2

, V(Zn−k) = π2/6s2k+1

2 logn +O 1

(logn)2

, where, fork >0,s1k=Pk

j=11/j,s2k =Pk

j=11/j2andγ0is the Euler constant.

Proof of Lemma 8. Following [6], let ξn−k+1 = n(1Φ(Zn−k+1))for k > 1. Since the random variables ξ1/n < ... < ξn/nare the order statistics ofnindependent uniform random variables, we see thatξn−k+1has density

fξn−k+1(x) =

n1 k1

x n

k−1 1x

n n−k

1[0,n](x).

(5)

Step 1.WriteΓ(k) = (k1)!and observe that n1

k1 1 n

k−1

= exp

k−1

X

j=1

log

1 j n

1 Γ(k) =

1 +O

(logn) n

1 Γ(k) since we have

k

X

j=1

j n

k

X

j=1

j n

2

6

k

X

j=1

log

1j n

6

k

X

j=1

j n

16k6C(logmax n)θ

k

X

j=1

log

1j n

+

k

X

j=1

j n

6 1 n

[C(logn)θ] X

j=1

j2 n =O

(logn) n

.

Step 2.Fork>1we have

E(Zn−k+1) =E

Φ−1

1ξn−k+1

n

= (n1)...(nk+ 1) Γ(k)

Z n 0

x n

k−1 1x

n n−k

Φ−1 1x

n dx

=

11 n

...

1k1 n

Z n 0

xk−1 Γ(k)

1x n

n−k

Φ−1 1x

n dx

= exp

s1k n s2k

2n2(1 +o(1))

(E1,n+E2,n) where, forp > θ+ 1,x(n) = (logn)pandfΓ(k)(x) =xk−1

Γ(k)e−xforx >0, E1,n= (1 +o(1))

Z x(n) 0

Φ−1 1x

n

fΓ(k)(x)dx, E2,n=

Z n x(n)

Φ−1 1x

n xk−1

Γ(k)

1x n

n−k

dx.

Assume thatk6C(logn)θ. By (4) it holds, for someK >0and all n large enough,

|E2,n|6 Z n

x(n)

xk−1 1x

n n−k

Φ−1

1x n

dx

6Kp logn

Z n/2 x(n)

exp

(nk)x

n+ (k1) logx dx

+K Z n

n/2

xk−1 1x

n

n−ks log

1 1x/n

dx 6Kexp

x(n) +C(logn)θ+ log logn

2 +C(logn)θ+1

+K Z n

n/2

xk−1 1x

n n−k−1

dx 6Kexp

(logn)p 2

+Knk

1 2

n−k−1

6Kexp ((1 +o(1))(logn)p). Now turn to

Z x(n) 0

Φ−1 1x

n

n1 k1

x n

k−1 1x

n n−k

dx

(6)

where, for0< x < x(n), we have, by (4), Φ−1(1x/n) =

s 2

log (n/x)1

2log log (n/x)1

2log(4π)O

log log (n/x(n)) log (n/x(n))

=p

2 logn2 logx+ log log (n/x) + log(4π) 2

2 logn +O

(log logn)2 (logn)3/2

(6) which is integrable near0with respect to the above density since

0<log(lognlogx) = log logn+ log

1logx logn

6log logn+

logx logn

(7) andlogx,(logx)2are integrable with respect to any Gamma distribution. Hence

E1,n =

Z x(n) 0

p2 logn2 logx+ log logn+ log(4π)

8 logn +O

(log logn+|logx|)2 (logn)3/2

fΓ(k)(x)dx

= O

(log logn)2 (logn)3/2

+

Z x(n) 0

p2 logn2 logx+ log logn+ log(4π)

8 logn

fΓ(k)(x)dx

= O

(log logn)2 (logn)3/2

+

Z +∞

0

p2 logn2 logx+ log logn+ log(4π)

8 logn

fΓ(k)(x)dx since we havex(n) = (logn)p,p >1thus, for anys >1,

Z +∞

x(n)

fΓ(k)(x)dx=o 1

ns

,

Z +∞

x(n)

logxfΓ(k)(x)dx=o 1

ns

. and moreover – see [6] – it holds

Z +∞

0

logx fΓ(k)(x)dx=s1k+1γ0, which yields the conclusion.

Similar computations give the claimed result for the variance. More precisely in the step 2 when substituing Φ−1 1nx

2

toΦ−1 1xn

inE1,n andE2,n it again appears that we can only consider integrals up to x(n).

Then it remains to compute, by substituing the expression ofE(Zn−k)and using equation (6) forΦ−1 1nx

: Z x(n)

0

Φ−1 1x

n

E(Zn−k)2n1 k1

x n

k−1 1x

n n−k

dx

=

Z x(n) 0

2(logx(s1k+1γ0)) 2

2 logn +log log (n/x) + log logn 2

2 logn +O

(log logn)2 (logn)3/2

2

×

n1 k1

x n

k−1 1x

n n−k

dx.

We conclude along the same lines as above by the upper bound (7) and the fact that the variance of the logarithm of a variable with distributionΓ(k)isπ2/6s2k+1.

3.2 Proof of Theorem 1

We intend to mimic the sheme of proof worked out in [2] and [1] - specialized to the simpler case of the distance between the empirical and true c.d.f.’s instead of two correlated empirical ones. However all arguments have to be reconsidered since the almost sure controls by means of the law of the iterated logarithm and strong approximations can not be turned easily intoL1 controls. Indeed, what happens now is that the main part of the random integral we consider is also built from the extreme parts rather than the inner part only. Moreover, only a very short extreme interval can be neglected and the remainder extreme intervals define a divergent integral to be precisely evaluated as a series. This is why the expectation rate is no more a CLT rate. Note that thelog lognin this paper only comes from the primitive ofu(1u)/h(u)2. Introduce the following decomposition, forC >0,γ >1and1< θ62,

An = Z 1

1−1/(n(logn)γ)

ZnΦ−1(u)2

du, Bn =

Z 1−1/(n(logn)γ) 1−1/n

ZnΦ−1(u)2

du,

Cn =

Z 1−1/n 1−[C(logn)θ]/n

F−1n (u)Φ−1(u)2

du, Dn=

Z 1−[C(logn)θ]/n

1/2

F−1n (u)Φ−1(u)2

du.

(7)

Step 1.We have, forγ >1, nAn

log logn 6 2Zn2

(logn)γlog logn+ 2n log logn

Z 1

1−1/(n(logn)γ)

Φ−1(u)2 du where

n→+∞lim

E Zn2

(logn)γlog logn = 0 and

Z 1

1−1/(n(logn)γ)

Φ−1(u)2

du = Z 1

1−1/(n(logn)γ)

2 log 1

1u

(1 +o(1u))2du

=

2(1u) log 1

1u 1

1−1/(n(logn)γ)

=O

1 n(logn)γ−1

hence

n→+∞lim

nE(An) log logn = 0.

Step 2.Notice that for allu[11/n,11/(n(logn)γ)], we have Φ−1(u) =p

2 logn+O

log logn

logn

. Next observe that

E(Bn) = V(Zn) n

1 1

(logn)γ

+

Z 1−1/(n(logn)γ) 1−1/n

E(Zn)Φ−1(u)2 du

=O 1

nlogn

+O

(log logn)2 nlogn

, hence

n→+∞lim

nE(Bn) log logn = 0.

Step 3.Start with

Cn=

[C(logn)θ] X

k=1

Z 1−k/n 1−(k+1)/n

Zn−kΦ−1(u)2

du.

Recall that

s1kγ0= logk+ 1 2k +O

1 k2

. Now, for16k6

C(logn)θ

andu[1(k+ 1)/n,1k/n]we have Φ−1(u) =

s 2

log (1u)1

2log log (1u)1

2log(4π)O

log logn logn

=p

2 logn2 logk+ log logn+ log(4π)

8 logn +O

(log logn)2 (logn)3/2

thus, by Lemma 8, we have, uniformly ink,

V(Zn−k) =π2/6s2k+1 2 logn +O

1 (logn)2

then

E

Zn−kΦ−1(u)2

=V(Zn−k) + E(Zn−k)Φ−1(u)2

=π2/6s2k+1 2 logn +O

1 (logn)2

+

logk(s1k+1γ0)

2 logn +O

(log logn)2 (logn)3/2

2

=π2/6s2k+1 2 logn +O

1 (logn)2

+

1 +O(1/k) 2k2 logn +O

(log logn)2 (logn)3/2

2 .

(8)

As a consequence, E(Cn)

= 1 n

[C(logn)θ] X

k=1

π2/6s2k+1 2 logn +O

1 n(logn)2−θ

+ 1

n

[C(logn)θ] X

k=1

1 +O(1/k) 2k

2 logn +O

(log logn)2 (logn)3/2

2

=O

(logn)θ/2 nlogn

+ 1

n

[C(logn)θ] X

k=[(logn)θ/2]

π2/6s2k+1 logn +O

1 nlogn

+O

(log logn)3 n(logn)3−θ

6 C(logn)θ nlogn

+∞

X

j=[(logn)θ/2] 1 j2 +O

(logn)θ/2 nlogn

=O

(logn)θ/2 nlogn

. Thus, for anyθ62we have

n→+∞lim

nE(Cn) log logn = 0.

Step 4.Now we compute the limit of the main deterministic contribution to the main stochastic termDn, namely D1,n=

Z 1−[C(logn)θ]/n

1/2

u(1u) h2(u) du.

Letvnbe such thatlogvn= (logn)εn, lim

n→+∞εn= 0, lim

n→+∞εnlog logn= +. By using (5) it holds 1

log logn

Z 1−[C(logn)θ]/n

1−1/vn

u(1u) h2(u) du

= 1 +o(1)

2 log logn(log(lognlog(

C(logn)θ

))log logvn)

= 1 +o(1) 2 log lognlog

(1 +o(1)) logn logvn

=1 +o(1)

2 (1εn) and

1 log logn

Z 1−1/vn

1/2

u(1u)

h2(u) du6 1 +o(1)

2 log logn(log logvn) = 1 +o(1) 2 εn. Therefore

n→+∞lim D1,n

log logn = 1

2. (8)

Compared with the result of [3] recalled at Remark 2 the truncation at level1/vninstead of1/npreserves the same first order.

Step 5. To show thatE(Dn)behaves as D1,n+o(1)we proceed as in [2] with strong approximation arguments.

First, we substitute the uniform quantile process to the general quantile process with a sharp control of the expectation of the random error terms in the Taylor Lagrange expansion. For short, writedn =

C(logn)θ

/nandβnX(u) =

n(F−1n (u)Φ−1(u))so that

nDn

log logn = 1 log logn

Z 1−dn

1/2

nX(u))2du.

DefiningUi = Φ(Xi)which is uniform on (0,1)we obviously haveU(i) = Φ(X(i)). Let denoteFU

n the uniform empiricalc.d.f.associated to theUiand define the underlying uniform quantile process to be

βnU(u) = n((FU

n)−1(u)u) =

n(Φ(F−1

n (u))u).

Références

Documents relatifs

Abstract: We established the rate of convergence in the central limit theorem for stopped sums of a class of martingale difference sequences.. Sur la vitesse de convergence dans le

We establish some deviation inequalities, moment bounds and almost sure results for the Wasserstein distance of order p ∈ [1, ∞) between the empirical measure of independent

We extend the results of [11] about the large time behaviour of the solution (dispersion faster than usual with an additional logarithmic factor, convergence of the rescaled modulus

Under a null hypothesis that the distributions are equal, it is well known that the sample variation of the infinity norm, maximum, distance between the two empirical

We prove a strong approximation result with rates for the empirical process associated to an ab- solutely regular stationary sequence of random variables with values in R d.. As soon

Hence, in that case, the condition (4.26) is weaker than the condition (5.2), and is in fact equiv- alent to the minimal condition to get the central limit theorem for partial sums

Concerning the empirical process, Dedecker (2010) proved that the weak convergence of n −1/2 R X (s, n) to a Gaussian process holds in D( R ) under a dependence condition involving

Central limit theorems for the Wasserstein distance between the empirical and the true distributions. Correction: “Central limit the- orems for the Wasserstein distance between