• Aucun résultat trouvé

On fine properties of mixtures with respect to concentration of measure and Sobolev type inequalities

N/A
N/A
Protected

Academic year: 2022

Partager "On fine properties of mixtures with respect to concentration of measure and Sobolev type inequalities"

Copied!
25
0
0

Texte intégral

(1)

www.imstat.org/aihp 2010, Vol. 46, No. 1, 72–96

DOI:10.1214/08-AIHP309

© Association des Publications de l’Institut Henri Poincaré, 2010

On fine properties of mixtures with respect to concentration of measure and Sobolev type inequalities

Djalil Chafaï

a

and Florent Malrieu

b

aUMR181 INRA ENVT, 23 Chemin des Capelles F-31076 Toulouse Cedex 3, France. UMR5583 CNRS Institut de Mathématiques de Toulouse (IMT), Université de Toulouse III, 118 route de Narbonne, F-31062, Toulouse Cedex, France. E-mail:[email protected]

bUMR 6625 CNRS Institut de Recherche Mathématique de Rennes (IRMAR), Université de Rennes I, Campus de Beaulieu, F-35042 RennesCEDEX, France. E-mail:[email protected]

Received 7 May 2008; revised 10 November 2008; accepted 10 December 2008

Abstract. Mixtures are convex combinations of laws. Despite this simple definition, a mixture can be far more subtle than its mixed components. For instance, mixing Gaussian laws may produce a potential with multiple deep wells. We study in the present work fine properties of mixtures with respect to concentration of measure and Sobolev type functional inequalities. We provide sharp Laplace bounds for Lipschitz functions in the case of generic mixtures, involving a transportation cost diameter of the mixed family. Additionally, our analysis of Sobolev type inequalities for two-component mixtures reveals natural relations with some kind of band isoperimetry and support constrained interpolation via mass transportation. We show that the Poincaré constant of a two-component mixture may remain bounded as the mixture proportion goes to 0 or 1 while the logarithmic Sobolev constant may surprisingly blow up. This counter-intuitive result is not reducible to support disconnections, and appears as a reminiscence of the variance-entropy comparison on the two-point space. As far as mixtures are concerned, the logarithmic Sobolev inequality is less stable than the Poincaré inequality and the sub-Gaussian concentration for Lipschitz functions. We illustrate our results on a gallery of concrete two-component mixtures. This work leads to many open questions.

Résumé. Les mélanges dont il est question ici sont des combinaisons convexes de lois de probabilité. Malgré cette défnition simple, un mélange peut étre beaucoup plus subtil que ses composants. Un mélange de lois gaussiennes par exemple peut donner lieu à des potentiels à profonds puits multiples. Dans ce travail, nous étudions les propriétés fines des mélanges vis à vis de la concentration de la mesure et des inégalités de type Sobolev. Nous proposons des bornes sur la transformée de Laplace faisant intervenir le diamètre de la famille mélangée pour une distance de transport. Notre analyse des inégalités de type Sobolev pour les mélanges à deux composants révèle des relations naturelles avec une forme d’isopérimétrie pour les bandes, ainsi qu’avec le transport optimal sous contrainte de support. Nous établissons que la constante de Poincaré peut rester bornée lorsque la proportion du mélange tend vers 0 tandis que la constante de Sobolev logarithmique peut exploser. Ce phénomène contre intuitif n’est pas réductible à un problème de support et peut être vu comme une trace de la comparaison variance-entropie sur l’espace à deux points. Pour les mélanges, la propriété de concentration de la mesure sous-gaussienne et l’inégalité de Poincaré sont plus stables que l’inégalité de Sobolev logarithmique. Nous illustrons nos résultats avec une collection d’exemples à deux composants concrets.

Ce travail conduit à plusieurs questions ouvertes.

MSC:60E15; 49Q20; 46E35; 62E99

Keywords:Transportation cost distances; Mallows or Wasserstein distance; Mixtures of distributions; Finite Gaussian mixtures; Concentration of measure; Gaussian bounds; Tails probabilities; Deviation inequalities; Functional inequalities; Poincaré inequalities; Gross logarithmic Sobolev inequalities; Band isoperimetry; Transportation of measure; Mass transportation

(2)

1. Introduction

Mixtures of distributions are ubiquitous in stochastic analysis, modelling, simulation, and statistics, see, for instance, the monographs [16,18,39,40,50]. Recall that a mixture of distributions is nothing else but aconvex combinationof these distributions. For instance, ifμ0andμ1are two laws on the same space, and ifp∈ [0,1]andq=1−p, then the law1+0is a “two-component mixture.” More generally, afinite mixturetakes the formp1μ1+ · · · +pnμn whereμ1, . . . , μnare probability measures on a common measurable space andp1δ1+ · · · +pnδnis a finite discrete probability measure. A widely used example is given by finite mixtures of Gaussians for whichμi=N(mi, σi2)for every 1≤in. In that case, for certain choices ofm1, . . . , mnandσ1, . . . , σn, the mixture

p1N m1, σ12

+ · · · +pnN mn, σn2

is multi-modal and its log-density is a multiple wells potential. For instance, each componentμi may correspond typ- ically in statistics to a sub-population, in information theory to a channel, and in statistical physics to an equilibrium.

Another very natural example is given by the invariant measures of finite Markov chains, which are mixtures of the invariant measures uniquely associated to each recurrent class of the chain. A more subtle example is the local field of the Sherrington–Kirkpatrick model of spin glasses which gives rise to a mixture of two univariate Gaussians with equal variances, see, for instance, [13].

At this point, it is enlightening to introduce a more abstract point of view. Letνbe a probability measure on some measurable spaceΘandθ)θΘbe a collection of probability measures on some common fixed measurable spaceX, such that the mapθEμθf is measurable for any fixed bounded continuousf:X→R. The mixtureM(ν, μθΘ) is the law onX defined for any bounded measurablef:X→Rby

EM(ν,μθ∈Θ)f =

Θ

Xf (x)θ(x)dν(θ )=EνEμθf ).

Hereνis themixing lawwhereasθ)θΘ are themixed lawsor themixture componentsor even themixed family.

With these new notations, and for the finite mixture example mentioned earlier we have Θ= {1, . . . , n}andν= p1δ1+ · · · +pnδnand

M

ν, (μθ)θΘ

=M

p1δ1+ · · · +pnδn,{μ1, . . . , μn}

=p1μ1+ · · · +pnμn.

The mixtureM(ν, μθΘ)can be seen as a sort of general convex combination in the convex set of probability mea- sures on X. It appears for a certain class of ν as a particular Choquet’s integral, see [43] and [17]. On the other hand, the case where the mixture components are product measures is also related to exchangeability and De Finetti’s theorem, see, for instance, [7]. In terms of random variables, if(X, Y )is a couple of random variables then the law L(X)ofXis a mixture of the family of conditional lawsL(X|Y =y)with the mixing lawL(Y ). By this way, mixing appears as the dual of the so-called disintegration of measure. Here and in the whole sequel, the term “mixing” refers to the mixture of distributions as defined above and has a priori nothing to do with weak dependence.

Our first aim is to investigate the fine behavior of concentration of measure for mixtures, for instance, for a two- component mixture1+0as min(p, q)goes to 0. It is well known that Poincaré and (Gross) logarithmic Sobolev functional inequalities are powerful tools in order to obtain concentration of measure. Also, our second aim is to in- vestigate the fine behavior of these functional inequalities for mixtures, and in particular for two-component mixtures.

Our work reveals striking unexpected phenomena. In particular, our results suggest that the logarithmic Sobolev in- equality, which implies sub-Gaussian concentration, is very sensitive to mixing, in contrast with the sub-Gaussian concentration itself which is far more stable. As in [20] and [3], our work is connected to the more general problem of the behavior of optimal constants for sequences of probability measures.

Let us start with the notion ofconcentration of measurefor Lipschitz functions. We denote by · 2the Euclidean norm ofRd. A functionf:Rd→Ris Lipschitz when

fLip=sup

x=y

|f (x)f (y)| xy2

<.

(3)

Letμ be a law onRd such that Eμ|f|<∞for every Lipschitz function f. This holds true for instance whenμ has a finite first moment. We always make implicitly this assumption in the sequel. We define now the log-Laplace transformαμ:R→ [0,∞]ofμby

αμ(λ)=log sup

fLip1

Eμ

eλ(fEμf )

. (1)

The Cramér–Chernov–Chebyshev inequality gives, for everyr >0, βμ(r)= sup

fLip1

μ

|fEμf| ≥r

≤2 exp −sup

λ>0

αμ(λ)

(2) and the supremum in the right-hand side is a Fenchel–Legendre transform ofαμ. Note thatβμ is a uniform upper bound on the tails probabilities of Lipschitz images ofμ. We are interested in the control ofβμviaαμin the case whereμ=M(ν, (μθ)θΘ), in terms of the mixing law ν and of the log-Laplace boundsμθ)θΘ for the mixed family.

We say thatμsatisfies asub-Gaussian concentration of measurefor Lipschitz functions when there exists a con- stantC(0,)such that for every real numberλ,

αμ(λ)≤1

42. (3)

The log-Laplace–Lipschitz quadratic bound (3) implies via (2) that for everyr >0, βμ(r)≤2 exp

r2 C

. (4)

Actually, it was shown (see [15] and [9]) that up to constants, (3) and (4) are equivalent, and are also equivalent to the existence of a constantς(0,)andx0∈Rdsuch that

Rdeς|xx0|2μ(dx) <. (5)

Linear or quadratic upper bounds forαμmay be deduced from functional inequalities such as Poincaré and (Gross) logarithmic Sobolev inequalities [23,24]. We say thatμsatisfies a Poincaré inequality of constantC(0,)when for every smoothh:Rd→R,

Varμ(h)CE

|∇h|2

, (6)

whereVarμ(h)=Eμ(h2)(Eμh)2is thevarianceofhfor μ. The smallest possible constantCis called theopti- mal Poincaré constant ofμand is denotedCPI(μ) with the convention inf∅= ∞. Similarly,μsatisfies a (Gross) logarithmic Sobolev inequality of constantC(0,)when

Entμ

h2

CE

|∇h|2

(7) for every smooth functionf:Rd→R, whereEntμ(h2)=Eμ(h2logh2)Eμ(h2)logEμ(h2)is theentropyorfree energyofh2for μ, with the convention 0 log(0)=0. As for the Poincaré inequality, the smallest possibleC is the optimal logarithmic Sobolev constant ofμand is denotedCGI(μ)with inf∅= ∞. Standard linearization arguments give that

ρ(Kμ)CPI(μ)≤1

2CGI(μ), (8)

whereρ(Kμ) stands for the spectral radius of the covariance matrixKμ of μ defined by(Kμ)i,j =Eμ(xixj)Eμ(xi)Eμ(xj)wherexiandxj are the coordinate functions. More precisely, the first inequality in (8) follows from (6)

(4)

by takingh= ·, uwhereuruns over the unit sphere while the second inequality in (8) follows by considering the directional derivative of both sides of (7) at the constant function 1.

A basic example is given by Gaussian laws for which equalities are achieved in (8). A wide class of laws satisfy Poincaré and logarithmic Sobolev inequalities. Beyond Gaussian laws, a criterion due to Bakry and Émery [1,2] (see also [12,42,46] and [10,11]) states that ifμhas Lebesque density eV onRdsuch thatxV (x)1|x|2is convex for some fixed realκ >0 thenCPI(μ)κandCGI(μ)≤2κwith equality in both cases whenμis Gaussian. This log- concave criterion appears as a comparison with Gaussians. Note that in general,CGI(μ) <∞impliesCPI(μ) <∞ but the converse is false. For instance, the law with density proportional to exp(−|x|a)on Rsatisfies a Poincaré inequality iffa≥1 and a logarithmic Sobolev inequality iffa≥2, see, for example, [1], Chapter 6. Note also that ifμhas disconnected support, then necessarilyCPI(μ)=CGI(μ)= ∞. To see it, consider a non-constanthwhich is constant on each connected component of the support ofμ. This is for instance the case for the two-component mixtureμ=1+0=M(pδ1+0,{μ0, μ1})withp(0,1)andq=1−p whereμ0andμ1have disjoint supports.

The logarithmic Sobolev inequality (7) implies a sub-Gaussian concentration of measure for Lipschitz images ofμ. Namely, using (7) withh=exp(12λf )for a real numberλand a smooth Lipschitz functionf:Rd→Rgives via Rademacher’s theorem and a standard argument attributed to Herbst [34], Chapter 5, that for any realsλandr >0

αμ(λ)≤1

4CGI(μ)λ2 and βμ(r)≤2 exp

r2 CGI(μ)

. (9)

The same method yields from (6) a sub-exponential upper bound forβμof the formc1exp(−c2r)for some constants c1, c2>0, see, for instance, [22] and [33], Section 2.5.

Both Poincaré and logarithmic Sobolev inequalities are invariant by the action of the translation group and the orthogonal group. More generally, let us denote byf·μthe image measure ofμby the mapf. Both (6) and (7) are stable by Lipschitz maps in the sense thatCPI(f·μ)f2LipCP I(μ)andCGI(f·μ)f2LipCGI(μ). On the real line,CPIandCGIcan be controlled via “simple” variational bounds such as (18). Both (6) and (7) are also stable by bounded perturbations on the log-density ofμ, see [25,27] and [3] for further details. In view of sub-exponential or sub-Gaussian concentration bounds, the main advantage of (6) and (7) over a direct approach based onαμorβμlies in the stability by tensor products of (6) and (7), see, for example, [1], Chapters 1 and 3, [9] and [21].

The case of mixtures

The integral criterion (5) shows that if the components of a mixture uniformly satisfy a sub-Gaussian concentration of measure for Lipschitz functions, and if the mixing law has compact support, then the mixture also satisfies a sub- Gaussian concentration of measure for Lipschitz functions. Such bounds appear, for instance, in [6]. However, this observation does not give any fine quantitative estimate on the dependency over the weights for a finite mixture.

Regarding Poincaré and logarithmic Sobolev inequalities, it is clear that a finite mixture of Gaussians will satisfy such inequalities since its log-density is a bounded perturbation of a uniformly concave function. Here again, this does not give any fine control on the constants.

An upper bound for the Poincaré constant of univariate finite Gaussian mixture was provided by Johnson [29], Theorem 1.1 and Section 2. Unfortunately, this upper bound blows up when the minimum weight of the mixing law goes to 0. A more general upper bound for finite mixtures of overlapping densities was obtained by Madras and Randall [35], Theorem 1.2 and Section 5. Here again, the bound blows up when the minimum weight of the mixing law goes to 0. Some aspects of Poisson mixtures are considered by Kontoyannis and Madiman [30,31] in connection with compound Poisson processes and discrete modified logarithmic Sobolev inequalities.

Outline of the article

Recall that the aim of the present work is to study the fine properties of mixture of law with respect to concentration of measure and Sobolev type functional inequalities. The analysis of various elementary examples shows actually that such a general objective is very ambitious. Also, we decided to focus in the present work on more tractable situations.

Section2provides Laplace bounds for Lipschitz functions in the case of generic mixtures. These upper bounds onαμ

(5)

(and thusβμ) for a mixtureμinvolve theW1-diameter (see Section2for a precise definition) of the mixed family.

Section3is devoted to upper bounds onαμfor two-component mixturesμ=μp=1+0. Our result is mainly based on a Laplace–Lipschitz counterpart of the optimal logarithmic Sobolev inequality for asymmetric Bernoulli measures. In particular, we show that ifμ0andμ1satisfy a sub-Gaussian concentration for Lipschitz functions, then it is also the case for the mixtureμp, with a quite satisfactory and intuitive behavior as min(p, q)goes to 0. In Section4, we study Poincaré and logarithmic Sobolev inequalities for two-component mixtures. A decomposition of variance and entropy allows us to reduce the problem to the Poincaré and logarithmic Sobolev inequalities for each component, to discrete inequalities for the Bernoulli mixing law1+0, and to the control of a mean-difference term. This last term can be controlled in turn by using some support-constrained transportation, leading to very interesting open questions in dimension>1. The Poincaré constant of the two-component mixture can remain bounded as min(p, q) goes to 0, while the logarithmic Sobolev constant may surprisingly blow up at speed−log(min(p, q)). This counter- intuitive result shows that as far as the mixture of laws is concerned, the logarithmic Sobolev inequality does not behave like the sub-Gaussian concentration for Lipschitz functions. We also illustrate our results on a gallery of concrete two-component mixtures. In particular, we show that the blow up of the logarithmic Sobolev constant as min(p, q)goes to 0 is not necessarily related to support problems.

Open problems

The study of Poincaré and logarithmic Sobolev inequalities for multivariate or non-finite mixtures is an interesting open problem, for which we give some clues at the end of Section4 in terms of support-constrained transportation interpolation. There is maybe a link with the decomposition approach used in [28] for Markov chains. One can also explore the tensor products of mixtures, which are again mixtures. Another interesting problem is the development of a direct approach for transportation cost and measure-capacities inequalities (see [5]) for mixtures, even in the finite univariate case.

2. General Laplace bounds for Lipschitz functions

Intuitively, the concentration of measure of a finite mixture may be controlled by the worst concentration of the components and some sort of diameter of the mixed family. We shall confirm, extend, and illustrate this intuition for a not necessarily finite mixture. The notion of diameter that we shall use is related to coupling and transportation cost.

Recall that for everyk≥1, the Wasserstein (or transportation cost) distance of orderkbetween two lawsμ1andμ2 onRd is defined by (see [51,52] and [44,47])

Wk1, μ2)=inf

π

Rd×Rd|xy|kdπ(x, y) k1

, (10)

whereπ runs over the set of laws onRd×Rd with marginalsμ1andμ2. TheWk-convergence is equivalent to the weak convergence together with the convergence of moments up to orderk. In dimensiond=1, we have, by denoting F1 andF2 the cumulative distribution functions of μ1 andμ2, with generalized inversesF11andF21, for every k≥1,

Wk1, μ2)k= 1

0

F11(x)F21(x) kdx and W11, μ2)=

R

F1(x)F2(x) dx, (11)

where the last expression ofW1follows from the Kantorovich–Rubinstein dual formulation W11, μ2)= sup

fLip1

Rdf1

Rdf2

. (12)

Note that ifμ1does not give mass to points thenμ2=(F21F1)·μ1. The transportation cost distances lead to the so-calledtransportation cost inequalities, popularized by Marton [36,37], Talagrand [49], and Bobkov and Götze [8].

See, for instance, the books [34,51,52] for a review. The link with concentration of measure was recently deeply explored by Gozlan, see [21]. We will not use this interesting line of research in the present paper.

(6)

Theorem 2.1 (General Laplace–Lipschitz bound via diameter). Letμ=M(ν, (μθ)θΘ)be a general mixture.If this mixture satisfies the uniform bounds

α=sup

θΘ

αθ<and W= sup

θ,θΘ

W1θ, μθ) <

then for everyλ >0we have αμ(λ)α(λ)+1

8min

8W λ, W2λ2 .

Proof of Theorem2.1. The key point is that iffLip≤1 then for everyλ >0, Eμ(eλf)

eλEμf =eλEμf

Θ

Eμθ

eλf ν(dθ )

Θ

eαθ(λ)+λ(EμθfEμf )ν(dθ ). (13) As a consequence, we get

αμ(λ)α(λ)+ sup

fLip1

log

Θ

eλ(EμθfEμf )ν(dθ ). (14)

Thanks to the relation (12), we obtain EμθfEμf =

Θ

(EμθfEμθf )ν

Θ

W1θ, μθ

W .

This shows that the second term in the right-hand side of (14) is bounded byW λ. Alternatively, one can use the Hoeffding bound [26] which says that ifXis a centered bounded random variable with oscillationc=supX−infX then

E eλX

≤eλ2c2/8.

The desired bound in terms of W2λ2 follows by taking X=EμYfEμf where Yν and noticing that c

supθ,θ(EμθfEμθf )=W.

Example 2.2 (Finite mixtures). For a finite mixture μ=p1μ1+ · · · +pnμn =M(ν, (μi)1in) where ν = p1δ1+ · · · +pnδn,the mixing measure ν is supported by a finite set.In that case,Theorem2.1gives an immedi- ate Laplace bound,involving the worst bound for the mixture components i)1in (this cannot be improved in general).However,in Section3,we provide sharper bounds by improving the dependency overν in the case where n=2.

Example 2.3 (Bounded mixtures of multivariate Gaussians). Here μθ =N(m(θ ), Γ (θ ))wherem:Θ→Rd and Γ :RdSd+are two measurable bounded functions andSd+is the cone of symmetric non-negative(d×d)-matrices.

Note thatΓ (θ )is allowed to be singular,that is,not of full rank.The spectrum ofΓ (θ )is real and non-negative.If λ1(θ )≥ · · · ≥λd(θ )are the eigenvalues ofΓ (θ ),we defineρ=supθΘλ1(θ )=supθΘΓ (θ )22.Now fix some mixing lawνonΘand consider the mixtureμ=M(ν, (μθ)θΘ).Then for everyλ >0,

αμ(λ)ρ 2λ2+1

8min

8W λ, W2λ2 .

One can deduce an upper bound forW from the following lemma.

(7)

Lemma 2.4 (W1-distance of two multivariate Gaussian laws). Letμ0=N(m(0), Γ (0))andμ1=N(m(1), Γ (1)) be two Gaussian laws onRd.Forθ∈ {0,1},we denote by

λ1(θ )≥ · · · ≥λd(θ )

the ordered spectrum ofΓ (θ )and by(vi(θ ))1idan associated orthonormal basis of eigenvectors.Assume,without loss of generality,thatvi(0)·vi(1)≥0for every1≤id where “·” stands for the Euclidean scalar product ofRd. ThenW10, μ1)is bounded above by

m(1)−m(0) + d

i=1

λi(1)λi(0)2

+2

λi(1)λi(0)

1−vi(1)·vi(0) .

The reader may find in [48], Theorem 3.2, a formula in the same spirit forW20, μ1).

Proof of Lemma2.4. The triangle inequality for theW1distance gives W10, μ1)W1

μ0,N

m(1), Γ (0) +W1

N

m(1), Γ (0) , μ1

≤ m(1)−m(0) +W1 N

0, Γ (0) ,N

0, Γ (1) .

Now, if(Yi)1id are i.i.d. real random variables of lawN(0,1)then the law of Xθ=

d i=1

Yi

λi(θ )vi(θ )

isN(0, Γ (θ ))forθ∈ {0,1}. Moreover, from (10) and Jensen’s inequality, we get W1

N

0, Γ (0) ,N

0, Γ (1)2

E|X1X0|2

≤E

|X1X0|2 .

At this step, we note that

|X1X0|2= d i=1

Yi2 λi(1)vi(1)

λi(0)vi(0) 2 +2

i<j

YiYj

λi(1)vi(1)

λi(0)vi(0)

·

λi(1)vi(1)

λi(0)vi(0) .

Since(Yi)are i.i.d.N(0,1)and(vi(θ ))1idis orthonormal forθ∈ {0,1}, one has E

|X1X0|2

= d i=1

λi(1)vi(1)

λi(0)vi(0) 2

= d i=1

λi(1)λi(0)2

+2

λi(1)λi(0)

1−vi(1)·vi(0)

.

Of course the assumptions of Theorem2.1may be relaxed. Instead of trying to deal with generic abstract results, let us provide some highlighting examples.

Example 2.5 (Gaussian mixture of translated Gaussians). HereΘ=Randμθ=N(θ, σ2)for some fixedσ >0, and the mixing law is also Gaussianν=N(0, τ2)for some fixedτ >0.In this case,α(λ)=12σ2λ2butW is infinite since

W1θ, μθ)= θθ .

(8)

In particular,Theorem2.1is useless.Nevertheless,the function θg(θ )=EμθfEμf

is Lipschitz since g(θ )−g

θE f (X+θ )f

X+θ ≤ θ−θ , whereXN(0,1).As a consequence,we get

fsupLip1

log

Θ

eλ(EμθfEμf )ν(dθ )τ2λ2 2 and for anyλ >0

αμ(λ)σ2+τ2 2 λ2.

The same argument may be used more generally for “position” mixtures.For instance,ifηis some fixed probability measure onRdandμθ=ηδθ forθ∈Rdthenλ >0,

αμ(λ)αη(λ)+αμ(λ).

In this particular case,μ=νηand the bound above follows also by tensorization!

Example 2.6 (Mixture of scaled Gaussians: from exponential to Gaussian tails). Here we takeΘ= [0,∞)and μθ=N(0, θ2)with a mixing measureνof density

θγ

Γ (γ1)exp

θγ

1[0,)(θ ),

whereγ≥2is some fixed real number.Note thatνhas a non-compact support and thatμdoes not satisfy the integral criterion(5).This means thatμcannot have sub-Gaussian tails.Note also that bothα(λ)andWare infinite since

αθ(λ)=θ2λ2

2 and W1θ, μθ)= 2

π θ−θ ,

where we used(11)forW1.Starting from(13),one has by the Cauchy–Schwarz inequality Eμ(eλf)

eλEμf 2

Θ

eθ2λ2ν(dθ )

Θ

e2λ(EμθfEμf )ν(dθ ). (15)

Note thatνsatisfies condition(5)andαν(λ)2for some real constantC >0.Here and in the sequel,the constant C may vary from line to line and may be chosen independent ofγ.On the other hand,the centered functiong(θ )= EμθfEμf is1-Lipschitz since

g(θ )g

θ = Ef (θ X)Ef

θX θθ E

|X| ,

whereXN(0,1).Also,for the second term in the right-hand side of(15)we have

Θ

e2λ(EμθfEμf )ν(dθ )≤eαν(2λ)≤e4Cλ2.

Ifγ=2thenαμ(λ)≤2Cλ214log(1−λ2)≤2C−14log(1−λ)ifλ <1,which gives,after some computations,the deviation bound,for some other constantsC>0andC>0,

μ(FEμfr)CeCr.

(9)

Assume in contrast thatγ >2.Sinceθ2λ2γ1θγ+C0λ2γ /(γ2)for some constantC0>0which may depend onγ but not onλandθ,we get,for some constantsC1>0andC2>0,

0

exp θ2λ2

ν(dθ )C1exp

C2λ2γ /(γ2) .

This givesαμ(λ)C3λ2γ /(γ2)+C4for some constantsC3>0andC4>0,which yields a deviation bound of the form(for some constantsC5>0andC6>0)

μ(fEμfr)C5exp

C6r24/(γ+2) .

Note thatνgoes to the uniform law on[0,1]asγ→ ∞and the Gaussian tail reappears.

3. Concentration bounds for two-component mixtures

In this section, we investigate the special case where the mixing measureνis the Bernoulli measureB(p)=1+0 whereq=1−p. We are interested in the study of the sharp dependence of the concentration bounds onp, especially whenpis close to 0 or 1.

Theorem 3.1 (Two-component mixture). Letμ0andμ1be two probability measures onX andμ=1+0

withp∈ [0,1]andq=1−p.Definexp=max(p, q)/(2cp)where cp= qp

4(log(q)−log(p))

with the continuity conventionsc1/2=1/8andc0=c1=0.Then for anyλ >0, αμ(λ)≤max(αμ0, αμ1)(λ)+

cpλ2W10, μ1)2 if λW10, μ1)xp, max(p, q)

λW10, μ1)12xp

otherwise.

Note that if min(p, q)→0, thencp∼ −(4 log(p))1→0 and xp→ ∞, and we thus recover an upper bound of the formαμ≤max(αμ1, αμ2)as min(p, q)→0, which is satisfactory. The two different upper bounds given by Theorem3.1provide two different upper bounds for the concentration of measure of the mixtureμ, illustrated by the following corollary (the proof of the corollary is immediate and is left to the reader).

Corollary 3.2 (Two-component mixtures with sub-Gaussian tails). Let μ0 andμ1 be two probability measures onX andμ=1+0for somep∈ [0,1]withq=1−p.If there exists a real constantC >0such that for any λ >0

max(αμ0, αμ1)(λ)≤1 22

then for everyr≥0,withW=W10, μ1),

βμ(r)≤2

⎧⎨

⎩ exp

r2

2C+4cpW2

if r≤max(p, q) C

2cpW +W , exp

2C1

r−max(p, q)W2

max(p,q)4cp 2

otherwise.

Proof of Theorem3.1. We haveμ=0+1=M(ν,{μ0, μ1})whereν:=0+1. For this finite mixture, we get, as in the general case, for anyf∈Lip(X,R)andλ >0,

log

Eμ(eλf) eλEμf

≤max(αμ0, αμ1)(λ)+log

Eν(eλg) eλEνg

,

(10)

whereg(i):=Eμif. At this step, we use the particular nature ofν, which gives Eν(eλg)

eλEνg =coshp

λ

g(1)g(0) ,

where coshp(x):=peqx+qepx. Sinceg(1)g(0)=Eμ1fEμ0f, we get by (12)

W10, μ1)g(1)g(0)W10, μ1).

Since coshp(x)=coshq(x)for anyx∈R, we get for anyλ >0,

fsupLip1

Eν(eλg) eλEνg

=max(coshp,coshq)

λW10, μ1) .

Putting all together, we obtain, for anyλ >0,

αμ(λ)≤max(αμ0, αμ1)(λ)+log max(coshp,coshq)

λW10, μ1) .

Since(coshq−coshp)(x)=2pq(cosh(px)−cosh(qx)), one has, for everyx≥0, max(coshp,coshq)(x)=coshmin(p,q)(x).

Let us assume thatpq. Lemma3.3ensures that, for everyx≥0, log max(coshp,coshq)(x)=log coshp(x)cpx2.

On the other hand,

log coshp(x)=qx+log

p+qex

qx.

Now, forx=xp, the slope ofxcpx2is equal toq and the tangent isy=q(xxp/2). On the other hand, the convexity ofx→log coshp(x)yields log coshp(x)q(xxp)forxxp(drawing a picture may help the reader).

The desired conclusion follows immediately.

The proof of Theorem3.1relies on Lemma3.3below, which provides a Gaussian bound for the Laplace transform of a Lipschitz function with respect to a Bernoulli law. This lemma is an optimal version of the Hoeffding bound [26]

in the case of a Bernoulli law.

Lemma 3.3 (Two-point lemma). For any0≤p≤1/2,we have sup

x>0

x2log

peqx+qepx

=cp= qp

4(log(q)−log(p)) (16)

with the natural conventionsc0=0andc1/2=1/8as in Theorem3.1.Moreover,the supremum inx is achieved for x=2(log(q)−log(p)).

The constantcpis also equal, as it will appear in the proof, to supλ>0αB(p)(λ)/λ2. The classical Hoeffding bound for this supremum isc1/2=1/8 which is the maximum ofcpoverp. Additionally, the quantity 1/(4cp)is the optimal constant of the logarithmic Sobolev inequality for the asymmetric Bernoulli measure0+1(see Lemma4.1).

Proof of Lemma3.3. Let us definexp=log(q/p)andβ(x)=x2ψ (x)where ψ (x)=log

peqx+qepx .

(11)

The functionψis “strongly convex” at the origin (ψ (0)=ψ(0)=0 andψ(0)=pqandψ(0) >0) and linear at infinity (ψ()=q). Therefore, the supremum ofβ is achieved for somex >0. The derivative ofβhas the sign of γ (x):=(x)−2ψ (x). Furthermore,

γ(x)=(x)ψ(x) and γ(x)=(x).

As a consequence,γ has the sign ofψ which is positive on(0,xp)and negative on(xp,+∞). Sinceγ(0)=0 andγ achieves its maximum forx=xp andγ goes to −q at infinity and there exists an uniqueyp>0 (in fact yp>xp) such thatγ(yp)=0. As a conclusion, sinceγ (0)=0 andγ is increasing on(0, yp)andγ (x)goes to−∞

asx goes to infinity,γ (x)is equal to zero exactly two times: forx=0 andx=zp> yp>xp. In fact,zpis equal to 2xp. Indeed, we have

ψ(x)=pq eqx−epx peqx+qepx. Now, we compute

ψ(2xp)=pq (q/p)2q(p/q)2p

p(q/p)2q+q(p/q)2p = · · · =q2p2=qp and

2ψ (2xp)=2 log

p(q/p)2q+q(p/q)2p

=2 log

(q+p)(q/p)qp

=2xpψ(2xp).

Thus, 2xp is the unique positive solution of 2ψ (x)=(x). As a conclusion, we getcp=ψ (2xp)/(4xp2), which

gives the desired formula after some algebra.

Remark 3.4 (Advantage of direct Laplace bounds). Consider a mixtureμ=1+0of two Gaussian laws μ0

andμ1onRwith same varianceσ2and different means.Corollary3.2ensures that for everyr≥0, βμ(r)≤2 exp

r2

2+4cpW10, μ1)2

.

This bound remains relevant asσ→0since we recover the bound for the Bernoulli mixing lawν=1+0.On the other hand,any concentration bound deduced from a logarithmic Sobolev inequality would blow up asσ goes to zero,as we shall see in Section4.

Remark 3.5 (Inhomogeneous tails). It is satisfactory to recover,whenpgoes to0 (resp. 1),the concentration bound ofμ0(resp.μ1)and not only the maximum of the bounds of the two components.It is possible to exhibit two regimes, corresponding to small and big values ofλ.Assume thatμi=N(0, θi2)fori∈ {0,1}withθ1> θ0>0.Theorem2.1 gives

αμ(λ)θ12λ2

2 +1θ0)λ.

On the other hand,one has logEμ(eλf)

eEμ(λf )

αμθ(λ)ν(dθ )+log

eHλ(θ )+λg(θ )ν(dθ ), where

Hλ(θ )=αμθ(λ)

αμθ(λ)ν

and g(θ )=EμθfEμf.

(12)

Then,Lemma3.3ensures that for everyε >0, log

eHλ(θ )+λg(θ )ν(dθ )cp

Hλ(1)+λg(1)Hλ(0)λg(0)2

cp

1

ε Hλ(1)Hλ(0) 2+ε λg(1)λg(0) 2 .

Choosingε=λleads to

log

eHλ(θ )+λg(θ )ν(dθ )cp

12θ02)2

4 +1θ0)2

λ3.

As a conclusionαμcan be controlled in(at least)these two ways:

αμ(λ)

⎧⎨

θ12λ2

2 +1θ0)λ,

12+02λ2

2 +cp

12θ02)2

4 +1θ0)2 λ3.

The second one provides sharp bounds forλf (1/cp)whereas the first one is useful forλf (1/cp)(wheref is an increasing function which is computable).

4. Gross–Poincaré inequalities for two-component mixtures

It is known that functional inequalities such as Poincaré and (Gross) logarithmic Sobolev inequalities provide, via Laplace exponential bounds, dimension free concentration bounds, see, for instance, [34]. It is quite natural to ask for such functional inequalities for mixtures. Before attacking the problem, some facts have to be emphasized.

As already mentioned in the introduction, a lawμwith disconnected support cannot satisfy a Poincaré or a log- arithmic Sobolev inequality. In particular, a mixture of laws with disjoint supports cannot satisfy such functional inequalities. This observation suggests that in order to obtain a functional inequality for a mixture, one has probably to control the considered functional inequality for each component of the mixture and to ensure that the support of the mixture is connected. It is important to realize that such a connectivity problem is due to the peculiarities of the functional inequalities, but does not pose a real problem for the concentration of measure properties, as suggested by Theorem3.1and Remark3.4, for instance. In the sequel, we will focus on the case of two-component mixtures, and try to get sharp bounds on the Poincaré and logarithmic Sobolev constants. The two-component case is fundamental.

The extension of the results to more general finite mixtures is possible by following roughly the same scheme, see Remark4.2below.

For the logarithmic Sobolev inequality of two-component mixtures, we will make use of the following optimal two-point lemma, obtained years ago independently by Diaconis and Saloff-Coste and Higushi and Yoshida. An elementary proof due to Bobkov is given by Saloff-Coste in his Saint-Flour Lecture Notes [45].

Lemma 4.1 (Optimal logarithmic Sobolev inequality for Bernoulli measures). For everyp(0,1)and every f:{0,1} →R,and with the convention(log(q)−log(p))/(q−p)=2ifp=q=1/2,we have

Ent1+0

f2

≤log(q)−log(p) qp pq

f (0)f (1)2

.

Moreover,the function ofpin front of the right-hand side cannot be improved.

Note that the constant in front of the right-hand side of the inequality provided by Lemma 4.1is nothing else butpq/(4cp)wherecp is as in Theorem3.1and Lemma3.3. At this stage, it is important to understand the deep difference between the Poincaré and the logarithmic Sobolev inequalities at the level of the two-point space. On the

Références

Documents relatifs

This paper is devoted to the study of the deviation of the (random) average L 2 −error associated to the least–squares regressor over a family of functions F n (with

In the appendices we give a new proof of Gaussian concentration from the exis- tence of an exponential moment of the square distance function, and provide a general approximation

Given estimates of the projection coefficients, we then construct bona fide series estimators of the component densities and a least-squares estimator of the mixing proportions..

Estimating extreme quantiles of Weibull tail- distributions. Comparison of Weibull

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

We also study the problem without restriction on the optimal plan, and provide lower and upper bounds for the value of the Gromov-Wasserstein distance between Gaussian

Once transfected colonies were selected, cells were stimulated with INF-  (15 ng/ml) for 24 h in the presence or absence of the HO-1 inducers tert-butyl hydroquinone (tBH), carnosol

Then, after having underlined the importance of using downbeat position information in the develop- ment of chord sequence prediction models, we introduced two different approach