• Aucun résultat trouvé

An exponential inequality for suprema of empirical processes with heavy tails on the left

N/A
N/A
Protected

Academic year: 2021

Partager "An exponential inequality for suprema of empirical processes with heavy tails on the left"

Copied!
9
0
0

Texte intégral

(1)

HAL Id: hal-02092094

https://hal.archives-ouvertes.fr/hal-02092094v3

Submitted on 5 Mar 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

An exponential inequality for suprema of empirical processes with heavy tails on the left

Antoine Marchina

To cite this version:

Antoine Marchina. An exponential inequality for suprema of empirical processes with heavy tails on the left. Comptes Rendus. Mathématique, Académie des sciences (Paris), 2019,

�10.1016/j.crma.2019.06.008�. �hal-02092094v3�

(2)

Contents lists available atScienceDirect

C. R. Acad. Sci. Paris, Ser. I

www.sciencedirect.com

Probability theory/Statistics

An exponential inequality for suprema of empirical processes with heavy tails on the left

Une inégalité exponentielle pour les suprema de processus empiriques avec queues lourdes sur la gauche

Antoine Marchinaa,b

aUniversitéParis-Est,LAMA(UMR8050),UPEM,UPEC,CNRS,77454Marne-la-Vallée,France

bLaboratoiredemathématiquesdeVersailles,UVSQ,CNRS,UniversitéParis-Saclay,78035Versailles,France

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received11October2018 Acceptedafterrevision17June2019 Availableonline3July2019 PresentedbyMichelTalagrand

InthisNote,weprovideexponentialinequalitiesforsupremaofempiricalprocesseswith heavytailsontheleft.Ourapproachisbasedonamartingaledecomposition,associated with comparison inequalities overa cone ofconvex functions originally introduced by Pinelis.Furthermore,theconstantsinourinequalitiesareexplicit.

©2019Académiedessciences.PublishedbyElsevierMassonSAS.Thisisanopenaccess articleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).

r é s um é

Dans cette Note, nous donnons des inégalités exponentielles pour les suprema de processus empiriques avec queues lourdessur lagauche. Notreapproche est basée sur unedécompositionenmartingale, associéeàdesinégalitésdecomparaisonsuruncône de fonctions convexes, initialement introduit par Pinelis. Les constantes données sont explicites.

©2019Académiedessciences.PublishedbyElsevierMassonSAS.Thisisanopenaccess articleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Let X,X1,. . . ,Xn beasequenceofindependentrandomvariablesvaluedinsomemeasurablespace(X,F),andidenti- callydistributedaccordingtoalawP.LetF beacountableclassofmeasurablefunctionsfromX into]−∞,1]suchthat P(f)=0 forall f∈F.Let1<p<2.Wesupposethat,forall f∈F, f(X)satisfiesthefollowingbehaviorontheleft:

P(f(X)≤ −t)c t

p

for anyt>0, (1.1)

forsomec>0.Let Znbethereal-valuedrandomvariabledefinedby

E-mailaddress:antoine.marchina@u-pem.fr.

https://doi.org/10.1016/j.crma.2019.06.008

1631-073X/©2019Académiedessciences.PublishedbyElsevierMassonSAS.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).

(3)

538 A. Marchina / C. R. Acad. Sci. Paris, Ser. I 357 (2019) 537–544

Zn:=supn

k=1

f(Xk): f∈F

. (1.2)

TheaimofthisNoteistogiveanexponentialboundfortheprobabilityofdeviationofZn aboveitsmean.Throughoutthe paper,wedenoteby ηaPareto(ontheleft)randomvariablewithparameters(c,p),thatis,itsdistributionfunction Fη is definedby

Fη(t)=c t

p

1 and Fη(t)=Fη(0)=1,for anyt>0. (1.3)

The control oftherandom fluctuationsofan empiricalprocess hasa centralrole inmathematical statisticsandmachine learning. Forgeneralpresentationsoftheseconnections betweenempiricalprocesstheory andstatistics,see,forinstance, thebooksofVanderVaartandWellner[24],Massart[15],orKoltchinskii[9].Forexample,inthecontextofnonparametric estimation,thecalibrationofadaptivemethodsisstronglyrelatedtothecontrolofanempiricalprocess:see,forinstance, [10] fortheEmpiricalRiskMinimizationmethod,[2] fortheCross-Validationmethodand[7] fortheGoldenshluger–Lepski method.

SinceTalagrand’s[23] andLedoux’s[12] pioneeringwork,concentrationinequalitiesforsupremaofempiricalprocesses have been the subject ofintense research. Mainly, the aim is to reach optimal counterparts for Zn of classical Hoeffd- ing’s, Bernstein’s, andBennett’s exponential inequalities for sums ofi.i.d. random variables, which correspond to classes F reduced to one element. The reader is referred to Chapter 12 of the book by Boucheron, Lugosi, and Massart [5]

for an overviewof this subject. Recently, some effortshave been made to consider heaviertails by only assuming that supf∈F|f(X)| isLr-integrableforsome r>2: see,Boucheron, Bousquet,Lugosi, andMassart[4], Adamczak[1], vande Geer and Lederer [11], and Marchina [13]. However, the common point of all these results is that supf∈FVar(f(X)) is finite,whichwedonotwanttoassumeinthiswork.Instead,weassume(1.1) whichstatesthefirst-orderstochasticdomi- nanceofηoverf(X),forall f∈F.Inotherwords,Fηistheextremal(inthesenseoffirst-orderstochasticdominance) distribution that f(X) could have.The Paretodistributionis theprototype of“power-tailed”distributions, whichare im- portantparticularcasesofheavy-taileddistributions.Moreprecisely,adistributionissaidpower-tailedifitstailfunctionis oftheform xα, α>0,forlarge x.Ithasthesingular propertythatevery momentgreaterthan the α-thisinfinite(see, forinstance,thebookofFoss etal.[6]).Here,sincewe assume1<p<2,thefirstmomentof ηisfinite,andthesecond one is infinite.In particular, combinedwith(1.1), it impliesthat Var(f(X)) could be infinitefor every f ∈F. Moreover, we emphasize that heavy-tailed dataare commonlyencountered, forexample,inthe areasofcomputer science, finance, biology,andastronomy,amongothers.Thentheassumption(1.1) isalsoofinterestforapplications.

Tothebestofourknowledge,theonlyresultwithouttheassumptionofsquareintegrabilityof f(X), f∈F,isprovided inRio[20,Theorem2].The authorgivesan upperbound ofthelog-Laplacetransformof Zn−E[Zn]involvingsquaresof positive partsandtruncatednegativepartsof f(X)forall f∈F.Hisproof reliesona martingaledecompositionof Zn− E[Zn] associatedwithanexponential inequalityforpositiveself-boundingfunctions(basedon Ledoux’sentropymethod) proved in Rio [21]. Our approach uses the same martingale decompositionused by Rio. However, the difference lies in the control of the martingale increments. To give an upper bound on their log-Laplace transform, we resort to convex comparisoninequalities,similartothoseofHoeffding [8] forboundedrandomvariables.Toourknowledge,thisresulthas nocounterpartintheexistingliterature.Thenitisnotthateasytostudytheoptimalityoftheconstantsappearinginour inequalities.Nevertheless,wethinkthatthemethodusedmaybeencouragingforfurtherworks.Actually,itseemsbeyond the scope of traditional functional analysistools to handle the case of nonfiniteness of supf∈FVar(f(X)). Furthermore, we haverecentlyshownthat martingalemethodscanbe usedto relaxclassical hypotheses(asthe uniformboundedness conditions) in concentration inequalities for separately convexfunctions of independent random variables, especially for supremaofempiricalprocesses(see[14,13]).

2. Result

Letusfirstgive somenotation.Foranyreal-valuedrandomvariable X, FX andFX1 denoterespectivelythedistribution functionof X andthecàdlàginverseofFX.Forallreals xand α,x+:=max(0,x)andxα

+:=(x+)α.Wedenoteby(.)the usualgammafunction.

Forthesakeofclarity,letusrecallthesettingweworkwith.Let X,X1,. . . ,Xn beasequenceofi.i.d.randomvariables valued in(X,F) withcommondistribution P. Let1<p<2. Let η be a random variablewithdistribution function Fη definedby (1.3).We recallthat Fη dependsonc>0 and p.LetF be acountable classofmeasurablefunctionsfromX intoR.Wemakethefollowingassumptions.

Assumption2.1.Forall f∈F andallxX,

f(x)1 and P(f):=E[f(X)] =0. (2.1)

(4)

Assumption2.2.Theparameterc issuchthat cp1

2p−1 1/p

. (2.2)

Thiscondition on c ispurely technical.Note that thefunction h:pp1

2p1

1/p

isincreasing on [1,2].Since h(2)=

1/3,ifc≥√

1/3,thenthecondition(2.2) issatisfied.

Assumption2.3.Foranyt>0,

E[(tf(X))+] ≤E[(tη)+]. (2.3)

Thisassumptioncorresponds tothesecond-order stochasticdominance ofη overf(X),forall f∈F.Itisweaker thanthecondition(1.1),whichcorrespondstothefirst-orderstochasticdominance,andissufficientforourresult.

Thefollowingresultthenholds.

Theorem2.1.LetZnbedefinedby(1.2).Letq0betherealin]0,1[suchthat q0c p

p1(1−q0)11/p=0. (2.4)

UnderAssumptions2.1,2.2and2.3,foranyxq0(2p),

P(Zn−E[Zn] ≥nx)exp

nγpxp/(p1)(1−εp(x))

, (a)

where

γp=(pαp)1/(p1)p1

p , αp= cp

p1(2−p), (2.5)

andεp(x)= p p1

k=2

q0

k!x(kp)/(p1)(pαp)−(k1)/(p1). (2.6) Moreover,foranyx>q0(2p),

P(Zn−E[Zn] ≥nx)≤exp

n(x(1−q0)1/pc1βp)

, (b)

where

βp=−(2p)

p1q0(2−p)

p11+exp((1−q0)1/pc1)(1−q0)1/pc1

. (2.7)

Roughlyspeaking,Inequality(a)statesthat,forsmallenoughx>0,

P(Zn−E[Zn] ≥n1/px)≤exp

Kpxq 1+O

x(2p)/(p1)n2/p

, (2.8)

whereq=p/(p1)istheHölderexponentconjugateofpandKp isaconstantdependingonlyonp.UnderAssumption2.1 andsupf∈FVar(f(X))<,itisknownthat ZnsatisfiesaBennett-typeinequality(see,forinstance,Rio[22,Theorem!1.1]).

Itleadstothefollowinginequalityforsmallenoughx>0:

P(Zn−E[Zn] ≥√

nx)≤exp − x2

2v 1+O

xn1/2

, (2.9)

wherev:=σ2+2n1E[Zn]and σ2:=supf∈F P(f2).Thus,Inequality(a)mayberegardedasanextensionof(2.9).

Remark2.2(Explanationof(2.4)).ForanyboundedrandomvariableaYb,a,b∈R,Hoeffding[8] showsthatY ismore concentrate for the convexfunctions than the two-valued random variable θ taking the values a and b and such that E[θ]=E[Y] (seehis inequalities(4.1)and(4.2)).It meansthatE[ϕ(Y)]≤E[ϕ(θ )] forall convexfunctions ϕ.Thisresult hasbeenextendedtounboundedrandomvariables:therealsa,barereplacedbysomerandomvariables α,β andaYb isreplacedby αYβ forsomestochastic order (seeBentkus [3] andMarchina[14]).Here,one hasη2 f(X)1, where 2 denotes the usual second-order stochastic dominance. Then, with respect to a class of convex functions, the distributionof f(X)ismoreconcentratethanthedistribution μ[q0] definedby

μ[q0](A)=μ(A∩ ]Fη1(1−q0) ,+∞[)+q0δ1(A), (2.10)

(5)

540 A. Marchina / C. R. Acad. Sci. Paris, Ser. I 357 (2019) 537–544

Fig. 1.Impact of the value ofcon the functionpq0(2p).

where μdenotesthedistributionof η, A⊂R(measurable),δ1 standsfortheDiracmeasurecenteredon1 andq0∈ [0,1] issuchthat

tμ[q0](dt)=E[f(X)]=0.Infact,thislastequationisequivalentto(2.4).

By(2.4),q0 dependsonc.Theimpactofc onthewindow[0,q0(2p)],inwhich(a)isvalid, isshowninFig.1.The function pq0(2p)forthevaluesc=3,c=1,c=√

1/3 andc=0.3 isrepresented.

3. Proof

Westartinthesamewayasintheproofofthemainresultsin[13],thatis,byamartingaledecompositionofZn−E[Zn]. Letusrecallthemainpoints. Thereaderisreferredto[13] formoredetails.First,byvirtueofthemonotoneconvergence theorem,wecansupposethatF isafiniteclassoffunctions.SetF0:= {∅,}andforallk=1,. . . ,n,Fk:=σ(X1,. . . ,Xk), andFnk:=σ(X1,. . . ,Xk1,Xk+1,. . . ,Xn).LetEk (respectively Ekn) denotetheconditional expectationoperator associated withFk(resp.Fnk).Setalso Zn(k):=supf∈F

j=kf(Xj)and Zk:=Ek[Zn].LetusnumberthefunctionsoftheclassF and considertherandomindices

τ:=inf

i>0: n k=1

fi(Xk)=Zn

τk:=inf

⎧⎨

i>0: n

j=1

fi(Xj)fi(Xk)=Z(nk)

⎫⎬

.

Defineξk:=Ek[fτk(Xk)]andrk:=(Zk−Ek[Zn(k)])ξk0.Notethatwehave

ξkξk+rk≤Ek[fτ(Xk)]. (3.1)

BythecenteringassumptionontheelementsofF,Ek1k]=0,leadingto

Zn−E[Zn] = n k=1

k wherek:=ξk+rk−Ek1[rk]. (3.2)

Theproofismadeinthreesteps:

1. UsingtheresultsofSection3inMarchina[14],wecomparegeneralized(conditional)momentsofk withthoseofa randomvariableζq0 withdistribution μ[q0] givenin(2.10).Inparticular,theclassofgeneralizedmomentsonwhichwe obtainacomparisoninequalitycontainsincreasingexponentialfunctionsxetxforeveryt0.

2. Wegiveanupperboundontheexponentialmomentsofζq0. 3. WeconcludetheproofbytheusualCramér–Chernoffcalculation.

(6)

Step1:Comparisoninequality

Letusdenotebyζq,foranyq∈ [0,1],arandomvariablewithdistributionfunctiongivenby

Fq(x):=

⎧⎨

Fη(x) ifx<aq, 1−q ifaqx<1,

1 ifx≥1,

(3.3)

whereaq:=Fη1(1q).Noticethat

E[ζq] =qc p

p1(1−q)11/p, (3.4)

whichensuresthat(2.4) isequivalenttoE[ζq0]=0.Letusalsodefine H2+:=

ϕC1(R):ϕis convex, and lim

x→−∞ϕ(x)= lim

x→−∞ϕ(x)=0

Thispartisdevotedtoprovethefollowinglemma.

Lemma3.1.Forany ϕ∈H2+andanyk=1,. . . ,n,

Ek1[ϕ(k)] ≤E[ϕ(ζq0)]. (a)

Consequently,foranyt0,

logE[exp(t(Zn−E[Zn]))] ≤nlogE[exp(tζq0)]. (b) Remark3.2.Comparison inequalitieswithrespecttotheclass offunctionsH2+ (or moregenerallyH+α, α>0) havebeen widelystudiedbyPinelis(theseinclude,amongothers,[16–19],andwereferthereadertothesepapersformoredetails).

Weonlyrecallthatwehavethefollowingequivalence:

(i) E[ϕ(X)]≤E[ϕ(Y)]forany ϕ∈H+2 (ii) E[(Xt)2+]≤E[(Yt)2+]foranyt∈R.

TheproofofLemma3.1liesonthefollowingresults,whichwereestablishedinMarchina[14].

Lemma3.3(Lemmas4.3and4.6(i)in[14]).Let ηbedefinedby(1.3).

(i) Let X be an integrablerandom variablesuch that X1 andforanyrealt,E[(tX)+]≤E[(tη)+].Then forany convexfunction ϕ,

E[ϕ(X−E[X])] ≤E[ϕ(ζq−E[ζq])],

whereq∈ [0,1]issuchthatE[ζq]=E[X].

(ii) Letq˜:=inf{q1/2:1+Fη1(1q)2E[ζq]}.Forallt∈R,thefunction q→E[(ζq−E[ζq] −t)2+]

isnonincreasingonq,1].

Proof of Lemma3.1. SinceH2+containsallincreasingexponentialfunctions,taking ϕ(x)=etx witht0 in(a)leadsto(b) byan inductiononn.Moreover,inview ofRemark3.2,we onlyhavetoprove(a) forthefunctions ϕ(x)=(xt)2+,with t∈R.Lett>0.Sincerk0,Jensen’sinequalityimpliesthat

Ek1[(−tk+rk))+] ≤Ek1[(−tξk)+]

≤Ek1[(−tfτk(Xk))+]

≤E[(−tη)+], (3.5)

wherethelast inequalityfollowsfromAssumption2.3.Furthermore,wecandirectlyverifythat(3.5) holdsfort0.Since f1 forany f∈F,(3.1) impliesthat ξk+rk1.Hence,applyingLemma3.3(i)conditionallyto Fk1,with X=ξk+rk, yieldsthat

Ek1[ϕ(k)] ≤Ek1[ϕ(ζqˆ−E[ζqˆ])], (3.6)

Références

Documents relatifs

In this paper, we provide left deviation inequalities for suprema of unbounded empirical processes associated with independent and iden- tically distributed random variables by means

In this paper, we provide new concentration inequalities for suprema of (possibly) non-centered and unbounded empirical processes associ- ated with independent and

Using the Lya- punov function approach we prove that such measures satisfy different kind of functional inequalities such as weak Poincar´e and weak Cheeger, weighted Poincar´e

However this time change ar- gument does not apply to the jump-diffusion case, in addition in the pure jump case it cannot be used when (J t ) t∈R + is non-constant, and in the

(III) Pareto distribution We use a slight modification of the Hill estimator. The Hill estimator allows to compute the slope of such a graph, using a weighted least squares

In this paper a concentration inequality is proved for the deviation in the ergodic theorem in the case of discrete time observations of diffusion processes.. The proof is based on

In the whole paper (X i ) 16i6n is a sequence of centred random variables. We shall obtain here a value of ρ which depends on conditional moments of the variables. We also want

The estimators withing this family are aggregated based on the exponential weighting method and the performance of the aggregated estimate is measured by the excess risk