HAL Id: hal-02092094
https://hal.archives-ouvertes.fr/hal-02092094v3
Submitted on 5 Mar 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
An exponential inequality for suprema of empirical processes with heavy tails on the left
Antoine Marchina
To cite this version:
Antoine Marchina. An exponential inequality for suprema of empirical processes with heavy tails on the left. Comptes Rendus. Mathématique, Académie des sciences (Paris), 2019,
�10.1016/j.crma.2019.06.008�. �hal-02092094v3�
Contents lists available atScienceDirect
C. R. Acad. Sci. Paris, Ser. I
www.sciencedirect.com
Probability theory/Statistics
An exponential inequality for suprema of empirical processes with heavy tails on the left
Une inégalité exponentielle pour les suprema de processus empiriques avec queues lourdes sur la gauche
Antoine Marchinaa,b
aUniversitéParis-Est,LAMA(UMR8050),UPEM,UPEC,CNRS,77454Marne-la-Vallée,France
bLaboratoiredemathématiquesdeVersailles,UVSQ,CNRS,UniversitéParis-Saclay,78035Versailles,France
a r t i c l e i n f o a b s t r a c t
Articlehistory:
Received11October2018 Acceptedafterrevision17June2019 Availableonline3July2019 PresentedbyMichelTalagrand
InthisNote,weprovideexponentialinequalitiesforsupremaofempiricalprocesseswith heavytailsontheleft.Ourapproachisbasedonamartingaledecomposition,associated with comparison inequalities overa cone ofconvex functions originally introduced by Pinelis.Furthermore,theconstantsinourinequalitiesareexplicit.
©2019Académiedessciences.PublishedbyElsevierMassonSAS.Thisisanopenaccess articleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).
r é s um é
Dans cette Note, nous donnons des inégalités exponentielles pour les suprema de processus empiriques avec queues lourdessur lagauche. Notreapproche est basée sur unedécompositionenmartingale, associéeàdesinégalitésdecomparaisonsuruncône de fonctions convexes, initialement introduit par Pinelis. Les constantes données sont explicites.
©2019Académiedessciences.PublishedbyElsevierMassonSAS.Thisisanopenaccess articleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
Let X,X1,. . . ,Xn beasequenceofindependentrandomvariablesvaluedinsomemeasurablespace(X,F),andidenti- callydistributedaccordingtoalawP.LetF beacountableclassofmeasurablefunctionsfromX into]−∞,1]suchthat P(f)=0 forall f∈F.Let1<p<2.Wesupposethat,forall f∈F, f(X)satisfiesthefollowingbehaviorontheleft:
P(f(X)≤ −t)≤c t
p
for anyt>0, (1.1)
forsomec>0.Let Znbethereal-valuedrandomvariabledefinedby
E-mailaddress:antoine.marchina@u-pem.fr.
https://doi.org/10.1016/j.crma.2019.06.008
1631-073X/©2019Académiedessciences.PublishedbyElsevierMassonSAS.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).
538 A. Marchina / C. R. Acad. Sci. Paris, Ser. I 357 (2019) 537–544
Zn:=supn
k=1
f(Xk): f∈F
. (1.2)
TheaimofthisNoteistogiveanexponentialboundfortheprobabilityofdeviationofZn aboveitsmean.Throughoutthe paper,wedenoteby ηaPareto(ontheleft)randomvariablewithparameters(c,p),thatis,itsdistributionfunction Fη is definedby
Fη(−t)=c t
p
∧1 and Fη(t)=Fη(0)=1,for anyt>0. (1.3)
The control oftherandom fluctuationsofan empiricalprocess hasa centralrole inmathematical statisticsandmachine learning. Forgeneralpresentationsoftheseconnections betweenempiricalprocesstheory andstatistics,see,forinstance, thebooksofVanderVaartandWellner[24],Massart[15],orKoltchinskii[9].Forexample,inthecontextofnonparametric estimation,thecalibrationofadaptivemethodsisstronglyrelatedtothecontrolofanempiricalprocess:see,forinstance, [10] fortheEmpiricalRiskMinimizationmethod,[2] fortheCross-Validationmethodand[7] fortheGoldenshluger–Lepski method.
SinceTalagrand’s[23] andLedoux’s[12] pioneeringwork,concentrationinequalitiesforsupremaofempiricalprocesses have been the subject ofintense research. Mainly, the aim is to reach optimal counterparts for Zn of classical Hoeffd- ing’s, Bernstein’s, andBennett’s exponential inequalities for sums ofi.i.d. random variables, which correspond to classes F reduced to one element. The reader is referred to Chapter 12 of the book by Boucheron, Lugosi, and Massart [5]
for an overviewof this subject. Recently, some effortshave been made to consider heaviertails by only assuming that supf∈F|f(X)| isLr-integrableforsome r>2: see,Boucheron, Bousquet,Lugosi, andMassart[4], Adamczak[1], vande Geer and Lederer [11], and Marchina [13]. However, the common point of all these results is that supf∈FVar(f(X)) is finite,whichwedonotwanttoassumeinthiswork.Instead,weassume(1.1) whichstatesthefirst-orderstochasticdomi- nanceof−ηover−f(X),forall f∈F.Inotherwords,Fηistheextremal(inthesenseoffirst-orderstochasticdominance) distribution that f(X) could have.The Paretodistributionis theprototype of“power-tailed”distributions, whichare im- portantparticularcasesofheavy-taileddistributions.Moreprecisely,adistributionissaidpower-tailedifitstailfunctionis oftheform x−α, α>0,forlarge x.Ithasthesingular propertythatevery momentgreaterthan the α-thisinfinite(see, forinstance,thebookofFoss etal.[6]).Here,sincewe assume1<p<2,thefirstmomentof ηisfinite,andthesecond one is infinite.In particular, combinedwith(1.1), it impliesthat Var(f(X)) could be infinitefor every f ∈F. Moreover, we emphasize that heavy-tailed dataare commonlyencountered, forexample,inthe areasofcomputer science, finance, biology,andastronomy,amongothers.Thentheassumption(1.1) isalsoofinterestforapplications.
Tothebestofourknowledge,theonlyresultwithouttheassumptionofsquareintegrabilityof f(X), f∈F,isprovided inRio[20,Theorem2].The authorgivesan upperbound ofthelog-Laplacetransformof Zn−E[Zn]involvingsquaresof positive partsandtruncatednegativepartsof f(X)forall f∈F.Hisproof reliesona martingaledecompositionof Zn− E[Zn] associatedwithanexponential inequalityforpositiveself-boundingfunctions(basedon Ledoux’sentropymethod) proved in Rio [21]. Our approach uses the same martingale decompositionused by Rio. However, the difference lies in the control of the martingale increments. To give an upper bound on their log-Laplace transform, we resort to convex comparisoninequalities,similartothoseofHoeffding [8] forboundedrandomvariables.Toourknowledge,thisresulthas nocounterpartintheexistingliterature.Thenitisnotthateasytostudytheoptimalityoftheconstantsappearinginour inequalities.Nevertheless,wethinkthatthemethodusedmaybeencouragingforfurtherworks.Actually,itseemsbeyond the scope of traditional functional analysistools to handle the case of nonfiniteness of supf∈FVar(f(X)). Furthermore, we haverecentlyshownthat martingalemethodscanbe usedto relaxclassical hypotheses(asthe uniformboundedness conditions) in concentration inequalities for separately convexfunctions of independent random variables, especially for supremaofempiricalprocesses(see[14,13]).
2. Result
Letusfirstgive somenotation.Foranyreal-valuedrandomvariable X, FX andF−X1 denoterespectivelythedistribution functionof X andthecàdlàginverseofFX.Forallreals xand α,x+:=max(0,x)andxα
+:=(x+)α.Wedenoteby(.)the usualgammafunction.
Forthesakeofclarity,letusrecallthesettingweworkwith.Let X,X1,. . . ,Xn beasequenceofi.i.d.randomvariables valued in(X,F) withcommondistribution P. Let1<p<2. Let η be a random variablewithdistribution function Fη definedby (1.3).We recallthat Fη dependsonc>0 and p.LetF be acountable classofmeasurablefunctionsfromX intoR.Wemakethefollowingassumptions.
Assumption2.1.Forall f∈F andallx∈X,
f(x)≤1 and P(f):=E[f(X)] =0. (2.1)
Assumption2.2.Theparameterc issuchthat c≥ p−1
2p−1 1/p
. (2.2)
Thiscondition on c ispurely technical.Note that thefunction h:p→ p−1
2p−1
1/p
isincreasing on [1,2].Since h(2)=
√1/3,ifc≥√
1/3,thenthecondition(2.2) issatisfied.
Assumption2.3.Foranyt>0,
E[(−t−f(X))+] ≤E[(−t−η)+]. (2.3)
Thisassumptioncorresponds tothesecond-order stochasticdominance of−η over−f(X),forall f∈F.Itisweaker thanthecondition(1.1),whichcorrespondstothefirst-orderstochasticdominance,andissufficientforourresult.
Thefollowingresultthenholds.
Theorem2.1.LetZnbedefinedby(1.2).Letq0betherealin]0,1[suchthat q0−c p
p−1(1−q0)1−1/p=0. (2.4)
UnderAssumptions2.1,2.2and2.3,foranyx≤q0(2−p),
P(Zn−E[Zn] ≥nx)≤exp
−nγpxp/(p−1)(1−εp(x))
, (a)
where
γp=(pαp)−1/(p−1)p−1
p , αp= cp
p−1(2−p), (2.5)
andεp(x)= p p−1
∞ k=2
q0
k!x(k−p)/(p−1)(pαp)−(k−1)/(p−1). (2.6) Moreover,foranyx>q0(2−p),
P(Zn−E[Zn] ≥nx)≤exp
−n(x(1−q0)1/pc−1−βp)
, (b)
where
βp=−(2−p)
p−1 −q0 −(2−p)
p−1 −1+exp((1−q0)1/pc−1)−(1−q0)1/pc−1
. (2.7)
Roughlyspeaking,Inequality(a)statesthat,forsmallenoughx>0,
P(Zn−E[Zn] ≥n1/px)≤exp
−Kpxq 1+O
x(2−p)/(p−1)n−2/p
, (2.8)
whereq=p/(p−1)istheHölderexponentconjugateofpandKp isaconstantdependingonlyonp.UnderAssumption2.1 andsupf∈FVar(f(X))<∞,itisknownthat ZnsatisfiesaBennett-typeinequality(see,forinstance,Rio[22,Theorem!1.1]).
Itleadstothefollowinginequalityforsmallenoughx>0:
P(Zn−E[Zn] ≥√
nx)≤exp − x2
2v 1+O
xn−1/2
, (2.9)
wherev:=σ2+2n−1E[Zn]and σ2:=supf∈F P(f2).Thus,Inequality(a)mayberegardedasanextensionof(2.9).
Remark2.2(Explanationof(2.4)).Foranyboundedrandomvariablea≤Y≤b,a,b∈R,Hoeffding[8] showsthatY ismore concentrate for the convexfunctions than the two-valued random variable θ taking the values a and b and such that E[θ]=E[Y] (seehis inequalities(4.1)and(4.2)).It meansthatE[ϕ(Y)]≤E[ϕ(θ )] forall convexfunctions ϕ.Thisresult hasbeenextendedtounboundedrandomvariables:therealsa,barereplacedbysomerandomvariables α,β anda≤Y≤b isreplacedby αYβ forsomestochastic order (seeBentkus [3] andMarchina[14]).Here,one hasη2 f(X)≤1, where 2 denotes the usual second-order stochastic dominance. Then, with respect to a class of convex functions, the distributionof f(X)ismoreconcentratethanthedistribution μ[q0] definedby
μ[q0](A)=μ(A∩ ]Fη−1(1−q0) ,+∞[)+q0δ1(A), (2.10)
540 A. Marchina / C. R. Acad. Sci. Paris, Ser. I 357 (2019) 537–544
Fig. 1.Impact of the value ofcon the functionp→q0(2−p).
where μdenotesthedistributionof η, A⊂R(measurable),δ1 standsfortheDiracmeasurecenteredon1 andq0∈ [0,1] issuchthat
tμ[q0](dt)=E[f(X)]=0.Infact,thislastequationisequivalentto(2.4).
By(2.4),q0 dependsonc.Theimpactofc onthewindow[0,q0(2−p)],inwhich(a)isvalid, isshowninFig.1.The function p→q0(2−p)forthevaluesc=3,c=1,c=√
1/3 andc=0.3 isrepresented.
3. Proof
Westartinthesamewayasintheproofofthemainresultsin[13],thatis,byamartingaledecompositionofZn−E[Zn]. Letusrecallthemainpoints. Thereaderisreferredto[13] formoredetails.First,byvirtueofthemonotoneconvergence theorem,wecansupposethatF isafiniteclassoffunctions.SetF0:= {∅,}andforallk=1,. . . ,n,Fk:=σ(X1,. . . ,Xk), andFnk:=σ(X1,. . . ,Xk−1,Xk+1,. . . ,Xn).LetEk (respectively Ekn) denotetheconditional expectationoperator associated withFk(resp.Fnk).Setalso Zn(k):=supf∈F
j=kf(Xj)and Zk:=Ek[Zn].LetusnumberthefunctionsoftheclassF and considertherandomindices
τ:=inf
i>0: n k=1
fi(Xk)=Zn
τk:=inf
⎧⎨
⎩i>0: n
j=1
fi(Xj)−fi(Xk)=Z(nk)
⎫⎬
⎭.
Defineξk:=Ek[fτk(Xk)]andrk:=(Zk−Ek[Zn(k)])−ξk≥0.Notethatwehave
ξk≤ξk+rk≤Ek[fτ(Xk)]. (3.1)
BythecenteringassumptionontheelementsofF,Ek−1[ξk]=0,leadingto
Zn−E[Zn] = n k=1
k wherek:=ξk+rk−Ek−1[rk]. (3.2)
Theproofismadeinthreesteps:
1. UsingtheresultsofSection3inMarchina[14],wecomparegeneralized(conditional)momentsofk withthoseofa randomvariableζq0 withdistribution μ[q0] givenin(2.10).Inparticular,theclassofgeneralizedmomentsonwhichwe obtainacomparisoninequalitycontainsincreasingexponentialfunctionsx→etxforeveryt≥0.
2. Wegiveanupperboundontheexponentialmomentsofζq0. 3. WeconcludetheproofbytheusualCramér–Chernoffcalculation.
Step1:Comparisoninequality
Letusdenotebyζq,foranyq∈ [0,1],arandomvariablewithdistributionfunctiongivenby
Fq(x):=
⎧⎨
⎩
Fη(x) ifx<aq, 1−q ifaq≤x<1,
1 ifx≥1,
(3.3)
whereaq:=F−η1(1−q).Noticethat
E[ζq] =q−c p
p−1(1−q)1−1/p, (3.4)
whichensuresthat(2.4) isequivalenttoE[ζq0]=0.Letusalsodefine H2+:=
ϕ∈C1(R):ϕis convex, and lim
x→−∞ϕ(x)= lim
x→−∞ϕ(x)=0
Thispartisdevotedtoprovethefollowinglemma.
Lemma3.1.Forany ϕ∈H2+andanyk=1,. . . ,n,
Ek−1[ϕ(k)] ≤E[ϕ(ζq0)]. (a)
Consequently,foranyt≥0,
logE[exp(t(Zn−E[Zn]))] ≤nlogE[exp(tζq0)]. (b) Remark3.2.Comparison inequalitieswithrespecttotheclass offunctionsH2+ (or moregenerallyH+α, α>0) havebeen widelystudiedbyPinelis(theseinclude,amongothers,[16–19],andwereferthereadertothesepapersformoredetails).
Weonlyrecallthatwehavethefollowingequivalence:
(i) E[ϕ(X)]≤E[ϕ(Y)]forany ϕ∈H+2 (ii) E[(X−t)2+]≤E[(Y−t)2+]foranyt∈R.
TheproofofLemma3.1liesonthefollowingresults,whichwereestablishedinMarchina[14].
Lemma3.3(Lemmas4.3and4.6(i)in[14]).Let ηbedefinedby(1.3).
(i) Let X be an integrablerandom variablesuch that X≤1 andforanyrealt,E[(t−X)+]≤E[(t−η)+].Then forany convexfunction ϕ,
E[ϕ(X−E[X])] ≤E[ϕ(ζq−E[ζq])],
whereq∈ [0,1]issuchthatE[ζq]=E[X].
(ii) Letq˜:=inf{q≥1/2:1+Fη−1(1−q)≤2E[ζq]}.Forallt∈R,thefunction q→E[(ζq−E[ζq] −t)2+]
isnonincreasingon[˜q,1].
Proof of Lemma3.1. SinceH2+containsallincreasingexponentialfunctions,taking ϕ(x)=etx witht≥0 in(a)leadsto(b) byan inductiononn.Moreover,inview ofRemark3.2,we onlyhavetoprove(a) forthefunctions ϕ(x)=(x−t)2+,with t∈R.Lett>0.Sincerk≥0,Jensen’sinequalityimpliesthat
Ek−1[(−t−(ξk+rk))+] ≤Ek−1[(−t−ξk)+]
≤Ek−1[(−t−fτk(Xk))+]
≤E[(−t−η)+], (3.5)
wherethelast inequalityfollowsfromAssumption2.3.Furthermore,wecandirectlyverifythat(3.5) holdsfort≤0.Since f ≤1 forany f∈F,(3.1) impliesthat ξk+rk≤1.Hence,applyingLemma3.3(i)conditionallyto Fk−1,with X=ξk+rk, yieldsthat
Ek−1[ϕ(k)] ≤Ek−1[ϕ(ζqˆ−E[ζqˆ])], (3.6)