An exponential inequality for suprema of empirical processes with heavy tails on the left

(1)

HAL Id: hal-02092094

https://hal.archives-ouvertes.fr/hal-02092094v3

Submitted on 5 Mar 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

An exponential inequality for suprema of empirical processes with heavy tails on the left

Antoine Marchina

To cite this version:

Antoine Marchina. An exponential inequality for suprema of empirical processes with heavy tails on the left. Comptes Rendus. Mathématique, Académie des sciences (Paris), 2019,

�10.1016/j.crma.2019.06.008�. �hal-02092094v3�

(2)

Contents lists available atScienceDirect

C. R. Acad. Sci. Paris, Ser. I

www.sciencedirect.com

Probability theory/Statistics

An exponential inequality for suprema of empirical processes with heavy tails on the left

Une inégalité exponentielle pour les suprema de processus empiriques avec queues lourdes sur la gauche

Antoine Marchina^a^,^b

aUniversitéParis-Est,LAMA(UMR8050),UPEM,UPEC,CNRS,77454Marne-la-Vallée,France

bLaboratoiredemathématiquesdeVersailles,UVSQ,CNRS,UniversitéParis-Saclay,78035Versailles,France

a r t i c l e i n f o a b s t r a c t

Articlehistory:

Received11October2018 Acceptedafterrevision17June2019 Availableonline3July2019 PresentedbyMichelTalagrand

InthisNote,weprovideexponentialinequalitiesforsupremaofempiricalprocesseswith heavytailsontheleft.Ourapproachisbasedonamartingaledecomposition,associated with comparison inequalities overa cone ofconvex functions originally introduced by Pinelis.Furthermore,theconstantsinourinequalitiesareexplicit.

©²⁰¹⁹Âcadémie^des^sciences.^Published^byÊlsevier^Masson^SAS.^Thisîsânôpenâccess articleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).

r é s um é

Dans cette Note, nous donnons des inégalités exponentielles pour les suprema de processus empiriques avec queues lourdessur lagauche. Notreapproche est basée sur unedécompositionenmartingale, associéeàdesinégalitésdecomparaisonsuruncône de fonctions convexes, initialement introduit par Pinelis. Les constantes données sont explicites.

©²⁰¹⁹Âcadémie^des^sciences.^Published^byÊlsevier^Masson^SAS.^Thisîsânôpenâccess articleundertheCCBY-NC-NDlicense (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction

Let X,X1,. . . ,Xn beasequenceofindependentrandomvariablesvaluedinsomemeasurablespace(X,F),andidenti- callydistributedaccordingtoalawP.LetF beacountableclassofmeasurablefunctionsfromX ^into]−∞,1]^such^that P(f)=0 forall f∈F.Let1<p<2.Wesupposethat,forall f∈F, f(X)satisﬁesthefollowingbehaviorontheleft:

P(f(X)≤ −^t)≤_c t

p

for anyt>0, (1.1)

forsomec>0.Let Znbethereal-valuedrandomvariabledeﬁnedby

E-mailaddress:antoine.marchina@u-pem.fr.

https://doi.org/10.1016/j.crma.2019.06.008

1631-073X/©²⁰¹⁹Âcadémie^des^sciences.^Published^byÊlsevier^Masson^SAS.^Thisîsânôpenâccessârticleûnder^the^CC^BY-NC-ND^license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

(3)

538 A. Marchina / C. R. Acad. Sci. Paris, Ser. I 357 (2019) 537–544

Z_n:=^supⁿ

k=¹

f(X_k): ^f∈F

. (1.2)

TheaimofthisNoteistogiveanexponentialboundfortheprobabilityofdeviationofZn aboveitsmean.Throughoutthe paper,wedenoteby ηaPareto(ontheleft)randomvariablewithparameters(c,p),thatis,itsdistributionfunction Fη is deﬁnedby

F_η(−^t)=_c t

p

∧^{1 and} ^Fη(t)=^Fη(0)=¹,for anyt>0. (1.3)

The control oftherandom ﬂuctuationsofan empiricalprocess hasa centralrole inmathematical statisticsandmachine learning. Forgeneralpresentationsoftheseconnections betweenempiricalprocesstheory andstatistics,see,forinstance, thebooksofVanderVaartandWellner[24],Massart[15],orKoltchinskii[9].Forexample,inthecontextofnonparametric estimation,thecalibrationofadaptivemethodsisstronglyrelatedtothecontrolofanempiricalprocess:see,forinstance, [10] fortheEmpiricalRiskMinimizationmethod,[2] fortheCross-Validationmethodand[7] fortheGoldenshluger–Lepski method.

SinceTalagrand’s[23] andLedoux’s[12] pioneeringwork,concentrationinequalitiesforsupremaofempiricalprocesses have been the subject ofintense research. Mainly, the aim is to reach optimal counterparts for Z_n of classical Hoeffd- ing’s, Bernstein’s, andBennett’s exponential inequalities for sums ofi.i.d. random variables, which correspond to classes F reduced to one element. The reader is referred to Chapter 12 of the book by Boucheron, Lugosi, and Massart [5]

for an overviewof this subject. Recently, some effortshave been made to consider heaviertails by only assuming that sup_f_∈F|^f(X)| îsL^r-integrableforsome r>2: see,Boucheron, Bousquet,Lugosi, andMassart[4], Adamczak[1], vande Geer and Lederer [11], and Marchina [13]. However, the common point of all these results is that sup_f_∈FVar(f(X)) is finite,whichwedonotwanttoassumeinthiswork.Instead,weassume(1.1) whichstatesthefirst-orderstochasticdomi- nanceof−ηover−^f(X),forall f∈F.Inotherwords,Fηistheextremal(inthesenseoffirst-orderstochasticdominance) distribution that f(X) could have.The Paretodistributionis theprototype of“power-tailed”distributions, whichare im- portantparticularcasesofheavy-taileddistributions.Moreprecisely,adistributionissaidpower-tailedifitstailfunctionis oftheform x⁻α, α>0,forlarge x.Ithasthesingular propertythatevery momentgreaterthan the α-thisinfinite(see, forinstance,thebookofFoss etal.[6]).Here,sincewe assume1<p<2,thefirstmomentof ηisfinite,andthesecond one is infinite.In particular, combinedwith(1.1), it impliesthat Var(f(X)) could be infinitefor every f ∈F. Moreover, we emphasize that heavy-tailed dataare commonlyencountered, forexample,inthe areasofcomputer science, finance, biology,andastronomy,amongothers.Thentheassumption(1.1) isalsoofinterestforapplications.

Tothebestofourknowledge,theonlyresultwithouttheassumptionofsquareintegrabilityof f(X), f∈F,isprovided inRio[20,Theorem2].The authorgivesan upperbound ofthelog-Laplacetransformof Zn−E[^Zn]învolving^squaresôf positive partsandtruncatednegativepartsof f(X)forall f∈F.Hisproof reliesona martingaledecompositionof Zn− E[^Zn] âssociated^withânexponential inequalityforpositiveself-boundingfunctions(basedon Ledoux’sentropymethod) proved in Rio [21]. Our approach uses the same martingale decompositionused by Rio. However, the difference lies in the control of the martingale increments. To give an upper bound on their log-Laplace transform, we resort to convex comparisoninequalities,similartothoseofHoeffding [8] forboundedrandomvariables.Toourknowledge,thisresulthas nocounterpartintheexistingliterature.Thenitisnotthateasytostudytheoptimalityoftheconstantsappearinginour inequalities.Nevertheless,wethinkthatthemethodusedmaybeencouragingforfurtherworks.Actually,itseemsbeyond the scope of traditional functional analysistools to handle the case of nonfiniteness of sup_f_∈FVar(f(X)). Furthermore, we haverecentlyshownthat martingalemethodscanbe usedto relaxclassical hypotheses(asthe uniformboundedness conditions) in concentration inequalities for separately convexfunctions of independent random variables, especially for supremaofempiricalprocesses(see[14,13]).

2. Result

Letusﬁrstgive somenotation.Foranyreal-valuedrandomvariable X, FX andF⁻_X¹ denoterespectivelythedistribution functionof X andthecàdlàginverseofF_X.Forallreals xand α,x₊:=^max(0,x)andxα

+:=(x₊)^α.Wedenoteby(.)the usualgammafunction.

Forthesakeofclarity,letusrecallthesettingweworkwith.Let X,X1,. . . ,Xn beasequenceofi.i.d.randomvariables valued in(X,F) withcommondistribution P. Let1<p<2. Let η be a random variablewithdistribution function Fη deﬁnedby (1.3).We recallthat Fη dependsonc>0 and p.LetF be acountable classofmeasurablefunctionsfromX intoR.Wemakethefollowingassumptions.

Assumption2.1.Forall f∈F andallx∈X,

f(x)≤¹ ^and ^P(f):=E[^f(X)] =⁰. (2.1)

(4)

Assumption2.2.Theparameterc issuchthat c≥ _p−¹

2p−¹ 1/p

. (2.2)

Thiscondition on c ispurely technical.Note that thefunction h:^p→ _p₋₁

2p−1

1/p

isincreasing on [¹,2]^.^Since ^h(2)=

√1/3,ifc≥√

1/3,thenthecondition(2.2) issatisﬁed.

Assumption2.3.Foranyt>0,

E[(−^t−^f(X))₊] ≤E[(−^t−η)₊]. (2.3)

Thisassumptioncorresponds tothesecond-order stochasticdominance of−η over−^f(X),forall f∈F.Itisweaker thanthecondition(1.1),whichcorrespondstotheﬁrst-orderstochasticdominance,andissuﬃcientforourresult.

Thefollowingresultthenholds.

Theorem2.1.LetZnbedeﬁnedby(1.2).Letq0betherealin]⁰,1[^such^that q₀−^c ^p

p−¹(1−^q0)¹⁻¹^/^p=⁰. (2.4)

UnderAssumptions2.1,2.2and2.3,foranyx≤^q0(2−^p),

P(Z_n−E[^Zn] ≥^nx)≤^exp

−ⁿγpx^p^/(^p⁻¹⁾(1−εp(x))

, (a)

where

γp=(pαp)⁻¹^/(^p⁻¹⁾^p−¹

p , αp= ^c^p

p−¹(2−^p), (2.5)

andεp(x)= ^p p−¹

∞ k=²

q₀

k!^x⁽^k⁻^p^)/(^p⁻¹⁾(pαp)⁻⁽^k⁻¹^)/(^p⁻¹⁾. (2.6) Moreover,foranyx>q0(2−^p),

P(Zn−E[Zn] ≥nx)≤exp

−n(x(1−q0)¹^/^pc⁻¹−βp)

, (b)

where

βp=−(²−^p)

p−¹ −^q0 −(2−^p)

p−¹ −¹+^exp((1−^q0)¹^/^pc⁻¹)−(1−^q0)¹^/^pc⁻¹

. (2.7)

Roughlyspeaking,Inequality(a)statesthat,forsmallenoughx>0,

P(Zn−E[Zn] ≥n¹^/^px)≤exp

−Kpx^q 1+O

x⁽²⁻^p^)/(^p⁻¹⁾n⁻²^/^p

, (2.8)

whereq=^p/(p−¹)istheHölderexponentconjugateofpandKp isaconstantdependingonlyonp.UnderAssumption2.1 andsup_f_∈FVar(f(X))<∞^,îtîs^known^that ^ZnsatisfiesaBennett-typeinequality(see,forinstance,Rio[22,Theorem!1.1]).

Itleadstothefollowinginequalityforsmallenoughx>0:

P(Zn−E[Zn] ≥√

nx)≤exp − ^x²

2v 1+O

xn⁻¹^/²

, (2.9)

wherev:=σ²₊2n⁻¹E[^Zn]^and σ²:=^supf∈F P(f²).Thus,Inequality(a)mayberegardedasanextensionof(2.9).

Remark2.2(Explanationof(2.4)).Foranyboundedrandomvariablea≤^Y≤^b,â,b∈R,Hoeffding[8] showsthatY ismore concentrate for the convexfunctions than the two-valued random variable θ taking the values a and b and such that E[θ]=E[^Y] ^(see^his inequalities(4.1)and(4.2)).It meansthatE[ϕ(Y)]≤E[ϕ(θ )] ^forâll ^convex^functions ϕ.Thisresult hasbeenextendedtounboundedrandomvariables:therealsa,barereplacedbysomerandomvariables α,β anda≤^Y≤^b isreplacedby α^Yβ forsomestochastic order ^(see^Bentkus ^{[3] and}^Marchina^[14]).^Here,ône ^hasη2 f(X)≤^1, where 2 denotes the usual second-order stochastic dominance. Then, with respect to a class of convex functions, the distributionof f(X)ismoreconcentratethanthedistribution μ^[^q⁰^] definedby

μ^[^q⁰^](A)=μ(A∩ ]^F_η⁻¹(1−^q0) ,+∞[)+^q0δ1(A), (2.10)

(5)

540 A. Marchina / C. R. Acad. Sci. Paris, Ser. I 357 (2019) 537–544

Fig. 1.Impact of the value ofcon the functionp→^q0(2−^p).

where μdenotesthedistributionof η, A⊂R(measurable),δ1 standsfortheDiracmeasurecenteredon1 andq0∈ [⁰,1] issuchthat

tμ^[^q⁰^](dt)=E[^f(X)]=^0.În^fact,^this^lastêquationîsêquivalent^to^(2.4).

By(2.4),q0 dependsonc.Theimpactofc onthewindow[0,q0(2−p)],inwhich(a)isvalid, isshowninFig.1.The function p→^q0(2−^p)forthevaluesc=^3,^c=^1,^c=√

1/3 andc=⁰.3 isrepresented.

3. Proof

Westartinthesamewayasintheproofofthemainresultsin[13],thatis,byamartingaledecompositionofZ_n−E[^Zn]^. Letusrecallthemainpoints. Thereaderisreferredto[13] formoredetails.First,byvirtueofthemonotoneconvergence theorem,wecansupposethatF isafiniteclassoffunctions.SetF0:= {∅,}ând^forâll^k=¹,. . . ,n,Fk:=σ(X1,. . . ,Xk), andFn^k:=σ(X₁,. . . ,X_k₋₁,X_k₊₁,. . . ,X_n).LetE_k (respectively E^k_n) denotetheconditional expectationoperator associated withFk(resp.Fn^k).Setalso Z_n⁽^k⁾:=^supf∈F

j=^kf(X_j)and Z_k:=E_k[^Zn]^.^Let^us^number^the^functions^of^the^classF and considertherandomindices

τ_:=inf

i>0: n k=¹

f_i(X_k)=^Zn

τk:=^inf

⎧⎨

⎩ⁱ>0: n

j=¹

f_i(X_j)−^fi(X_k)=^Z⁽n^k⁾

⎫⎬

⎭.

Deﬁneξk:=Ek[^fτk(Xk)]^and^rk:=(Zk−Ek[^Zn⁽^k⁾])−ξk≥^0.^Note^that^we^have

ξk≤ξk+^rk≤E_k[^fτ(X_k)]. ^(3.1)

BythecenteringassumptionontheelementsofF,E_k₋₁[ξk]=^0,^leading^to

Zn−E[Zn] = n k=1

k wherek:=ξk+rk−Ek−¹[rk]. (3.2)

Theproofismadeinthreesteps:

1. UsingtheresultsofSection3inMarchina[14],wecomparegeneralized(conditional)momentsof_k withthoseofa randomvariableζq0 withdistribution μ^[^q⁰^] givenin(2.10).Inparticular,theclassofgeneralizedmomentsonwhichwe obtainacomparisoninequalitycontainsincreasingexponentialfunctionsx→^e^tx^for^every^t≥^0.

2. Wegiveanupperboundontheexponentialmomentsofζq0. 3. WeconcludetheproofbytheusualCramér–Chernoffcalculation.

(6)

Step1:Comparisoninequality

Letusdenotebyζq,foranyq∈ [⁰,1]^,^a^random^variable^withdistributionfunctiongivenby

F_q(x):=

⎧⎨

⎩

F_η(x) ifx<a_q, 1−^q ^if^aq≤^x<1,

1 ifx≥1,

(3.3)

wherea_q:=^F⁻_η¹(1−^q).Noticethat

E[ζq] =^q−^c ^p

p−¹(1−^q)¹⁻¹^/^p, (3.4)

whichensuresthat(2.4) isequivalenttoE[ζq0]=^0.^Letûsâlso^define H²₊:=

ϕ∈C¹(R):ϕis convex, and lim

x→−∞ϕ(x)= lim

x→−∞ϕ(x)=0

Thispartisdevotedtoprovethefollowinglemma.

Lemma3.1.Forany ϕ∈H²₊^and^any^k=¹,. . . ,n,

Ek−¹[ϕ(k)] ≤E[ϕ(ζq0)]. (a)

Consequently,foranyt≥^0,

logE[exp(t(Zn−E[Zn]))] ≤nlogE[exp(tζq₀)]. (b) Remark3.2.Comparison inequalitieswithrespecttotheclass offunctionsH²₊ ^(or ^more^generallyH₊^α^, α>0) havebeen widelystudiedbyPinelis(theseinclude,amongothers,[16–19],andwereferthereadertothesepapersformoredetails).

Weonlyrecallthatwehavethefollowingequivalence:

(i) E[ϕ(X)]≤E[ϕ(Y)]^for^any ϕ∈H₊² (ii) E[(X−^t)²₊]≤E[(Y−^t)²₊]^for^any^t∈R.

TheproofofLemma3.1liesonthefollowingresults,whichwereestablishedinMarchina[14].

Lemma3.3(Lemmas4.3and4.6(i)in[14]).Let ηbedeﬁnedby(1.3).

(i) Let X be an integrablerandom variablesuch that X≤^{1 and}^for^any^real^t,E[(t−^X)₊]≤E[(t−η)₊]^.^Then ^for^any convexfunction ϕ,

E[ϕ(X−E[X])] ≤E[ϕ(ζq−E[ζq])],

whereq∈ [⁰,1]^is^such^thatE[ζq]=E[^X]^.

(ii) Let^q˜:=^inf{^q≥¹/2:¹+^F_η⁻¹(1−^q)≤²E[ζq]}^.^For^all^t∈R,thefunction q→E[(ζq−E[ζq] −^t)²₊]

isnonincreasingon[˜^q,1]^.

Proof of Lemma3.1. SinceH²₊^containsâllîncreasingexponentialfunctions,taking ϕ(x)=ê^tx ^with^t≥^{0 in}^(a)^leads^to^(b) byan inductiononn.Moreover,inview ofRemark3.2,we onlyhavetoprove(a) forthefunctions ϕ(x)=(x−^t)²₊,with t∈R.Lett>0.Sincer_k≥^0,^Jensen’sînequalityîmplies^that

Ek−¹[(−^t−(ξk+^rk))₊] ≤Ek−¹[(−^t−ξk)₊]

≤Ek−¹[(−^t−^fτk(X_k))₊]

≤E[(−^t−η)₊], ^(3.5)

wherethelast inequalityfollowsfromAssumption2.3.Furthermore,wecandirectlyverifythat(3.5) holdsfort≤^0.^Since f ≤^{1 for}^any ^f∈F,(3.1) impliesthat ξk+^rk≤^1.^Hence,^applying^Lemma^3.3⁽ⁱ⁾conditionallyto Fk−¹,with X=ξk+^rk, yieldsthat

Ek−¹[ϕ(k)] ≤Ek−¹[ϕ(ζ_q_ˆ−E[ζq_ˆ])], ^(3.6)