Estimating the conditional tail index with an integrated conditional log-quantile estimator in the random covariate case

(1)

HAL Id: hal-01074694

https://hal.archives-ouvertes.fr/hal-01074694

Preprint submitted on 15 Oct 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Estimating the conditional tail index with an integrated conditional log-quantile estimator in the random

covariate case

Laurent Gardes, Gilles Stupfler

To cite this version:

Laurent Gardes, Gilles Stupfler. Estimating the conditional tail index with an integrated conditional log-quantile estimator in the random covariate case. 2014. �hal-01074694�

(2)

integrated onditional log-quantile estimator in

the random ovariate ase

Laurent Gardes

(1)

&GillesStuper

(2)

(1)

UniversitédeStrasbourg&CNRS,IRMA,UMR7501,7rueRenéDesartes,

67084StrasbourgCedex,Frane

(2)

AixMarseilleUniversité,CNRS,EHESS,CentraleMarseille,GREQAMUMR

7316, 13002Marseille,Frane

Abstrat. It iswellknownthat the tailbehaviorofaheavy-tailed distri-

bution isontrolled byaparameteralled the tailindex. Suh aparameteris

thereforeofprimaryinterestinextremevalueanalysis,partiularlytoestimate

extremequantiles. Invarious appliations,therandomvariable ofinterestan

belinkedtoanite-dimensionalrandomovariate. Insuhasituation,thetail

indexisfuntionoftheovariateandisreferredtoastheonditionaltailindex.

Thegoalofthispaperisto providealassof estimatorsof thisquantity. The

pointwise weak onsisteny and asymptoti normality of these estimators are

established. Weillustrate thenite sampleperformaneof ourtehniqueona

simulationstudyand onarealhurrianedataset.

AMS Subjet Classiations: 62G05,62G20,62G30,62G32.

Keywords: Heavy-taileddistribution,tailindex,randomovariate,onsis-

teny,asymptotinormality.

1 Introdution

Studyingextremeeventsisrelevantinnumerouseldsofstatistialappliations.

Inhydrologyforexample,itisofinteresttoestimatethemaximumlevelreahed

byseawateralongaoastoveragivenperiod, ortostudyextremerainfallata

givenloation;in atuarialsiene,amajorproblemforaninsuranermisto

estimatetheprobabilitythatalaimsolargethatitrepresentsathreattoitssol-

venyisled. Apartiularbranhofextremevalueanalysisfousesonthestudy

ofheavy-tailedrandomvariables, that is,those randomvariableswhosedistri-

butionfuntion F ^is^suh ^that,^for^allλ >0^, (1−F(λx))/(1−F(x))→λ^−1/γ

asx^goes^toînnity, ^whereγ >0 îs ^the^so-alled^tailîndex. ^The^parameterγ

drivestheasymptotibehaviorofF ⁱⁿîts^right^tail,^whih^makesîtsêstimation

(3)

Theestimation of the tailindex has thereforebeen extensivelystudied in the

literature. Reentoverviewsonunivariatetailindexestimationanbefoundin

Beirlantetal.[2℄anddeHaanandFerreira[22℄.

In pratial appliations, the variable of interest Y ân ôften ^be ^linked ^to â

ovariate X^. ^Fôr înstane, ^the ^value ôf ^rainfall ât â ^given ^loation ^depends

onitsgeographialoordinates;in atuarialsiene,thelaimsizedependson

thesuminsuredby thepoliy. Inthis situation, thetailindex of therandom

variableY ^givenX =xîsâ^funtion ôfx^to ^whih^we^shall^referâs^theôndi-

tional tailindex. Itsestimation hasrst beenonsidered in thexed design

ase,namelywhentheovariatesarenonrandom. Smith[30℄ andDavisonand

Smith [12℄ onsidered a regression model while Hall and Tajvidi [23℄ used a

semi-parametri approah to estimate the onditional tail index. Fully non-

parametrimethods havebeen developed using splines (see Chavez-Demoulin

and Davison [7℄), loal polynomials(see Davisonand Ramesh[11℄), amoving

window approah (see Gardes and Girard [15℄), a nearest neighbor approah

(seeGardes and Girard [16℄), and aonditional quantile-based tehnique(see

Gardeset al.[18℄),amongothers.

Despitethegreatinterestinpratie,thestudyoftherandomovariateasehas

beeninitiatedonlyreently. WerefertotheworksofWangandTsai[32℄,based

onamaximumlikelihoodapproah,Daouiaetal.[9℄whousedaxednumberof

nonparametrionditionalquantileestimatorstoestimatetheonditional tail

index,latergeneralizedinDaouiaetal.[10℄toaregressionontextwithondi-

tionalresponsedistributionsbelongingtothegeneralmax-domainofattration,

GardesandGirard[17℄ whointrodued aloalgeneralizedPikands-typeesti-

mator(seePikands[27℄),Goegebeuretal.[20℄whostudiedanonparametrire-

gressionestimatorwhosestronguniformpropertiesareexaminedinGoegebeur

etal.[21℄,Stuper[31℄whointroduedageneralizationofthepopularmoment

estimatorof Dekkersetal. [13℄and GardesandStuper [19℄whoworkedona

smoothedloalHillestimator(seeHill[24℄)relatedtotheworkofResnikand

St ri [28℄.

Theaimofthispaperistointrodueanestimatoroftheonditionaltailindex

basedon theintegration of aonditional log-quantileestimator. This typeof

estimatorsissimilartotheoneofGardesandGirard[15℄;ouraimistoproveits

onsistenyand asymptotinormalitywhentheovariatesarerandom,aswell

asto examineits appliability on numerialexamples and on real data. Our

paperisorganizedasfollows: wedeneouronditional tailindexestimatorin

Setion2,itsasymptotipropertiesarestatedin Setion3,asimulationstudy

isprovidedinSetion4andweshowaseourestimatoronasetofrealhurriane

datainSetion5. WeoeraoupleofonludingremarksinSetion6. All the

auxiliaryresultsandproofsaredeferredtotheAppendix.

(4)

Welet(X1, Y1), . . . ,(Xn, Yn)^benindependentopiesofarandompair(X, Y)∈ E ×R₊, where (E, d) îs â ^metri ^spae. ^We âssume ^that ^for âny x ∈ E^, ^the

onditional distributionfuntion y 7→F(y|x) := P(Y ≤ y|X = x) ôf Y ^given X=x^belongs^to^the^setRV−1/γ(x)ôf^regularly^varying^funtions^(atînnity)ôf

index−1/γ(x)<0^. ^Reall^thatâ^funtionG∈ RVâ^,a∈RifGîsnonnegative andforallλ >0^,G(λy)/G(y)→λâ âsy ^goes^toînnity^. ^Thisîs^theâdaptation

ofthestandardextreme-valueframeworktotheasewhenthereisaovariate.

Anequivalentassumption(seeBinghametal.[5,Proposition1.5.15℄)is:

(M1) Foranyx∈ E^,^the ^onditional^quantile^funtion α7→q(α|x) :=F^←(1− α|x) = inf{y∈R|F(y|x)≥1−α} ∈ RV^−γ(x)^.

Our goal is to estimate the onditional tail index γ ^at ^a ^point x ∈ E^. ^Re-

mark rst that, under (M1), for u ∈ (0,1) ^small ênough ând α ∈ (0, u)^, logq(α|x)/q(u|x)≈γ(x) log(u/α)^. ^Hene,^forâny^measurable^funtionΨ(.|x, u)

on(0, u)^suh^that Z u 0

Ψ(α|x, u) log (u/α)dα= 1, ⁽¹⁾

onehas Z u

0

Ψ(α|x, u) logq(α|x)

q(u|x)dα≈γ(x). ⁽²⁾

We propose to estimate γ(x) ^by ^replaing ⁱⁿ ^the ^previous approximation the onditional quantilefuntion q(.|x) ^by âônsistentêstimator ôf ^this ^quantity^.

Tothisend,letI{.}^denote^theîndiator^funtionând,^forânyh >0^,B(x, h) :=

{x^′ ∈ E | d(x, x^′)≤h} ^denote^the^losed ^ballⁱⁿ E ^with^enter x^and^radiush^.

ThetotalnumberofovariatesbelongingtotheballB(x, h)^is^given^by M(x, h) =

Xn i=1

I{Xi∈B(x, h)}.

Theonditional distributionfuntion F(.|x)^is^estimated^by:

Fbn(y|x, hx) = 1 M(x, hx)

Xn i=1

I{Yi ≤y}I{Xi∈B(x, hx)},

wherehx =hx(n) îsâ ^positive^sequene ônverging^to ^0. ^Theâssoiated êsti-

matoroftheonditional quantilefuntionq(.|x)^is^then,^forα∈(0,1)^, b

qn(α|x, hx) =Fb_n^←(1−α|x, hx) = inf{y∈R|Fbn(y|x, hx)≥1−α}.

Replaingq(.|x)^byqbn(.|x, hx)ⁱⁿ^(2),ôur^lassôfêstimatorsôfγ(x)îs^given^for

a(0,1)^-valued^measurable^funtion uxônverging^to⁰âtînnity^by:

b

γ(x, ux, hx) = Z Ux

0

Ψ(α|x, Ux) log bqn(α|x, hx) b

qn(Ux|x, hx)dα, ⁽³⁾

(5)

inwhihUx=ux(M(x, hx))ândΨ(.|x, u)îsânîntegrable^funtionôn(0, u)^sat-

isfying(1). Theestimatorbγ(x, ux, hx)îs^thusâ^weightedîntegralôfânêstimator

oftheonditional log-quantilefuntion.

Weonludethissetionbypointingoutthatpartiularhoiesofthefuntion

Ψ(.|x, u)^atually^yieldgeneralizationsofsomewell-knowntailindexestimators to theonditional framework. Let kx :=UxM(x, hx)^. ^The^hoie Ψ(.|x, u) = u⁻¹^yields:

b

γ^H(x, ux, hx) = 1 kx

⌊kXx⌋ i=1

logqbn((i−1)/M(x, hx)|x, hx)

qbn(kx/M(x, hx)|x, hx) , ⁽⁴⁾

whih is the straightforward adaptation of the lassial Hill estimator (see

Hill [24℄). Similarly, letting Ψ(.|x, u) = u⁻¹(log(u/.)−1) ^entails, ^after ^some

algebra:

b

γ^Z(x, ux, hx) = 1 kx

⌊kXx⌋ i=1

log kx

i ilogqbn((i−1)/M(x, hx)|x, hx) b

qn(i/M(x, hx)|x, hx)

.

ThisestimatoranbeseenasageneralizationoftheZipfestimator(seeKratz

andResnik[26℄,ShultzeandSteinebah[29℄).

3 Asymptoti properties

3.1 Main results

Westartbystating theweakonsistenyoftheestimator(3). Tothis end,an

additionalhypothesisisrequired.

(A1) ThefuntionΨ(.|x, u)^satises:

lim sup

u↓0

Z u

0 |Ψ(α|x, u)|dα <∞,

andforallu∈(0,1) ^andβ∈(0, u]^, u

β Z β

0

Ψ(α|x, u)dα= Φ(β/u|x),

whereΦ(.|x)^is^asquare-integrablenoninreasingprobabilitydensityfun- tionon(0,1)^.

Partiular onsequenes of this ondition inlude that F(q(α|x)|x) = 1−α

for any α ∈ (0,1) ând ^that ^given X = x^, Y ^has ân âbsolutely ôntinuous

(6)

distributionwith probabilitydensityfuntion f(.|x)^. ^F^or0< α1< α2<1^,^we

nallyintroduethequantity:

ω(α1, α2, x, hx) = sup

α∈[α1,α2]

sup

x^′∈B(x,hx)

logq(α|x^′) q(α|x)

,

whih istheuniformosillationofthelog-quantilefuntion in itsseondargu-

ment. Suhaquantityis alsostudied inGardesandStuper [19℄,forinstane.

Lettingmx(hx) =nP(X ∈B(x, hx))^be^theâverage^numberôfôvariates^whih

belong to B(x, hx)^, ^the^weak ônsistenyôf ôur^family ôf êstimatorsîs êstab-

lishedinthefollowingtheorem.

Theorem 1. Assume that onditions (M1) and (A1) are satised. Assume

furtherthatmx(hx)→ ∞^asn→ ∞ ^and^thatux∈ RV−a(x) ^witha(x)∈(0,1)^.

If,for someδ >0^,

ω [mx(hx)]^−1−δ,1−[mx(hx)]^−1−δ, x, hx

→0, ⁽⁵⁾

thenitholdsthat bγ(x, ux, hx)−→^P γ(x)^asn→ ∞^.

Note that ux(mx(h))mx(h) → ∞ îs ^the âverage ^numberôf observations used toomputeourestimatorof γ(x)^. ^Theônditionsⁱⁿ ^Theorem¹âre ^thus âna-

loguesof the lassialhypotheses in theestimation of thetail index. Besides,

ondition(5)ensuresthatthedistributionofY ^givenX =x^′ ^is^lose^enough^to

thatofY ^givenX =x^whenx^′ ^isⁱⁿ^a^suiently^small neighborhoodofx^.

Ouraim isnowto establishanasymptoti normality result. First,reall that

under(M1),theonditionalquantilefuntionmaybewrittenasfollows:

∀t >1, q(t⁻¹|x) =c(t|x) exp Z t

1

∆(v|x)−γ(x)

v dv

,

wherec(.|x) îsâ^positive^funtion ônverging^toâ^positiveônstantât înnity

and ∆(.|x) îs â ^measurable ^funtion ônverging ^to ⁰ ât înnity^, ^see ^Bingham

et al. [5, Theorem 1.3.1℄. We introdue the following lassial seond-order

ondition:

(M2) Condition (M1) holds, c(.|x) îs â ônstant^funtion êqual^to c(x) > 0^,

thefuntion∆(.|x)^hasûltimatelyônstant^signâtînnityând|∆(.|x)| ∈ RV^ρ(x)^,^withρ(x)<0^.

In ondition (M2), ρ(x) îs âlled ^the ônditional seond-order parameter of the distribution. This ondition is ommonly used when studying tail index

estimatorsandmakesitpossibletoontroltheasymptotibiasoftheestimator

b

γ(x, ux, hx)^. ^Weâlsoîntrodueâ^furtherâssumptionôn^the^weighting^funtion Φ(.|x)^,^whih îs^similarⁱⁿ^spirit^to âônditionîntroduedⁱⁿ ^Beirlantêtâl.^[1℄.

Towritedownthisondition,wenotethat if(A1)holdsthen

∀β∈(0,1), 0≤βΦ(β|x)≤ Z β/2

0 |Ψ(α|x,1/2)|dα

(7)

and the right-hand side onverges to 0 asβ ↓ 0^, ^so ^that ^we ^may ^extend ^the

denitionofthemapt7→tΦ(t|x)^by^sayingîtîs⁰ât t= 0^.

(A2) Condition(A1)holds,thereisκ >0^suh^that Φ^2+κ(.|x)îsîntegrableôn (0,1)ând^thereêxistsâ^positive^funtiong(.|x)^,^whihîsêitherôntinuous

on[0,1]ôrnoninreasingon(0,1)^,^suh ^that^forânyk >1 ândi∈[1, k)^,

|iΦ (i/k|x)−(i−1)Φ ((i−1)/k|x)| ≤g(i/k|x),

wherethefuntiong(.|x) max(log(1/.),1) îsîntegrableôn(0,1)^.

Notethat ondition(A2) issatisedforinstane bythefuntions Ψ(.|x, u) = u⁻¹ ândΨ(.|x, u) = u⁻¹(log(u/.)−1) ^mentionedât ^theênd ôf^Setion ²^with g(.|x) = 1^for ^the^rstôneând, ^for ^the^seondône, g(.|x) =−log(.) + 1^. Ôur

asymptotinormalityresultisthefollowing:

Theorem 2. Assume that onditions (M2) and (A2) are satised. Assume

furtherthatmx(hx)→ ∞âsn→ ∞^,^that ux∈ RV^−a(x) ^with a(x)∈(0,1)ând (zux(z))^1/2∆(1/ux(z)|x)→λ(x)∈Ras z→ ∞^. Îf ^for^some δ >0^,

v_x^1/2ω [mx(hx)]^−1−δ,1−[mx(hx)]^−1−δ, x, hx

→0 ⁽⁶⁾

wherevx=mx(hx)ux(mx(hx))^,^then^it^holds^that

v_x^1/2(bγ(x, ux, hx)−γ(x))−→ N^d λ(x)AB^x(Φ, ρ(x)), γ²(x)AV^x(Φ)

asn→ ∞^,^with AB^x(Φ, ρ(x)) =

Z 1 0

Φ(α|x)α^−ρ(x)dα ^and AV^x(Φ) = Z 1

0

Φ²(α|x)dα.

Ourasymptotinormalityresultthusholdsundergeneralizationsoftheommon

hypothesesonthemodel andonux^and hx^,^provided^the^onditional^distribu-

tionsofY ^at^twoneighboringpointsaresuientlylose.

We onlude this paragraph by noting that these results are similar in spirit

to results obtained in the literature for other onditional tail index orondi-

tional extreme-valueindex estimators, see e.g. Gardes and Stuper [19℄ and

Stuper [31℄. The main disadvantage of formulating the hypotheses in terms

of the uniform osillationω îs ^that ^theyânnot immediately be translated in termsofonditionsonux ândhx^. Înôur ^next^paragraph,^we^givealternative, simpleonditionsforourmain resultstohold.

3.2 Disussion of the hypotheses

Asastartingpoint,wenotethatifX ^has^aprobabilitydensityfuntionf ^with

respettotheLebesguemeasureonE =R^d equippedwiththeEulideannorm

k.k ^then ^suient ^onditions ^for mx(hx) → ∞ ^are ^that hx → 0^, nh^d_x → ∞^,

(8)

f(x)>0ândf îsôntinuousâtx^. Îndeed,ⁱⁿ^thisâse,îfV ^denotes^the^volume

oftheunit ballofR^d,ahangeofvariablesentails:

mx(hx) =n Z

B(x,hx)

f(s)ds=nh^d_xf(x) V+ Z

kvk≤1

f(x+hxv) f(x) −1

dv

! .

Sinef îsôntinuousâtx^,^we^getmx(hx) =nh^d_xVf(x)(1 + o(1))→ ∞^. ^Fûrther-

more,wepointoutthatifthefuntions γ^,logc(t|.)^and∆(t|.)^satisfy^a^Hölder

ondition,namely:

sup

x^′∈B(x,hx)|γ(x^′)−γ(x)| = O(h^β_x), sup

t⁻¹∈Kx,δ(hx)

sup

x^′∈B(x,hx)|logc(t|x^′)−logc(t|x)| = O(h^β_x)

and sup

t⁻¹∈Kx,δ(hx)

sup

x^′∈B(x,hx)|∆(t|x^′)−∆(t|x)| = O(h^β_x),

where β > 0 ând Kx,δ(hx) îs ^the înterval [(mx(hx))^−1−δ,1−(mx(hx))^−1−δ]^,

then(5) isaonsequeneof theonvergene h^β_xlogmx(hx)→0^. ^In^the^afore-

mentioned ontext when X ^has â probability density funtion, this ondition beomes h^β_xlogn→ 0 âs n→ ∞^. ^Suh ônditions^wereâlready ônsidered ⁱⁿ

Stuper[31℄.

As an illustration, we now ompute the optimal rate of onvergene of our

estimatorwhenE =R^d andX ^has âprobability density funtion. Leta(x)∈ (0,1) ândb(x)∈(0,1/d)^. ^We^takelog(hx) =−b(x) log(n) ând log(nux(n)) = (1−a(x)) log(n)^. În ^this ôntext, ^the ^rate ôf ônvergene ôf ^theêstimator îs

essentially(mx(hx)ux(mx(hx))^1/2 =n(1−db(x))(1−a(x))/2

. Besides, sine∆(.|x)

isregularlyvaryingwith indexρ(x)<0^, ^the^onditions^for^Theorem ²^to ^hold

arethenessentially:

1−a(x) + 2a(x)ρ(x)≤0 ^and 1−a(x)−2βb(x)≤0.

Theproblemthusamountstomaximizingthefuntion(a, b)7→(1−db)(1−a)

undertheseonditions. Thesolutionis:

(a^∗(x), b^∗(x)) =

1

1−2ρ(x), ρ(x)

dρ(x) +β(2ρ(x)−1)

,

whihyieldstheoptimalrateofonvergenenβρ(x)/(dρ(x)+β(2ρ(x)−1))

. Notethat

settingd= 0^, î.e. ônsidering^the âse^when ^thereîs ^noôvariate,^we^reover

the optimal rate of onvergene of the Hill estimator, see e.g. de Haan and

Ferreira[22℄.

4 Simulation study

Weexamine thebehaviorof our estimatoron several nite-sample situations.

To make it easier to showase our results, we fous on the ase E = [0,1]