• Aucun résultat trouvé

Estimation of the extreme value index in a censorship framework: asymptotic and finite sample behaviour

N/A
N/A
Protected

Academic year: 2022

Partager "Estimation of the extreme value index in a censorship framework: asymptotic and finite sample behaviour"

Copied!
27
0
0

Texte intégral

(1)

HAL Id: hal-01768990

https://hal.archives-ouvertes.fr/hal-01768990

Submitted on 17 Apr 2018

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Estimation of the extreme value index in a censorship framework: asymptotic and finite sample behaviour

Jan Beirlant, Julien Worms, Rym Worms

To cite this version:

Jan Beirlant, Julien Worms, Rym Worms. Estimation of the extreme value index in a censorship

framework: asymptotic and finite sample behaviour. Journal of Statistical Planning and Inference,

Elsevier, 2019, 202, pp.31-56. �hal-01768990�

(2)

Estimation of the extreme value index in a censorship framework: asymptotic and finite sample behaviour

Jan Beirlant(1) , Julien Worms

1

(2) & Rym Worms (3)

(1) KU Leuven

Department of Mathematics and Leuven Statistics Research Center, KU Leuven, Belgium, Department of Mathematical Statistics and Actuarial Science, University of the Free State,

South Africa

e-mail : [email protected]

(2) Universit´ e de Versailles-Saint-Quentin-En-Yvelines Laboratoire de Math´ ematiques de Versailles (CNRS UMR 8100),

F-78035 Versailles Cedex, France, e-mail : [email protected]

(3) Universit´ e Paris-Est

Laboratoire d’Analyse et de Math´ ematiques Appliqu´ ees (CNRS UMR 8050),

UPEMLV, UPEC, F-94010, Cr´ eteil, France, e-mail : [email protected]

Abstract

We revisit the estimation of the extreme value index for randomly censored data from a heavy tailed dis- tribution. We introduce a new class of estimators which encompasses earlier proposals given in Worms and Worms (2014) and Beirlant et al. (2018), which were shown to have good bias properties compared with the pseudo maximum likelihood estimator proposed in Beirlant et al. (2007) and Einmahl et al. (2008).

However the asymptotic normality of the type of estimators first proposed in Worms and Worms (2014) was still lacking. We derive an asymptotic representation and the asymptotic normality of the larger class of es- timators and consider their finite sample behaviour. Special attention is paid to the case of heavy censoring, i.e. where the amount of censoring in the tail is at least 50%. We obtain the asymptotic normality with a classical ?

k rate wherek denotes the number of top data used in the estimation, depending on the degree of censoring.

AMS Classification. Primary 62G32 ; Secondary 62N02

Keywords and phrases. Extreme value index. Tail inference. Random censoring. Asymptotic representation.

1Corresponding author J. Worms

1

(3)

1. Introduction

Starting from Beirlant et al. (2007), the estimation of the extreme value index in a censorship framework is of growing interest. Suppose we observe a sample ofnindependent couplespZi, δiq1ďiďn where

Zi“minpXi, Ciq and δi “IXiďCi.

The i.i.d. samplespXiqiďnandpCiqiďn, of respective continuous distribution functionsF andG, are samples from the variable of interest X and of the censoring variable C, measured onnindividual items (insurance claims, hospitalized patients, ...). The variablesXandCare supposed to be independent and, for convenience only, we will suppose in this work that they are non-negative. We will denote by Z1,n ď . . . ď Zi,n ď . . .ďZn,n the order statistics associated to the observed sample, and by pδ1,n, . . . , δn,nqthe corresponding indicators of non-censorship.

Einmahl et al. (2008) presented a general method for adapting estimators of the extreme value index in this censorship framework. Worms and Worms (2014) proposed a more survival analysis-oriented approach restricted to the heavy tail case, while Diop et al. (2014) extended the framework to data with covariate information. Beirlant et al. (2016) and Beirlant et al. (2018) proposed bias-reduced versions of two existing estimators. See also Brahimi et al. (2015), Brahimi et al. (2016) and Brahimi et al. (2018) for other papers on the subject.

In this paper, we propose a new class of estimators that encompasses one of the estimators proposed in Worms and Worms (2014) and propose a novel approach to prove the asymptotic normality of these estimators which was unknown up to now for the caseβ“0. We consider here that the distributionsF and Gare heavy-tailed, with positive and respective extreme value indices (EVI)γ1andγ2, i.e.

F¯pxq “1´Fpxq “x´1{γ1lFpxq and Gpyq “¯ 1´Gpyq “y´1{γ2lGpyq,

where lF and lG are slowly varying at infinity. Our target is the EVI γ1, which we try to recover from our randomly censored observations.

Denoting the distribution function ofZ withH, by independence ofX andC we readily obtain ¯Hpzq “ 1´Hpzq “ z´1{γlHpzq, where lH “ lFlG and the EVI γ of Z is related to those of X and C via the important relation 1{γ “ 1{γ1`1{γ2. Further in this paper, we will denote by p the crucial quantity p“γ{γ1 “γ2{pγ12q Ps0,1r, which has to be interpreted as the asymptotic proportion of non-censored observations in the tail.

We assume in this work that the slowly varying functions lF andlG satisfy the second order condition first proposed by Hall and Welsh (1985). This yields the so called ”Hall-type” model, i.e. asx, yÑ `8,

F¯pxq “C1x´1{γ1`

1`D1x´β1p1`op1qq˘

(1) Gpyq “¯ C2y´1{γ2`

1`D2y´β2p1`op1qq˘

(2) whereβ1, β2,C1,C2 are positive constants andD1,D2are real constants. Then, setting

C“C1C2, β˚“minpβ1, β2q, and D˚

$

&

%

D1 if β1ăβ2, D2 if β2ăβ1, D1`D2 if β1“β2, we have, aszÑ 8,

Hpzq “¯ Cz´1{γ`

1`D˚z´β˚p1`op1qq˘

. (3)

Correspondingly, withH´puq “inftz:Hpzq ěuu(0ăuă1) the quantile function corresponding toH, we considerUHpxq “H´p1´1{xq, the right-tail function ofH, for which asxÑ 8,

UHpxq “Cγxγ`

1`γD˚C´β˚γx´β˚γp1`op1qq˘

. (4)

Let us now explain how we build our new family of estimators ofγ1. For some real numberβ, consider the Box-Cox transformk´βpuq “şu

1t´β´1dtforuą1, with the caseβ “0 leading tok0puq “logpuq. Based on the relation

tÑ8limErk´βpX{tq |Xąts “ lim

tÑ8

ż8 1

Fputqs

Fsptq dk´βpuq “ γ1 1`βγ1

, (5)

(4)

and estimatingFs by the Kaplan-Meier estimatorFsnKM defined fortăZn,n by FsnKMptq “ ź

Zi,nďt

ˆ n´i n´i`1

˙δi,n

, (6)

we introduce the following class of statistics Tpkpβq:“

k

ÿ

j“2

FsnKMpZn´j`1,nq FsnKMpZn´k,nq

ˆ k´β

ˆZn´j`1,n Zn´k,n

˙

´k´β

ˆZn´j,n Zn´k,n

˙˙

(7) wherek“kndenotes an integer sequence satisfyingknÑ 8andkn “opnq. Withβ“0 we thus obtain the estimator

pW1,kq:“Tpkp0q “

k

ÿ

j“2

FsnKMpZn´j`1,nq FsnKMpZn´k,nq log

ˆZn´j`1,n

Zn´j,n

˙

(8) of γ1 which was considered in Worms and Worms (2014) and Beirlant et al. (2018). In fact pγ1,kpWq turns out to be very close to the estimatorřk

j“1

FsnKMpZn´j`1,n´ q

FsnKMpZn´k,nq logZZn´j`1,n

n´j,n defined in equation (12) of Worms and Worms (2014) based on ideas issued from the so-called Leurgans approach in survival regression analysis.

The difference concerns a different way to circumvent the use ofFsnKM at Zn,n: whether using left-limits or deletingFsnKMpZn,nqas inpγpW1,kq.

Note that the statisticsTpkpβqwere used in Beirlant et al. (2018) to obtain a bias-reduced version of the estimatorpγ1,kpWq:

1,kpBRq“pγpW1,kq´

p1`β11,kpWqq2p1`2β11,kpWqq pβ11,kpWqq2

˜

Tpk1q ´ pγ1,kpWq 1`β11,kpWq

¸

, (9)

whereβ1denotes the second order parameter ofF in assumption (1).

Now, it is clear from (5) that we can construct the following estimator ofγ1 when the tuning parameter β is supposed to be larger than ´1{γ1:

γp1,kpβq “ Tpkpβq

1´βTpkpβq. (10)

We will compare these estimators with the pseudo maximum likelihood estimator which was first proposed in the random censoring context by Beirlant et al. (2007) and Einmahl et al. (2008):

1,kpHq“ 1 ppn

1 k

k

ÿ

i“1

logZn´i`1,n Zn´k,n

where ppn“ 1 k

k

ÿ

i“1

δn´i`1,n. (11)

In Beirlant et al. (2018) a small sample simulation study was performed using all those available estimators and it was found thatpγ1,kpWqoverall shows quite good bias and MSE performance. However, since no results on the asymptotic normality of this estimator were available yet, these authors proposed the use of a bootstrap algorithm to construct confidence intervals. In this paper we prove the asymptotic normality of pγ1,kpβq in the case p`βγ ą 12. Hence this paper provides the first complete proof of the asymptotic normality forpγ1,kpWq in case pą 12, issued from an explicit asymptotic development stated in Theorem 1 of the next section. In the deterministic threshold case, this central limit result (for pγ1,kpWq) had already been obtained in Worms and Worms (2018), where a more general competing risks setting was considered, and using a different approach from the present proof.

The restriction pą 12 is rather restrictive for instance in insurance problems such as those discussed in Beirlant et al. (2018) where heavy censoring appears. The introduction of the class of estimators pγ1,kpβq helps to circumvent this problem when consideringβ ą0.

Finally, in the next section, we will see that our results also lead to the statement of the asymptotic normality of the bias-reduced estimatorpγpBRq1,k , which was not known so far.

Our paper is organized as follows: in Section 2, we state and discuss the asymptotic normality result 3

(5)

for pγ1,kpβq and pγ1,kpBRq. Section 3 is devoted to the proof. Technical aspects of the proof are postponed to the Appendix. In Section 4 we discuss the finite sample behavior of the different estimators pγ1,kpβq with βą ´1{γ1, and ofpγ1,kpBRq.

2. Results

Our first and main result states the asymptotic behavior of the statisticsTpkpβqdefined in (7). This result entails the asymptotic normality of the estimatorpγ1,kpWq ofγ1 by considering the particular caseβ“0. The main condition is that the heaviness of the tail of the censoring variableC should be sufficiently high with respect to the one of the variableX. More precisely, introducing the notationpβ“p`γβ“pp1`γ1βq, the condition is be thatpβ must be larger than 1{2 (i.e. γ2ąγ1{p1`2γ1βq).

Theorem 1. Let conditionsp1qandp2qhold. We assume further thatpβą12, and

?

kpk{nqγβ˚ nÑ8ÝÑ λ, (12)

and, ifλ“0, that n“OpkBnqfor some large enoughBą0. We then have, asnÑ 8,

?k ˆ

Tpkpβq ´ γ1 1`γ1β

˙

“Gn`λmβ`oPp1q where Gnd γ pβ

?1 k

k

ÿ

i“2

upi,kβ´1pppEi´1q ´ pIUiďp´pqq with pEiq and pUiq denoting independent iid samples with, respectively, standard exponential and standard uniform distributions, and

mβ

"

´γ2β1D1C´γβ1p´1β ppβ`γβ1q´1 ifβ1ďβ2,

0 ifβ1ąβ2.

Therefore, asnÑ 8,

? k

ˆ

Tpkpβq ´ γ1

1`γ1β

˙

ÝÑd Npλmβ, σ2βq where σ2β“ γ2 p2β

p

2pβ´1 “γ12 p 2p´1

p2 p2β

2p´1 2pβ´1.

Since Gn is a sum of independent random variables, it is then easy, using Lyapunov’s CLT and the delta-method, to derive the following asymptotic normality result for the family of estimatorspγ1,kpβqof γ1 defined by (10).

Corollary 1. Under the conditions of Theorem 1, asnÑ 8,

?

kppγ1,kpβq ´γ1qÝÑd Npλmγ1, σ2γ

1q where

σγ2

1 “ γ2 p2β

p

2pβ´1p1`βγ1q4“γ21 p

2p´1p1`βγ1q2 2p´1 2pβ´1 and

mγ1

"

´γ2β1D1C´γβ1p´1β ppβ`γβ1q´1p1`βγ1q2 if β1ďβ2,

0 if β1ąβ2.

Remark 1. Since pγ1,kpWq “ Tpkp0q “ pγ1,kp0q, taking β “ 0 in Theorem 1 or in Corollary 1 entails the asymptotic normality for pγ1,kpWq when pą 1{2, i.e. when γ2 ąγ1. When β ą0, the asymptotic normality forpγ1,kpβqholds under the weaker assumption pβą1{2, i.e. γ2ąγ1{p1`2γ1βq, and therefore allowing for stronger censoring in the tail. On the other hand the restriction becomes worse for negativeβ.

Whenβ1ďβ2 the absolute value of the asymptotic bias of pγ1,kpβqis increasing inβ. For a bias comparison for the caseβ1ąβ2one needs third order assumptions. On the other hand the asymptotic variance ofpγ1,kpβq is decreasing in β as long as pβ ă1 and is increasing as pβ ą1. It is difficult to say anything in general about the comparison of the asymptotic mean-squared error of pγ1,kpβq with respect toγp1,kpWq. It is of course, whenβ ą0 andpgets close to the value1{2, in favor ofpγ1,kpβq, at least from a theoretical point of view.

(6)

Remark 2. From Einmahl et al. (2008) it follows that the asymptotic variance of pγ1,kpHq is given by 1kγp12, which, for all1{2ăpă1is lower than the asymptotic variance 1k2p´121 of pγ1,kpWq.

On the other hand, in case β1 ďβ2 it follows from Beirlant et al. (2016) that the absolute value of the asymptotic bias of pγ1,kpHq equals pk{nqγβ˚|mγ1,0|1`γ1`γβ1β1

1 , which is larger than pk{nqγβ˚|mγ1,0| stated in the above theorem.

Remark 3. The asymptotic distribution of pγ1,kpWq in casepď 12, and in general of pγ1,kpβqin case pβ ď 12, is not known. The authors conjecture that asymptotic normality still holds, however with a slower rate than k´1{2, presumablyk´p whenpă1{2, but the method of proof outlined below could not be carried through in that case.

Combining the asymptotic developments of γp1,kpWqand Tpkpβq forβ “β1, which are both weighted sums of the same i.i.d. random variablesppEi´1q ´ pIUiďp´pq, and relying on the two-dimensional Lyapunov’s CLT and the delta-method, it is now possible to deduce the following asymptotic normality result for the bias-reduced version ofpγ1,kpWq introduced in Beirlant et al. (2018). The proof is omitted for brevity.

Corollary 2. Under the conditions of Theorem 1 and assuming that pą1{2, asnÑ 8, we have

?

kppγpBRq1,k ´γ1qÝÑd Np0, σpBRq2 q where, withδ“pβ1´p“γβ1,

σpBRq2 :“γ12 p 2p´1

pp`δq2ppp`δq2` p1´pq2`δ`δ2q δ2p2p´1`δqp2p´1`2δq .

Remark 4. While the asymptotic bias ofpγ1,kpBRqis always0, its asymptotic variance is in general larger than those of the competing estimators.

3. Proof of Theorem 1

Let us introduce the following important notations with 1ďi, jďk:

ξj“jlogZn´j`1,n

Zn´j,n

and ui,k“ i

k`1, (13)

as well as the ratios

RFyj “FsnKMpZn´j`1,nq

FsnKMpZn´k,nq and RFj“ FpZs n´j`1,nq

FpZs n´k,nq . (14)

If we also defineξj,k,β“ξj ifβ“0 and otherwise ξj,k,β “ j

ˆ k´β

ˆZn´j`1,n

Zn´k,n

˙

´k´β

ˆZn´j,n

Zn´k,n

˙˙

“ j β

˜ˆ Zn´j,n

Zn´k,n

˙´β

´

ˆZn´j`1,n

Zn´k,n

˙´β¸

then, fromp7q, we have

Tpkpβq:“

k

ÿ

j“2

FsnKMpZn´j`1,nq FsnKMpZn´k,nq

ξj,k,β

j where, using a Taylor expansion (of order 2) ,

ξj,k,β “ j β

ˆ

exp´βlog

´Zn´j,n

Zn´k,n

¯

´exp´βlog

´Zn´j`1,n

Zn´k,n

¯˙

“ ξj

ˆZn´j`1,n Zn´k,n

˙´β

`βξ2j 2j

˜ Z˜j,n Zn´k,n

¸´β

, (15) for some variables ˜Zj,n satisfyingZn´j,nďZ˜j,nďZn´j`1,n.

The overall objective is to appropriately use the relation between the variablesξjand standard exponential order statisticsEjpnqdefined below, as well as between the ratiosRFjandpZn´j`1,n{Zn´k,nq´β and uniform order statisticsVj,k(with meanuj,k) also defined below, in order to prove Theorem 1. Indeed, letpYiqdenote i.i.d. standard Pareto rv’s defined byZi“UHpYiq, and let

k´j`1,k“Yn´j`1,n{Yn´k,n, Vj,k“1{Y˜k´j`1,k, and Ejpnq“jlogpYn´j`1,n{Yn´j,nq, 1ďj ďk. (16) 5

(7)

It is then known that pV1,k, . . . , Vj,k, . . . , Vk,kq follows the distribution of the vector of order statistics of a standard uniform random sample of size k, and that the variables pE1pnq, . . . , Ekpnqq are jointly equal in distribution to a sample of sizek of independent standard exponential rv’s.

Beirlant et. al. (2002) showed that the rv’sξj andEjpnqare related as follows:

ξj “ξj1`Rn,j, where we defineξ1j“ pγ`uγβj,k˚bn,kqEjpnq, (17) wherebn,k is asymptotically equivalent to´γ2β˚D˚C´γβ˚´

k`1 n`1

¯γβ˚

, ask, nÑ 8andk{nÑ0. Properties of the remainder termRn,j will be detailed in Subsection 3.1 . Equationp17qthus implies that

ξj,k,β “ξj,k,β1 `Rn,j,β, (18)

where

ξj,k,β1 “ ξj1

ˆZn´j`1,n

Zn´k,n

˙´β

(19)

Rn,j,β “ Rn,j

ˆZn´j`1,n

Zn´k,n

˙´β

`βξ2j 2j

˜ Z˜j,n

Zn´k,n

¸´β

. (20)

We can now start breaking downTpkpβq ´1`γγ1

1β into several terms by writing:

Tpkpβq ´ γ1 1`γ1β “

k

ÿ

j“2

RFyj

ξj,k,β

j ´ γ1 1`γ1β “

˜ k ÿ

j“2

RFyj

ξj,k,β1

j ´ γ1 1`γ1β

¸

`

k

ÿ

j“2

yRFj

Rn,j,β j

k

ÿ

j“2

˜ RFyj

RFj

´1

¸ RFj

ξj,k,β1 j

`

˜ k ÿ

j“2

RFj

ξj,k,β1

j ´ γ

k`1

k

ÿ

j“2

upj,kβ´1

¸

`

˜ γ k`1

k

ÿ

j“2

upj,kβ´1´ γ pβ

¸

`

k

ÿ

j“2

RFyj

Rn,j,β

j

“Tk,np1q`Tk,np2q`Rp0qn `Rnp1q, (21) with

Tk,np1q

k

ÿ

j“2

´

logRFyj´logRFj

¯ RFj

ξ1j,k,β

j `

k

ÿ

j“2

#

´logRFyj

RFj ´

˜

1´RFyj

RFj

¸+

RFj

ξ1j,k,β j

“Tk,np1,1q ` Tk,np1,2q. (22)

The termTk,np1,1q is introduced in order to make logarithms of the Kaplan-Meier product appear, leading to manageable sums. Indeed, by definition ofFsnKM we find that

logRFyj

k

ÿ

i“j

δn´i`1,nlog`i´1

i

˘ and logRFj “ ´1 γ1

k

ÿ

i“j

ξi i `

ˆ

logRFj` 1 γ1

logZn´j`1,n Zn´k,n

˙ . Consequently, defining the following important notations

RFj,β “RFj

ˆZn´j`1,n Zn´k,n

˙´β

i“2, . . . , k, (23)

and

Si,k,β“ 1 i

i

ÿ

j“2

RFj,β

ξj1

j , i“2, . . . , k, (24)

(8)

by inverting sums we obtain Tk,np1,1q

k

ÿ

i“2

„1 γ1

i´γq ``

δn´i`1,nilog`i´1

i

˘` p˘

Si,k,β´

k

ÿ

j“2

ˆ

logRFj` 1 γ1

logZn´j`1,n

Zn´k,n

˙ RFj,β

ξj1 j

“Tk,np1,1,1q ´ Tk,np1,1,2q. (25)

To summarize,

Tpkpβq ´ γ1

1`γ1β “Tk,np1,1,1q´Tk,np1,1,2q`Tk,np1,2q`Tk,np2q`Rp0qn `Rp1qn . (26) Introducing now the additional notations

ci “1`ilogi´1

i , Ai,n“ppEipnq´1q ´ pδn´i`1,n´pq, and Bi,n“ 1 γ1

bn,kuβi,k˚γEpnqi , and usingp17q, one readily obtains the following formula for the main termTk,np1,1,1q:

Tk,np1,1,1q

k

ÿ

i“2

Ai,nSi,k,β`

k

ÿ

i“2

Bi,nSi,k,β`

k

ÿ

i“2

δn´i`1,nciSi,k,β` 1 γ1

k

ÿ

i“2

Rn,iSi,k,β. (27)

In the sequel, we will show that the variablesSi,k,β can be approximated appropriately by pγ

β

1

k`1upi,kβ´1. Also, as it is explained in Einmahl et al. (2008), on one hand the parameter p “ γ{γ1γγ2

12 is the limit of ppzq “ Ppδ “ 1|Z “ zq as z Ñ 8, and on the other hand the original observations pZi, δiqiďn

have the same distribution as the variablespZi1, δ1iqiďn, wherepZi1qiďnis an independent copy of the sequence pZiqiďn, δi1 “IUiďppZi1qandpUiqiďndenotes some given i.i.d. sequence of standard uniform random variables (shortened to rv’s), which are independent of the sequencepZi1qiďn. We thus carry on the proof by considering from now on that the observationsδi andZi are related by the formula

δi“IUiďppZiq.

Mimicking what is done in Einmahl et al. (2008), we will later (see proof of Lemma 8) approximate the rv’sδn´i`1,n by i.i.d Bernoulli rv’sIUiďp.

The main goal will thus be to prove that the termřk

i“2Ai,nSi,k,β above is (up to a bias term) close to the main random term appearing in Theorem 1

γ pβ

1 k`1

k

ÿ

i“2

tppEi´1q ´ pIUiďp´pquupi,kβ´1 (28) The other terms inp27qwill be bias or remainder terms, noting that the coefficients ci are close to 0.

The second termTk,np1,1,2q inp26qturns out to be adding to the bias since it only involves the slowly varying functionlF. The treatment of the third termTk,np1,2qabove is very important since it strongly participates to the approximation of a ratio of the formFsnKMpxq{FsnKMpyqby the ratioFspxq{Fpyq, for very large values ofs xandy. Such approximation is delicate. Invoking results from survival analysis, we will show however that Tk,np1,2qis a remainder term.

Next,Tk,np2q is decomposed using the variablespVj,kqintroduced inp16q:

Tk,np2q“ γ k`1

k

ÿ

j“2

´

Vj,kpβ´upj,k,β ¯ u´1j,k`

k

ÿ

j“2

Vj,kpβξj1 ´γ j `

k

ÿ

j“2

´

RFj,β´Vj,kpβ¯ξj1

j. (29)

According to the definition of ξ1j, we can see that the second term of this decomposition is close to

γ k`1

řk

j“2pEj ´1qupj,kβ´1. While this is part of the main term described in p28q, we will find in Proposi- tion 3 that this term is neutralized by another part ofTk,np2q, so thatTk,np2qis just a bias term. FinallyRp0qn and Rp1qn will also turn out to be remainder terms.

The rest of the section is organised as follows. In subsection 3.1, we set additional notations and state 7

(9)

some preliminary approximation results needed in the sequel. In subsection 3.2 we state the asymptotic results for all terms in (26) and conclude the proof.

3.1. Additional notations and important preliminary results

‚First, in the sequel we will regularly work under the following event, for someαą1 arbitrary close to 1, En,α“ @1ďjďk , α´1uj,kďVj,kďαuj,k

(, (30)

whereuj,kandVj,k are defined inp13qandp16q. According to Shorack and Wellner (1986) (chapter 8), for everyαą1 we have limnÑ8PpEn,αq “1. In the proof section, working ”on the eventEn,α” will thus mean stating bounds or results which are valid with an arbitrary large probability.

‚ Secondly, the remainder term Rn,j defined in the second-order exponential representation of the log- spacingsp17qsatisfy, according to Theorem 2.1 in Beirlant et. al. (2002),

ˇ ˇ ˇ

řk j“i

Rn,j j

ˇ ˇ

ˇ“oPpbn,klog`pu1

i,kqq. (31)

‚Thirdly, under assumptionsp1qandp2q, sinceZi“UHpYiq, one can show usingp1qandp4qthat RFj,β“ FspZn´j`1,nq

FspZn´k,nq

ˆZn´j`1,n

Zn´k,n

˙´β

“Vj,kpβp1`Cj,k,βq, (32)

whereCj,k,β“Yn´k,n´γβ˚DβC´γβ˚pY˜k´j`1,k´γβ˚ ´1qp1`oPp1qqandDβ“D´γβD˚withD“ ´γγ

1D˚ifβ2ăβ1, D“D1´γγ

1D˚ ifβ1ďβ2.

‚ Finally, using R´enyi representation (see for example p4.3q in Beirlant et. al. (2004)) and a Taylor ex- pansion, one obtains that for every 2ďjďk,

Vj,kpβ´upj,k,β “ ´pβupj,k,β

˜ k ÿ

i“j

Eipnq´1 i

¸

´ pβupj,k,β

˜ k ÿ

i“j

1 i ´log

ˆk`1 j

˙¸

` p2β

2

j,kpβplogpVj,k{uj,kqq2, (33) where ˜Vj,k lies between Vj,k and uj,k. The combination of p32qand p33q thus means that the ratioRFj,β

will be appropriately approximated by the deterministic weightsupj,k,β . 3.2. Asymptotics for the terms in (26)and conclusion of the proof

The first result stated concerns the termTk,np1,1,1q, which contains the main term of the decomposition of Tpkpβq ´1`γγ1

1β (see relationsp27qandp28q).

Proposition 1. Under the conditions of Theorem 1, asnÑ 8, we have paq ?

k

i“2Ai,nSi,k,β“Gn`λbβ`oPp1q, where bβ“ ´γ p

pβp1´pqpDγq˚β˚C´γβ˚{ppβ`γβ˚q and pDγq˚

$

&

%

γ1D1 if β1ăβ2

´γ2D2 if β2ăβ1

γ1D1´γ2D2 if β1“β2

andGn is equal in distribution to γ pβ

?1 k

k

ÿ

i“2

upi,kβ´1pppEi´1q ´ pIUiďp´pqq,

where pEiqandpδiqare independent iid samples with distributions standard exponential and standard uniform. The variable Gn is asymptotically centred gaussian distributed with varianceσ2βγ

2

p2β p 2pβ´1. pbq ?

k

i“2Bi,nSi,k,β“λb˚`oPp1q, whereb˚“ ´γ2pp

βD˚β˚C´γβ˚{ppβ`γβ˚q.

pcq řk

i“2δn´i`1,nciSi,k,β “oPpk´1{2q

(10)

pdq řk

i“2Rn,iSi,k,β“oPpk´1{2q

The following proposition concerns the terms Rp0qn ,Rp1qn ,Tk,np1,2qandTk,np1,1,2q. The last two of these terms result from the replacement of the ratios of Kaplan-Meier estimates yRFj by the ratios of the true survival function valuesRFj.

Proposition 2. Under the conditions of Theorem 1, asnÑ 8,

paqRp0qn “opk´1{2q, pbqRp1qn “oPpk´1{2q, pcqTk,np1,2q“oPpk´1{2q, pdqTk,np1,1,2q“D1p1`oPp1qqZn´k,n´β1 řk

j“2

ˆ´Z

n´j`1,n

Zn´k,n

¯´β1

´1

˙ RFj,βξ

1 j

j ` řk

j“2Ln,jRFj,βξ

1 j

j, where 0ďLn,jďD12pZn´j`1,n´β1 ´Zn´k,n´β1 q2p1`oPp1qq.

Moreover,Tk,np1,1,2q“bKMpk{nqγβ˚`oPpk´1{2q, wherebKM is equal to´γp2

βD1β1C´γβ1{ppβ`γβ1qifβ1ďβ2

and to0 ifβ1ąβ2.

The last result concerns the behaviour ofTk,np2q : it turns out that it only generates a bias term.

Proposition 3. We have

Tk,np2q“ ´ pβγ k`1

k

ÿ

j“2

pEjpnq´1q

˜ 1 j

j

ÿ

i“2

upi,kβ´1´ 1 pβupj,kβ´1

¸

´ pβγ k`1

k

ÿ

j“2

pEpnqj ´1qupj,kβ´1

˜ k ÿ

i“j

Eipnq´1 i

¸

` bn,k

k`1

k

ÿ

j“2

upj,kβ´1`γβ˚Epnqj ` bn,k

k`1

k

ÿ

j“2

1

uj,kpVj,kpβ´upj,k,β quγβj,k˚Ejpnq `

k

ÿ

j“2

Vj,kpβCj,k,β

ξj1 j

´ pβγ k`1

k

ÿ

j“2

upj,kβ´1

˜ k ÿ

i“j

1

i ´logk`1 j

¸

Epnqj ` ppβq2γ 2pk`1q

k

ÿ

j“2

1 uj,k

j,kpβ ˆ

log ˆVj,k

uj,k

˙˙2

Ejpnq. (34) Moreover, under the conditions of Theorem 1, when nÑ 8we have

Tk,np2q“˜b˚pk{nqγβ˚`oPpk´1{2q, where˜b˚“ ´γ

2β˚C´γβ˚

pβ`γβ˚ pD˚`Dpβ

βq.

The proofs of all these results can be found in the Appendix. Now, since

?k ˆ

Tpkpβq ´ γ1 1`γ1β

˙

?kTk,np1,1,1q`

?kTk,np1,2q´

?kTk,np1,1,2q`

?kTk,np1,1,3q`

?kTk,np2q`

?kRp0qn `

?kRp1qn and assumption (12) holds, by combination of relation (27) and propositions 1, 2 and 3, we have proved that Theorem 1 holds,i.e. that

? k

ˆ

Tpkpβq ´ γ1 1`γ1β

˙

“Gn`λmβ`oPp1qÝÑd Npλmβ, σβ2q,

because it can be checked thatbβ`b˚´bKM`˜b˚is actually equal to the valuemβdescribed in the statement of Theorem 1.

4. Finite sample comparisons

In this section, we consider a comparison (using finite sample simulations) in terms of observed bias and mean squared error (MSE) of the estimators considered in this paper : pγpHq1,k, pγ1,kpWq “γp1,kp0q, pγ1,kpβqwith β ‰0, and pγ1,kpBRq. Forγp1,kpβq, we consider three different values ofβ (´1, 0.5 and 1.5). In the expression of pγpBRq1,k , the second order parameter β1 of F should be estimated. Instead, we proceed as in Beirlant et al. (2018) (see equations p13q and p14q therein) by reparametrizing β11,kpWq by ´ρ1 and we consider two different values ofρ1 (´1.5 and´2) in the following formula

1,kpBRq1q “γp1,kpWq´p1´ρ1q2p1´2ρ1q ρ21

˜ Tpk

`´ρ1{pγpW1,kq˘

´ pγ1,kpWq 1´ρ1

¸ . 9

Références

Documents relatifs

This work presents a precise quantitative, finite sample analysis of the double descent phenomenon in the estimation of linear and non-linear models.. We make use of a

This is in line with Figure 1(b) of [11], which does not consider any covariate information and gives a value of this extreme survival time between 15 and 19 years while using

The main result of this paper is a theorem giving sufficient conditions to ensure the asymp- totic normality of the Sobol index estimators based on the Monte-Carlo procedure

Their asymptotic normality is derived under quite natural and general extreme value conditions, without Lipschitz conditions on the boundary and without recourse to assumptions

A model of conditional var of high frequency extreme value based on generalized extreme value distribution [j]. Journal of Systems & Management,

The population under Chinese rule have lived with agents of censorship and daily practices of self-censorship in imperial times, under warlords during the Republican

In order to validate the assumption of underlying conditional Pareto-type distributions for the claim sizes, we constructed local Pareto quantile plots of claim sizes for which

(2011), who used a fixed number of extreme conditional quantile estimators to estimate the conditional tail index, for instance using the Hill (Hill, 1975) and Pickands (Pickands,