Extreme value statistics for censored data with heavy tails under competing risks

(1)

HAL Id: hal-01418370

https://hal.archives-ouvertes.fr/hal-01418370v2

Submitted on 13 Jan 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Extreme value statistics for censored data with heavy tails under competing risks

Julien Worms, Rym Worms

To cite this version:

Julien Worms, Rym Worms. Extreme value statistics for censored data with heavy tails under com-

peting risks. Metrika, Springer Verlag, 2018, 81 (7), pp.849-889. �10.1007/s00184-018-0662-3�. �hal-

01418370v2�

(2)

Extreme value statistics for censored data with heavy tails under competing risks

Julien Worms (1) & Rym Worms

¹

(2)

(1) Universit´ e Paris-Saclay / Universit´ e de Versailles-Saint-Quentin-En-Yvelines Laboratoire de Math´ ematiques de Versailles (CNRS UMR 8100),

F-78035 Versailles Cedex, France, e-mail : julien.worms@uvsq.fr

(2) Universit´ e Paris-Est

Laboratoire d’Analyse et de Math´ ematiques Appliqu´ ees (CNRS UMR 8050),

UPEMLV, UPEC, F-94010, Cr´ eteil, France, e-mail : rym.worms@u-pec.fr

1Corresponding author

(3)

Extreme value statistics for censored data with heavy tails under competing risks

Abstract

This paper addresses the problem of estimating, in the presence of random censoring as well as competing risks, the extreme value index of the (sub)-distribution function associated to one partic- ular cause, in the heavy-tail case. Asymptotic normality of the proposed estimator (which has the form of an Aalen-Johansen integral, and is the first estimator proposed in this context) is es- tablished. A small simulation study exhibits its performances for finite samples. Estimation of extreme quantiles of the cumulative incidence function is also addressed.

AMS Classification. Primary 62G32 ; Secondary 62N02

Keywords and phrases. Extreme value index. Tail inference. Random censoring. Competing Risks. Aalen- Johansen estimator.

2

(4)

1. Introduction

The study of duration data (lifetime, failure time, re-employment time...) subject to random censoring is a major topic of the domain of statistics, which finds applications in many areas (in the sequel we will, for convenience, talk about lifetimes to refer to these observed durations, but without restricting our scope to lifetime data analysis). In general, the interest lies in obtaining informations about the central characteristics of the underlying lifetime distribution (mean lifetime or survival probabilities for instance), often with the objective of comparing results between different conditions under which the lifetime data are acquired. In this work, we will address the problem of inferring about the (upper) tail of the lifetime distribution, for data subject both to random (right) censoring and competing risks.

Suppose indeed that we are interested in the lifetimes of n individuals or items, which are subject to K different causes of death or failure, and to random censorship (from the right) as well. We are particularly interested in one of these causes (this main cause will be considered as cause number k thereafter, where k P t1, . . . , Ku), and we suppose that all causes are exclusive and are likely to be dependent on the others.

The censoring time is assumed to be independent of the different causes of death or failure and of the observed lifetime itself. However, since the other causes (different from the k-th cause of interest) generally cannot be considered as independent of the main cause, in no way they can be included in the censoring mechanism.

This prevents us from relying on the basic independent censoring statistical framework, and we are thus in the presence of what is called a competing risks framework (see Moeschberger and Klein (1995)).

For instance, if a patient is suffering from a very serious disease and starts some treatment, then the final outcome of the treatment can be death due to the main disease, or death due to other causes (nosocomial infection for instance). And censoring can occur due to loss of follow up or end of the clinical study. Another example, in a reliability experiment, is that the failure of some mechanical system can be due to the failure of a particular subpart, or component, of the system : since separating the different components for studying the reliability of only one of them is generally not possible, accounting for these different competing causes of failure is necessary. Another field where competing risks often arise are labor economics, for instance in re-employment studies (see Fermanian (2003) for practical examples).

One way of formalising this is to say that we observe a sample of n independent couples pZ

_i

, ξ

_i

q

1ďiďn

where

Z

i

“ minpX

i

, C

i

q, δ

i

“ I

XiďCi

, ξ

i

“

"

0 if δ

_i

“ 0, C

i

if δ

_i

“ 1.

The i.i.d. samples pX

_i

q

iďn

and pC

_i

q

iďn

, of respective continuous distribution functions F and G, represent the lifetimes and censoring times of the individuals, and are supposed to be independent. For convenience, we will suppose in this work that they are non-negative. The variables pC

i

q

iďn

form a discrete sample with values in t1, . . . , Ku, and represent the causes of failure or death of the n individuals or items. It is important to note that these causes are observed only when the data is uncensored (i.e. when δ

i

“ 1), therefore we only observe the ξ

i

’s, not the complete C

i

’s.

One way of considering the failure times X

_i

is to write

X

i

“ minpX

i,1

, . . . , X

i,K

q,

where the variable X

i,k

is a (rather artificial) variable representing the imaginary latent lifetime of the i-th individual when the latter is only affected by the k-th cause (the other causes being absent). This viewpoint may be interesting in its own right, but we will not keep on considering it in the sequel, one reason being that such variables X

i,1

, . . . , X

i,K

cannot be realistically considered as independent, and their respective distributions are of no practical use or interpretability (as explained and demonstrated in the competing risks literature, these distributions are in fact not statistically identifiable, see Tsiatis (1975) for example).

The object of interest is the probability that a subject dies or fails after some given time t, due to the k-th cause, for high values of t. This quantity, denoted by

F s

^pkq

ptq “ P r X ą t , C “ k s, is related to the so-called cumulative incidence function F

^pkq

defined by

F

^pkq

ptq “ P r X ď t , C “ k s.

Note that F s

^pkq

ptq is not equal to 1´F

^pkq

ptq, but to P pC “ kq´F

^pkq

ptq, because F

^pkq

is only a sub-distribution function. However we have F s

^pkq

ptq “ ş

8

dF

^pkq

puq. In the sequel, the notation ¯ Sp.q “ Sp8q ´ Sp.q will be

(5)

used, for any non-decreasing function S.

In this paper, we are interested in investigating the behaviour of F s

^pkq

ptq for large values of t. This amounts to statistically study extreme values in a context of censored data under competing risks, and will lead us to consider some extreme value index γ

k

related to F s

^pkq

, which will be defined in a few lines. Equivalently, the object of interest is the high quantile x

^pkqp

“ p F s

^pkq

q

^´

ppq “ inft x P R ; F s

^pkq

pxq ě p u when p is close to 0, which can be interpreted as follows (in the context of lifetimes of individuals or failure times of systems) : in the presence of the other competing causes, a given individual (or item) will die (or fail), due to cause k after such a time x

^pkqp

, only with small probability p. A nonparametric inference for quantiles of fixed (and therefore not extreme) order, in the competing risk setting, has been already proposed in Peng and Fine (2007).

One way of addressing this problem could be through a parametric point of view (see Crowder (2001) for further methods in the competing risk setting), however, the non-parametric approach is the most common choice of people faced with data presenting censorship or competing risks. Of course, the standard Kaplan- Meier method for survival analysis does not yield valid results for a particular risk if failures from other causes are treated as censoring times, because the other causes cannot always be considered independent of the particular cause of interest.

The commonly used nonparametric estimator of the cumulative incidence function F

^pkq

is the so-called Aalen-Johansen estimator (see Aalen and Johansen (1978), or Geffray (2009) equation p7q) defined by

F

_n^pkq

ptq “ ÿ

Z_iďt

δ

i

I

_Ci“k

n G s

n

pZ

_i^´

q ,

where G s

n

denotes the standard Kaplan-Meier estimator of G (and G s

n

pt

^´

q denotes lim

sÒt

G s

n

psq), so that we can introduce the following estimator for F s

^pkq

:

F s

_n^pkq

ptq “ ÿ

Z_iąt

δ

i

I

_Ci“k

n G s

n

pZ

_i^´

q .

But if the value t considered is so high that only very few (if any) observations Z

_i

(such that C

i

“ k) exceed t, then this purely nonparametric approach will lead to very unstable estimations F s

n^pkq

ptq of F s

^pkq

ptq. This is why a semiparametric approach is desirable, and the one we will consider here is the one inspired by classical extreme value theory.

First note that in this paper, we will only consider situations where the underlying distributions F and G of the variables X and C are supposed to present power-like tails (also commonly named heavy tails), and we will focus on the evaluation of the order of this tail. Our working hypothesis will be thus that the different functions F s

^pkq

(for k “ 1, . . . , K) as well as G s “ 1 ´ G belong to the Fr´ echet maximum domain of attraction. In other words, we assume that they are (see Definition 1 in the Appendix) regularly varying at infinity, with respective negative indices ´1{γ

₁

, . . . , ´1{γ

_K

and ´1{γ

_C

@1 ď k ď K, @x ą 0, lim

tÑ`8

F s

^pkq

ptxq{ F s

^pkq

ptq “ x

^´1{γ^k

and lim

tÑ`8

Gptxq{ s Gptq “ s x

^´1{γ^C

. (1) Consequently, F s “ 1´F “ ř

K

k“1

F s

^pkq

and H s “ F s G s (the survival function of Z) are regularly varying (at `8) with respective indices ´1{γ

F

and ´1{γ, where γ

F

“ maxpγ

1

, . . . , γ

K

q and γ satisfies γ

^´1

“ γ

F´1

` γ

C´1

(these relations are constantly used in this paper).

The estimation of γ

_F

has been already studied in the literature, as it corresponds to the random (right) censoring framework, without competing risks. We can cite Beirlant et al. (2007) and Einmahl et al. (2008), where the authors propose to use consistent estimators of γ divided by the proportion of non-censored observations in the tail, or Worms and Worms (2014), where two Hill-type estimators are proposed for γ

F

, based on survival analysis techniques. However, our target here is γ

k

(for a fixed k “ 1, . . . , K) and the point is that there seems to be no way to deduce an estimator of γ

k

from an estimator of γ

F

. Note that the useful trick used in Beirlant et al. (2007) and Einmahl et al. (2008) to construct an estimator of γ

F

does not seem to be extendable to this competing risks setting. To the best of our knowledge, our present paper is the first one addressing the problem of estimating the cause-specific extreme value index γ

k

.

4

(6)

Considering assumption (1), it is simple to check that, for a given k, we have

tÑ`8

lim 1 F s

^pkq

ptq

ż

`8 t

logpu{tq dF

^pkq

ptq “ γ

k

.

It is therefore most natural to propose the following (Hill-type) estimator of γ

k

, for some given threshold value t

n

(assumptions on this threshold are detailed in the next section) :

γ p

_n,k

“ ż

φ p

_n

puqdF

_n^pkq

puq where φ p

_n

puq “ 1 F s

n^pkq

pt

_n

q

log ˆ u

t

_n

˙ I

uątn

,

which can be also written as p γ

_n,k

“ 1

n F s

n^pkq

pt

_n

q

n

ÿ

i“1

logpZ

_i

{t

_n

q

G s

_n

pZ

_i^´

q I

ξi“k

I

Ziątn

“ 1 n F s

n^pkq

pt

_n

q

ÿ

Z_piqątn

logpZ

_piq

{t

n

q

G s

_n

pZ

_pi´1q

q δ

_piq

I

Cpiq“k

,

where Z

_p1q

ď . . . ď Z

_pnq

are the ordered random variables associated to Z

₁

, . . . , Z

_n

, and δ

_piq

and C

piq

are the censoring indicator and cause number which correspond to the order statistic Z

_piq

. It is clear that this estimator is a generalisation of one of the estimators proposed in Worms and Worms (2014), in which the situation K “ 1 (with only one cause of failure/death) was considered. The asymptotic result we prove in the present work is then valid in the situation studied in the latter, where only consistency was proved and a random threshold was used.

Our paper is organized as follows: in Section 2, we state the asymptotic normality result of the proposed estimator, and of a corresponding estimator of an extreme quantile of the cumulative incidence function.

Section 5 is devoted to the proofs. In Section 3, we present some simulations in order to illustrate finite sample behaviour of our estimator. Some technical aspects of the proofs are postponed to the Appendix.

2. Assumptions and Statement of the results

The central limit theorem which is going to be proved has the rate ?

v

_n

where v

_n

“ n F s

n^pkq

pt

_n

q Gpt s

_n

q and t

_n

is a threshold tending to 8 with the following constraint

v

n

nÑ8

ÝÑ `8 such that n

^´η⁰

v

n

nÑ8

ÝÑ `8 for some η

0

ą 0. (2) If we note l

_k

the slowly varying function associated to F s

^pkq

(i.e. such that F s

^pkq

pxq “ x

^´1{γ^k

l

_k

pxq in condition (1)), the second order condition we consider is the classical SR2 condition for l

_k

(see Bingham, Goldie and Teugels (1987)),

@x ą 0, l

k

ptxq

l

k

ptq ´ 1

^tÑ8

„ h

ρ_k

pxq gptq p@x ą 1q, (3) where g is a positive measurable function, slowly varying with index ρ

k

ď 0, and h

ρ_k

pxq “

^x^ρk_ρ^´1

k

when

ρ

k

ă 0, or h

ρ_k

pxq “ log x when ρ

k

“ 0.

Theorem 1. Under assumptions p1q, p2q and p3q, if there exists λ ě 0 such that ?

v

_n

gpt

_n

q

^nÑ8

ÝÑ λ, and if γ

_k

ă γ

_C

then we have

? v

n

p p γ

n,k

´ γ

k

q ÝÑ

^d

N pλm, σ

²

q as n Ñ 8 where

m “

$

&

%

γ_k²

1´γkρk

if ρ

k

ă 0, γ

_k²

if ρ

k

“ 0,

and σ

²

“ γ

_k²

p1 ´ rq

³

` p1 ` r

²

q ´ 2cr ˘ ,

with c “ lim

xÑ8

F s

^pkq

pxq{ Fpxq P r0, s 1s and r “ γ

k

{γ

C

Ps0, 1r.

Remark 1. Note that when γ

k

ă γ

F

, then c “ 0, and, when γ

k

“ γ

F

and c “ 1 (for instance when there is only one cause of failure/death), then σ

²

reduces to γ

_F²

{p1 ´ rq.

Proposition 1. Under assumptions p1q and p2q, we have p γ

n,k P

ÝÑ γ

k

as n Ñ 8.

Remark 2. The condition γ

k

ă γ

C

(weak censoring) is not necessary for the consistency of γ

n,k

.

(7)

Now, concerning the estimation of an extreme quantile x

^pkqpn

(of order p

n

tending to 0) associated to F s

^pkq

, we propose the usual Weissman-type estimator (in this heavy tailed context), associated to the threshold t

n

used in the estimation of γ

k

,

ˆ x

^pkq_p

n,t_n

“ t

_n

˜

F s

n^pkq

pt

_n

q p

n

¸

γp_n,k

,

where p

n

is assumed to satisfy the constraint p

n

“ o `

F s

^pkq

pt

n

q ˘

. Remind that by definition F s

^pkq

px

^pkqp_n

q “ p

n

, and thus the definition of this estimator is based on the fact that, by the assumed regular variation of F s

^pkq

, the ratio F s

^pkq

px

^pkqp_n

q{ F s

^pkq

pt

_n

q is close to px

^pkqp_n

{t

_n

q

^´1{γ^k

.

Corollary 1. Under the assumptions of Theorem 1, if in addition ρ

_k

ă 0 (in (3)) and d

_n

“ F s

^pkq

pt

_n

q{p

_n

Ñ 8 satisfies the condition

? v

n

{ logpd

n

q

^nÑ8

ÝÑ 8, (4)

then (with λ, m and σ

²

being defined in the statement of Theorem 1)

? v

_n

logpd

n

q

˜ x ˆ

^pkq_p

n,t_n

x

^pkqpn

´ 1

¸

ÝÑ

d

N pλm, σ

²

q as n Ñ 8.

3. Simulations

In this section, a small simulation study is conducted in order to illustrate the finite-sample behaviour of our new estimator in some simple cases, and discuss the main issues associated with the competing risks setting.

For simplicity, we focus on the situation with two competing risks (K “ 2), also called causes below, and our aim is the extreme value index γ

1

associated to the first cause. Data are generated from one of the following two models : for c

1

, c

2

non-negative constants satisfying c

1

` c

2

“ 1, we consider the following (sub-)distribution for each cause-specific function F s

^pkq

(k P t1, 2u) :

´ Fr´ echet : F s

^pkq

ptq “ c

k

expp´t

^´1{γ^k

q, for t ě 0 ;

´ Burr : F s

^pkq

ptq “ c

k

p1 ` t

^τ^k

{βq

^´1{pγ^k^τ^k^q

, for t ě 1, where τ

k

ą 0, β ą 0.

The lifetime X , of survival function ¯ F “ F ¯

^p1q

` F ¯

^p2q

, is generated by the inversion method (with numerical computation of F s

^´1

). Censoring times are then generated from a Fr´ echet or a Burr distribution :

Gptq “ s expp´t

^´1{γ^C

q pt ě 0q or Gptq “ p1 s ` t

^τ^C

{βq

^´1{pγ^C^τ^C^q

pt ě 1q.

In this section, we consider (as it is often done in simulation studies) that the threshold t

_n

used in the definition of our new estimator ˆ γ

_n,1

is taken equal to Z

_pn´k_n_q

(i.e. we consider it as random). One aim of this section is to show how our estimator (with random threshold)

ˆ

γ

1

“ 1

n F s

n^p1q

pZ

_pn´k_n_q

q

kn

ÿ

i“1

logpZ

_pn´i`1q

{Z

_pn´k_n_q

q

G s

n

pZ

_pn´i,nq

q δ

_pn´i`1q

I

_Cpn´i`1q“1

of γ

1

behaves when the proportion c

1

of cause 1 events varies : we consider c

1

P t1, 0.9, 0.7, 0.5u, the case c

₁

“ 1 corresponding to the simple censoring framework, without competing risk.

Another aim is to illustrate the impact of dependency between the causes, when estimating the tail. The starting point is that, if cause 2 could be considered independent of cause 1, then we could (and would) include it in the censoring mechanism and we would be in the simple random censoring setting, without competing risk. In this case, it would be possible to estimate γ

₁

by one of the following two estimators, the first one being proposed in Beirlant et al. (2007) (a Hill estimator weighted with a constant weight), and the second one in Worms and Worms (2014) (a Hill estimator weighted with varying Kaplan-Meier weights):

ˆ

γ

₁^{pBDF Gq}

“ 1 k

_n

k_n

ÿ

i“1

1 ˆ

p

₁

logpZ

_pn´i`1q

{Z

_pn´k_n_q

q (5)

ˆ

γ

₁^pKM^q

“ 1 n F s

n,b

pZ

_pn´k_n_q

q

k_n

ÿ

i“1

δ

_pn´i`1q

I

Cpn´i`1q“1

G s

n,b

pZ

_pn´i,nq

q logpZ

_pn´i`1q

{Z

_pn´k_n_q

q, (6)

6

(8)

100 300

−0.010.03

kn

Bias c1=1

c1=0.9 c1=0.7 c1=0.5

100 300

0.0000.003

kn

MSE c1=1

c1=0.9 c1=0.7 c1=0.5

(a) Fr´echet case, γ1“0.1, γ2“0.25, γ_C“0.3

100 300

−0.010.03

kn

Bias c1=1

c1=0.9 c1=0.7 c1=0.5

100 300

0.0000.003

kn

MSE c1=1

c1=0.9 c1=0.7 c1=0.5

(b) Same case as (a) but for Burr distribution

100 300

−0.010.03

kn

Bias c1=1

c1=0.9 c1=0.7 c1=0.5

100 300

0.0000.003

kn

MSE c1=1

c1=0.9 c1=0.7 c1=0.5

(c) Fr´echet case, γ1“0.1, γ2“0.25, γC“0.2

100 300

−0.010.03

kn

Bias c1=1

c1=0.9 c1=0.7 c1=0.5

100 300

0.0000.003

kn

MSE c1=1

c1=0.9 c1=0.7 c1=0.5

(d) Same case as (c) but for Burr distribution

100 300

−0.010.03

kn

Bias c1=1

c1=0.9 c1=0.7 c1=0.5

100 300

0.0000.003

kn

MSE c1=1

c1=0.9 c1=0.7 c1=0.5

(e) Fr´echet case, γ1“0.25, γ2“0.1, γC“0.45

100 300

−0.010.03

kn

Bias c1=1

c1=0.9 c1=0.7 c1=0.5

100 300

0.0000.003

kn

MSE c1=1

c1=0.9 c1=0.7 c1=0.5

(f) Same case as (e) but for Burr distribution

Figure 1: Comparison of bias and MSE (respectively left and right in each subfigure) ofpγn,kfor different values ofc1; in figures paq,pcqandpeq,X andCare Fr´echet distributed, but in figurespbq,pdqandpfqthey are Burr distributed.

(9)

100 300

−0.010.03

kn

Bias

100 300

0.00000.0025

kn

MSE

(a) Fr´echet case, γ1“0.1, γ2“0.25, γC“0.3, andc1“1

100 300

−0.010.03

kn

Bias

100 300

0.00000.0025

kn

MSE

(b) Case (a) but withc1“0.9

100 300

−0.010.03

kn

Bias

100 300

0.00000.0025

kn

MSE

(c) Fr´echet case, γ1“0.1, γ2“0.25, γC“0.2, andc1“1

100 300

−0.010.03

kn

Bias

100 300

0.00000.0025

kn

MSE

(d) Case (c) but withc1“0.9

100 300

−0.010.03

kn

Bias

100 300

0.0000.003

kn

MSE

(e) Fr´echet case, γ1“0.1, γ2“0.25, γC“0.45, andc1“1

100 300

−0.010.03

kn

Bias

100 300

0.0000.003

kn

MSE

(f) Case (e) but withc1“0.9

Figure 2: Comparison of bias and MSE (respectively left and right in each subfigure) forγp_n,k (plain thick), ˆγ^{pBDF Gq}₁ (plain thin) and ˆγ₁^pKMq (dashed), for Fr´echet distributedXandC.

where, in Equation p5q, ˆ p

₁

“

_k¹

n

ř

kn

i“1

δ

_pn´i`1q

I

Cpn´i`1q“1

, and in Equation p6q, the Kaplan Meier estimators F s

_n,b

and G s

_n,b

are based on the ˜ δ

_i

“ δ

_i

I

Ci“1

. These two estimators consider the uncensored lifetimes associated to cause 2 as independent censoring times. Comparing our new estimator with these latter two estimators, when c

1

ă 1, will empirically prove that considering cause 2 as a competing risk independent of cause 1 has a great (negative) impact on the estimation of γ

1

. Note that when c

1

“ 1, the new estimator ˆ

γ

1

and ˆ γ

₁^pKMq

are exactly the same (therefore the thick and dashed lines in sub-figures (a), (c) and (e) of Figures 2 and 3 are overlapping, identical).

We address these two aims for each set-up (Fr´ echet, or Burr), by generating 2000 datasets of size 500, with three configurations of the triplet pγ

₁

, γ

₂

, γ

_C

q : p0.1, 0.25, 0.3q (γ

₁

ă γ

₂

, moderate censoring γ

_C

ą γ

_F

), p0.1, 0.25, 0.2q (γ

₁

ă γ

₂

, heavy censoring γ

_C

ă γ

_F

), or p0.25, 0.1, 0.45q (γ

₁

ą γ

₂

, moderate censoring γ

_C

ă γ

F

). Median bias and mean squared error (MSE) of the different estimators are plotted against different values of k

n

, the number of excesses used. When Burr distributions are simulated, the parameter β is taken equal to 1, and the parameters pτ

1

, τ

2

, τ

C

q are taken equal to p12, 6, 5q in configurations 1 and 2, and to p6, 12, 5q in configuration 3.

Figure 1 illustrates the behaviour of our estimator when c

1

varies. In terms of bias and MSE, we can see that the first configuration is a little better than the second one, which is itself much better than the third one. We observed this phenomenon in many other cases, not reported here : our estimator behaves best when it is the smallest parameter γ

k

which is estimated, and when the censoring is not too strong. Our simulations also show that the quality of our estimator (especially in terms of the MSE) diminishes with c

1

. Figures 2 and 3 present the comparison between our new estimator and the ones described in (5) and (6). A general conclusion (confirmed by other simulations not reported here) is that ˆ γ

₁^{pBDF Gq}

and ˆ γ

₁^pKM^q

8

(10)

100 300

0.000.06

kn

Bias

100 300

0.0000.005

kn

MSE

(a) Burr case, γ1“0.1, γ2“0.25, γC“0.3, andc1“1

100 300

0.000.06

kn

Bias

100 300

0.0000.005

kn

MSE

(b) Case (a) but withc1“0.9

100 300

0.000.06

kn

Bias

100 300

0.0000.005

kn

MSE

(c) Burr case, γ1“0.1, γ2“0.25, γC“0.2, andc1“1

100 300

0.000.06

kn

Bias

100 300

0.0000.005

kn

MSE

(d) Case (c) but withc1“0.9

100 300

−0.040.04

kn

Bias

100 300

0.0000.005

kn

MSE

(e) Burr case, γ1“0.1, γ2“0.25, γC“0.45, andc1“1

100 300

−0.040.04

kn

Bias

100 300

0.0000.005

kn

MSE

(f) Case (e) but withc1“0.9

Figure 3: Comparison of bias and MSE (respectively left and right in each subfigure) forγpn,k (plain thick), ˆγ^{pBDF Gq}₁ (plain thin) and ˆγ₁^pKMq (dashed), for Burr distributedX andC.

(11)

behave worse in most cases, even for a value of c

₁

of 0.9, which is only a slight modification of the situation without competing risk (c

₁

“ 1). Therefore, a contamination of the cause 1 distribution by another cause rapidly yield inadequate estimations of γ

1

if dependency between causes is ignored ; this conclusion is true for both ˆ γ

₁^{pBDF Gq}

and ˆ γ

₁^pKM^q

, but to a greater extent for ˆ γ

₁^{pBDF Gq}

. In the third configuration pγ

₁

, γ

₂

, γ

_C

q “ p0.25, 0.1, 0.45q, the improvement provided by ˆ γ

₁

(with respect to ˆ γ

^pKMq₁

) becomes notable when c

₁

drops below 0.7.

4. Conclusion

In this paper, we consider heavy tailed lifetime data subject to random censoring and competing risks, and use the Aalen-Johansen estimator of the cumulative incidence function to construct an estimator for the extreme value index associated to the main cause of interest. To the best of our knowledge, this is the first estimator proposed in this context. Its asymptotic normality is proved and a small simulation study exhibiting its finite-sample performance shows that accounting for the dependency of the different causes is important, but that the bias can be particularly high. Estimating second order tail parameters would then be interesting in order to reduce this bias. A first step towards this aim could be to study the following moments

M

_n^pαq

“ 1 n F s

n^pkq

pt

n

q

n

ÿ

i“1

log

^α

pZ

_i

{t

_n

q

G s

n

pZ

_i^´

q I

ξi“k

I

Ziątn

,

which asymptotic behaviour can be derived following the same lines as in the proof of Theorem 1.

5. Proofs

This section is essentially devoted to the proof of the main Theorem 1. Some hints about the proof of the consistency result contained in Proposition 1 are given in Subsection 5.3, and Corollary 1 is proved in Subsection 5.4.

We adopt a strategy developed by Stute in Stute (1995) in order to prove his Theorem 1.1, a well- known result which states that a Kaplan-Meier integral of the form ş

φ dF

n

can be approximated by a sum of independent terms. This idea is used in Suzukawa (2002) in the context of competing risks. We thus intend to approximate p γ

n,k

by the integral r γ

n,k

“ ş

φ

n

dF

n^pkq

of some deterministic function φ

n

, with respect to the Aalen-Johansen estimator, and approximate this integral by the mean q γ

n,k

of independent variables U

_i,n

(defined a few lines below). The passage from p γ

_n,k

to r γ

_n,k

(which amounts to replacing F s

n^pkq

pt

_n

q by F s

^pkq

pt

_n

q in the denominator of p γ

_n,k

) will imply an additional sum of independent variables V

_i,n

, which will participate to the asymptotic variance of our estimator.

However, a major difference with Stute (1995) or Suzukawa (2002) is that the function we integrate here, φ

n

puq “

¹

Fs^pkqptnq

logpu{t

n

q I

uąt_n

, is not only an unbounded function, depending on n, but it also has a ”sliding” support rt

n

, `8r, which is therefore always close to the endpoint `8 of the distribution H . In Stute (1995), a crucial point of the proof consists in temporarily considering that the integrated function φ has a support which is bounded away from the endpoint of H (condition (2.3) there). Considering the kind of function φ

n

we have to deal with here, we cannot follow the same strategy : dealing with the remainder terms will thus be a particularly challenging part of our work. Finally note that, in order to deal with the ratio F s

n^pkq

pt

n

q{ F s

^pkq

pt

n

q (and somehow approximate γ p

n,k

by r γ

n,k

) we will have to consider simultaneously integrals (with respect to F

n^pkq

) of φ

_n

and of another function g

_n

, defined below, which basically shares the same flaws as φ

_n

.

10

(12)

Let us first recall or define the following objects : φ p

n

puq “ 1

F s

n^pkq

pt

n

q log

ˆ u t

n

˙ I

uąt_n

φ

_n

puq “ 1 F s

^pkq

pt

_n

q log

ˆ u t

_n

˙ I

uątn

γ

_n,k

“ ż

φ

_n

puqdF

^pkq

puq

^nÑ8

ÝÑ γ

_k

r γ

_n,k

“

ż

φ

_n

puqdF

_n^pkq

puq p γ

n,k

“

ż

φ p

n

puqdF

_n^pkq

puq.

We thus have p γ

n,k

“ ∆

^´1_n

r γ

n,k

, where

∆

_n

“ F s

_n^pkq

pt

_n

q{ F s

^pkq

pt

_n

q “ ż

g

_n

puqdF

_n^pkq

puq and g

_n

puq “ 1

F s

^pkq

pt

_n

q I

uątn

,

and we now introduce the following new quantities, related to the Stute-like decomposition of r γ

n,k

and ∆

n

: U

_i,n^p1q

“ φ

_n

pZ

_i

q

GpZ s

_i

q δ

_i

I

Ci“k

and V

_i,n^p1q

“ g

_n

pZ

_i

q GpZ s

_i

q δ

_i

I

Ci“k

U

_i,n^p2q

“ 1 ´ δ

_i

HpZ s

_i

q ψpφ

_n

, Z

_i

q and V

_i,n^p2q

“ 1 ´ δ

_i

H s pZ

_i

q ψpg

_n

, Z

_i

q U

_i,n^p3q

“

ż

Z_i 0

ψpφ

n

, uq dC puq and V

_i,n^p3q

“ ż

Z_i

0

ψpg

n

, uq dCpuq

U

_i,n

“ U

_i,n^p1q

` U

_i,n^p2q

´ U

_i,n^p3q

and V

_i,n

“ V

_i,n^p1q

` V

_i,n^p2q

´ V

_i,n^p3q

where, for any function f : R

`

Ñ R , we note (for any given z ě 0)

ψpf, zq “ ż

`8

z

f ptqdF

^pkq

ptq and Cpzq “ ż

z

0

dGptq Hptq s Gptq s . This enables us to finally define the important objects

q γ

_n,k

“ 1 n

n

ÿ

i“1

U

_i,n

and ∆ p

_n

“ 1 n

n

ÿ

i“1

V

_i,n

(7)

which are the triangular sums of independent terms which will respectively approximate r γ

n,k

and ∆

n

. At the beginning of section 5.1, it will be proved that EpU

_i,n^p1q

q “ γ

n,k

and EpV

_i,n^p1q

q “ 1, while EpU

_i,n^p2q

q “ EpU

_i,n^p3q

q and E pV

_i,n^p2q

q “ E pV

_i,n^p3q

q, yielding E p q γ

_n,k

q “ γ

_n,k

and E p ∆ p

_n

q “ 1 ; the terms U

_i,n^p2q

, U

_i,n^p3q

, V

_i,n^p2q

and V

_i,n^p3q

only participate to the variance component of the estimator. The relation between all these quantities is made clearer in the following Lemma :

Lemma 1. We have

? v

n

p p γ

n,k

´ γ

k

q “ ∆

^´1_n

p Z

n

` ?

v

n

R

n

` ?

v

n

pγ

n,k

´ γ

k

q q (8) where

Z

_n

“ ? v

_n

´

p γ q

_n,k

´ γ

_n,k

q ´ γ

_k

p ∆ p

_n

´ 1q ¯

and R

_n

“ p γ r

_n,k

´ q γ

_n,k

q ´ γ

_k

p∆

_n

´ ∆ p

_n

q The proof of Lemma 1 is simple :

? v

_n

p p γ

_n,k

´ γ

_k

q “ ∆

^´1_n

?

v

_n

p r γ

_n,k

´ ∆

_n

γ

_k

q

“ ∆

^´1_n

?

v

n

tp q γ

n,k

´ γ

n,k

q ` γ

k

p1 ´ ∆

n

q ` p γ r

n,k

´ q γ

n,k

q ` pγ

n,k

´ γ

k

qu which leads to the desired relation (8).

The main theorem thus becomes an immediate consequence of the following four results, the second one

being the most difficult to establish.

(13)

Proposition 2. Under condition p1q and assuming that v

n

nÑ8

ÝÑ `8, (9)

if γ

_k

ă γ

_C

, then

Z

n

ÝÑ

d

N p0, σ

²

q as n Ñ 8 where σ

²

is defined in the statement of Theorem 1.

Proposition 3. Under conditions p1q and p2q , if γ

k

ă γ

C

, then

r γ

_n,k

“ q γ

_n,k

` R

_n,φ

and ∆

_n

“ ∆ p

_n

` R

_n,g

(10) where R

_n,φ

, R

_n,g

(and consequently R

_n

too) are o

_P

pv

^´1{2n

q.

Corollary 2. Under the conditions of Proposition 3, ?

v

_n

p∆

_n

´1q converges in distribution to N p0, 1{p1´rqq where r “ γ

k

{γ

C

Ps0, 1r.

Lemma 2. Under conditions p1q, p3q and ?

v

n

gpt

n

q Ñ λ ě 0, the bias term ?

v

n

pγ

n,k

´ γ

k

q in (8) converges to λm as n Ñ 8, where m is defined in Theorem 1.

Propositions 2 and 3 will be proved in Sections 5.1 and 5.2 respectively, sometimes with the help of other results stated and established in the Appendix. The proofs of Corollary 2 and Lemma 2 are short, we state them below.

Concerning Corollary 2, once the proof of Proposition 2 has been gone through, it will become clear to the reader that ?

v

n

p ∆ ˆ

n

´ 1q converges in distribution to the centred gaussian distribution of variance 1{p1 ´ rq, because ?

v

n

p ∆ ˆ

n

´ 1q “ ř

n

i“1

W ˜

i,n

where ˜ W

i,n

“

?v_n n

V ˜

i,n

“

?v_n

n

pV

i,n

´ 1q are centred, and V arp W ˜

_i,n

q “

_n¹_1´r¹

` op1{nq (this is proved similarly as (11) and (15)). Since Proposition 3 states that

∆

n

“ ∆ ˆ

n

` o

_P

pv

^1{2n

q, the same central limit theorem holds for ∆

n

and the corollary is proved.

Concerning now Lemma 2, remind that γ

_n,k

“ ş

φ

_n

puqdF

n^pkq

puq. An integration by parts and the fact that F s

^pkq

pxq “ x

^´1{γ^k

l

k

pxq yield

? v

n

pγ

n,k

´ γ

k

q “ ? v

n

ż

`8 1

y

^´1{γ^k^´1

ˆ l

k

pyt

n

q l

k

pt

n

q ´ 1

˙ dy,

and, using assumption p3q and Proposition 3.1 in de Haan and Ferreira (2006), we can write ż

`8

1

y

^´1{γ^k^´1

ˆ l

k

pyt

n

q l

_k

pt

_n

q ´ 1

˙

dy “ gpt

n

q ż

`8

1

y

^´1{γ^k^´1

h

ρ_k

pyqdy ` opgpt

n

qq.

The result then follows from assumption ?

v

n

gpt

n

q Ñ λ ě 0 and the fact that ş

`8

1

y

^´1{γ^k^´1

h

ρ_k

pyqdy “ m.

In the rest of the paper, we will very often handle the well-known sub-distributions functions H

^p0q

and H

^p1,kq

defined, for all t ě 0, by

H

^p0q

ptq “ P pZ ď t, δ “ 0q and H

^p1,kq

ptq “ P pZ ď t, ξ “ kq.

Note that we have

dH

^p0q

“ F dG s and dH

^p1,kq

“ GdF s

^pkq

. 5.1. Proof of Proposition 2

We first write Z

n

“

n

ÿ

i“1

W

i,n

where W

i,n

“

? v

_n

n

´ U ˜

i,n

´ γ

k

V ˜

i,n

¯ and

" U ˜

i,n

“ U

i,n

´ γ

n,k

V ˜

i,n

“ V

i,n

´ 1

where W

i,n

, ˜ U

i,n

and ˜ V

i,n

are centred, because the random variables U

i,n

and V

i,n

have expectations respec- tively equal to γ

n,k

and 1. Indeed, we have

E

´ U

_i,n^p1q

¯

“ E

´

_φ

npZ_iq GpZs iq

δ

i

I

Ci“k

¯

“ ş

φ_npuq

Gpuqs

dH

^p1,kq

puq “ ş

φ

n

puqdF

^pkq

puq “ γ

n,k

and E

´ U

_i,n^p2q

¯

“ E

´

1´δ_i

HpZĎ _iq

ψpφ

n

, Z

i

q

¯

“ ş ş

₁

ĎHpuq

I

tąu

φ

n

ptq dF

^pkq

ptq dH

^p0q

puq “ ş ş

₁

Gpuqs

I

tąu

φ

n

ptq dF

^pkq

ptq dGpuq

12

(14)

as well as E

´ U

_i,n^p3q

¯

“ E

´ ş

Z_i

0

ψpφ

n

, uq dC puq

¯

“ ş ş ş

I

ząu

I

tąu

φ

n

ptq dH pzq dF

^pkq

ptq

^dGpuq

Gpuqs HpuqĎ

“ E

´ U

_i,n^p2q

¯ . The proof for E pV

i,n

q “ 1 is similar.

We will now prove the asymptotic normality of Z

n

by using the Lyapunov criteria.

Lemma 3. Under the conditions p1q and p9q , if γ

k

ă γ

C

: piq we have

V arp U ˜

1,n

´ γ

k

V ˜

1,n

q “ E pU

_1,n²

q ` γ

_k²

E pV

_1,n²

q ´ 2γ

k

E pU

1,n

V

1,n

q ` op1q (11) piiq we have

E pU

_1,n²

q “

ż φ

²_n

puq

Gpuq s dF

^pkq

puq ´ ż

pψpφ

_n

, uqq

²

dC puq (12) E pV

_1,n²

q “

ż g

_n²

puq

Gpuq s dF

^pkq

puq ´ ż

pψpg

_n

, uqq

²

dCpuq (13) E pU

_1,n

V

_1,n

q “

ż φ

_n

puqg

_n

puq

Gpuq s dF

^pkq

puq ´ ż

ψpφ

_n

, uqψpg

_n

, uq dC puq (14) piiiq we have, noting r “ γ

k

{γ

C

(which belongs to s0, 1r under our conditions) as well as p “ γ{γ

F

“

γ

C

{pγ

F

` γ

C

q Ps0, 1r,

V arp U ˜

1,n

´ γ

k

V ˜

1,n

q “ 1 F s

n^pkq

pt

n

q Gpt s

n

q

ˆ γ

²_k

p1 ` r

²

q p1 ´ rq

³

` op1q

˙

´ 1 ´ p H s pt

n

q

ˆ 2γ

_k³

γp2 ´ γ

k

{γq

³

` op1q

˙

` op1q (15) Lemma 4. Under the conditions p1q and p9q , if γ

k

ă γ

C

, then

ř

n

i“1

E |W

_i,n

|

^2`δ

Ñ 0, as n tends to infinity, for some δ ą 0.

We can then immediately prove Proposition 2. Indeed, since Z

_n

“ ř

n

i“1

W

_i,n

, Lemma 3 yields V arpZ

_n

q “ n V arpW

_1,n

q “ v

_n

n V arp U ˜

_1,n

´ γ

_k

V ˜

_1,n

q.

which, since v

_n

“ n F s

n^pkq

pt

_n

q Gpt s

_n

q, becomes V arpZ

_n

q “ γ

²_k

p1 ` r

²

qp1 ´ rq

^´3

´ `

2p1 ´ pqγ

_k³

γ

^´1

p2 ´ γ

_k

{γq

^´3

˘ ´

F s

^pkq

pt

_n

q{ F s pt

_n

q

¯

` op1q.

Therefore, depending on the limit c of the ratio F s

^pkq

pt

_n

q{ F s pt

_n

q when n Ñ 8 (for instance, it converges to 0 when γ

_k

ă γ

_F

), it is simple to check that the variance of Z

_n

converges to the value σ

²

described in the statement of Theorem 1. Thanks to Lemma 4, the Lyapunov CLT applies and Proposition 2 is proved.

The two subsections 5.1.1 and 5.1.2 are now respectively devoted to the proofs of Lemmas 3 and 4.

5.1.1. Proof of Lemma 3

Part piq of the lemma is straightforward : since ˜ U

1,n

and ˜ V

1,n

are centred, we have indeed V arp U ˜

1,n

´ γ

k

V ˜

1,n

q “ Epp U ˜

1,n

´ γ

k

V ˜

1,n

q

²

q “ Ep U ˜

_1,n²

q ` γ

_k²

Ep V ˜

_1,n²

q ´ 2γ

k

Ep U ˜

1,n

V ˜

1,n

q

“ p E pU

_1,n²

q ´ γ

_n,k²

q ` γ

_k²

p E pV

_1,n²

q ´ 1q ´ 2γ

_k

p E pU

_1,n

V

_1,n

q ´ γ

_n,k

q and the result comes by using the fact that γ

n,k

converges to γ

k

as n Ñ 8.

Now we proceed to the proof of part piiq, and will only prove (12) because, by definition of φ

n

and g

n

, the proofs for (13) and (14) will be completely similar. First of all, we obviously have

pU

_1,n

q

²

“ pU

_1,n^p1q

q

²

` pU

_1,n^p2q

q

²

` pU

_1,n^p3q

q

²

` 2U

_1,n^p1q

U

_1,n^p2q

´ 2U

_1,n^p1q

U

_1,n^p3q

´ 2U

_1,n^p2q

U

_1,n^p3q

(16) The first term in the right-hand side of (12) is equal to E ppU

_1,n^p1q

q

²

q, and the second one (without the minus sign) is equal to E ppU

_1,n^p2q

q

²

q and to E pU

_1,n^p1q

U

_1,n^p3q

q because

E ppU

_1,n^p2q

q

²

q “ E

ˆ 1 ´ δ

1

H

²

pZ pψpφ

n

, Z

1

qq

²

˙

“ ż 1

H

²

pzq pψpφ

n

, zqq

²

dH

^p0q

pzq “ ż

pψpφ

n

, zqq

²

dCpzq

(15)

and

EpU

_1,n^p1q

U

_1,n^p3q

q “

ż φ

n

pzq Gpzq s

ˆż

z 0

ψpφ

n

, uqdC puq

˙

dH

^p1,kq

pzq

“ ż

ψpφ

n

, uq ˆż

₈

u

φ

n

pzqdF

^pkq

pzq

˙

dC puq “ ż

pψpφ

n

, zqq

²

dCpzq

The expectation E pU

_1,n^p1q

U

_1,n^p2q

q equals 0 because δ

₁

p1 ´ δ

₁

q is constantly 0, and we are now going to prove that E ppU

_1,n^p3q

q

²

q “ 2 E pU

_1,n^p2q

U

_1,n^p3q

q, which ends the proof of (12) in view of (16). Indeed, noting hpzq “ ş

z

0

ψpφ

n

, uqdCpuq and using the simple fact that hpzq “ hpyq ` ş

z

y

ψpφ

n

, uqdCpuq for every y ă z, we have E ppU

_1,n^p3q

q

²

q “

ż

8 0

ˆż

z 0

ψpφ

_n

, yqhpzqdCpyq

˙ dH pzq

“ ż

8

0

ˆż

z 0

ψpφ

_n

, yqhpyqdC pyq

˙

dHpzq ` ż

8

0

"ż

z 0

ˆż

z y

ψpφ

_n

, uqdC puq

˙

ψpφ

_n

, yqdCpyq

* dHpzq

“ ż

8

0

ˆż

z 0

ψpφ

n

, yqhpyqdC pyq

˙

dHpzq ` ż

8

0

"ż

z 0

ˆż

u 0

ψpφ

n

, yqdCpyq

˙

ψpφ

n

, uqdC puq

* dHpzq

“ 2 ż

8

0

ˆż

z 0

ψpφ

n

, yqhpyqdC pyq

˙ dHpzq

and

E pU

_1,n^p2q

U

_1,n^p3q

q “ E

ˆ 1 ´ δ

₁

H s pZ

₁

q ψpφ

_n

, Z

₁

q hpZ

₁

q

˙

“ ż

Hpyqhpyqψpφ s

_n

, yq dGpyq H s pyq Gpyq s

“ ż

8

0

ˆż

8 y

dHpzq

˙

hpyqψpφ

n

, yq dC pyq

“ ż

8

0

ˆż

z 0

ψpφ

n

, yqhpzqdC pyq

˙

dHpzq “ 1

2 E ppU

_1,n^p3q

q

²

q as announced.

We can now start proving part piiiq of the lemma, in which the exact nature of the function φ

_n

matters.

First remind that functions F s

^pkq

, G s and C are regularly varying of respective orders ´1{γ

_k

, ´1{γ

_C

and 1{γ (for C, this is proved in Lemma 8 with δ “ 1). Let us define the constants c

j

and d

j

(j “ 0, 1, 2) by

c

j

“ j!γ

_k^j

p1 ´ qq

^j`1

and d

j

“ j!γ

_k^j`1

γp2 ´ γ

k

{γq

^j`1

.

Since γ

k

ă γ

C

was assumed, then according to Lemma 7 part piiq (applied first with a ` b “ 1{γ

C

´ 1{γ

k

ă 0 for c

j

, and then with a ` b “ ´2{γ

k

` 1{γ “ p1{γ

C

´ 1{γ

k

q ` p1{γ

F

´ 1{γ

k

q ă 0 for d

j

) , we have

ż

8 t_n

log

^j

ˆ u

t

n

˙ dF

^pkq

puq Gpuq s „ c

j

F s

^pkq

pt

n

q Gpt s

n

q and

ż

8 t_n

log

^j

ˆ u

t

n

˙ ˆ

F s

^pkq

puq F s

^pkq

pt

n

q

˙

²

dC puq

Cpt

n

q

nÑ8

ÝÑ d

j

. (17) Hence, by definition of φ

n

, g

n

, the first terms of EpU

_1,n²

q, EpV

_1,n²

q and EpU

1,n

V

1,n

q in relations (12), (13) and (14) are respectively equivalent (as n Ñ 8) to c

2

Dpt

n

q, c

0

Dpt

n

q and c

1

Dpt

n

q where Dpt

n

q denotes

Dpt

_n

q “ 1 F s

^pkq

pt

_n

q Gpt s

_n

q .

Since c

₂

` γ

²_k

c

₀

´ 2γ

_k

c

₁

is found to be equal to γ

_k²

p1 ` r

²

q{p1 ´ rq

³

, then in view of (11) this proves the first term in relation (15). We now need to obtain equivalent expressions for the quantities ş

pψpφ

_n

, uqq

²

dC puq, ş pψpg

_n

, uqq

²

dC puq and ş

ψpφ

_n

, uqψpg

_n

, uq dC puq in order to prove the second part of relation (15) and there- fore finish the proof of Lemma 3.

For saving space, we will use temporarily the following notations :

l

n

puq “ log pu{t

n

q , R

n

puq “ F s

^pkq

puq{ F s

^pkq

pt

n

q

According to the technical Lemma 9 of the Appendix and, after splitting the integral into ş

`8

0

and ş

`8 tn

, we

14

(16)

can write ż

pψpφ

_n

, uqq

²

dCpuq “ γ

_n,k²

ż

tn

0

dC puq

` ż

`8

tn

´

l

²_n

puqR

²_n

puq `γ

_k²

pu{t

n

q

^´2{γ^k

`2γ

k

l

n

puqR

n

puq pu{t

n

q

^´1{γ^k

¯

dC puq ` opCpt

n

qq, (18) where opCpt

_n

qq in p18q is due to part piiq of Lemma 7 and to the fact that

_n

puq in Lemma 9 converges to 0 uniformly in u. According to the second part of relation p17q, we thus have

ż

pψpφ

_n

, uqq

²

dCpuq “ pγ

_k²

` d

₂

` γ

²_k

d

₀

` 2γ

_k

d

₁

qCpt

_n

q ` opCpt

_n

qq (19) The other terms are treated similarly (using the fact that ψpg

_n

, uq “ 1 when u ď t

_n

, and “ F s

^pkq

puq{ F s

^pkq

pt

_n

q when u ą t

_n

) and we obtain

ż

pψpg

_n

, uqq

²

dC puq “ p1 ` d

₀

q Cpt

_n

q ` opCpt

_n

qq, (20) ż

ψpφ

n

, uqψpg

n

, uq dC puq “ pγ

k

` d

1

` γ

k

d

0

q Cpt

n

q ` opCpt

n

qq. (21) In view of (11), combining p19q, p20q and p21q and using Remark 3 (following Lemma 8) to write that Cpt

n

q „ p1 ´ pq{ H s pt

n

q (as n Ñ 8), this proves the second term in relation (15).

5.1.2. Proof of Lemma 4

We have to prove that, for some δ ą 0 small enough, n E |W

_1,n

|

^2`δ

tends to 0, as n Ñ 8. In the sequel, cst denotes an unspecified absolute positive constant. According to the definition of W

_1,n

, it is clear that

n |W

1,n

|

^2`δ

ď cst v

n^1`δ{2

n

^1`δ

˜

₃

ÿ

j“1

ˇ ˇ ˇ U

_1,n^pjq

ˇ ˇ ˇ

2`δ

` γ

_k^2`δ

3

ÿ

j“1

ˇ ˇ ˇ V

_1,n^pjq

ˇ ˇ ˇ

2`δ

` |γ

n,k

´ γ

k

|

^2`δ

¸

First, we clearly have n

^´1´δ

v

n^1`δ{2

|γ

n,k

´ γ

k

|

^2`δ

ÝÑ 0 as n Ñ 8. Secondly, since V

_1,n^pjq

has the same form as U

_1,n^pjq

, with g

_n

instead of φ

_n

(i.e. without the log factor), we will only prove that there exists some δ ą 0 such that, as n Ñ 8,

n

^´1´δ

v

_n^1`δ{2

E ˇ ˇ ˇ U

_1,n^pjq

ˇ ˇ ˇ

2`δ

“ n

^´δ{2

´

F s

^pkq

pt

_n

q Gpt ¯

_n

q

¯

1`δ{2

E ˇ ˇ ˇ U

_1,n^pjq

ˇ ˇ ˇ

2`δ nÑ8

ÝÑ 0 for j P t1, 2, 3u (22) For j “ 1, we have

E ˇ ˇ ˇ U

_1,n^p1q

ˇ ˇ ˇ

2`δ

“ ż

`8

0

ˇ ˇ ˇ ˇ

φ

n

pzq Gpzq ¯ ˇ ˇ ˇ ˇ

2`δ

dH

^p1,kq

pzq “ p F s

^pkq

pt

n

qq

^´2´δ

ż

`8

t_n

p Gpzqq s

^´2´δ

plogpz{t

n

qq

^2`δ

GpzqdF ¯

^pkq

pzq

“

´

F s

^pkq

pt

n

q Gpt ¯

n

q

¯

´1´δ

ż

`8 t_n

plogpz{t

n

qq

^2`δ

ˆ Gpt ¯

n

q Gpzq ¯

˙

^1`δ

dF

^pkq

pzq F s

^pkq

pt

n

q .

Applying part piiq of Lemma 7 for α “ 2 ` δ, a “ p1 ` δq{γ

C

and b “ ´1{γ

k

(with δ sufficiently small so that a ` b “ 1{γ

C

´ 1{γ

k

` δ{γ

C

is kept ă 0), and using the fact that v

n

“ n F s

^pkq

pt

n

q Gpt s

n

q Ñ 8, this ends the proof of (22) for j “ 1.

For j “ 2, we have

E ˇ ˇ ˇ U

_1,n^p2q

ˇ ˇ ˇ

2`δ

“ ż

`8

0

ˇ ˇ ˇ ˇ

ψpφ

n

, zq Hpzq ¯

ˇ ˇ ˇ ˇ

2`δ

FpzqdGpzq. s

By definition of ψ, φ

n

, and γ

n,k

, we have ψpφ

n

, zq “ γ

n,k

when z ď t

n

. Therefore, splitting the integral above into two integrals ş

t_n

0

and ş

`8

t_n

we obtain E

ˇ ˇ ˇ U

_1,n^p2q

ˇ ˇ ˇ

2`δ

“ I

1

pt

n

q ` I

2

pt

n

q, where, on one hand,

I

₁

pt

_n

q “ pγ

_n,k

q

^2`δ

ż

tn

0

F ¯ pzqdGpyq

p Hpzqq ¯

^2`δ

“ pγ

_n,k

q

^2`δ

ż

tn

0

dCpzq

p Hpzqq ¯

^δ

ď pγ

_n,k

q

^2`δ

Cpt

n

q

p Hpt ¯

n

qq

^δ