• Aucun résultat trouvé

Estimation of extreme quantiles from heavy and light tailed distributions

N/A
N/A
Protected

Academic year: 2022

Partager "Estimation of extreme quantiles from heavy and light tailed distributions"

Copied!
23
0
0

Texte intégral

(1)

HAL Id: hal-00627964

https://hal.archives-ouvertes.fr/hal-00627964v3

Submitted on 19 Apr 2012

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Estimation of extreme quantiles from heavy and light tailed distributions

Jonathan El Methni, Laurent Gardes, Stéphane Girard, Armelle Guillou

To cite this version:

Jonathan El Methni, Laurent Gardes, Stéphane Girard, Armelle Guillou. Estimation of extreme

quantiles from heavy and light tailed distributions. Journal of Statistical Planning and Inference,

Elsevier, 2012, 142 (10), pp.2735-2747. �10.1016/j.jspi.2012.03.025�. �hal-00627964v3�

(2)

Estimation of extreme quantiles from heavy and light tailed distributions

Jonathan El Methni

(1)

, Laurent Gardes

(2)

, St´ ephane Girard

(1)

& Armelle Guillou

(2)

(1)Team Mistis, INRIA Rhˆone-Alpes & LJK, Inovall´ee, 655, av. de l’Europe, Montbonnot, 38334 Saint-Ismier cedex, France

(2) Universit´e de Strasbourg & CNRS, IRMA, UMR 7501, 7 rue Ren´e Descartes, 67084 Strasbourg cedex, France

Abstract

In [18], a new family of distributions is introduced, depending on two parametersτ andθ, which encompasses Pareto-type distributions as well as Weibull tail-distributions. Estimators forθand extreme quantiles are also proposed, but they both depend on the unknown param- eterτ, making them useless in practical situations. In this paper, we propose an estimator of τwhich is independent ofθ. Plugging our estimator ofτ in the two previous ones allows us to estimate extreme quantiles from Pareto-type and Weibull tail-distributions in an unified way.

The asymptotic distributions of our three new estimators are established and their efficiency is illustrated on a small simulation study and on a real data set.

AMS Subject Classifications: 62G05, 62G20, 62G30.

Keywords: Weibull tail-distributions, Pareto-type distributions, extreme quantile, maxi- mum domain of attraction, asymptotic normality.

1 Introduction

Let X1, . . . , Xn be a sequence of independent and identically distributed random variables with a cumulative distribution function F and let X1,n ≤ · · · ≤ Xn,n denote the order statistics as- sociated to this sample. The Gnedenko theorem [20] insures that for a large class of cumulative distribution functions, the maximumXn,n (after proper renormalization) converges in distribution to an extreme-value distribution with shape parameter γ. Depending on its sign, three possible maximum domains of attraction forF are possible: Fr´echet (γ >0), Gumbel (γ= 0) and Weibull (γ <0). Since distributions in the Weibull maximum domain of attraction have a finite right tail, in most applications this maximum domain of attraction is not relevant. In this paper, we focus on the Fr´echet and Gumbel maximum domains of attraction.

Distributions in the Fr´echet maximum domain of attraction can be characterized through their survival functionF = 1−F asF(x) =x−1/γL(x) whereγ >0 andLis a slowly varying function at infinityi.e. L(λx)/L(x)→1 asx→ ∞for allλ≥1.F is said to be regularly varying at infinity with index −1/γ. This property is denoted by F ∈ R−1/γ, (see [6] for more details on regular variations theory) andF is called a Pareto-type distribution. To make inference on the distribution tail, most approaches consist of using thekn upper order statisticsXn−kn+1,n≤ · · · ≤Xn,n since the tail information is only contained in the extreme upper part of the sample. Here, (kn) is an

(3)

intermediate sequence of integersi.e. such that

n→∞lim kn =∞and lim

n→∞kn/n= 0.

A large part of the extreme-value literature is devoted to the estimation of the tail-index γ >0, the most known estimator being the Hill estimator [24]. It can be equivalently defined in terms of log-excesses log(Xn−i+1,n/Xn−kn+1,n) or in terms of log-spacings log(Xn−i+1,n/Xn−i,n):

Hn(kn) = 1 kn−1

kn−1

X

i=1

log

Xn−i+1,n Xn−kn+1,n

= 1

kn−1

kn−1

X

i=1

ilog

Xn−i+1,n Xn−i,n

. (1)

At the opposite, there is no simple representation for distributions in the Gumbel maximum domain of attraction. However, an interesting subset of the Gumbel maximum domain of attrac- tion is the Weibull tail-distributions family. It encompasses for instance Weibull, Gaussian and Gamma distributions. Let us recall that a cumulative distribution functionF has a Weibull tail if F(x) = exp (−H(x)), whereH(t) := inf{x, H(x)≥t} ∈ Rβ. The functionHis the so-called generalized inverse of H. The tail of such distributions is driven by the shape parameter β > 0 called the Weibull tail-coefficient. Many papers are dedicated to the estimation of the Weibull tail-coefficient. A first family of approaches [2, 3, 7, 13] is based on the log-excesses while a second one relies on the log-spacings [5, 11, 16, 17, 19, 21, 22]. All these estimators are thus similar to the Hill statistic (1).

In order to understand the similarity between most estimators of the Weibull tail-coefficient and the Hill estimator, a new family of distributions has been proposed in [18]. These distributions depend on two parametersτ∈[0,1] andθ >0. More specifically, lettingKx(y) =Ry

1 ux−1duwhere x∈R, the considered family of survival distribution functions is given by:

(A1(τ, θ)) F(x) = exp(−Kτ(logH(x))) forx≥x>0 withτ∈[0,1].

Here, H is an increasing function such thatH∈ Rθ whereθ >0. The parameterτ allows us to represent a large panel of distribution tails ranging from Weibull-type tails (in this caseτ= 0 and θcoincides with the Weibull tail-coefficientβ) to distributions belonging to the Fr´echet maximum domain of attraction (in this caseτ = 1 andθcorresponds to the tail indexγ).

In [18], an estimator ofθbased on the Hill statistic is also introduced:

θbn,τ(kn) = Hn(kn)

µτ(log(n/kn)), (2)

with, for allt >0,

µτ(t) = Z

0

(Kτ(x+t)−Kτ(t)) e−xdx,

and an estimator of the extreme quantile xpn=F(pn) withpn →0 asn→ ∞is derived:

bxp

n,bθn,τ =Xn−kn+1,nexp

n,τ(kn) [Kτ(log(1/pn))−Kτ(log(n/kn))]

. (3)

We refer to [14, 26, 15] for applications of extreme quantiles respectively in reliability, hydrology and finance. The asymptotic distributions of bθn,τ and bxp

n,bθn,τ are established in [18]. Let us highlight that estimators (2) and (3) are only of theoretical interest since, in practical situationsτ is unknown, and therefore, they cannot be used.

This paper builds on the work of [18]. We propose an estimator bτn of τ, independent of θ.

Replacing τ by bτn in (2) and (3) yields two new estimators which can be computed in practical

(4)

situations. As a result, we are able to estimate extreme quantiles from Pareto-type and Weibull tail-distributions in an unified way. The asymptotic normality of our three new estimators is also established.

The paper is organized as follows. The estimators are defined in Section 2, their asymptotic properties are established in Section 3. The behavior of the extreme quantile estimator is illustrated on simulated data in Section 4 and on a real data set in Section 5. Proofs of the main results are presented in Section 6 while proofs of auxiliary results are postponed to the Appendix.

2 Definition of the estimators

Let us first describe the construction of an estimator ofτ. Let (kn) and (k0n) withk0n> kn be two intermediate sequences of integers such thatθbn,τ(kn)−→P θandθbn,τ(kn0)−→P θ. It straightforwardly follows that

θbn,τ(kn)

θbn,τ(kn0)= Hn(kn) Hn(k0n)

µτ(log(n/kn0)) µτ(log(n/kn))

−→P 1.

Introducing for all t > t0 > 0 the function defined by ψ(.;t;t0) : R → (−∞,exp(t−t0)) and ψ(x;t, t0) =µx(t)/µx(t0), it follows that

Hn(kn) Hn(kn0)

P ψ(τ; log(n/kn),log(n/kn0)). (4) Moreover, it can be shown (see Lemma 3 in Section 6) thatψ(.; log(n/kn),log(n/k0n)) is a bijection from Rto (−∞, kn0/kn). As a consequence, the following estimator ofτ is considered:

n =

ψ−1H

n(kn)

Hn(k0n); log(n/kn),log(n/k0n)

if HHn(kn)

n(k0n) < kk0n

n

u if HHn(kn)

n(k0n)kk0n

n,

(5) where uis the realization of a standard uniform distribution. In practice, only the first situation has to be considered, since Lemma 5 in Section 6 shows that, for nlarge enough,Hn(kn)/Hn(kn0) is almost surely smaller thankn0/kn. It is thus possible to plugbτn in (2) to obtain a new estimator ofθ:

θbn,

bτn(kn) = Hn(kn) µ

bτn(log(n/kn)). (6)

Similarly, replacingτandθbn,τ by their estimates in (3) yields a new estimator of extreme quantiles:

xbp

n,bθn,bτn =Xn−kn+1,nexp

θbn,bτn(kn) [K

bτn(log(1/pn))−K

bτn(log(n/kn))]

. (7)

The asymptotic behavior of the three new estimators bτn,θbn,τbn and bxp

n,bθn,bτn is established in the next section.

3 Asymptotic properties

As a first result, we establish the consistency ofτbn under the following assumption:

H(t) =tθ`(t) =c tθexp Z x

1

ε(u) u du

(8) withca positive constant andε(s)→0 ass→ ∞. Let us highlight that (8) amounts to supposing that H∈ Rθ and that the slowly varying function at infinity`is normalised, see [6], page 15.

(5)

Proposition 1 Suppose that (A1(τ, θ))holds with `(.) a normalised slowly varying function. If (kn)and(k0n)are two intermediate sequences of integers such thatkn/kn0 →0, then bτn−→P τ.

Let us note that the consistency ofτbn is established for allθ >0 andτ ∈[0,1] in an unified way.

In this sense, the asymptotic behavior of this estimator is more a consequence of the log-spacings property than a tail behavior (which can be exponential forτ= 0 as well as polynomial forτ = 1).

Next, to establish the asymptotic normality of the three estimators (5), (6) and (7), a second-order condition on `is necessary:

(A2(ρ)) There existρ <0 andb(x)→0 such that uniformly locally onλ≥λ0>0 log

`(λx)

`(x)

∼b(x)Kρ(λ), whenx→ ∞, with|b|asymptotically decreasing.

It can be shown that necessarily |b| ∈ Rρ. The second order parameter ρ < 0 tunes the rate of convergence of `(λx)/`(x) to 1. The closer is ρ to 0, the slower is the convergence. Condition (A2(ρ))is the cornerstone in all the proofs of asymptotic normality for extreme value estimators.

It is used in [4, 23, 24] to prove the asymptotic normality of several estimators of the extreme value index. Let log2(.) = log(log(.)). The first theorem establishes the asymptotic normality ofτbn: Theorem 1 Suppose that (A1(τ, θ)) and (A2(ρ)) hold. If (kn) and (kn0) are two intermediate sequences of integers such that kn/k0n→0 and

pk0nb(expKτ(logn/kn0))→0, (9) pkn(log2(n/kn)−log2(n/kn0))→ ∞, (10) log(n/k0n) (log2(n/kn)−log2(n/kn0))→ ∞, (11) then

pkn(log2(n/kn)−log2(n/kn0)) (τbn−τ)−→ Nd (0,1).

Condition (9) is standard in extreme-value theory. It imposes that the bias induced by the slowly varying function is asymptotically negligible. Condition (10) forces the speed of convergence ofτbn

to tend to infinity. Finally, (11) is of the same nature as (10): It imposes some minimal spacing between the two sequences (kn) and (k0n). Besides, if τ = 0, conditions (9) and (10) imply that xb(x)→0 asx→ ∞. As a consequence, ifτ= 0 andρ >−1, it is not possible to choose sequences (kn) and (k0n) satisfying the above assumptions. In such a case, only the consistency ofbτn can be guaranteed.

In our next result, it is established thatθbn,

bτn inherits from the asymptotic normality of bτn. Theorem 2 Suppose the assumptions of Theorem 1 hold. If, moreover,

(log2(n/kn)−log2(n/k0n))/log2(n/kn)→0, (12) pkn(log2(n/kn)−log2(n/k0n))/log2(n/kn)→ ∞, (13)

then √

kn(log2(n/kn)−log2(n/k0n)) log2(n/kn)

θbn,bτn(kn)−θ d

−→ N(0, θ2).

(6)

It appears that the estimation of τ has a cost in terms of rates of convergence. Condition (12) implies thatbθn,τbn converges slower thanθbn,τ, see Lemma 7 in Section 6. As previously, (13) forces the speed of convergence of bθn,

bτn(kn) to tend to infinity. Note that this condition implies (10) in Theorem 1. Similarly as for Theorem 1, it can be shown that, if τ = 0, (9) and (13) imply xlog(x)b(x)→ 0 as x→ ∞. Thus, again no sequences (kn) and (kn0) exist in case τ = 0 and ρ > −1. Now, if τ ∈ (0,1] or if τ = 0 and ρ < −1, a possible choice for the two intermediate sequences is log(kn) =aKτ(log(n)) and log(k0n) =a0Kτ(log(n)) with the following restrictions on (a, a0)∈R:

0< a < a0<2ρ/(2ρ−1) if τ= 1 0< a < a0 <−2ρ if 0< τ <1 2< a < a0 <−2ρ if τ= 0.

(14)

Finally, in the case where τ= 0 andρ=−1, the existence of sequences (kn) and (k0n) depends on the underlying distribution.

The last result is dedicated to the asymptotic distribution of the extreme quantile estimator.

Theorem 3 Suppose the assumptions of Theorem 2 hold. If, moreover,

pkn(log2(n/kn)−log2(n/kn0)) /log2(1/pn)→ ∞, (15) (log(n/kn))1−τ[Kτ(log(1/pn))−Kτ(log(n/kn))]→ ∞, (16)

log2(n/kn)[Kτ(log(1/pn))−Kτ(log(n/kn))]

,Z log(1/pn) log(n/kn)

log(u)uτ−1du →0, (17)

then √

kn(log2(n/kn)−log2(n/k0n)) Rlog(1/pn)

log(n/kn)log(u)uτ−1du

bxp

n,bθn,bτn

xpn

−1

!

−→ Nd (0, θ2).

A sufficient condition to verify (16) and (17) is log2(1/pn)/log2(n/kn) → ∞. This imposes an upper bound on the order pn of the extreme quantile. At the opposite, condition (15) provides a lower bound on pn. A possible choice for the order pn of the extreme quantile is given by log2(1/pn) = [log2(n)]α for all α >1. The finite sample performances of xbp

n,bθn,bτn are illustrated in the next section.

4 A small simulation study

In this section, our extreme quantile estimator is compared to the Moment estimator of Dekkers et al. [9] and the Peaks Over Threshold (POT), see for instance [8], on simulated data. The POT approach relies on the approximation of the excesses distribution, over a high threshold, by a Generalized Pareto Distribution (GPD). Among the numerous methods available for estimating the GPD parameters, we focus on the moments method which yields the best results in our simulations.

Let us emphasize that the Moment and POT estimators are designed to work in any domain of attraction. Let us recall that the extreme quantile estimator (7) requires the numerical inversion of the function ψ. This computation is achieved thanks to a dichotomy procedure since Lemma 3(i) ensures thatψis increasing.

(7)

The comparison is achieved on twelve different distributions:

• five Pareto-type distributions (τ = 1): the absolute value of the standard Cauchy distribution (θ = 1 andρ=−2), standard Pareto distribution (θ= 2 and ρ=−∞), the absolute value of the Student distribution with two degrees of freedom (θ= 1/2 andρ=−2) and two Burr distributions (θ = 1/2 and ρ ∈ {−1,−1/2}) with quantile function F(x) = (xρ−1)−θρ, x∈(0,1),

• five Weibull tail-distributions (τ= 0): the absolute value of the standard Gaussian distribu- tion (θ = 1/2 and ρ= −1), a Weibull distribution with 2 as shape parameter (θ = 2 and ρ=−∞), a Gamma distribution with 2 as shape parameter (θ = 1 and ρ=−1) and two distributions with quantile functionF(x) = (−logx)θ(1 + (ρ+θ)(−logx)ρ),x∈(0,1) with θ= 1/2 andρ∈ {−1/2,−1/4},

• two log-Weibull tail distributions (τ = 1/2), see [18], paragraph 2.2: the standard lognormal distribution (θ = √

2/2 and ρ = 0) and the distribution with quantile function F(x) = exp{√

2[(−logx)1/2−1]},x∈(0,1) for whichθ=√

2/2 andρ=−∞.

These distributions represent various situations (τ ∈ {0,1/2,1}, θ ∈ {1/2,√

2/2,1,2} and ρ ∈ {−∞,−2,−1,−1/2,−1/4,0}) in which the asymptotic normality of our estimator is not always established. In the following, we take pn = 10−3 and simulate N = 100 samples (Xn,j)j=1,...,N

of size n = 500. For each sample Xn,j, the estimator bxp

n,bθn,bτn is computed for kn0 = 3, . . . ,500 and kn =bck0ncwith c = 0.1. This value has been chosen on the basis of intensive Monte-Carlo simulations. With such a choice, our estimator and the Moment and POT estimators depend on only one intermediate sequence of integers (k0n). For all the considered distributions, the mean- squared errors associated to the estimators are computed as functions of k0n and are reported on Figures 1–3. It appears that our estimator outperforms the Moment and POT estimators for almost all the values ofkn0. Also forkn0 ≥50, the mean-squared errors associated to our estimator are almost constant as a function ofk0n, whatever the distribution is.

5 A real data set

The performance of our estimator (7) is illustrated through the analysis of extreme events on the Nidd river data set, which is common in extreme values studies, see for instance [8]. It consists of 154 exceedances of the levels 65m3s−1 by the river Nidd (Yorkshire, England) during the period 1934-1969 (35 years). There is no general agreement on a maximum domain of attraction for this data set. In [10], a Fr´echet maximum domain of attraction is assumed and heavy tailed- distributions are considered as a possible model for such data. However, according to [25], the Nidd data may reasonably be assumed to come from a distribution in the Gumbel maximum domain of attraction. This result was in accordance with [12] where it has been shown that a Weibull tail-distribution could be considered for such a data. The estimation ofτ is therefore of great interest. The estimated values ofτ andθare depicted on Figure 4 as functions of kn0 (recall that kn =b0.1k0nc). It appears that the estimators become stable for kn0 ≥80 with bτn ' 1 and θbn,τbn(kn)'0.3. These results indicate that the data may be assumed to come from a distribution in the Fr´echet maximum domain of attraction. They are in accordance with the ones obtained by Bayesian methods [10], Figure 7 where the tail index is also estimated at 0.3.

The standard quantity of interest in environmental studies is theN-year return level, defined as the level which is exceeded on average once inN years. Here, we focus on the estimation of the 50- and 100- year return levels. In Figure 4, our extreme quantile estimator is compared to the Moment and POT estimators, plotting the associated N-year return level as a function of kn0 forN = 50

(8)

andN = 100. It appears that our estimator and the Moment estimator yield similar curves. The POT sample path is different, but for all methods, choosing kn0 '60, we obtain an estimation of the 50-year return level which belongs approximately to the interval [340m3s−1,375m3s−1], and an estimation of the 100-year return in [400m3s−1,470m3s−1]. Again, these results are in accordance with the the credibility intervals obtained in [10], Table 1.

6 Proofs

We first give some preliminary lemmas. Their proofs are postponed to the appendix.

6.1 Preliminary lemmas

In the following, C is a compact subset such that [0,1] ⊂ C ⊂ (−∞,2). The first lemma is a standard result on the behavior at infinity of the Laplace transform.

Lemma 1 Let x∈C andhx∈C(R+). Seti= min{j∈N/h(j)x (0)6= 0}. If sup

x∈C y≥0

h(i+1)x (y) <∞,

then

t→∞lim sup

x∈C

ti+1ehx(t)−h(i)x (0) = 0, withehx(t) =R+∞

0 exp(−tu)hx(u)du is the Laplace transform ofhx.

Lemma 1 is the key tool for establishing the uniform asymptotic expansions of µx and ∂µx/∂x given in Lemma 2:

Lemma 2 For all x∈C andt >0, we have (i) lim

t→∞sup

x∈C

µx(t) tx−1 −1

= 0, (ii) lim

t→∞sup

x∈C

∂xµx(t)−log(t)µx(t)

tx−2 −1

= 0.

As a consequence of the above expansions, some important properties of ψcan be derived in the two next lemmas:

Lemma 3 For all t > t0 >0 andx∈R, we have (i)x→ψ(x;t, t0)is an increasing function, (ii) lim

x→∞ψ(x;t, t0) = exp(t−t0).

Lemma 4 For all x∈C andt > t0 >0, we have

∂xψ(x;t, t0) = log(t/t0)ψ(x;t, t0)

1 +O

1 t0log(t/t0)

, as t0→ ∞.

The following lemma ensures that the estimator τbn is well-defined. In (5), the situation where Hn(kn)/Hn(k0n)≥k0n/kn is eventually impossible.

(9)

Lemma 5 Let (kn) and (kn0) be two intermediate sequences of integers such that kn/kn0 →0. If θbn,τ(kn)/bθn,τ(k0n)−→P 1 then

P

Hn(kn) Hn(kn0) ≥ kn0

kn

→0asn→ ∞.

We now prove that θbn,τ(kn) is a consistent estimator for θ when τ is known and under general assumptions.

Lemma 6 Suppose that (A1(τ, θ)) holds with a normalised slowly varying function `(.). If (kn) is an intermediate sequence of integers, then θbn,τ(kn)−→P θ.

The asymptotic normality of bθn,τ andxbp

n,bθn,τ has already been established in the particular case whereτ is known by [18]. Let us quote two results from this work.

Lemma 7 (Theorem 1, [18]). Suppose that(A1(τ, θ))and(A2(ρ))hold. If (kn)is an intermedi- ate sequence such that√

knb(expKτ(log(n/kn)))→0, then pkn

θbn,τ(kn) θ −1

!

−→ Nd (0,1). (18)

Lemma 8 (Theorem 2, [18]). Under the assumptions of Lemma 7 and if, moreover, (log(n/kn))1−τ(Kτ(log(1/pn))−Kτ(log(n/kn)))→ ∞,

then,

√kn

Kτ(log(1/pn))−Kτ(log(n/kn))log bxp

n,bθn,τ

xpn

!

−→ Nd (0, θ2).

The next lemma establishes that θ can be replaced by θbn,τ(kn0) in (18) without changing the asymptotic distribution.

Lemma 9 Suppose that (A1(τ, θ)) and (A2(ρ)) hold. Let (kn) and (kn0) be two intermediate sequences of integers such that p

kn0b(expKτ(logn/kn0))→0 andkn/kn0 →0. We have pkn θbn,τ(kn)

θbn,τ(kn0)−1

!

−→ Nd (0,1).

The last lemma quantifies the effect of estimatingτ in θbn,τ(kn).

Lemma 10 Suppose that (A1(τ, θ)) and (A2(ρ)) hold. Let (kn) and (kn0) be two intermediate sequences of integers such that

kn/kn0 →0, p

kn(log2(n/kn)−log2(n/k0n))/log2(n/kn)→ ∞,

pkn0b(expKτ(logn/k0n))→0 and log(n/kn0) (log2(n/kn)−log2(n/k0n))→ ∞.

We have √

kn(log2(n/kn)−log2(n/kn0)) log2(n/kn)

θbn,

bτn(kn) θbn,τ(kn) −1

!

−→ Nd (0,1).

(10)

6.2 Proofs of the main results

Proof of Proposition 1−For allε >0, let us write

P(|bτn−τ|> ε) =P(bτn > τ+ε) +P(τbn< τ −ε) and consider the two terms separately. Clearly, one has

P(bτn > τ+ε) = P

{τbn> τ +ε}

Hn(kn) Hn(k0n)≥ k0n

kn P

Hn(kn) Hn(k0n) ≥k0n

kn

+ P

{τbn> τ +ε} ∩

Hn(kn) Hn(k0n) <k0n

kn

≤ P

Hn(kn) Hn(k0n) ≥kn0

kn

+P

Hn(kn)

Hn(k0n) ≥ψ(τ+ε; log(n/kn),log(n/k0n))

.(19) Focusing on the first probability of (19), note that,θbn,τ(kn) andθbn,τ(k0n) are consistent estimators forθ in probability by Lemma 6. Thusbθn,τ(kn)/bθn,τ(kn0)−→P 1 and Lemma 5 entail

P

Hn(kn) Hn(k0n) ≥kn0

kn

→0.

The second probability in (19) tends to 0 by (4) and Lemma 3(i). Similarly, we can prove that P(bτn < τ−ε)→0 which achieves the proof.

Proof of Theorem 1 − Letting vn = √

kn(log2(n/kn)−log2(n/kn0)), our goal is to prove that P(vn(bτn−τ)≤s) → Φ(s), for all s ∈ R where Φ is the cumulative distribution function of the standard Gaussian distribution. To this aim, let us first remark that Lemma 9 yields θbn,τ(kn)/bθn,τ(k0n)−→P 1. Thus, introducingEn(s) ={vn(τbn−τ)≤s}, we have

P(En(s)) = P

En(s)∩

Hn(kn) Hn(k0n) <k0n

kn

+P

En(s)

Hn(kn) Hn(kn0) ≥ k0n

kn P

Hn(kn) Hn(k0n)≥ k0n

kn

=: Tn(1)(s) +o(1),

by Lemma 5. From the definition of τbn given in (5) and recalling that, from Lemma 3(i), ψ(.; log(n/kn),log(n/kn0)) is an increasing function, we obtain

Tn(1)(s) = P

Hn(kn)

Hn(kn0) ≤ψ(τ+s/vn; log(n/kn),log(n/kn0))

Hn(kn) Hn(k0n) <kn0

kn

= P

Hn(kn) Hn(kn0) ≤min

ψ(τ+s/vn; log(n/kn),log(n/kn0)),kn0 kn

.

Lemma 3(ii) thus yields

Tn(1)(s) = P

Hn(kn)

Hn(kn0) ≤ψ(τ+s/vn; log(n/kn),log(n/kn0))

= P

Hn(kn)

Hn(kn0) ≤ µτ+s/vn(log(n/kn)) µτ+s/vn(log(n/k0n))

,

and, from (2) and the fact that ζn:=p

kn

θbn,τ(kn) θbn,τ(k0n)−1

!

−→ Nd (0,1),

(11)

(see Lemma 9), we have Tn(1)(s) = P

ζn ≤p

kn

µτ(log(n/kn0)) µτ(log(n/kn))

µτ+s/vn(log(n/kn)) µτ+s/vn(log(n/kn0))−1

= P

ζn ≤p

kn

µτ(log(n/k0n))

µτ(log(n/kn))[ψ(τ+s/vn; log(n/kn),log(n/kn0))−ψ(τ; log(n/kn),log(n/kn0))]

.

A first order Taylor expansion leads to Tn(1)(s) = P

ζn ≤ s

(log2(n/kn)−log2(n/kn0))

µτ(log(n/kn0)) µτ(log(n/kn))

∂xψ(τ0; log(n/kn),log(n/k0n))

, where τ0=τ+sη0/vn with η0 ∈(0,1). Since τ0 →τ, for nlarge enough, τ0 <2 and Lemma 4 entails

Tn(1)(s) = P

ζn ≤sµτ0(log(n/kn)) µτ0(log(n/k0n))

µτ(log(n/kn0)) µτ(log(n/kn))

1 +O

1

log(n/k0n) (log2(n/kn)−log2(n/kn0))

= P

ζn ≤sµτ0(log(n/kn)) µτ0(log(n/k0n))

µτ(log(n/kn0))

µτ(log(n/kn))(1 +o(1))

.

Besides, Lemma 2(i) entails that sµτ0(log(n/kn))

µτ0(log(n/kn0))

µτ(log(n/kn0)) µτ(log(n/kn)) =s

log(n/kn) log(n/kn0)

τ0−τ

(1 +o(1)) =sexp sη0

√kn

(1 +o(1)) −→

n→∞s, and thus

Tn(1)(s) =P(ζn≤s(1 +o(1)))≤Φ(s) + sup

x∈R

|Φ(x)−P(ζn≤x)| →Φ(s) by [15], page 552. This achieves the proof of Theorem 1.

Proof of Theorem 2−On the first hand, Lemma 7 shows that θbn,τ(kn)−θ= ζnθ

√kn where ζn

−→ Nd (0,1), and on the other hand, Lemma 10 entails

θbn,

τbn(kn)

θbn,τ(kn) −1 = δnlog2(n/kn)

√kn(log2(n/kn)−log2(n/kn0)) where δn −→ Nd (0,1). (20) Collecting these results, it follows that

θbn,

τbn(kn)−θ = θbn,τ(kn) θbn,bτn(kn) θbn,τ(kn) −1

! +

θbn,τ(kn)−θ

= θbn,τ(kn) δnlog2(n/kn)

√kn(log2(n/kn)−log2(n/k0n))+ ζnθ

√kn. Consequently, one immediately has

√kn(log2(n/kn)−log2(n/kn0)) log2(n/kn)

θbn,

bτn(kn)−θ

= θbn,τ(knnnθlog2(n/kn)−log2(n/k0n) log2(n/kn) and Theorem 2 follows.

(12)

Proof of Theorem 3−Let us consider the following expansion:

log xbp

n,bθn,bτn

xpn

!

= log xbp

n,bθn,bτn

bxp

n,bθn,τ

!

+ log bxp

n,bθn,τ

xpn

!

=:Tn(2)+Tn(3). By (3), we have

Tn(2) = θbn,

bτn(kn)[K

bτn(log(1/pn))−K

τbn(log(n/kn))]−θbn,τ(kn)[Kτ(log(1/pn))−Kτ(log(n/kn))]

= [Kτ(log(1/pn))−Kτ(log(n/kn))]

θbn,bτn(kn)−θbn,τ(kn) +bθn,

τbn(kn) [(K

bτn(log(1/pn))−K

τbn(log(n/kn)))−(Kτ(log(1/pn))−Kτ(log(n/kn)))]

=: Tn(2,1)+Tn(2,2).

Focusing on the first term, (20) leads to

Tn(2,1)= log2(n/kn)[Kτ(log(1/pn))−Kτ(log(n/kn))]

√kn(log2(n/kn)−log2(n/kn0)) bθn,τ(knn

which implies that

√kn(log2(n/kn)−log2(n/k0n)) Rlog(1/pn)

log(n/kn)uτ−1log(u)du

Tn(2,1)nlog2(n/kn)[Kτ(log(1/pn))−Kτ(log(n/kn))]

Rlog(1/pn)

log(n/kn)uτ−1log(u)du

θbn,τ(kn) =oP(1).

Let us now consider Tn(2,2). Theorem 1 states that bτn=τ+ ξn

√kn(log2(n/kn)−log2(n/k0n)) =:τ+ξnσn where ξn−→ Nd (0,1).

Replacing inTn(2,2), we obtain Tn(2,2)=bθn,

τbn(kn)[(Kτ+σnξn(log(1/pn))−Kτ+σnξn(log(n/kn)))−(Kτ(log(1/pn))−Kτ(log(n/kn)))].

By definition

Kτ(log(1/pn))−Kτ(log(n/kn)) =

Z log(1/pn) log(n/kn)

uτ−1du,

and therefore it immediately follows that Tn(2,2)=bθn,

τbn(kn)

Z log(1/pn) log(n/kn)

uτ+σnξn−1−uτ−1

du=θbn,

bτn(kn)

Z log(1/pn) log(n/kn)

uτ−1 uσnξn−1 du.

Lettingϕ(x) := exp(x)−1−x, we deduce that Tn(2,2)=θbn,τbn(knnξn

Z log(1/pn) log(n/kn)

uτ−1log(u)du+θbn,bτn(kn)

Z log(1/pn) log(n/kn)

uτ−1ϕ(σnξnlog(u))du.

Now, there exists c > 0 such that x < log(c) implies |ϕ(x)| < c2x2. As a consequence, since σnlog2(1/pn)→0 andσnlog2(n/kn)→0, fornlarge enough, we have

Z log(1/pn) log(n/kn)

uτ−1ϕ(σnξnlog(u))du

Z log(1/pn) log(n/kn)

uτ−1|ϕ(σnξnlog(u))|du

≤ c 2σn2ξ2n

Z log(1/pn) log(n/kn)

uτ−1[log(u)]2du.

(13)

Thus,

Rlog(1/pn)

log(n/kn)uτ−1ϕ(σnξnlog(u))du σnRlog(1/pn)

log(n/kn)uτ−1log(u)du

≤ cσnξ2nRlog(1/pn)

log(n/kn)uτ−1[log(u)]2du 2Rlog(1/pn)

log(n/kn)uτ−1log(u)du

≤ cξn2log2(1/pn) 2√

kn(log2(n/kn)−log2(n/k0n)) =oP(1), and, replacing inTn(2,2), we obtain

√kn(log2(n/kn)−log2(n/kn0)) Rlog(1/pn)

log(n/kn)log(u)uτ−1du

Tn(2,2)nθbn,

bτn(kn) +oP(1)−→ Nd (0, θ2).

Finally, Lemma 8 states that

Tn(3)= %n[Kτ(log(1/pn))−Kτ(log(n/kn))]

√kn where %n

−→ Nd (0, θ2), and thus, under our assumptions,

√kn(log2(n/kn)−log2(n/kn0)) Rlog(1/pn)

log(n/kn)uτ−1log(u)du

Tn(3)=oP(1).

Combining the above results and using the delta method, Theorem 3 follows.

Acknowledgments

The authors are grateful to the referees for their helpful comments.

References

[1] Artin, E., (1964),The gamma function, Holt, Rinehart and Winston.

[2] Beirlant, J., Bouquiaux, C., Werker, B., (2006), Semiparametric lower bounds for tail index estimation,Journal of Statistical Planning and Inference,136, 705–729.

[3] Beirlant, J., Broniatowski, M., Teugels, J.L., Vynckier, P., (1995), The mean residual life function at great age: Applications to tail estimation, Journal of Statistical Planning and Inference, 45, 21–48.

[4] Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G., (1999), Tail index estimation and an exponential regression model,Extremes, 2, 177–200.

[5] Beirlant, J., Teugels, J., Vynckier, P., (1996), Practical analysis of extreme values, Leuven university press, Leuven.

[6] Bingham, N.H., Goldie, C.M., Teugels, J.L., (1987),Regular Variation, Cambridge University Press.

[7] Broniatowski, M., (1993), On the estimation of the Weibull tail coefficient,Journal of Statis- tical Planning and Inference, 35, 349–366.

(14)

[8] Davison, A., Smith, R. (1990), Models for exceedances over high thresholds,Journal of the Royal Statistical Society Series B,52, 3, 393–442.

[9] Dekkers, A. L. M., Einmahl, J. H. J., de Haan, L., (1989), A moment estimator for the index of an extreme-value distribution,The Annals of Statistics,17, 1833–1855.

[10] Diebolt, J., El-Aroui, M., Garrido, M., Girard, S., (2005), Quasi-Conjugate Bayes Estimates for GPD Parameters and Application to Heavy Tails Modeling,Extremes, 8, 57–78.

[11] Diebolt, J., Gardes, L., Girard, S., Guillou, A., (2008), Bias-reduced estimators of the Weibull tail-coefficient,Test,17, 311–331.

[12] Diebolt, J., Gardes, L., Girard, S., Guillou, A., (2008), Bias-reduced extreme quantiles es- timators of Weibull tail-distributions, Journal of Statistical Planning and Inference, 138, 1389–1401.

[13] Dierckx, G., Beirlant, J., De Waal, D., Guillou, A. (2009), A new estimation method for Weibull-type tails based on the mean excess function,Journal of the Statistical Planning and Inference, 139, 1905–1920.

[14] Ditlevsen, O., (1994), Distribution Arbitrariness in Structural Reliability, Structural Safety and Reliability, 1241–1247, Balkema, Rotterdam.

[15] Embrechts, P., Kl¨uppelberg, C., Mikosch, T., (1997),Modelling extremal events, Springer.

[16] Gardes, L., Girard, S., (2006), Comparison of Weibull tail-coefficient estimators,REVSTAT - Statistical Journal, 4, 163–188.

[17] Gardes, L., Girard, S., (2008), Estimation of the Weibull tail-coefficient with linear combina- tion of upper order statistics,Journal of Statistical Planning and Inference,138, 1416–1427.

[18] Gardes, L., Girard, S., Guillou, A., (2011), Weibull tail-distributions revisited: a new look at some tail estimators,Journal of Statistical Planning and Inference,141, 429–444.

[19] Girard, S., (2004), A Hill type estimate of the Weibull tail-coefficient, Communication in Statistics - Theory and Methods,33(2), 205–234.

[20] Gnedenko, B.V., (1943), Sur la distribution limite du terme maximum d’une s´erie al´eatoire, Annals of Mathematics,44, 423–453.

[21] Goegebeur,Y., Beirlant, J., de Wet, T. (2010), Generalized kernel estimators for the Weibull- tail coefficient,Communications in Statistics - Theory and Methods,39(20), 3695–3716.

[22] Goegebeur, Y., Guillou, A. (2010), Goodness-of-fit testing for Weibull-type behavior,Journal of Statistical Planning and Inference,140, 1417–1436.

[23] H¨ausler, E., Teugels, J.L. (1985), On asymptotic normality of Hill’s estimator for the exponent of regular variation,The Annals of Statistics,13, 743–756.

[24] Hill, B.M., (1975), A simple general approach to inference about the tail of a distribution, The Annals of Statistics,3, 1163–1174.

[25] Hosking, J., Wallis, J., (1987), Parameter and quantile estimation for the generalized Pareto distribution,Technometrics,29, 339–349.

[26] Smith, J., (1991), Estimating the upper tail of flood frequency distributions,Water Resources Research,23(8), 1657–1666.

(15)

Appendix: Proof of auxiliary results

Proof of Lemma 1−A (i+ 1)thorder Taylor expansion leads to hx(u) =ui

i!h(i)x (0) + ui+1

(i+ 1)!h(i+1)x (ηu), withη∈(0,1) and consequently,

ehx(t) = Z

0

exp(−tu)ui

i!h(i)x (0)du+ Z

0

exp(−tu) ui+1

(i+ 1)!h(i+1)x (ηu)du=:Tx(1)(t) +Tx(2)(t).

It follows that

Tx(1)(t) = h(i)x (0) i!

Z 0

uiexp(−tu)du=h(i)x (0) ti+1 , and the change of variable v=tuyields

Tx(2)(t) = 1 (i+ 1)!

1 ti+2

Z 0

exp(−v)vi+1h(i+1)x ηv

t

dv.

Therefore, we have, uniformly inx∈C:

|Tx(2)(t)| ≤sup

z∈Cy≥0

h(i+1)z (y)

1 (i+ 1)!

1 ti+2

Z 0

exp(−v)vi+1dv= 1 ti+2 sup

z∈Cy≥0

h(i+1)z (y) =O

1 ti+2

and thus

ehx(t) = h(i)x (0) ti+1 +O

1 ti+2

= 1 ti+1

h(i)x (0) +O 1

t

, which achieves the proof.

Proof of Lemma 2−(i) By definition, for all x∈Randt >0, we have µx(t) =

Z 0

(Kx(u+t)−Kx(t)) exp(−u)du= Z

0

Z u+t t

yx−1dy

exp(−u)du.

Using Fubini’s theorem, it follows µx(t) =

Z t

yx−1 Z

y−t

exp(−u)du

dy= exp(t) Z

t

yx−1exp(−y)dy, (21) and the change of variable u=y/t−1 yields

µx(t) =tx Z

0

exp(−tu)(u+ 1)x−1du=txehx(t)

withhx(t) = (t+ 1)x−1. Applying Lemma 1 withi= 0 concludes the proof of (i).

(ii) From (21) we obtain, for x∈Randt >0, µx(t) = exp(t)

Z t

yx−1exp(−y)dy= exp(t)Γ(x, t), (22) where Γ(x, t) is the upper incomplete gamma function. We thus have (see for instance [1])

∂xµx(t) = exp(t) Z

t

exp(−y) log(y)yx−1dy, (23)

(16)

and the change of variable u=y/t−1 yields

∂xµx(t) = exp(t) Z

0

exp(−tu) exp(−t)tx−1(u+ 1)x−1log(t(u+ 1))tdu

= tx Z

0

exp(−tu)(u+ 1)x−1log(u+ 1)du+ log(t)tx Z

0

exp(−tu)(u+ 1)x−1du

=: tx Z

0

exp(−tu)gx(u)du+ log(t)µx(t),

withgx(u) := (u+ 1)x−1log(u+ 1). Applying Lemma 1 withi= 1, the conclusion follows.

Proof of Lemma 3 − (i) First of all, remark thatψ(x;t, t0) >0 for allx∈ Rand t > t0 >0.

Besides, routine calculations show that

∂xψ(x;t, t0) = ψ(x;t, t0)

∂/∂x(µx(t))

µx(t) −∂/∂x(µx(t0)) µx(t0)

(24)

=: ψ(x;t, t0) (Qx(t)−Qx(t0)),

where, by (21) and (23),

Qx(z) = R

z log(y)yx−1exp(−y)dy R

z yx−1exp(−y)dy . Let us remark that Qxis an increasing function on (0,∞) since

Q0x(z) = zx−1exp(−z)R

z yx−1exp(−y) log(y/z)dy R

z yx−1exp(−y)dy2 >0.

As a consequence t > t0 implies ∂/∂x(ψ(x;t, t0)) =ψ(x;t, t0)(Qx(t)−Qx(t0))> 0 and concludes the first part of the proof.

(ii) From (22), we have 1

ψ(x;t, t0)

exp(−t0)

exp(−t) −1 = µx(t0) exp(−t0) µx(t) exp(−t) −1 =

Rt

t0yx−1exp(−y)dy R

t yx−1exp(−y)dy =:Mx(t, t0).

Besides, considering the following inequalities, 0<

Z t t0

yx−1exp(−y)dy < tx−1(exp(−t0)−exp(−t)),

and Z

t

yx−1exp(−y)dy >

Z 2t

yx−1exp(−y)dy >(2t)x−1exp(−2t), it follows that

Mx(t, t0)<21−xexp(−t0)−exp(−t) exp(−2t) , and thusMx(t, t0)→0 asx→ ∞. The conclusion follows.

Références

Documents relatifs

Estimating extreme quantiles of Weibull tail- distributions. Comparison of Weibull

In this chapter we propose inference (i.e. estimation and testing) for vast dimensional elliptical distributions. Estimation is based on quantiles, which always exist regardless of

Bias-reduced extreme quantiles estimators of Weibull tail-distributions. A new estimation method for Weibull-type tails based on the mean

However, the results on the extreme value index, in this case, are stated with a restrictive assumption on the ultimate probability of non-censoring in the tail and there is no

Abstract This paper presents new approaches for the estimation of the ex- treme value index in the framework of randomly censored (from the right) samples, based on the ideas

The quantile function at an arbitrarily high extreme level can then be consistently deduced from its value at a typically much smaller level provided γ can be consistently

In this communication, we first build simple extreme analogues of Wang distortion risk measures and we show how this makes it possible to consider many standard measures of

2 we recall the known results for the sample covariance matrix estimator under light-tailed assumptions, together with some recent high-probability results for an alternative