Bias-reduced estimators of the Weibull tail-coefficient

(1)

HAL Id: hal-00008881

https://hal.archives-ouvertes.fr/hal-00008881

Submitted on 20 Sep 2005

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de

Bias-reduced estimators of the Weibull tail-coeﬀicient

Jean Diebolt, Laurent Gardes, Stéphane Girard, Armelle Guillou

To cite this version:

Jean Diebolt, Laurent Gardes, Stéphane Girard, Armelle Guillou. Bias-reduced estimators of the

Weibull tail-coeﬀicient. Test, Spanish Society of Statistics and Operations Research/Springer, 2008,

17 (2), pp.311-331. �10.1007/s11749-006-0034-6�. �hal-00008881�

(2)

BIAS-REDUCED ESTIMATORS OF THE WEIBULL TAIL-COEFFICIENT

Jean Diebolt

⁽¹⁾

, Laurent Gardes

⁽²⁾

, St´ephane Girard

^(3,^∗⁾

and Armelle Guillou

⁽⁴⁾

(1)

CNRS, Universit´e de Marne-la-Vall´ee

Equipe d’Analyse et de Math´ematiques Appliqu´ees ´ 5, boulevard Descartes, Batiment Copernic

Champs-sur-Marne

77454 Marne-la-Vall´ee Cedex 2

(2)

Universit´e Grenoble 2, LabSAD, 1251 Avenue centrale B.P. 47, 38040 Grenoble Cedex 9

(3,∗)

Corresponding author.

Universit´e Grenoble 1,

LMC-IMAG, 51 rue des Math´ematiques B.P. 53, 38041 Grenoble Cedex 9,

Phone: +(33) 4 76 51 45 53, Fax: +(33) 4 76 63 12 63 Stephane.Girard@imag.fr

(4)

Universit´e Paris VI

Laboratoire de Statistique Th´eorique et Appliqu´ee Boˆıte 158

175 rue du Chevaleret 75013 Paris

Abstract. In this paper, we consider the problem of the estimation of a Weibull tail-coefficient θ.

In particular, we propose a regression model, from which we derive a bias-reduced estimator of θ.

This estimator is based on a least-squares approach. The asymptotic normality of this estimator is also established. A small simulation study is provided in order to prove its efficiency.

Key words and phrases. Weibull tail-coe ffi cient, Bias-reduction, least-squares approach, asymptotic normality.

AMS Subject classifications. 62G05, 62G20, 62G30.

(3)

1. Introduction.

Let X

₁

, ..., X

_n

be a sequence of independent and identically distributed random variables with distribution function F, and let X

1,n

≤ ... ≤ X

n,n

denote the order statistics associated to this sample.

In the present paper, we address the problem of estimating the Weibull tail- coe ffi cient θ > 0 defined as

1 − F(x) = exp( − H(x)) with H

⁻¹

(x) : = inf { t : H(t) ≥ x } = x

^θ

`(x), (1) where ` is a slowly varying function at infinity satisfying

`(λx)

`(x) −→ 1, as x → ∞ , for all λ > 0. (2) Girard (2004) investigated this estimation problem and proposed the following estimator of θ:

θ e

n

=

kn

X

i=1

³ log X

n−i+1,n

− log X

n−kn+1,n

´

kn

X

i=1

³ log log n

i − log log n k

_n

´ , (3)

where k

n

is an intermediate sequence, i.e. a sequence such that k

n

→ ∞ and k

n

/n → 0 as n → ∞ .

We refer to Beirlant et al. (1995) and Broniatowski (1993) for other propositions and to Beirlant et al. (2005) for Local Asymptotic Normality (LAN) results. Estimator (3) is closed in spirit to the Hill estimator (see Hill, 1975) in the case of Pareto-type distributions. In Girard (2004), the asymptotic normality of e θ

n

is established under suitable assumptions. To prove such a result, a second-order condition is required in order to specify the bias-term. This assumption can be expressed in terms of the slowly varying function ` as follows:

Assumption (R

`

(b, ρ)) There exists a constant ρ ≤ 0 and a rate function b satisfying b(x) → 0 as x → ∞ , such that for all ε > 0 and 1 < A < ∞ , we have

sup

λ∈[1,A]

¯¯ ¯¯

¯

log

^`(λx)_`(x)

b(x)K

_ρ

(λ) − 1 ¯¯

¯¯ ¯ ≤ ε, for x sufficiently large, with K

_ρ

(λ) =

Z

λ 1

t

^ρ⁻¹

dt.

It can be shown that necessarily | b | is regularly varying with index ρ (see e.g. Geluk

and de Haan, 1987). Moreover, we focus on the case where the convergence (2) is slow,

(4)

and thus when the bias term in θ e

n

is large. This situation is described by the following assumption:

xb(x) → ∞ as x → ∞ . (4)

Let us note that this condition implies ρ ≥ − 1. Gamma and Gaussian distributions fulfill (4), whereas Weibull distributions do not (see Table 1) since, in this case, the bias term vanishes.

Using this framework, we will establish rigorously in Section 2 the following approximation for the log-spacings of upper order statistics:

Z

j

: = j log n j

³ log X

n−j+1,n

− log X

n−j,n

´ ≈ Ã

θ + b ³ log n

k

n

´Ã log

ⁿ_j

log

_kⁿ

n

!

ρ

!

f

j

, (5)

for 1 ≤ j ≤ k

n

, where ( f

1

, ..., f

kn

) is a vector of independent and standard exponentially distributed random variables.

This exponential regression model is similar to the ones proposed by Beirlant et al.

(1999, 2002) and Feuerverger and Hall (1999) in the case of Pareto-type distributions.

Ignoring b ³

log

_kⁿ_n

´Ã

logⁿ_j log_knⁿ

!

ρ

in (5) leads to the maximum likelihood estimator θ ˇ

n

= 1

k

n kn

X

j=1

Z

j

,

which turns out to be an alternative estimator of e θ

n

.

The full model (5) allows us to generate bias-corrected estimates b θ

n

for θ through maximum likelihood estimation of θ, b(log n/k

_n

) and ρ for each 1 ≤ k

_n

≤ n − 1. An alternative to this approach consists in using a canonical choice for ρ and to estimate the two other parameters by a least-squares method (LS). For the canonical choice of ρ, we can use for instance the value -1, which is the same as the one proposed by Feuerverger and Hall (1999) for the regression model in the case of Pareto-type distributions. The asymptotic normality of the resulting LS-estimator is established in Section 3. In order to illustrate the usefulness of the bias-term, we provide a small simulation study in Section 4. The proofs of our results are postponed to Section 6.

2. Exponential regression model

In this section, we formalize (5). First, remark that F

⁻¹

(x) = h

− log(1 − x) i

θ

` ³

− log(1 − x) ´ . Since X

n−j+1,n

=

d

F

⁻¹

(U

n−j+1,n

), 1 ≤ j ≤ k

n

, where U

j,n

denotes the j-th order statistic of a

(5)

uniform sample of size n, we have X

n−j+1,n

=

d

h

− log(1 − U

n−j+1,n

) i

θ

` ³

− log(1 − U

n−j+1,n

) ´ which implies that

log X

n−j+1,n

=

d

θ log h

− log(1 − U

n−j+1,n

) i

+ log h

` ³

− log(1 − U

n−j+1,n

) ´i .

Moreover, considering the order statistics from an independent standard exponential sample, E

n−j+1,n

=

d

− log(1 − U

n−j+1,n

). Therefore log X

n−j+1,n

=

d

θ log(E

n−j+1,n

) + log h

`(E

n−j+1,n

) i

= : A

n

( j) + B

n

( j).

Our basic result now reads as follows.

Theorem 1 Suppose (1) holds together with (R

`

(b, ρ)) and (4). Then, if k

n

→ ∞ and log k

n

/ log n → 0, we have

sup

1≤j≤kn

¯¯ ¯¯

¯ j log n j

³ log X

n−j+1,n

− log X

n−j,n

´ − Ã

θ + b ³ log n

k

n

´Ã log

ⁿ_j

log

_kⁿ

n

!

ρ

! f

j

¯¯ ¯¯

¯

= o

P

³ b ³ log n

k

n

´´ , (6)

where ( f

1

, ..., f

kn

) is a vector of independent and standard exponentially distributed random variables.

The proof of this theorem is based on the following two lemmas:

Lemma 1 Suppose (1) holds together with (R

`

(b, ρ)) and (4). Then, if k

n

→ ∞ and k

n

/n → 0, we have

sup

1≤j≤kn

¯¯ ¯¯

¯ j log n j

h A

n

(j) − A

n

(j + 1) i

− θ f

j

¯¯ ¯¯

¯ = o

^P

³ b ³

log n k

n

´´ , (7)

and

Lemma 2 Suppose (1) holds together with (R

`

(b, ρ)). Then, if k

n

→ ∞ and log k

n

/ log n → 0, we have

sup

1≤j≤kn

¯¯ ¯¯

¯ j log n j

h B

n

(j) − B

n

( j + 1) i

− b ³ log n

k

n

´Ã log

ⁿ_j

log

_kⁿ_n

!

ρ

f

j

¯¯ ¯¯

¯ = o

P

³ b ³ log n

k

n

´´ . (8)

The proof of these lemmas is postponed to Section 6.

(6)

Remark 1 Under the assumptions of Theorem 1, we also have sup

1≤j≤kn

¯¯ ¯¯

¯ j log n j

³ log X

n−j+1,n

− log X

n−j,n

´ − Ã

θ + b ³ log n

k

n

´Ã log

ⁿ_j

log

_kⁿ_n

!

₋1

! f

j

¯¯ ¯¯

¯

= o

^P

³ b ³

log n k

n

´´ ,

where ( f

1

, ..., f

kn

) is a vector of independent and standard exponentially distributed random variables.

This implies that one can plug the canonical choice ρ = − 1 in the regression model (6) without perturbing the approximation. From model (6) we can easily deduce the asymptotic normality of the estimator ˇ θ

n

, given in the next theorem:

Theorem 2 Suppose (1) holds together with (R

_`

(b, ρ)) and (4). Then, if k

n

→ ∞ ,

√ k

n

b(log(n/k

n

)) → λ ∈ R and if λ = 0: log k

n

/ log n → 0, we have p k

n

Ã

θ ˇ

n

− θ − b ³ log n

k

n

´ 1 k

n

kn

X

j=1

Ã log

ⁿ_j

log

_kⁿ

n

!

ρ

!

−→ N

d

(0, θ

²

).

This model (6) now plays the central role in the remainder of this paper. It allows us to generate bias-corrected estimates of θ as we will show in the next section.

3. Bias-reduced estimates of θ

In order to reduce the bias of the estimator ˇ θ

n

, we can either estimate simultaneously θ, b(log n/k

n

) and ρ by a maximum likelihood method or estimate θ and b by a least- squares approach after substituting a canonical choice for ρ. In fact, this second-order parameter is di ffi cult to estimate in practice and we can easily check by simulations that fixing its value does not much influence the result. This problem has already been discussed in Beirlant et al. (1999, 2002) and Feuerverger and Hall (1999) where similar observations have been made in the case of Pareto-type distributions. The canonical choice ρ = −1 is often used although other choices could be motivated performing a model selection.

In all the sequel, we will estimate θ and b(log

_kⁿ

n

) by a LS-method after substituting ρ with the value − 1. In that case, we find the following LS-estimators:

 

 



θ b

n

= Z

kn

− b b ³ log

_kⁿ

n

´ x

kn

b b ³ log n

k

n

´ = P

kn

j=1

(x

j

− x

kn

)Z

j

P

kn

j=1

(x

j

− x

kn

)

²

(7)

where x

j

= Ã

logⁿ_j

log_knⁿ

!

₋1

, x

kn

=

¹

kn

P

kn

j=1

x

j

and Z

kn

=

¹

kn

P

kn

j=1

Z

j

.

Our next goal is to establish, under suitable assumptions, the asymptotic normality of b θ

n

. This is done in the following theorem.

Theorem 3 Suppose (1) holds together with (R

`

(b, ρ)) and (4). Then, if k

n

is such that k

n

→ ∞ ,

√ k

n

log

_kⁿ

n

b ³ log n

k

n

´ → Λ ∈ R and, if Λ = 0, log

²

k

_n

log

_kⁿ

n

→ 0 and

√ k

n

log

_kⁿ

n

→ ∞ , (9)

we have √

k

n

log

_kⁿ_n

³ b θ

n

− θ ´

_d

−→ N (0, θ

²

).

Remark that the rate of convergence of ˇ θ

n

is the same as the one of b θ

n

in the cases where both λ and Λ are not equal to 0.

The proof of this theorem is postponed to Section 6.

In order to illustrate the usefulness of the bias-term in the model (6), we will provide a small simulation study in the next section.

4. A small simulation study

The finite sample performances of the estimators θ b

n

, θ e

n

and ˇ θ

n

are investigated on 5 di ff erent distributions: Γ (0.25, 1), Γ (4, 1), N (1.1, 1), W (0.25, 0.25) and W (4, 4). We limit ourselves to these three estimators, since it is shown in Girard (2004) that θ e

n

gives better results than the other approaches (Beirlant et al., 1995; Broniatowski, 1993). In each case, N = 100 samples (X

n,i

)

i=1,...,N

of size n = 500 were simulated. On each sample (X

n,i

), the estimates b θ

n,i

(k

n

), θ e

n,i

(k

n

) and ˇ θ

n,i

(k

n

) were computed for k

n

= 2, . . . , 360. Finally, the Hill-type plots were built by drawing the points

 

 k

n

, 1 N

X

N i=1

θ b

n,i

(k

n

)

 

 ,

 

 k

n

, 1 N

X

N i=1

θ e

n,i

(k

n

)

 

 and

 

 k

n

, 1 N

X

N i=1

θ ˇ

n,i

(k

n

)

 

 .

We also present the associated MSE (mean square error) plots obtained by plotting the points

 

 k

n

, 1 N

X

N i=1

³ b θ

n,i

(k

n

) − θ ´

2

 

 ,

 

 k

n

, 1 N

X

N i=1

³ e θ

n,i

(k

n

) − θ ´

2

 

 , and

 

 k

n

, 1 N

X

N i=1

³ θ ˇ

n,i

(k

n

) − θ ´

2

 

 .

The results are presented on figures 1–5. In all the plots, the graphs associated to θ e

n

(8)

and ˇ θ

n

are similar, with a slightly better behaviour of ˇ θ

n

. The bias corrected estimator b θ

n

always yields a smaller bias than the two previous ones leading to better results for Gamma and Gaussian distributions (figures 1–3). On Weibull distributions (figures 4–

5), it presents a larger variance.

5. Concluding remarks

In this paper, we introduce a regression model, from which we derive a bias-reduced estimator for the Weibull tail-coe ffi cient θ. Its asymptotic normality is established and its e ffi ciency is illustrated in a small simulation study. However, in many cases of practical interest, the problem of estimating a quantile x

pn

= F

⁻¹

(1 − p

n

), with p

n

< 1/n, is much more important. Such a problem has already been studied in Gardes and Girard (2005) where the following Weissman-type estimator has been introduced

e x

pn

= X

n−kn+1,n

Ã log

_p¹_n

log

_kⁿ

n+1

!

^eθn

.

It is, however, desirable to refine e x

pn

with the additional information about the slowly varying function ` that is provided by the LS-estimates for θ and b. To this aim, condition (R

`

(b, ρ)) is used to approximate the ratio F

⁻¹

(1 − p

n

)/X

n−kn+1,n

, noting that

X

n−kn+1,n

=

d

F

⁻¹

(U

n−kn+1,n

),

with U

1,n

≤ ... ≤ U

n,n

the order statistics of a uniform (0, 1) sample of size n, x

pn

X

n−kn+1,n

=

d

F

⁻¹

(1 − p

n

) F

⁻¹

(U

n−kn+1,n

)

=

d

( − log p

n

)

^θ

( − log(1 − U

n−kn+1,n

))

^θ

`( − log p

n

)

`( − log(1 − U

n−kn+1,n

)) '

d

Ã log

_p¹_n

log

_k_nⁿ+1

!

θ

exp

"

b ³ log n

k

n

´ ³

log_pn¹ log_knⁿ+1

´

ρ

− 1 ρ

# .

The last step follows from replacing U

kn+1,n

(resp. E

n−kn+1,n

) by (k

n

+ 1)/n (resp. log n/(k

n

+ 1)). Hence, we arrive at the following estimator for extreme quantiles

b x

pn

= X

n−kn+1,n

Ã log

_p¹

n

log

_k_nⁿ+1

!

θ^bn

exp

"

b b ³ log n

k

n

´ ³

log_pn¹ log_knⁿ₊₁

´

bρ

− 1 b ρ

# .

Here, the LS-estimators of θ and b can be used after substituting ρ by the canonical

choice − 1. The study of the asymptotic properties of such an estimator is beyond the

scope of the present paper, but it will lead to further investigations.

(9)

6. Proofs of our results

6.1 Preliminary lemmas

Lemma 3 For all 1 ≤ j ≤ k

n

such that k

n

→ ∞ and

^k_nⁿ

→ 0, we have E

n−j,n

log

ⁿ_j

= 1 + O

P

³ 1 log

_kⁿ_n

´ uniformly in j.

Proof of Lemma 3. According to R´enyi’s representation, we have E

n−j,n

=

d n

X

−j+1

`=1

f

n−`−j+1

` + j − 1 where f

j i.i.d.

∼ Exp(1). Since

Var Ã

ⁿ

X

−j+1

`=1

f

n−`−j+1

` + j − 1

!

= O(1), denoting

T

j,n

: =

n

X

−j+1

`=1

"

f

n−`−j+1

` + j − 1 − E f

n−`−j+1

` + j − 1

# ,

we have, using Kolmogorov’s inequality (see e.g. Shorack and Wellner, 1986, p. 843), that

P Ã

1

max

≤j≤kn

| T

j,n

| ≥ λ

!

≤ Var(T

1,n

)

λ

²

, λ > 0.

This implies that T

j,n

= O

^P

(1) uniformly in j. Taking into account the fact that

¯¯ ¯¯

¯ X

n

`=j

1 ` − log n j

¯¯ ¯¯

¯ = O(1) uniformly in j, 1 ≤ j ≤ k

n

,

it is easy to deduce Lemma 3. t u

Let us introduce the E

m

− function defined by the integral E

m

(x) : =

Z

_∞

1

e

⁻^xt

t

^m

dt

for a positive integer m. The asymptotic expansion of this integral is given in the

following lemma.

(10)

Lemma 4 As x → ∞ , for any fixed positive integers m and p, we have E

m

(x) = e

⁻^x

x (

1 − m

x + m(m + 1)

x

²

+ ... + (−1)

^p

m(m + 1)...(m + p − 1)

x

^p

+ O ³ 1

x

^p⁺¹

´) . The proof of this lemma is straightforward from Abramowitz and Stegun (1972, p. 227- 233) and the O − term can be obtained by a Taylor expansion with an integral remainder.

Denote

µ

p

: = 1 k

n

kn

X

j=1

³ x

j

− x

kn

´

p

, p ∈ N

^∗

.

The next lemma provides a first order expansion of this Riemman sum.

Lemma 5 If k

n

→ ∞ ,

^k_nⁿ

→ 0,

_log^kⁿn

kn

→ ∞ and

^log_log²^knⁿ

kn

→ 0, then

µ

p

∼ C

p

Ã log n

k

n

!

₋p

as n → ∞ , where C

p

= Z

1

0

³ log x + 1 ´

p

dx < ∞ . Proof of Lemma 5. Denote α

n

=

¹

logn/kn

. Then x

kn

can be rewritten as x

kn

= 1

k

n

+ Ã 1

k

n kn−1

X

j=1

f

n

(j/k

n

) − Z

1

0

f

n

(x)dx

! +

Z

1 0

f

n

(x)dx = : 1 k

n

+ T

1

+ T

2

,

where f

n

(x) = (1 − α

n

log x)

⁻¹

, x ∈ [0, 1].

Denoting by f

_n⁽ⁱ⁾

, i ∈ { 1, 2 } , the ith derivative of f

n

, we infer that T

1

=

kn−1

X

j=1

Z

(j+1)/kn

j/kn

³ j k

n

− t ´

f

_n⁽¹⁾

³ j k

n

´ dt +

kn−1

X

j=1

Z

(j+1)/kn

j/kn

Z

t j/kn

(x − t) f

_n⁽²⁾

(x)dxdt

+

Z

1/kn

0

f

n

(x)dx

= : T

3

+ T

4

+ T

5

. Remark that

T

3

= − 1 2k

n

Ã 1 k

n

kn−1

X

j=1

f

_n⁽¹⁾

³ j k

n

´ − Z

1

1/kn

f

_n⁽¹⁾

(t)dt

!

− 1 2k

n

Z

1 1/kn

f

_n⁽¹⁾

(t)dt

= : − 1 2k

n

T

6

+ T

7

.

(11)

Since f

_n⁽¹⁾

is positive and decreasing on h

1 kn

, 1 i

for n su ffi ciently large, we can prove that

 

 

 

 



| T

4

| ≤

_2k¹2 n

¯¯ ¯¯

¯ f

_n⁽¹⁾

³

₁

kn

´ − f

_n⁽¹⁾

(1) ¯¯

¯¯ ¯ = o ³

₁

kn

´ T

5

= O ³

1 kn

´ ,

| T

6

| ≤

_k¹_n

¯¯

¯¯ ¯ f

_n⁽¹⁾

³

1 kn

´ − f

_n⁽¹⁾

(1) ¯¯

¯¯ ¯ = o(1) T

7

= −

_2k¹_n

Ã

f

n

(1) − f

n

³

1 kn

´! = o ³

1 kn

´ and consequently T

1

= O ³

₁

kn

´ . Besides, a direct application of Lemma 4 provides T

2

= 1 − α

n

+ O(α

²_n

).

Therefore x

kn

= 1 − α

n

+ O ³

1 kn

´ + O(α

²_n

). Now, we can check that

µ

p

= α

^p_n

( 1

k

n kn

X

j=1

³ log ³ j k

n

´ + 1 ´

p

+ R

n

)

where

R

n

= 1 k

n

kn−1

X

j=1

(³ log ³ j k

n

´ + 1 + ε

n

´

p

− ³ log ³ j

k

n

´ + 1 ´

p

)

with ε

n

= O ³

α

n

log

²

k

n

´ + O ³

1 kαn

´ which tends to 0 by assumption.

Since

_C¹

p

1 kn

P

kn

j=1

³ log ³

_j

kn

´ + 1 ´

p

→ 1, in order to conclude the proof of Lemma 5, we only

have to remark that R

n

→ 0. t u

6.2 Proof of Lemma 1 Remark that

α

j,n

: = j log n j

h A

n

( j) − A

n

( j + 1) i

= θ j log n

j log E

n−j+1,n

E

n−j,n

= θ log n j

j(E

n−j+1,n

− E

n−j,n

) E

^∗_n₋_j,n

=

d

θ f

j

log

ⁿ_j

E

^∗_n₋_j,n

(12)

where E

^∗_n₋_j,n

∈ h

E

n−j,n

; E

n−j+1,n

i . Consequently, from Lemma 3,

α

j,n

= θ f

j

+ O

P

³ 1

log

_kⁿ_n

´

= θ f

j

+ o

P

³ b ³ log n

k

n

´´ , (10)

by the assumption xb(x) → ∞ as x → ∞ with a o

^P

− term which is uniform in j. Lemma 1

is therefore proved. t u

6.3 Proof of Lemma 2 We consider

β

j,n

: = j log n j

h B

n

( j) − B

n

( j + 1) i .

In order to study this term, we will use the notations λ

1j

=

^E^n−j⁺^1,n

E_n−kn+1,n

, λ

2j

=

^E^n−j,n

E_n−kn+1,n

and y

kn

= E

n−kn+1,n

, and we rewrite β

j,n

as

β

j,n

= j log n j

( log ` ³

λ

2j

λ

1j

λ

2j

y

kn

´ − log ` ³ λ

2j

y

kn

´) .

It is clear that 1 ≤

^λ_λ^1j_2j

−→

^P

1 uniformly in j by Lemma 3 and therefore for n ≥ N

0

,

λ1j

λ₂_j

∈ [1, 2] in probability. Under our assumption (R

`

(b, ρ)) on the slowly varying function, we deduce that

β

j,n

= j log n j

(

b(λ

_2j

y

_k_n

)K

_ρ

³ λ

1j

λ

2j

´ (1 + o

P

(1)) )

. Now, since λ

2j

−→

P

1 uniformly in j and b(.) is regularly varying with index ρ, b(λ

2j

y

kn

) = λ

^ρ_2j

b(y

kn

)(1 + o

^P

(1)) with a o

^P

(1)-term uniform in j.

Therefore

β

j,n

= j log n j b(y

kn

)

( λ

^ρ_2j

K

ρ

³ λ

1j

λ

2j

´ (1 + o

P

(1)) )

.

Again, uniformly in j,

K

ρ

³ λ

1j

λ

2j

´ = Ã λ

1j

λ

2j

− 1

!

(1 + o

P

(1)), which implies that β

j,n

can be rewritten as follows:

β

j,n

= − j log n

j b(y

kn

)(λ

2j

− λ

1j

)λ

^ρ−1₂_j

(1 + o

^P

(1)).

(13)

Therefore, we have

β

j,n

= f

j

Ã log

ⁿ_j

log

_kⁿ

n

!

ρ

b(y

kn

)(1 + o

^P

(1)),

with a o

P

(1)-term which is uniform in j. This achieves the proof of Lemma 2. t u Remark that, since

_log(n/k^log(n/j)

n)

→ 1 uniformly in j, one also has β

j,n

= f

_j

Ã log

ⁿ_j

log

_kⁿ_n

!

₋1

b(y

_k_n

)(1 + o

P

(1)),

with a o

^P

(1)-term which is uniform in j, and this proves Remark 1.

6.4 Proof of Theorem 2 From model (6), we infer that

p k

n

Ã

θ ˇ

n

− θ − b ³ log n

k

n

´ 1 k

n

kn

X

j=1

Ã log

ⁿ_j

log

_kⁿ

n

!

ρ

!

= p k

n

θ 1

k

n kn

X

j=1

( f

j

− 1) + p k

n

b ³

log n k

n

´ 1 k

n

kn

X

j=1

Ã log

ⁿ_j

log

_kⁿ

n

!

ρ

( f

j

− 1) + o

^P

³ p k

n

b ³

log n k

n

´´ .

Now, an application of Tchebychev’s inequality gives that 1

k

n kn

X

j=1

Ã log

ⁿ_j

log

_kⁿ

n

!

ρ

( f

j

− 1) = o

^P

(1).

Then, under our assumptions, Theorem 2 follows by an application of the Central Limit

Theorem. t u

6.5 Proof of Theorem 3 From Remark 1, we have

√ k

n

log

_kⁿ

n

³ θ b

n

− θ ´

=

√ k

n

log

_kⁿ

n

1 k

n

kn

X

j=1

Ã θ + b ³

log n k

n

´ x

j

!Ã

1 − x

j

− x

kn

µ

2

x

kn

!

( f

j

− 1)

+ o

^P

Ã √

k

n

log

_kⁿ

n

b ³ log n

k

n

´! .

Since we have (9), the o

^P

-term is negligible. The first term can be viewed as a sum

of a weighted mean of independent and identically distributed variables. Now, using

(14)

Lyapounov’s theorem, we only have to show that

k

lim

n→∞

1 s

⁴_k

n

kn

X

j=1

E X

⁴_j

= 0,

where X

j

= ³ θ + b ³

log

_kⁿ_n

´ x

j

´³ 1 −

^x^j⁻_µ^x₂^kn

x

kn

´ ( f

j

− 1), j = 1, ..., k

n

and s

²_k

n

= P

kn

j=1

VarX

j

. We remark that

s

²_k_n

∼ θ

²

kn

X

j=1

Ã

1 − x

j

− x

kn

µ

2

x

kn

!

2

as n → ∞

and X

kn

j=1

E X

⁴_j

∼ 9θ

⁴

kn

X

j=1

Ã

1 − x

j

− x

kn

µ

2

x

kn

!

4

as n → ∞ from which we deduce by direct computations that

1 s

⁴_k

n

kn

X

j=1

E X

⁴_j

∼ 9 k

n

µ

⁴₂

+ 6(x

kn

)

²

µ

³₂

− 4(x

kn

)

³

µ

2

µ

3

+ (x

kn

)

⁴

µ

4

[µ

²₂

+ (x

kn

)

²

µ

2

]

²

∼ 9C

4

k

n

by Lemma 5. Our Theorem 3 now follows from the fact that s

²_k_n

∼ θ

²

k

n

log

²

(n/k

n

).

t

u

(15)

References

[1] Abramowitz, M., Stegun, I., (1972), Handbook of Mathematical Functions, Dover.

[2] Beirlant, J., Bouquiaux, C., Werker, B., (2005), Semiparametric lower bounds for tail index estimation, Journal of Statistical Planning and Inference, to appear.

[3] Beirlant, J., Broniatowski, M., Teugels, J.L., Vynckier, P., (1995), The mean residual life function at great age: Applications to tail estimation, Journal of Statistical Planning and Inference, 45, 21–48.

[4] Beirlant, J., Dierckx, G., Goegebeur, Y., Matthys, G., (1999), Tail index estimation and an exponential regression model, Extremes, 2, 177–200.

[5] Beirlant, J., Dierckx, G., Guillou, A., Starica, C., (2002), On exponential representa- tions of log-spacings of extreme order statistics, Extremes, 5 (2), 157–180.

[6] Broniatowski, M., (1993), On the estimation of the Weibull tail coe ffi cient, Journal of Statistical Planning and Inference, 35, 349–366.

[7] Feuerverger, A., Hall, P., (1999), Estimating a Tail Exponent by Modelling Depar- ture from a Pareto Distribution, Annals of Statistics, 27, 760–781.

[8] Gardes, L., Girard, S., (2005), Estimating extreme quantiles of Weibull tail- distributions, Communication in Statistics - Theory and Methods, 34, 1065-1080.

[9] Geluk, J.L., de Haan, L., (1987), Regular Variation, Extensions and Tauberian Theorems, Math Centre Tracts, 40, Centre for Mathematics and Computer Science, Amsterdam.

[10] Girard, S., (2004), A Hill type estimate of the Weibull tail-coe ffi cient, Communication in Statistics - Theory and Methods, 33(2), 205–234.

[11] Hill, B.M., (1975), A simple general approach to inference about the tail of a distribution, Annals of Statistics, 3, 1163–1174.

[12] Shorack, G.R., Wellner, J.A., (1986), Empirical Processes with Applications to Statistics,

Wiley New York.

(16)

θ b(x) ρ Gaussian N (µ, σ

²

) 1/2 1

4 log x

x − 1

Gamma Γ (α , 1, β) 1 (1 − α) log x

x −1

Weibull W (α, λ) 1/α 0 −∞

Table 1: Parameters θ, ρ and the function b(x) associated to some usual distributions

(17)

0 40 80 120 160 200 240 280 320 360 0.9

1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

××

×

××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××

(a) Mean as a function of k

n

0 40 80 120 160 200 240 280 320 360

0 2 4 6 8 10 12 14 16 18 20

+

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

××

××××

×××××××××××××××××××××××××××××××××××××××××××××××××××××××

(b) Mean square error as a function of k

n

(18)

0 40 80 120 160 200 240 280 320 360 0.4

0.5 0.6 0.7 0.8 0.9 1.0 1.1

++++++++++++

+++++++++++++++++++++++++++++++++

+++++++++++++++++++++++++

♦

♦♦♦

♦♦♦♦♦♦

♦♦♦♦♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

××××××××

××××××××××××××××××××××××

×××××××××××××××××××××××××××××××××××

(a) Mean as a function of k

n

0 40 80 120 160 200 240 280 320 360

0 2 4 6 8 10 12 14 16

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

×××××

××××××××××××××××××××××××××××××××××××××××××××××××××××××××××××

(b) Mean square error as a function of k

n

(19)

0 40 80 120 160 200 240 280 320 360 0.4

0.5 0.6 0.7 0.8 0.9 1.0

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

××

×

×××

×

×××××

××××××××××××××××××××××××××××××××××××××××××××××××××××××××

(a) Mean as a function of k

n

0 40 80 120 160 200 240 280 320 360

0 2 4 6 8 10 12

+

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

××

×

×××××××××××××××

××××××××××××××××××××××××××××××××××××××××××××

(b) Mean square error as a function of k

n

(20)

0 40 80 120 160 200 240 280 320 360 3.5

3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5

+ +

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦♦

♦♦♦

♦♦♦♦♦

♦♦♦♦♦♦♦

♦♦♦♦♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

×××××

×

××××

×

×××

××××××

×××

×××××××××××××××××××

××××××××××××××××××××

(a) Mean as a function of k

n

0 40 80 120 160 200 240 280 320 360

0 1 2 3 4 5 6 7 8 9 10

+ +

+ ++++

++ +++++

+++++

+++++++++++++++++++++++++++++++

++++++++++++++++++

♦

♦♦♦♦

♦

♦♦♦♦♦♦

♦♦

♦♦♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦

♦♦♦♦♦♦♦♦♦♦♦♦♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

× ××××××

××

×

××

×××××

××

×××××××××

×××××××××××××

××××

(b) Mean square error as a function of k

n

(21)

0 40 80 120 160 200 240 280 320 360 0.20

0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦

♦♦♦♦♦♦♦♦♦♦

♦♦♦♦♦♦♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

××

×

××

×××××××

××××

×××××××××××××

××××××××××××××××××××××××××××××××××

(a) Mean as a function of k

n

0 40 80 120 160 200 240 280 320 360

0.0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0

+

++

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

♦

♦♦

♦♦♦♦♦♦

♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦♦

×

××

×××

××××××

××

×××××××

××××××××××

×××××××××××××××××××××××××××××××××××

(b) Mean square error as a function of k

n