• Aucun résultat trouvé

Estimation of multivariate critical layers: Applications to rainfall data

N/A
N/A
Protected

Academic year: 2022

Partager "Estimation of multivariate critical layers: Applications to rainfall data"

Copied!
31
0
0

Texte intégral

(1)

HAL Id: hal-00940089

https://hal.archives-ouvertes.fr/hal-00940089v3

Submitted on 11 Dec 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Estimation of multivariate critical layers: Applications to rainfall data

Elena Di Bernardino, Didier Rullière

To cite this version:

Elena Di Bernardino, Didier Rullière. Estimation of multivariate critical layers: Applications to rainfall data. Journal de la Societe Française de Statistique, Societe Française de Statistique et Societe Mathematique de France, 2015, 156 (1), pp. 11-50. �hal-00940089v3�

(2)

Estimation of multivariate critical layers:

Applications to rainfall data

Elena Di Bernardino

1

and Didier Rullière

†2

1Conservatoire National des Arts et Métiers, Département IMATH, EA4629, Paris Cedex 03, France.

2Université de Lyon, Université Lyon 1, ISFA, Laboratoire SAF, EA2429, 69366 Lyon, France.

December 9, 2014

Abstract

Calculating return periods and critical layers (i.e. multivariate quantile curves) in a multivariate envi- ronment is a difficult problem. A possible consistent theoretical framework for the calculation of the return period, in a multi-dimensional environment, is essentially based on the notion of copula and level sets of the multivariate probability distribution. In this paper we propose a fast and parametric methodology to esti- mate the multivariate critical layers of a distribution and its associated return periods. The model is based on transformations of the marginal distributions and transformations of the dependence structure within the class of Archimedean copulas. The model has a tunable number of parameters, and we show that it is possible to get a competitive estimation without any global optimum research. We also get parametric ex- pressions for the critical layers and return periods. The methodology is illustrated on rainfall 5-dimensional real data. On this real data-set we obtain a good quality of estimation and we compare the obtained results with some classical parametric competitors. Finally we provide a simulation study.

keywords: Multivariate probability transformations level sets estimation copulas hyperbolic conversion func- tions risk assessment multivariate return periods.

1 Introduction

1.1 Return Periods

The notion ofReturn Period (RP) is frequently used in environmental sciences for the identification of danger- ous events, and provides a means for rational decision making and risk assessment. Roughly speaking, the RP can be considered as an analogue of the “Value-at-Risk” in Economics and Finance, since it is used to quantify and assess the risk (see, e.g., Nappo and Spizzichino (2009)). In engineering practice, finance, insurance and environmental science the choice of the RP depends on the impact/magnitude of the considered event and the consequences of its realisation.

Equally important is the related concept of design quantile, usually defined as “the value of the variable char- acterizing the event associated with a given RP”. In the univariate case the design quantile is usually identified without ambiguity. Conversely in the multivariate setting different definitions are possible (seeSerfling(2002)).

For this reason, the identification problem of design events in a multivariate context has recently attracted the attention of many researches. The interested reader is referred for example to Embrechts and Puccetti (2006),Belzunce et al.(2007),Nappo and Spizzichino(2009), in the economics and finance context; toChebana and Ouarda(2009),Chebana and Ouarda(2011) (and references therein) in the hydrological context.

During the last years, researchers in environmental fields joined efforts to properly answer the following cru- cial question: “How is it possible to calculate the critical design event(s) in the multivariate case?” (see for instanceSalvadori et al.(2007)). In this sense, a possible consistent theoretical framework for the calculation of the design event(s) and the associated return period(s) in a multi-dimensional environment, is proposed, e.g., by Salvadori et al. (2011), Salvadori et al. (2012), Gräler et al. (2013). In particular the authors define the multivariate return period using the notion of upper and lower level sets of multivariate probability distribution

elena.di_bernardino@cnam.fr

didier.rulliere@univ-lyon1.fr

(3)

F and of the associated Kendall’s measure.

In the following, we will consider a sequence X = {X1,X2, . . .} of independent and identically distributed d−dimensional random vectors, with d > 1. Thus each Xk, k ∈ N, has the same multivariate distribution FX : Rd+→[0,1] as the nonnegative real-valued random vector X ∼ FX = C(FX1, . . . , FXd) describing the hydrological phenomenon under investigation. The functionCis thed-dimensionalcopula associated toF (see Nelsen (1999)). We write I = {1, . . . , d} the set of indexes of the considered random variables and of their associated cumulative distribution functions, i.e.,FXi(xi) =P(Xi≤xi), fori∈I.

In the following we will consider multivariate distribution functionsFX satisfying theseregularity conditions - for allu∈[0,1], the diagonal of the copulaC, i.e. C(u, . . . , u), is a strictly increasing function ofu;

- for anyi∈I, the marginal FXi is continuous and strictly monotonic distribution function.

In this setting, we introduce the notion of critical layer (see, e.g., Salvadori et al. (2011), Salvadori et al.

(2012),Gräler et al.(2013)).

Definition 1.1 (Critical layer). The critical layer ∂L(α) associated to the multivariate distribution function FXof level α∈(0,1) is defined as

∂L(α) ={x∈Rd:FX(x) =α}.

Then∂L(α)is the iso-hyper-surface (with dimensiond−1) whereF equals the constant value α. Thus,∂L(α) is a (iso)line for bivariate distributions, a (iso)surface for trivariate ones, and so on. The critical layer ∂L(α) partitionsRd into three non-overlapping and exhaustive regions:

L<(α) = {x∈Rd:FX(x)< α},

∂L(α) = the critical layer itself, L>(α) = {x∈Rd:FX(x)> α}.

Practically, at any occurrence of the hydrological considered phenomenon, only three mutually exclusive events may happen: either a realization of the considered hydrological event lies in one of these 3 Borel setsL<(α),

∂L(α), or L>(α).

In the applications, usually, the event of interest is of the type {X ∈ A}, whereA is a non-empty Borel set in Rd collecting all the values judged to be “dangerous” according to some suitable criterion. A natural choice forAis the setL>(α)(seeSalvadori et al. (2011), Gräler et al.(2013)). The first random indexN where Xk

reaches the set L>(α) isN = mink∈N{Xk ∈L>(α)}. AssumingP[X∈L>(α)]∈(0,1), one easily shows that N is a geometric random variable. The Return Period is defined as the average time required for reaching the setL>(α), that is:

RP>(α) = ∆t·E[N] = ∆t

P[X∈L>(α)], (1) where∆t>0is the (deterministic) average time elapsing betweenXkandXk+1,k∈N. The probability that a realization of this vector belongs toL<(α)is given by the Kendall’s function, which only depends on the copula C of this random vector, i.e.,

KC(α) =P

X ∈L<(α)

=P[C(U1, . . . , Ud)≤α], forα∈(0,1). (2) Then, the considered Return Period can be expressed using Kendall’s function in (2),RP>(α) = ∆t·1−K1

C(α). Obviously, Return Periods can naturally be associated to other sets thanL>(α), the interested reader is referred for example toSalvadori et al.(2011).

This paper aims at:

• giving a parametric representation of the multivariate distribution F of a random vector X, here repre- senting rain measurements (for applications see Section6),

• giving direct estimation procedure for this representation,

• giving closed parametric expressions, both for critical layers in Definition 1.1and Return Periods in (1),

• adapting this methodology to some asymmetric dependencies (as, for instance, non-exchangeable random vectors; for a possible investigation in this sense see Section 7).

In the next section, we introduce the model used to answer the issues introduced above.

(4)

1.2 The model

We consider the following model, which is detailed inDi Bernardino and Rullière(2013a),

F(xe 1, . . . , xd) =T◦C0(T1−1◦F1(x1), . . . , Td−1◦Fd(xd)), (3) or equivalently





F(xe 1, . . . , xd) = C(e Fe1(x1), . . . ,Fed(xd)), with C(ue 1, . . . , ud) = T ◦C0(T−1(u1), . . . , T−1(u1)) Fei(x) = T ◦Ti−1◦Fi(x), for i∈I,

(4)

where F1, . . . , Fd are given parametric initial marginal cumulative distribution functions, and where C0 is a given initial copula. Hence the distribution Fe(x1, . . . , xd) is built from transformed marginals Fei, i ∈ I and from a transformed copula C, under regularity conditions. Transformatione T permits to transform the initial dependence structure C0. For a given T, transformations Ti permit to transform marginals, i ∈I. All these transformations are described hereafter.

As we will see in the following, the initial copulaC0 in (3) is not estimated but it is chosen at the beginning of the estimation procedure. So it can represent some kind of a priori belief on dependence structure of the data or on the considered problem, that will be transformed in order to improve the fit.

Furthermore, as inDi Bernardino and Rullière(2013b), we will assume in the following thatC0is an Archimedean copula. This means that in this paper, we mainly consider copulas that can be written as

Cφ(u1, . . . , ud) =φ(φ−1(u1) +. . .+φ−1(ud)),

where the functionφis called the generator of the Archimedean copulaCφ. Some conditions like d−monotony are given in McNeil and Nešlehová (2009). Here we choose strict generator, i.e., φ(t) > 0,∀t ≥ 0 and

t→+∞lim φ(t) = 0, with proper inverse φ−1 such thatφ◦φ−1(t) =t.

The function T : [0,1] → [0,1] is a continuous and increasing function on the interval [0,1], with T(0) = 0, T(1) = 1, with supplementary assumptions that will be chosen to guarantee that Ce is also a copula (detailed hereafter). Internal transformations Ti : [0,1] → [0,1] are continuous non-decreasing functions, such that Ti(0) = 0, Ti(1) = 1, fori∈I. Conditions on transformations such thatCeis a copula are discussed for example inDurante et al.(2010),Di Bernardino and Rullière(2013a), Di Bernardino and Rullière(2013b).

Remark that among problems generated by transformations of Archimedean copulas, one can point out in par- ticular the problem of uniqueness. Transformations of a given initial copula leading to a given target copula are not unique. This raises some problems for the analysis of the convergence of estimators of the transforma- tion. This also causes problems to compare transformations and to understand their impact on the dependence structure. A further analysis shows that also a generator of an Archimedean copula is not unique, causing the same kind of problems. Then in Di Bernardino and Rullière (2013b), the definition of equivalence classes for both transformations and generators is provided to select some standardized forms for practical use, for the comparison and the interpretation of obtained distribution functions. Firstly equivalent classes for transforma- tions can be characterized. Furthermore one can ensures the uniqueness of the transformation T by passing through the point (x0, y0), among the invariant class for transformations (see Lemma 2.2 and Corollary 2.1 of the aforementioned paper). Obviously, in an iterative procedure of estimation, the uniqueness of the transfor- mationTis essential in order to permit the convergence of the procedure and the identifiability of the considered transformation model.

In the following we detail the proposed semi-parametric estimation procedure in order to easily fit the model in (3). We show in particular how to estimate the transformationsT andTi,i∈I.

Organization of the paper

The paper is organized as follows. In Section2we developed the estimation procedure for the chosen model in (3).

In Section2.1we focus on a central tool in our estimation procedure: the estimation of the diagonal section of a copula. This estimated diagonal section is used in the non-parametric estimation of the external transformation T and the internal transformationsTi(see Section2.2). The parametric estimation using composite hyperbolic transformations is detailed in Section 3. Then in Section 4 we propose the parametric form for the desired

(5)

quantities: the multivariate distribution function F, its critical layers ∂L(α) and the associated Kendall’s function KC(α), for α∈ (0,1). Finally, Section 6 is devoted to a detailed study of a5−dimensional rainfall data-set. Furthermore, a nested model is proposed for this5−dimensional data-set, using the whole estimation procedure presented in the paper (see Section7). Finally we provide a simulation study in Section8. Directions for future research and the conclusion are in Section9.

2 Nonparametric estimation

2.1 Initial diagonal section estimation

In order to give a non-parametric estimation of the external transformationT, we will propose a non-parametric estimator of the copula within the Archimedean class of copulas. Several non-parametric estimators of the generator of an Archimedean copula are available. One can cite for example those of Dimitrova et al.(2008) orGenest et al. (2011) both based on empirical Kendall’s function. Here, proposed estimations will rely on an initial estimation of the diagonal section of the empirical copula. Firstly, we propose some (classical) estimators of the diagonal sectionδ1 of a copula,

δ1(u) =C(u, . . . , u), u∈[0,1].

Remark that if (U1, U2, . . . , Ud) has as distribution function the copulaC then

P[U1≤u, . . . , Ud≤u] =C(u, . . . , u) =P[max{U1, U2, . . . , Ud} ≤u] =δ1(u).

As a consequence, estimators based on the diagonal section of a copula are relying on the distribution of the maximum ofU1, . . . , Ud, as explained inSungur and Yang(1996). As pointed out inDi Bernardino and Rullière (2013b), the diagonal of an Archimedean copula, under some suitable conditions, is essential to describe the copula. So, in the following we recall important assumptions (which are fulfilled for many Archimedean copulas, including the independent copula) for the unique determination of an Archimedean copulas starting from its diagonal section (see, for instance,Erdely et al.(2014) and references therein). Some constructions of copulas starting from thediagonal section are given for example inNelsen et al.(2008) andWysocki(2012).

Proposition 2.1(Identity of Archimedean copulas, Theorem 3.5 byErdely et al.(2014)). LetCad−dimensional Archimedean copula whose diagonal sectionδ1 satisfiesδ10(1) =d. ThenC is uniquely determined by its diag- onal.

Condition in Proposition2.1is referred to asFrank’s condition in Erdely et al.(2014) (see their Theorems 1.2 and 3.5). Note that if|φ0(0)|<+∞then the condition on the diagonal in Proposition2.1is automatically sat- isfied. In this case, the functionφcan be reconstructed from the diagonalδ(see alsoSegers(2011)). As pointed out byEmbrechts and Hofert(2011) a possible limitation is that ifφhas finite right-hand derivative at zero, the Archimedean copula generated byφhas upper tail independent structure. To show that the situation of many Archimedean copulas having the same diagonal is far from exceptional, a recipe to construct further examples is given inSegers(2011). For further details the interested reader is referred to Section 3 inDi Bernardino and Rullière(2013b).

It has been shown however that if one aims at fitting both lower and upper tail dependence, then some specific dependence structures can be suited to this purpose, see Di Bernardino and Rullière(2014). Here we will not focus on very extreme tail behavior, but on the main part of the distribution. Using the necessary few data in the tails would require some specific estimators of tail dependence coefficients, as detailed in above mentioned article.

In the following we present different estimation of the diagonal section. Some empirical copula estimators for δ1are given inDeheuvels(1979). A comparison between several more recent estimators is presented inOmelka et al.(2009).

Empirical diagonal

We present here an estimator that is detailed in Omelka et al.(2009), and directly inspired by the one of De- heuvels (1979). Consider observations {X(k) = (X1(k), . . . , Xd(k))}k∈{1,...,n} of the random vector X. Define pseudo-observationsU(k)= (U1(k), . . . , Ud(k))by setting every componentifor observation numberk to

Ui(k)= 1 n+ 1

n

X

j=1

1n

Xi(j)≤Xi(k)o,

withi∈I,k∈ {1, . . . , n}. One can check that for any i∈I,k∈ {1, . . . , n},Ui(k)∈(0,1).

(6)

The empirical estimatorCb of the copula of vectorX, is

C(ub 1, . . . , ud) = 1 n

n

X

k=1

1n

U1(k)≤u1,...,Ud(k)≤ud

o.

We thus obtain for anyu∈[0,1], the empirical estimation for the diagonal ofC and its inverse, i.e., (

δb1emp(u) = C(u, . . . , u)b , δb−1emp(u) = arginfn

x∈[0,1]; δb1emp(x)≥uo .

Smooth empirical diagonal

We do not present here all possible smooth estimators of a copula and the associated smoothed estimates of the diagonal section. We will restrict ourselves in estimators based on transformations, in the same spirit as the other parts of this paper. These estimators perform reasonably well (see Omelka et al. (2009)), especially considering Cramér-von Mises distance.

Using a smooth estimation of the empirical cumulative distribution of U could create some bias since the distribution support must be[0,1], this is the reason why we consider a transformation of pseudo-observations.

For given smoothing coefficientsh1, . . . , hd, we can defineL(k)= (L(k)1 , . . . , L(k)d )where L(k)i =G−1(Ui(k)),

with i ∈ I, k ∈ {1, . . . , n}, and where G−1 is the inverse of a cumulative distribution function, for example G−1(x) = logit(x), and one can check that((G)0(x))2/G(x)is bounded, which is a required condition detailed inOmelka et al.(2009).

A smooth version ofCb can be defined by:

C(u1, . . . , ud) = 1 n

n

X

k=1

Y

i∈I

K G−1(ui)−L(k)i hi

! ,

whereK is a suited kernel function (we took here a multiplicative multivariate kernel).

We thus obtain for any u∈[0,1], a smooth estimator bδof the diagonal section of C, and a smooth estimator bδ−1 of the inverse function of this diagonal section:

(

δb1(u) = C(u, . . . , u) δb−1(u) = arginfn

x∈[0,1];bδ1(x)≥uo

. (5)

Note that initial pseudo-data can also be smoothed, by defining U(k)i = 1

n+ 1 X

j∈{1,...,n}

Φ Xi(k)−Xi(j) hi

! .

Discussions on this last choices can be found inChen and Huang(2007) andOmelka et al.(2009).

A discussion on the possible choices forh1, . . . , hd is given inChiu(1996) (ford= 1),Wand and Jones(1993), Wand and Jones(1994),Zhang et al.(2006) (ford≥2) and references therein. However, according toOmelka et al.(2009), “a good bandwidth selection rule is missing, for the moment, and is subject of further research.”

A summary of the needed input parameter and the estimation ofδ1 andδ1−1 is given in Algorithm1.

Algorithm 1Smooth initial diagonal estimation Input parameters

ChooseG−1 inverse of a cdf,e.g., G−1(x) = logit(x)

Choosehi,i∈I bandwidth sizes,e.g., the ones proposed by the Silverman’s Rule of Thumb Estimation

For anyx∈[0,1], getbδ1(x)andδb−1(x)by Equation (5)

(7)

2.2 Nonparametric estimation of transformations T and T

i

, i ∈ I

A non parametric estimator of the external transformation T in (3) is given in Di Bernardino and Rullière (2013b), forx∈(0,1),by

Tb(x) =δbr(x)(y0), (6)

withr(x)such thatδ0r(x)(x0) =x,

wherebδr refers to the estimator of the self-nested diagonalδr of the target copulaC, andδr(x)0 (x0)refers to the self-nested diagonal of the initial copulaC0. These estimatorsδbrand self-nested diagonals are defined hereafter.

In the case where the initial copulaC0 is an Archimedean copula of generatorφ0, then δr(x)0 (x0) =φ0

dr(x)φ−10 (x0)

and r(x) = 1 lndln

φ−10 (x) φ−10 (x0)

.

In particular, if C0 is the independence copula, with generator φ0(x) = exp(−x), and setting for example x0=y0= exp(−1), then

Tb(x) =δbln(−ln(x))/lnd(e−1).

Let bδ1 be an estimator of δ1, and bδ−1 be an estimator of the inverse function δ−1 as in Equation (5). At a relative integer orderk∈Z, the self-nested diagonals estimators are defined as





k(u) = bδ1◦. . .◦bδ1(u), (ktimes), k∈N bδ−k(u) = bδ−1◦. . .◦δb−1(u), (ktimes), k∈N bδ0(u) = u.

(7)

At any real orderr∈R, an estimator bδr of the self-nested diagonal is bδr(x) =z

z−1◦bδk(x)1−α

z−1◦bδk+1(x)α

, x∈[0,1]

with α=r− brcand k=brc, where brcdenotes the integer part ofr, and where z is a function driving the interpolation, ideally (if known) the generator of the considered copulaC. Some consistency results about this estimatorTbare detailed inDi Bernardino and Rullière (2013b).

For invertible marginal transformationsTi, one easily shows

Ti=Fi◦Fei−1◦T, i∈I. (8)

Then, by replacing the transformed marginal distributionFei in (8) by an estimator of the targeti-th marginal distribution, denoted byFbi (e.g., empirical cdf), and given the external transformation or its consistent estima- tionTb, a non-parametric estimation ofTi, fori∈I, is provided by

Tbi(u) =Fi◦Fbi−1◦Tb(u), u∈(0,1), (9) whereFi is the choseni−th initial marginal.

Following Definition2.1summaries the expression of non parametric estimators for bothT and Ti,i∈I.

Definition 2.1 (Non-parametric estimators of T and Ti). For a given arbitrary couple (x0, y0) ∈ (0,1)2, a non-parametric estimator of T is given by

Tb(x) =bδr(x)(y0), for allx∈(0,1) with r(x)such that δr(x)0 (x0) =x,

where δr(x)0 refers to the self nested diagonal of the initial copula C0. In particular, if the initial copula C0 is the independence copula, r(x) = ln1dln

lnx

lnx0

. For anyi∈I, non-parametric estimatorsTi are

Tbi(x) =Fi◦Fbi−1◦Tb(x).

A summary of the steps for the non-parametric estimation ofT andTi is given in Algorithm2.

(8)

Algorithm 2Non-parametric estimation ofTbandTbi

Input parameters

Choose(x0, y0)in (0,1)2, initial copulaC0 and initial marginalsFi, fori∈I Choosekmaxpre-computation range

Eventual pre-calculations Getbδ1and bδ−1 by Algorithm1

Calculate and storebδk(y0),k∈ {−kmax, . . . ,0, . . . , kmax}by Equation (7) Non-Parametric estimation

GetTbby Equation (6), using previous precomputations GetTbi by Equation (9)

2.3 Subset of points of transformations

In practice, we will propose in Section 3 some parametric estimators for the transformations T and Ti, by requiring that these transformations are passing through a finite set of points. The interested reader is also referred toDi Bernardino and Rullière(2013a). We aim here at determining some good set of points to be chosen.

Firstly, we define some sets of points to be chosen, starting from given sets of quantile levels. The chosen expressions for these sets is motivated by several interesting properties (see Proposition2.2below).

Definition 2.2 (Set Ω). Let J ⊂ N be a finite set of indexes and Q = n qj(T)o

j∈J be a given set of targeted percentiles,q(Tj )∈(0,1),j∈J. One definesΩ(Q) ={(αj, βj}j∈J, with

(

αj = q(Tj ) βj = Tb(αj),

whereTbis the estimator in Equation (6).

Definition 2.3 (Sets Ωi, i ∈ I). Let Ji ⊂ N be finite sets of indexes and Qi = n q(i)j o

j∈Ji

be finite given sets of targeted percentiles, q(i)j ∈ (0,1), j ∈ Ji, i ∈ I. For a given transformation τ, one defines the sets Ωi(Qi, τ) =

ji, βji)

j∈Ji where

( αij = τ−1(qj(i)),

βij = Fi◦Fbi−1(qj(i)), (10)

whereFbi−1 is an estimator of the target i-th marginal distribution, denoted byFbi.

Given Ω and Ωi, for i ∈I, some set of points as in Definitions 2.2 and 2.3, we firstly derive some properties of some parametric transformations T and Ti, i∈ I that are chosen to pass through these sets Ω and Ωi, for i∈I (see Proposition2.2 below). Theses properties justify the choice of sets Ωand Ωi in our estimation procedure.

Proposition 2.2 (Set of points forTi). Assume that estimators Tb andFbi are given continuous and invertible functions. LetJi⊂N be finite sets of indexes andQi =n

q(i)j o

j∈Ji

be a finite given set of targeted percentiles, q(i)j ∈(0,1),j∈Ji,i∈I. Then for transformationsTbi defined in Equation (9),i∈I

Tbi is passing through all points ofΩi(Qi,T),b

where Ωi(Qi,Tb) is defined as in Definition 2.3. Furthermore, for any invertible transformation Ti passing through all points of Ωi(Qi,Tb),i∈I

Fe−1

i (q) =Fbi−1(q)for allq∈ Qi, (11) whereFei =Tb◦T−1

i ◦Fi is the transformed marginal distribution which usesTi.

(9)

Proof: Leti ∈ I, for any q ∈ Qi, one can define (α, β) ∈Ωi(Qi,T)b by Equation (10), with α=Tb−1(q) and β=Fi◦Fbi−1(q). We assume here that all estimators are continuous and invertible functions, so that for example Tb◦Tb−1=Id.

Firstly, one can check that Tbi(α) = Tbi ◦Tb−1(q) and from Equation (9), Tbi(α) = Fi◦Fbi−1◦Tb◦Tb−1(q) = Fi◦Fbi−1(q) =β, so thatTbi(α) =β and the first result holds.

Secondly, assuming all estimators are invertible, one can write Fe−1

i = Fi−1◦Ti ◦Tb−1, so that Fe−1

i(q) = Fi−1◦Ti(α). By assumptionTi(α) =β since Ti is passing through all points ofΩi(Qi,Tb), so that finally Fe−1

i (q) =Fi−1(β) =Fi−1◦Fi◦Fbi−1(q)andFe−1

i (q) =Fbi−1(q)which gives the second result.

Once chosen the thresholdsqj(i), j ∈Ji, for which we want to identify the transformed margins with targeted margins, one thus get a finite set of passage points forTi. Indeed, Proposition 2.2 shows that, usingΩi from Definition 2.3, one can select a further parametric estimator Ti passing trough points of Ωi and such that quantiles of the targetFbiare identified with quantiles of the transformed margin usingTi (see Equation (11)), for each quantile levelq∈ Qi and for any i∈I. Figure1illustrates this identification of quantile levels.

0.0 0.5 1.0 1.5

0.00.20.40.60.81.0

Univariate distribution F

x

F(x)

Figure 1: Graphical illustration of procedure described in Proposition2.2, ford= 1. Here an univariate data-set and the setQ={0.1,0.5,0.9}are chosen. The black line represents the empirical cdf of data, Fb, and the red one the cdfFe using the procedure in Proposition 2.2. As proved before, the quantiles ofFb are identified with quantiles of the obtained transformedFefor each quantile levelq∈ Q(black points and associated dotted lines).

Such a result would be difficult to establish for the external transformation T. A weaker form is given by following Proposition2.3.

Proposition 2.3 (Set of points forT). LetJ ⊂Nbe a finite set of indexes andQ=n qj(T)o

j∈J be a given set of targeted percentiles,qj(T)∈(0,1),j∈J. DefineΩ(Q)as in Definition2.2. Then obviously

Tb is passing through all points ofΩ(Q).

Furthermore, ifT is another transformation passing through all points ofΩ(Q), then

max

jj)∈Ω

C(βb j, . . . , βj)−Cej, . . . , βj) ≤ sup

u∈[0,1]

Tb(u)−T(u)

where Ce(u1, . . . , ud) = T◦C0(T−1(u1), . . . , T−1(ud)) is the transformed copula using transformation T, Cb(u1, . . . , ud) =Tb◦C0(Tb−1(u1), . . . ,Tb−1(ud)) is the transformed copula using the full non-parametric trans- formationTband C0 is the initial copula.

Proof: From Definition2.2, it is obvious thatTbis passing through all points ofΩ(Q). For any(αj, βj)∈Ω(Q), Cej, . . . , βj) =T◦C0(T−1j), . . . , T−1j))andCbj, . . . , βj) =Tb◦C0(Tb−1j), . . . ,Tb−1j)). Since both

(10)

Tband T are passing trough all points of Ω(Q), it follows that T−1j) =Tb−1j) =αj. As a consequence,

C(βb j, . . . , βj)−Cej, . . . , βj) =

T(vb j)−T(vj)

, withvj=C0j, . . . , αj),vj ∈[0,1]. Hence the result.

Proposition 2.3gives an interpretation of the set Qdefined in Definition 2.2. Indeed,Q can be interpreted as a set of percentiles for which we want to minimize the difference between diagonals of the transformed non- parametric copula and a transformed parametric copula using a transformationTinstead ofTb.

Once obtained these finite sets of reaching points forTbi andT, one can find parametric estimators without anyb optimization procedure. A summary of the estimation procedure for obtaining piecewise linear estimators ofT andTi,i∈I, is given in Algorithm 3.

Algorithm 3Subsets from non-parametric estimation Input parameters

ChooseQandQi,i∈I, initial set of quantile levels,e.g. Q=Q1=. . .=Qd={0.25,0.5,0.75}.

Subsets of points

GetTˆ andTˆi,i∈I, by Algorithm2 GetΩ(Q)by Definition2.2

GetΩi(Qi,T)b by Definition2.3

We have seen that, starting from given thresholds q(i)j ∈ (0,1), it was possible to propose some set of points Ωi,i∈I and Ω, as in Proposition 2.2and2.3, such that eachTbi is passing trough points of Ωi, and each Tb is passing trough points ofΩ. Starting from these considerations, one can also introduce in the following definition for some piecewise linear estimators ofT andTi,i∈I.

Definition 2.4 (Piecewise linear estimators of T and Ti). One can define two piecewise linear estimators of the external and internal transformationsT andTi,i∈I.

• TbPL is defined as a piecewise linear function passing trough the points(0,0), all points ofΩ(Q), and(1,1), where Ω(Q) is given in Definition2.2.

• for eachi∈I,TbiPL is defined as a piecewise linear function passing through the points (0,0), all points of Ωi(Qi,TbPL), and(1,1), whereΩi(Qi,TbPL),i∈I is given in Definition 2.3.

Piecewise linear estimatorsTbPLandTbiPL,i∈I, are estimators relying on a chosen finite number of parameters.

However, these estimators are not differentiable everywhere on (0,1). In the following, we will try to find differentiable parametric estimators passing through the points of considered subsetsΩandΩi,i∈I.

3 Parametric estimation

3.1 Composite hyperbolic transformations

As initially defined in Bienvenüe and Rullière (2011) and Bienvenüe and Rullière (2012) we recall here the definition of hyperbolic composite transformations.

Definition 3.1 (Conversion and transformation functions). Letf any bijective increasing function,f :R→R. It is said to be a conversion function. Furthermore the associated transformation function T tof is defined by:

Tf : [0,1]→[0,1]such that

Tf(u) =

0 if u= 0,

logit−1(f(logit(u))) if 0< u <1,

1 if u= 1.

Remark that Tf is a continuous non-decreasing function, such that Tf(0) = 0, Tf(1) = 1. Furthermore we remark that the transformation functions in Definition 3.1are chosen in a way to be easily invertible. In par- ticular in a way such that Tf ◦Tg =Tf◦g, Tf−1 =Tf−1. When f is easily invertible, these readily invertible transformations help sampling transformed distributions (seeBienvenüe and Rullière(2011)).

In this section we consider the following particular class of hyperbolic conversion function (for further details seeBienvenüe and Rullière(2012)).

(11)

Definition 3.2 (A class of hyperbole). The considered hyperboleH is

Hm,h,ρ12(x) =m−h+ (eρ1+eρ2)x−m−h

2 −(eρ1−eρ2) s

x−m−h 2

2

+eη−ρ1 +2ρ2,

withm, h, ρ1, ρ2∈R, and one smoothing parameterη∈R. After some calculations, one can check that

Hm,h,ρ−1

12(x) = Hm,−h,−ρ1,−ρ2(x).

FunctionsH in Definition3.2are thus readily invertible: a simple change of the sign of some parameters leads to the inverse function. As a consequence, transformationsTHbased on conversion functionsH will also be readily invertible. In the following, we consider the generic hyperbolic conversion function defined in Definition 3.2.

First remark that when the smoothing parameterη tends to−∞, the hyperboleH tends to the angle function:

Am,h,ρ12(x) =m−h+ (x−m−h) eρ11{x<m+h}+eρ21{x>m+h}

. (12)

As remarked in Bienvenüe and Rullière (2011), it thus appears that hyperbolic transformations have the ad- vantage of being smooth versions of angle functions. They show in their paper that initial parameters for the estimation are easy to obtain with angle compositions. Another advantage of the consider hyperbolic transfor- mations is the flexibility in the tail parameters estimation. We discuss this point in Remark1.

Remark 1 (Tail behavior). In Di Bernardino and Rullière (2014), it is shown that for the hyperbolic trans- formations defined above (see Definition 3.2) and starting from some particular initial Archimedean copulas C0, it is possible to produce Archimedean copulas having tunable regular variation properties, and thus to get specific targeted multivariate lower and upper tail coefficients. Indeed using the hyperbolic conversion functions H, the aforementioned paper proposes a generic way to construct families of Archimedean generators presenting a chosen couple of lower and upper tail coefficients. This construction is based on the fact that the conversion function H in Definition3.2 has an asymptote ax+b at−∞ with a= eρ1, and an asymptote αx+β at+∞

with α= eρ2. Illustrations proposed in Section 4 in the above mentioned article show that, when fitting some data, it is thus possible to propose a fit that respects some estimated tail dependence coefficients, by deducing parametersρ1 andρ2 from tail coefficients and by estimating other parametersm,handη.

For sake of clarity, we recall below some definitions inDi Bernardino and Rullière(2013a). These definitions of composite transformations and suited parameters will be useful in the following.

Definition 3.3 (Composite transformations). Let k ∈ N. Consider η ∈ R and a given parameter vector θ = (m, h, ρ1, ρ2, a1, r1, . . . , ak, rk) if k ≥ 1, or θ = (m, h, ρ1, ρ2) if k = 0. We define the angle composite transformationAθ as:

Aθ=Tfθ, withfθ=

Aak,0,0,rk◦ · · · ◦Aa1,0,0,r1◦Am,h,ρ12 if k≥1,

Am,h,ρ12 if k= 0,

and the hyperbolic composite transformationHθ,η as:

Hθ,η=Tfθ,η, withfθ,η=

Hak,0,0,rk◦ · · · ◦Ha1,0,0,r1◦Hm,h,ρ12 if k≥1,

Hm,h,ρ12 if k= 0,

withAm,h,ρ12 as in Equation (12)andHm,h,ρ12 as in Definition3.2.

Remark 2 (Inverse composite transformations). Let k ∈ N. Consider η ∈ Rand a given parameter vector θ= (m, h, ρ1, ρ2, a1, r1, . . . , ak, rk)ifk≥1, orθ= (m, h, ρ1, ρ2)ifk= 0. SinceTf−1=Tf−1, the angle composite transformationA−1θ is such that:

A−1θ =Tfθ, withfθ=

Am,−h,−ρ1,−ρ2◦Aa1,0,0,−r1◦ · · · ◦Aak,0,0,−rk ifk≥1,

Am,−h,−ρ1,−ρ2 ifk= 0.

The hyperbolic inverse composite transformationHθ,η is such that:

H−1θ,η=Tfθ,η, withfθ,η=

Hm,−h,−ρ1,−ρ2◦Ha1,0,0,−r1◦ · · · ◦Hak,0,0,−rk ifk≥1,

Hm,−h,−ρ1,−ρ2 ifk= 0.

(12)

Definition 3.4 (Suited parameters from Ω). Let k ∈ N. Consider one given setΩ = {ω1, . . . , ω3+k}, ωj ∈ (0,1)2. Denote by uj and vj the two respective components of each ωj in the logit scale, such that ωj = (logit−1uj,logit−1vj),j∈ {1, . . . ,3 +k}. Assume thatuj andvj are increasing sequences ofj. We define:

Θ(Ω) =

(m, h, ρ1, ρ2, a1, r1, . . . , ak, rk) if k≥1, (m, h, ρ1, ρ2) if k= 0, where m= u2+v2 2, h= u2−v2 21 = ln

v2−v1

u2−u1

, ρ2 = ln

v3−v2

u3−u2

, rk = lnv

3+k−v2+k u3+k−u2+k

u2+k−u1+k v2+k−v1+k

, ak =v2+k, k≥1.

Proposition 3.1(Suited composite transformations). Let k∈N. Consider one given set Ω ={ω1, . . . , ω3+k}, ωj ∈(0,1)2 and a smoothing parameterη∈R. Setθ= Θ(Ω), then

- the transformationAθ(x) is piecewise linear in the logit scale and will be called logit-piecewise linear. It links point(0,0), points ofΩ, and point(1,1).

- the transformationHθ,η converges pointwise toAθ asη tends to −∞. It results that the continuous and differentiable transformationHθ,η can fit as precisely as desired the set of points Ωwhenη tends to −∞.

Proof: The first result is proved in Bienvenüe and Rullière (2011). It simply comes from the fact that AΘ(Ω)(uj) = vj for all j ∈ {1, . . . ,3 +k}, where ωj = (logit−1uj,logit−1vj). The convergence of the hyper- bole composite transformation toward the angle composite transformation is straightforward and also evoked inBienvenüe and Rullière (2011).

3.2 Smoothed parametric estimators

For the estimation of transformationsT andTi, we assume that are given:

• Initial estimators ofδandδ−1 (see Section2.1).

• The sets of quantile levels Q=n qj(T)o

j∈J,Qi=n qj(i)o

j∈Ji,i∈I (see Section2.3).

• Some smoothing parametersη∈Randηi∈R,i∈I (see Section3.1).

In the following we introduce the smooth parametric estimators T and Ti, i ∈ I, for external and internal transformations respectively.

Estimation ofT

Using estimators of δandδ−1 (see Section2.1), one easily gets the resulting set Ω(Q)by Proposition2.3 and suited parameters by Definition3.4:

θb= Θ(Ω(Q)).b (13)

Then one defines: T =H

bθ,η. Estimation ofTi, i∈I

OnceTestimated byT, one gets the passage setΩi, fori∈Iby Proposition2.2and suited associated parameters θi by Definition3.4, i.e.,

θbi= Θ(bΩi(Qi, T))). (14)

Note that once given thresholds setsQ andQi and smooth parametersη andηi, all estimated parameters are directly and analytically defined, so that we do not need here any inversion or optimization procedure.

Then one defines: Ti=H

bθii, fori∈I.

A summary of the expressions of smooth estimatorsT andTi,i∈I, is given in following definition.

Definition 3.5 (Smooth estimation ofT andTi, i∈I). Let Q andQi, i∈I be given sets of quantile levels.

Let η∈Randηi∈R,i∈I be given smoothing parameters. One defines ( T = H

θ,ηb , Ti = H

θbii, i∈I, (15)

where parameters bθandθbi are given by (

θb = Θ(bΩ(Q))),

θbi = Θ(bΩi(Qi, T)), i∈I.

The setsΩandΩi are defined in Definitions 2.2and2.3. The function Θis defined in Definition3.4.

(13)

Estimation procedure for obtaining smoothed parametric estimatorsT and Ti, i∈I is gathered in Algorithm 4.

Algorithm 4Parametric estimation Input parameters

ChooseQandQi,i∈I the sets of quantile levels,e.g. Q=Q1 =. . .=Qd={0.25,0.5,0.75}, Choose smoothing parametersη∈Randηi ∈R,i∈I,

Parametric estimation

GetΩ(Q))b andΩbi(Qi, T),i∈I by Algorithm3 Getθbby Equation (13)

Getθbi by Equation (14)

Get smooth estimatorsT andTi,i∈Iby Equation (15).

One can define the complete vector parameter presented in Algorithms3 and4:

Θ

ΘΘ = (bθ1, . . . ,θbd,θ, ηb 1, . . . , ηd, η).

What is noticeable here is that given thresholdsQ,Qi and smoothing parametersη,ηi, one have direct expres- sions for the parametric estimators θband θbi,i∈I. The estimation can also change, depending on some other estimation choices (generator among its equivalence class(x0, y0), bandwidthshi, i∈I or kernel functionK), but one aims here at finding good estimators whatever the choice of any reasonable value of these parameters.

In numerical Section6we will illustrate this point. Finding a global optimum in high dimension would lead to a curse a dimensionality and numerical problems. As a consequence, it is very important to get correct estimators without jointly optimizing a lot of parameters in high dimension.

4 Final parametric results

Once all parameters estimated as in Section3, the previous parametric model allows to get various analytical results for both the multivariate distribution function, its associated critical layers, Kendall’s function and mul- tivariate return periods.

Firstly we obtain the parametric expression for the transformed copulaC:e

C(ue 1, . . . , ud) =φ(ee φ−1(u1) +. . .+φe−1(ud)), (16) where φe is the final parametric transformed Archimedean generator (see (4)). This generator can be easily written in terms of the external transformationT, i.e.,

φ(t) =e T(φ0(t)), (17)

whereφ0is the generator associated to the initial copulaC0in (4) andTis the smooth estimator of the external transformation presented in Section3.2.

Using model in (4), the corresponding estimated transformed multivariate distribution is given by:

FeΘΘΘ(x1, . . . , xd) =H

θ,ηb ◦C0(H−1

θb11

◦F1(x1), . . . ,H−1

bθdd◦Fd(xd)), (18) whereΘΘΘis the complete estimated vector parameter presented in Algorithms3 and4.

Furthermore, using expression in (18) for the estimated transformed multivariate distributionFe, the associated parametricαcritical-layers are given by

∂LeΘΘΘ(α) ={(F1−1◦ H

θb11(u1), . . . , Fd−1◦ H

θbdd(ud)),(u1, . . . , ud)∈(0,1)d, C0(u1, . . . , ud) =H−1

θ,ηb (α)}, where a direct analytic expression H−1

θ,ηb is given by Remark 2 (see also Proposition 2.4 in Di Bernardino and Rullière(2013a)).

As presented in the introduction, we aim at providing a parametric estimation of the multivariate Return Period

(14)

RP>(α) = ∆t·1−K1

C(α).

Genest and Rivest (2001) obtained the following explicit expression for the Kendall distribution in the case of multivariate Archimedean copulas with a given generatorϕ, i.e.,

KC(α) =α+

d−1

X

i=1

1

i! −ϕ−1(α)i

ϕ(i) ϕ−1(α)

, forα∈(0,1), where the notationf(i)corresponds to thei−th derivatives of a functionf.

Consider the case of transformed generatorφ(t) =e T(φ0(t)), in (16)-(17). Then, the analytical formula for the estimated transformed Kendall distributionKeC can be easily written as:

KeC(α) = α+

d−1

X

i=1

1 i!

−φ−10 (T−1(α))i

T◦φ0

(i)

φ−10 (T−1(α)

, forα∈(0,1). (19) Remark that the hyperbolic transformationT =H

θ,ηb ,in this paper are chosen in a way to be easily invertible.

Then estimated smooth inverse transformationT−1 in (19) is straightforwardly obtained by changing the signs of some parameters of the hyperbolic composition (see Remark2).

5 Comprehensive scheme of the estimation procedure

The method presented in this paper involves different notations progressively introduced in different sections.

Moreover, the final scheme of the procedure relies in many tuning parameters. To improve clarity, a comprehen- sive scheme of the estimation procedure is presented below (see Table1). The relationship between the different algorithms presented above is pointed out using arrows and implications. Moreover, the nature of consid- ered parameters is specified, in particular, we distinguish between tuning parameters and estimated/calculated quantities.

As one can see in Table 1 (left column), the algorithm depends on many tuning parameters, we discuss here the sensitivity of the results on these choices. The results expressed below refer to variations of mean absolute errors and are based on our simulation study (see Section8, casen= 1000, using as a referencex0=y0= e−1, Q=Q1=Q2={0.25,0.5,0.75},η =−1, η12=−3, with very small bandwidths).

• We always used in our numerical illustrations (see Sections 6, 7 and 8), G−1 = logit(x) in order to simplify inversion procedures and C0 the independent copula. We believe that these choices might have more important implications for the tails. A deep analysis in this sense is developed inDi Bernardino and Rullière(2014).

• The use of positive bandwidth sizes hi ensures continuity and possible inversion of fitted copula diagonal section and marginals distributions. Using small bandwidth sizes may increase the variance of some non-parametric estimators, but in practice this leads to estimate parametric copula and margins using empirical diagonal and empirical margins instead of smoothed ones. The impact is thus limited on the parametric estimation. As an example in further simulation study (see Section 8), using a bandwidth given by classical rule of thumbor dividing this bandwidth by100leads to an absolute variation of0.6%

on resulting mean absolute errors.

• Concerning the choice of the point(x0, y0), we observed absolute variations of order4%in mean absolute errors compared to the choice ofx0=y0= e−1(which simplifies some calculations).

• Concerning the choice of quantile levels set Q, Qi, for i∈ I, using regular thresholds Q=Q1 =Q2 = {0.25,0.5,0.75}, without any optimization, leads to mean absolute error of M AE = 3.6% (instead of approx. 1%for optimized thresholds), which is still reasonable for non optimized parameters.

• At last, choices of η and ηi, i ∈ I, mostly depend on the desired smoothness of the final resulting distribution. Except for clearly excessive smoothing, the impact on resulting distribution is quite small.

As an example on simulated data of Section 8, using η = 0 and η1 = η2 = −2 or using η = −2 and η12=−4 leads to absolute variations of0.5%on resulting mean absolute errors.

(15)

Scheme of the estimation procedure Chosen tuning param-

eters in Algorithm Calculated quantities

1) G−1 (inverse of a cdf ) 2) hi,iI bandwidth sizes

3) Point(x0, y0)in(0,1)2 4) Initial copulaC0

5) Initial marginalsFi, foriI

6) QandQi,iI

(initial set of quantile levels)

7) ηRandηiR,iI (smoothing parameters)

a) Using 1) and 2), estimate diag- onalbδ1(x)andbδ−1(x), for any x[0,1].

b) Using 3), 4) and a), getTb

c) Estimate of the targeti-th marginal distribution,Fbifor iI(e.g., empirical cdf )

d) Using 5), b) and c), getTbi, for iI

e) Using 6) and b), getΩ(Q)

f ) Using 5), 6), b) and c), get i(Qi,T), forb iI

g) Using 7), e) and f ),

Getbθ= Θ(bΩ(Q)).

Getbθi = Θ(bi(Qi, T))for iI, whereT =Hθ,ηb .

h) Using 4), 5) and g), finally get FeΘΘΘ(x1, . . . , xd) =Hθ,ηb C0(H−1

θb11F1(x1), . . . ,H−1

bθdd Fd(xd)),

whereΘΘΘ = (bθ1, . . . ,bθd,θ, ηb 1, . . . , ηd, η).

Table 1: Comprehensive scheme of the global estimation procedure. We distinguish between tuning parameters (left column) and estimated/calculated quantities (right column).

This quite small sensitivity of these parameters is important. Indeed in high dimension, due to the curse of dimensionality, it is not possible in practice to find the global optimum of a function. It is thus essential that any chosen tuning parameter lead to small average errors, in order that calculated quantities are near a local optimizer of the considered error, and in order to reduce the difference between local optima and global one.

Resulting calculated parameters like (m, h, ρ1, ρ2)for each transformation can still be optimized once they are close to one optimizer.

In further numerical illustrations (see Sections 6, 7 and 8), we never optimize resulting calculated quantities, but we sometimes choose tuning parameters that permit to get reduced final errors.

6 Numerical results on the rainfall real data

In the following we illustrate our methodology presented in the previous sections using a5−dimensional rainfall data-set.

6.1 Presentation of the data-set

Data comes from the website CISL Research Data Archive (RDA), http://rda.ucar.edu, and is available for registered users. The user is granted the right to use the Site for non-commercial, non-profit research, or educational purposes only, without any fee or cost, as specified in the terms of use of the website (January 2014).

The name of the data in this website is: ds570.0, and the direct url to this data is http://rda.ucar.edu/

datasets/ds570.0/. The whole citation information for this data is:

Références

Documents relatifs

Estimation using relative frequencies of extreme circuits We estimate here the test metrics considering the procedure presented in Section IV-A based on relative frequencies of

This manuscript reviews the justification for the conditional independence assumption in multivariate finite mixture models and summarizes what is known about the identifiability

A new algorithm based on fast summation in lexicographical order has been developed to efficiently calculate multivariate empirical cumulative distribution functions (ECDFs)

Here, we applied NLDR techniques to empirical forensic data, integrating clinical, dental and radiological data and investigated whether these data could properly and accurately

The calculation of multinormal probabilities is one reason why we need expansions for repeated integrals of products of univariate and multivariate normal densities. The need for

This paper has derived the implications for the univariate processes of the presence of common factors or trends resulting from common cyclical features or cointegration in

104 Niggli/Gfeller, in: Basler Kommentar Strafrecht I (Anm.. direkten Schluss zu, ob sich eine juristische Person oder ein sonstiger Verband auch tatsächlich „strafbar“ machen kann

For illustrative purposes, we restrict our attention to the trivariate data of PM10, NO, and SO2. The scatterplots in Fig. 1 indicate that PM10 and NO exhibit relatively strong