Semi-parametric approximation of Kendall’s distribution function and multivariate Return Periods

(1)

Vol. 154No. 1 (2013)

Semi-parametric approximation of Kendall’s distribution function and multivariate Return

Periods

^*

Titre:Approximation semi-paramétrique de la fonction de distribution de Kendall et périodes de retour multivariées

Gianfausto Salvadori¹, Fabrizio Durante²and Elisa Perrone³

Abstract:In this work we outline a constructive approach for the approximation of Kendall’s distribution function and Kendall’s Return Period in the bivariate case. First, we introduce a suitable theoretical framework, based on the Theory of Copulas, where to embed the issue. Then, we outline an original construction procedure to approximate the empirical Kendall distribution function estimated using the available data. The whole approach is semi-parametric: the empirical Kendall distribution function is approximated via a (suitable) continuous piece-wise linear function on the unit interval. A sensitivity analysis is carried out via a simulation procedure, in order to investigate the robustness of the approach proposed against several relevant factors.

Résumé :Dans le cas bivarié, une approximation de la fonction de distribution de Kendall ainsi que de la période de retour de Kendall sont proposées. Le contexte théorique est celui de la théorie des copules. L’approche semi- paramétrique suggérée consiste à approcher la fonction de distribution de Kendall empirique par une fonction linéaire par morceaux sur l’intervalle unité. La robustesse de l’approche proposée est étudiée empiriquement par le biais de simulations.

Keywords:Kendall’s distribution function, multivariate return periods, copulas, risk assessment

Mots-clés :fonction de distribution de Kendall, périodes de retour multivariés, copules, évaluation de risque AMS 2000 subject classifications:62G05, 62H12, 86A05

* Helpful discussions with C. Sempi (Università del Salento, Italy) are acknowledged. The comments and the suggestions of two anonymous Referees are also gratefully acknowledged. [GS] The research was partially supported by the Italian M.I.U.R. via the project “Metodi stocastici in finanza matematica” (PRIN 2010). Also the support of the CMCC—Centro Mediterraneo Cambiamenti Climatici - Lecce (Italy) is acknowledged. [FD] The Author acknowledges the support of School of Economics and Management, Free University of Bozen-Bolzano (Italy), via the project “Risk and Dependence”.

1 Dipartimento di Matematica e Fisica, Università del Salento, I-73100 Lecce, Italy.

E-mail:[email protected]

2 School of Economics and Management, Free University of Bozen-Bolzano, I-39100 Bolzano, Italy.

3 Institut für Angewandte Statistik, Johannes Kepler Universität, A-4040 Linz, Austria.

(2)

1. Introduction

The notion of Return Period (hereinafter,RP) is frequently used in environmental sciences for the identification of dangerous events, and provides a means for rational decision making and risk assessment (for a review, see [Singh et al., 2007,AghaKouchak et al., 2013], and references therein). Roughly speaking, the RP can be considered [Nappo and Spizzichino, 2009] as an analogue of the “Value-at-Risk” in Economics and Finance, since it is used to quantify and assess the risk.

The traditional definition of the RP is as “the average time elapsing between two successive realizations of a prescribed event”, which clearly has a statistical base. Equally important is the related concept ofdesign quantile, usually defined as “the value of the variable(s) characterizing the event associated with a given RP”. In engineering practice, the choice of the RP depends upon the importance of the structure, and the consequences of its failure. While in the univariate case the design quantile is usually identified without ambiguity, in the multivariate one this is not so.

Indeed, the identification problem of design events in a multivariate context has recently attracted the attention of many researches (see, e.g., [Serfling, 2002,Belzunce et al., 2007,Chebana and Ouarda, 2009,Chebana and Ouarda, 2011,Chaouch and Goga, 2010,AghaKouchak et al., 2013], and references therein).

As we shall show later, the calculation of the RP is strictly related to the notion of Copula, which is the restriction of a joint distribution with Uniform margins toI^d= [0,1]^d,d>1. The link between a multivariate distributionFand the associatedd-dimensional copulaCis given by the functional identity stated by Sklar’s Theorem [Sklar, 1959,Durante et al., 2012]:

F(x₁, . . . ,xd) =C(F₁(x₁), . . . ,Fd(xd)) (1)

for allx∈R^d, where theF_i’s are the univariate margins ofF. If all theF_i’s are continuous, thenC is unique. Most importantly, theFi’s in Eq. (1) only play the role of (geometrically) re-mapping the probabilities induced byCon the subsets ofI^donto suitable subsets ofR^d, without changing their values: viz., the dependence structure modeled byCplays a central role in tuning the probabilities of joint occurrences. For a thorough theoretical introduction to copulas see [Joe, 1997,Nelsen, 2006,Durante and Sempi, 2010]; for a practical approach see [Salvadori et al., 2007,Jaworski et al., 2010].

In order to avoid troublesome situations, hereinafter we shall assume that Fis continuous (but not necessarily absolutely continuous), and strictly increasing in each marginal. In turn, any pointx∈R^d can be uniquely re-mapped ontou∈I^d (and vice-versa) via the Probability Integral Transform (PIT):

(u₁, . . . ,ud) = (F₁(x₁), . . . ,Fd(xd)). (2)

Later we shall use theKendall’s distribution(ormeasure) functionK_C:I→I[Genest and Rivest, 1993,Genest and Rivest, 2001] given by

K_C(t) =P(W≤t) =P(C(U₁, . . . ,Ud)≤t), (3) wheret∈Iis a probability level,W=C(U₁, . . . ,Ud)is a univariate random variable (hereinafter, r.v.) taking value on I, and the U_i’s are Uniform r.v.s on I with copulaC. Note that KC(t) practically measures the probability that a random event will appear in the region ofI^d defined by

(3)

the inequalityC(u)≤t: this function was introduced as a generalization of the PIT, and stands its name by the fact that it is related to Kendall’s measure of association — see [Genest and Rivest, 2001,Nappo and Spizzichino, 2009].

While in some cases (e.g., for bivariate Extreme Value copulas [Ghoudi et al., 1998], or Archimedean copulas [Barbe et al., 1996,McNeil and Nešlehová, 2009]) a close formula forK_C is available, in general it is possible to resort to simulations for estimating specific values ofK_C (see, e.g., the procedures outlined in [Barbe et al., 1996,Salvadori et al., 2011], and references therein). As we shall see,K_Cturns out to be a fundamental tool for calculating a copula-based RP for multivariate events.

The paper is organized as follows. In Section2we recall consistent definitions of RP’s and quantiles in a multivariate environment. In Section3we present a constructive method for the approximation of Kendall’s distribution in the bivariate case. In Section4we carry out a simulation study, in order to test the performance of the approach outlined in the paper. Finally, in Section5 we draw some conclusions and discuss possible future perspectives.

2. Multivariate Return period

In order to provide a consistent theory of RP’s in a multivariate environment, it is first necessary to define the abstract framework where to embed the question. Preliminary studies can be found in [Salvadori, 2004,Salvadori and De Michele, 2004,Durante and Salvadori, 2010,Salvadori and De Michele, 2010], and some applications are presented in [De Michele et al., 2007,Salvadori and De Michele, 2010,Vandenberghe et al., 2010,Salvadori et al., 2011]. Hereinafter, we shall consider as the object of our investigation a sequenceX ={X₁,X₂, . . .}of independent and identically distributedd-dimensional random vectors, withd>1: thus, eachXihas the same multivariate distributionFas of the random vectorX∼F=C(F₁, . . . ,Fd)describing the phenomenon under investigation, with suitable marginalsF_i’s andd-copulaC.

In applications, usually, the event of interest is of the type{X∈D}, whereD is a non-empty Borel set inR^d collecting all the values judged to be “dangerous” according to suitable criteria.

Let µ >0 be the average inter-arrival time of the realizations in X (viz., µ is the average time elapsing betweenXiandXi+1). According to [Salvadori et al., 2011], we may introduce a consistent notion of RP as follows.

Definition 1. The RP associated with the event{X∈D}is given byµ_D=µ/P(X∈D).

Definition1 is a very general one: the setD may be constructed in order to satisfy broad requirements, useful in different applications. Indeed, most of the approaches already present in literature are particular cases of the one outlined above. It is important to stress that the RP is a quantity associated with a proper event. However, with a slight abuse of language, we may also speak of “the RP of a realization”, meaning in fact “the RP of the event {Xbelongs to the dangerous regionDx^∗ identified by the given realizationx^∗}”. In a univariate framework, usually the assignment ofx^∗uniquely identifies the corresponding regionDx^∗; instead, this is not the case in a multivariate framework. Below we show how it is possible to associate a given multi-dimensional realizationx^∗∈R^dwith a dangerous regionDx^∗⊂R^d. First of all we need to introduce the following notion.

(4)

Definition 2. Given ad-dimensional distributionF=C(F₁, . . . ,F_d)andt∈(0,1), thecritical layerLt^F of leveltis defined as

Lt^F={x∈R^d:F(x) =t}. (4) Clearly,Lt^F is the iso-hyper-surface (having dimensiond−1) whereFequals the constant valuet: thus,Lt^Fis a (iso)line for bivariate distributions, a (iso)surface for trivariate ones, and so on. Evidently, for any givenx∈R^d, there exists auniquecritical layerLt^Fsupportingx: namely, the one identified by the levelt=F(x). Note that, thanks to Eq. (2), there exists a one-to-one correspondence between the two iso-hyper-surfacesLt^C={u∈I^d:C(u) =t}(pertaining toC inI^d) andLt^F(pertaining toFinR^d).

The critical layerLt^FpartitionsR^dinto three non-overlapping and exhaustive regions:

1. Rt^<={x∈R^d:F(x)<t};

2. Lt^F, the critical layer itself;

3. Rt^>={x∈R^d:F(x)>t}.

Practically, at any occurrence of the phenomenon, only three mutually exclusive things may happen: either a realization ofXlies inRt^<, or overLt^F, or inRt^>. Note that all these three regions are Borel sets.

Thanks to the above discussion, it is now clear that the following (multivariate) notion of RP is meaningful, and coincides with the one used in the univariate framework.

Definition 3. LetXbe a multivariate r.v. with distributionF=C(F₁, . . . ,Fd). Also, letLt^Fbe the critical layer supporting a realizationxofX(i.e.,t=F(x)). Then, the RP associated withxis defined as

1. for the regionRt^>

T_x^>=µ/P(X∈Rt^>), (5)

2. for the regionRt^<

T_x^<=µ/P(X∈Rt^<). (6) For instance, in hydrology,T_x^<may be of interest if droughts are investigated, while the study of floods may require the use ofT_x^>. In the sequel we shall concentrate only upon Rt^>: the corresponding formulas forRt^<could easily be derived. Now, in view of the results outlined in [Nelsen et al., 2001,Nelsen et al., 2003], it is immediate to show that

T_x^>= µ

1−K_C(t), (7)

whereK_Cis the Kendall’s distribution function associated withC. Clearly,T_x^>is a function of the critical leveltidentified by the relationt=F(x). It is then convenient to denote the above RP via a special notation as follows.

Definition 4. The quantityκx=T_x^>is called theKendall’s RPof the realizationxbelonging to Lt^F(hereinafter,KRP).

(5)

An advantage of the approach outlined in this work is that realizations lying over the same critical layer do always identify the same “dangerous” region (i.e.,Rt^>). In particular, all the realizationsyhaving a KRPκy<κxmust lie inRt^<, whereas all thoseyhaving a KRPκy>κx

must lie inRt^>, while the realizations lying overLt^Fshare the same KRPκx.

Traditionally, in the univariate framework, once a RP (say, T) is fixed (e.g., by design or regulation constraints), the corresponding critical probability level pis calculated as 1−p= P(X >xp) =µ/T, and by inverting the distributionFX ofX it is then immediate to obtain the critical quantilexp=F_X⁽⁻¹⁾(p), which is usually unique. Then,xpis used in practice for design purposes. As shown below, the same approach can also be adopted in a multivariate environment.

Definition 5. Given a d-dimensional distribution F=C(F₁, . . . ,Fd) with d-copula C, and a probability levelp∈I, theKendall’s quantile qp∈Iof order pis defined as

q_p=inf{t∈I:KC(t) =p}=K⁽⁻¹⁾_C (p), (8)

whereK⁽⁻¹⁾_C is the (generalized) inverse ofKC.

Definition5provides a close analogy with the definition of univariate quantile: indeed, recall thatK_Cis a univariate distribution function (see Eq. (3)), and henceqp is simply the quantile of orderpofKC. In turn, pis the probability measure induced byCon the regionRq^<_p, while (1−p)is the one ofRq^>_p. From a practical point of view this means that, in a large simulation ofnindependentd-dimensional vectors extracted fromF,nprealizations are expected to lie in Rq^<_p, and the others inRq^>_p. A heuristic procedure for estimatingqp(if it cannot be calculated analytically) is outlined in [Salvadori et al., 2011].

Remark1. It is worth stressing that a common error is to confuse the value of the copulaCwith the probability induced byConI^d (and, hence, onR^d): on the critical layerLq^C_p it isC=qp, but the corresponding regionRq^<_p has probability p=K_C(qp)≥q_p, sinceK_Cis usually non-linear (the same rationale holds for the regionRq^>p).

3. Approximation of K_C(bivariate case)

Given the discussion in the previous Sections, it is now clear the importance of Kendall’s distribution function within the multivariate RP theory. In this Section we present a constructive method for the approximation of a Kendall’s distribution in the bivariate case. Multivariate generalizations will be discussed later.

Here the fundamental point is as follows. Given a random sample ofmmultivariate observations {X₁, . . . ,Xm}extracted from a common joint distributionFwith copulaC, it is possible to estimate the corresponding values ofK_Cat any point inI(e.g., via the procedure outlined in [Barbe et al., 1996]). Then, the true (but unknown!)K_Ccan be approximated as shown below. As a consequence, the KRP’s of the events of interest can be estimated, without knowing the true copula model ruling the actual multivariate statistics.

(6)

3.1. Construction and features

LetTnbe a dyadic partition of ordern>0 of the unit intervalI, i.e.:

t_i= i

2ⁿ, i=0, . . . ,2ⁿ. (9)

Note that we use here a dyadic partition for the sake of simplicity only: actually, the present approach can easily be generalized to an arbitrary (finite) partition ofIincluding the points{0}

and{1}. LetYnbe the set of values

y_i=K(tˆ i), i=0, . . . ,2ⁿ, (10) where the ˆK(ti)’s are the values, or empirical estimates, of the (unknown) Kendall’s distribution functionKCassociated with an (unknown) copulaC. Clearly,y₀=0 andy₂ⁿ =1; also, in order to avoid troublesome cases — which would require anad hoctreatment — we only use thosey_i’s such thatyi>ti(see the admissibility discussion at the beginning of Section4.2, the constraint on Knbelow Eq. (16), and the note on the coefficientsc_i,n’s below Eq. (19)).

Now, the idea is to approximate the true K_C, whose values Yn are assumed to be known (estimated) at the pointsTn, with a continuous, piecewise linear, distribution functionKn, which depends upon the ordernof the partitionTn. Such a choice is a very natural one: on the one hand,Knis the simplest continuous polynomial, and its construction is easy and requires a least amount of parameters; on the other hand, the expression of the associated Archimedean generator (see below) is both analytically and computationally tractable.En passantwe note that, given the constraints stated above,Knturns out to be a proper Kendall’s distribution function.

In turn, we simply need to find suitable parametersa_i,n’s andb_i,n’s such that

Kn(t) =a_i,n+b_i,nt, t∈[ti−1,ti], (11) withi=1, . . . ,2ⁿ, and to solve the following system of equations:

(a_i,n+b_i,nt_i−1=y_i−1

a_i,n+b_i,nt_i=y_i . (12)

The solutions are

(a_i,n=y_i−i(yi−y_i−1)

b_i,n=2ⁿ(yi−y_i−1) , (13)

withi=1, . . . ,2ⁿ. Clearly,b_i,n≥0 for all indicesi’s: indeed,Knshould represent a distribution function overI— see Eq. (3), and hence it must be increasing. In addition,Knconverges pointwise a.e. toK_CinIas the ordernof the partitionTnand the sample sizemincrease.

Now, sinceKnis a (Kendall’s) distribution function, Theorem 4.3.4 in [Nelsen, 2006] states that

Kn(t) =t− γn(t)

γ_n⁰(t⁺), t∈(0,1), (14) where the functionγnis the inner generator of a suitable Archimedean bivariate copulaCn.

(7)

Remark2. We recall that a (strict) inner Archimedean generatorγnmust be positive, convex, and strictly decreasing onI, withγn(0) = +∞andγn(1) =0. In turn,γnis continuous inI.

In view of Eq. (14), it is then possible to calculateγn, and eventually to construct an Archi- medean copulaCnassociated withKn. Note that, in the present case,t−Kn(t)is continuos and negative by construction for anyn, and hence so is the ratioγn/γ_n⁰ in(0,1).

Remark3. The Archimedean copulaCnassociated withKncould be used to generate random samples having a Kendall’s distribution approximating the unknown one of the available data (i.e.,K_C). Clearly, the whole approach is expected to provide reasonable simulations in case the underlying observations exhibit some kind of Archimedean dependence (e.g., exchangeability, or, at least, a symmetric dependence structure). Recent tests to check the presence of these features can be found in [Jaworski, 2010,Bücher et al., 2012,Genest et al., 2012,Kojadinovic and Yan, 2012].

Practically, we need to solve the set of differential equations γi,n(t)

γ_i,n⁰ (t) =t−Kn(t) =−a_i,n+ (1−b_i,n)t, t∈[ti−1,ti], (15) in eachi-th interval of the partitionTn, withi=1, . . . ,2ⁿ. The corresponding solutions are

γi,n(t) =

(c_i,nexp(−t/a_i,n), ifb_i,n=1 c_i,n(| −a_i,n+ (1−b_i,n)t|)

1

1−bi,n, ifb_i,n6=1

, (16)

where thec_i,n’s are suitable positive constants (see below),t∈[ti−1,ti], andi=1, . . . ,2ⁿ. Using Eq. (15), and the constraintKn(t)>t(except fort=0 andt=1, whereKn(t) =t), it follows that the argument of the absolute value is strictly negative for allt∈[ti−1,ti]. As a consequence, the inner Archimedean generatorγnhas the following piecewise representation:

γ_i,n(t) =

(c_i,nexp(−t/a_i,n), ifb_i,n=1 c_i,n(a_i,n+ (b_i,n−1)t)

1

, (17)

witht∈[ti−1,ti]andi=1, . . . ,2ⁿ. In turn, the full expression ofγnis as follows:

γn(t) =

2ⁿ

∑

i=1

γi,n(t)1_[t_i−1_,t_i_](t). (18) The Lemmas given below show that γn features all the properties of a proper (strict) inner Archimedean generator.

Lemma 1. The functionγn, given by the piecewise representation (18), satisfies the boundary conditionsγn(0) = +∞andγn(1) =0.

Proof. Due to the constraint Kn(t)>t for t∈(0,1), one cannot have b_1,n=1 or b₂ⁿ_,n =1, otherwise the first or last segments of the piecewise linear function Kn would coincide with the diagonal of the unit square: more particularly, one must haveb_1,n>1 andb₂ⁿ_,n<1, since Kn(0) =0 and Kn(1) =1. In turn, the first and last portions ofγn necessarily only admit the power-law representation given by Eq. (17), and hence the boundary conditionsγn(0) = +∞and γn(1) =0 are trivially satisfied, beingKn(0) =a_1,n=0 andKn(1) =a₂ⁿ_,n+b₂ⁿ_,n=1.

(8)

According to Remark2, suitable conditions must be satisfied by the coefficientsc_i,n’s, for all indicesi’s, in order to yield a proper inner Archimedean generator: here we shall only deal with the non-trivial caseb_i,n6=1. First of all, we fix backwards thec_i,n’s in such a way thatγn, given by Eq. (18), be continuous inI, i.e.γi,n(ti) =γi+1,n(ti). This yields

c_i,n=c_i+1,n(ai+1,n+ (bi+1,n−1)t_i)

1 1−bi+1,n

(a_i,n+ (b_i,n−1)ti)

1 1−bi,n

(19) fori=1, . . . ,2ⁿ−1. Clearly,c_i,n>0 for all indicesi’s, sinceKn(ti)>ti. Note that it is possible to choose any arbitrary starting valuec₂ⁿ_,n>0, since the inner generator of an Archimedean copula is uniquely defined up to a positive multiplicative constant. The following Lemmas show thatγn

is strictly decreasing and convex overI.

Lemma 2. The function γn, given by the piecewise representation (18), has a continuous and strictly negative first derivative overI, and hence is strictly decreasing.

Proof. A simple calculation shows that

γ_n⁰(t) =γ_i,n⁰ (t) =







−^c_a^i,n

i,n exp(−t/a_i,n), ifb_i,n=1

−c_i,n(ai,n+ (bi,n−1)t)

bi,n

, (20)

witht∈[ti−1,ti]andi=1, . . . ,2ⁿ. Thus,γ_n⁰ is strictly negative, and by using Eq. (19) it is immediate to show thatγ_n⁰ is also continuous, i.e.γ_i,n⁰ (ti) =γ_i+1,n⁰ (ti).

Lemma 3. The functionγn, given by the piecewise representation (18), is convex overI.

Proof. In order to show the convexity ofγnit is enough to prove that

γ_n⁰(y)/γ_n⁰(x)≤1 (21)

for allx≤yinI, i.e.γ_n⁰ is a non-decreasing function. We first note thatγ_n⁰ is continuous and strictly negative, as shown in Lemma2, and that all the piecewise componentsγi,n’s are convex. Then, we proceed by induction over the partitionTn.

First consider the componentsγ_1,n⁰ andγ_2,n⁰ defined over, respectively,I₁= [t0,t1]andI₂= [t1,t2].

If x and y both belong either to I₁ or I₂, then the inequality (21) trivially holds, being γ_1,n⁰ andγ_2,n⁰ both convex. Thus, lett₀≤x<t₁<y≤t₂. In turn, sinceγ_n⁰ is continuous int₁, then γ_1,n⁰ (t1) =γ_2,n⁰ (t1), and hence

γ_2,n⁰ (y)

γ_1,n⁰ (x) = γ_2,n⁰ (y)

γ_2,n⁰ (t1)·γ_1,n⁰ (t₁) γ_1,n⁰ (x) ≤1,

beingγ_1,n⁰ andγ_2,n⁰ both convex. Thus,γnis convex in[t₀,t₂]. By the same token, we can then prove thatγnis convex in[t0,t3], in[t0,t4], and so on, till the end pointt₂ⁿ of the unit intervalI.

Remark4. An alternative proof can be given by exploiting the (equivalent) integral representation ofγngiven in [Genest and Rivest, 1993]:

γn(t) =exp _Z _t

x₀

1 x−Kn(x)dx

,

(9)

wheret∈Iandx₀∈(0,1)is arbitrary. Given the particular expression ofKn, it is easy to show that the inequality (21) holds for allx≤yinI(however, the tedious calculations are not shown).

Now, given the fact thatγnis a proper inner Archimedean generator, we can provide the explicit expression of the corresponding Archimedean copulaCnvia the standard formula

Cn(u,v) =γ_n⁻¹(γn(u) +γn(v)), (22) where(u,v)∈I². Clearly,CnhasKnas its Kendall’s distribution function. A further feature of Cnis as follows.

Proposition 1. TheCn-measure of the level curves ofCnis zero.

Proof. For anyt∈(0,1), thet-level curve ofCnis defined via the relationγn(u) +γn(v) =γn(t).

According to Theorem 4.3.3 in [Nelsen, 2006], the correspondingCn-measure is given by γn(t)

γ_n⁰(t⁻)− γn(t) γ_n⁰(t⁺).

Since the ratioγn/γ_n⁰ is continuos in(0,1)by construction, the claim follows.

As is well known [Nelsen et al., 2003], Kendall’s distribution function induces an equivalence relation on the set of copulas as follows.

Definition 6. LetC₁andC₂be two copulas with respective Kendall’s distribution functionsK₁ andK₂. Then,C₁is in relation withC₂, and we writeC₁≡_KC₂, if and only ifK₁(t) =K₂(t)for allt∈I.

As a consequence, there is an entire class of copulas sharing the same Kendall’s distribution function. In turn, the Archimedean copula Cn given by Eq. (22) may play the role as of the

“representative member” of the≡_K_n-equivalence class. Actually, in view of the results shown in [Nelsen et al., 2009,Genest et al., 2011a,Genest et al., 2011b], there is exactly one Archimedean copula in each equivalence class (however, it should be noted that such a uniqueness result is only proven for dimensions two and three).

The last point to be investigated is the convergence ofCnto a proper copula ˜C: here we use the following result shown in [Charpentier and Segers, 2008].

Proposition 2. Letγ_n⁰ andγ˜⁰be the right-hand derivatives of, respectively, the inner Archimedean generatorsγnandγ. Then, set˜

λn(t) =γn(t)

γ_n⁰(t), λ˜(t) = γ(t)˜

γ˜⁰(t), Kn(t) =t−λn(t), K(t) =˜ t−λ(t),

whereKnandK˜ are, respectively, Kendall’s distribution functions of the Archimedean copulas CnandC. Then, the following five conditions are equivalent:˜

1. limn→∞Cn(u,v) =C(u,v)˜ for all(u,v)∈I². 2. limn→∞γn(u)

γ_n⁰(v)=_γ^γ(u)_˜^˜0(v)for every u∈(0,1]and v∈(0,1)such thatγ˜⁰is continuous in v.

3. limn→∞λn(u) =λ˜(u)for every u∈(0,1)such thatλ˜(u)is continuous in u.

4. There exist positive constants k_nsuch thatlimn→∞k_nγn(u) =γ(u)˜ for all u∈I.

(10)

5. limn→∞Kn(t) =K(t)˜ for every t∈(0,1)such thatK˜ is continuous in t.

In our case, the last claim follows directly from the pointwise convergence ofKntoK_C. In turn, Proposition2guarantees the pointwise convergence of the sequence of copulasCnto the Archimedean copula ˜C, havingK_Cas its Kendall’s distribution function. As a consequence, ˜C may play the role as of the (Archimedean) “representative member” of the equivalence class of copulas sharing the originalKCKendall’s distribution function, which obviously includes the (unknown) copulaC.

3.2. Global and local simulation

Besides the explicit construction of the approximating copulaCn, the crucial point in applications is the possibility to generate random samples having such a dependence structure, yielding the corresponding Kendall’s distribution function Kn of interest here. Since Cn is Archimedean, several algorithms are available: here we use a well known one, providing a global simulation.

Algorithm 1. (See Exercise 4.15 in [Nelsen, 2006])

1. Generate two independent variatessandtUniform on(0,1).

2. Setw=γn(K⁻¹_n (t)).

3. Setu=γ_n⁻¹(s w)andv=γ_n⁻¹((1−s)w).

4. The desired pair is then(u,v).

Instead, in order to provide “local” scenarios (e.g., to simulate on a given critical layer of interest), here we adapt an algorithm based on the Conditional Inverse Method for exchangeable Archimedean copulas (see Algorithm 3.8 in [Brechmann, 2012]).

Algorithm 2. Letp∈(0,1)be a fixed probability level.

1. Calculate Kendall’s quantileq_p=K⁻¹_n (p)(see Definition5).

2. Setw=γn(qp).

3. Generate a variatesUniform on(0,1).

4. Setu=γ_n⁻¹(s w)andv=γ_n⁻¹((1−s)w).

5. The desired pair is then(u,v).

Essentially, the difference between the two procedures given above is that in Algorithm1two independent variatessandtare needed, whereas in Algorithm2only one is used: this is obvious, since in the latter case the pair(u,v)is constrained to lie on the critical layer (onedimensional isoline)Lq^C_pⁿ. Clearly, the realization generated by Algorithm2has a KRP equal toµ/(1−p).

4. A simulation study (bivariate case)

In order to test the techniques outlined in the previous Sections we adopt a simulation approach:

first, we generate random samples from known copulas, and then we check how the corresponding Kendall’s distribution functions (and, hence, the estimates of the return periods) are approximated by the techniques outlined above. Below we describe the testing strategy.

(11)

4.1. Design of the testing procedure

We briefly outline here the steps of the testing procedure adopted in this work.

1. A bivariate copulaC(with suitable parameterθ) is chosen among five well known families:

Gumbel, Frank, Clayton, Gaussian, and Cuadras-Augé. This gives the possibility to test the performance of the approximating model against different structures of dependence, whose features are briefly summarized below [Nelsen, 2006,Salvadori et al., 2007]. Note that the range of the parameters may also include the limiting cases corresponding to the 2-copulasW₂(the Fréchet-Hoeffding lower bound),Π₂(the Product copula), orM₂(the Fréchet-Hoeffding upper bound), with Kendall’sτequal to−1, 0, and+1, respectively.

Gumbel. This family is Archimedean and Extreme Value, with Kendall’s τ ranging in [0,1], is absolutely continuous, and has upper tail dependence. The expressions of the copula and the corresponding generator are as follows:

C_θ(u,v) =exp

−h

(−lnu)^θ+ (−lnv)^θi_θ¹

, (23)

withθ∈[1,+∞), and

γ_θ(t) = (−lnt)^θ. (24)

Frank. This family is Archimedean, with Kendall’sτ ranging in [−1,1], is absolutely continuous, and has no tail dependence. The expressions of the copula and the corresponding generator are as follows:

Cθ(u,v) =−1 θln

1+(e^−θu−1)(e^−θv−1) e^−θ−1

, (25)

withθ∈R, and

γ_θ(t) =−lne^−θ^t−1

e^−θ−1. (26)

Clayton. This family is Archimedean, with Kendall’sτ ranging in[−1,1], is absolutely continuous, and has lower tail dependence. The expressions of the copula and the corresponding generator are as follows:

C_θ(u,v) =h max

u^−θ+v^−θ−1,0i−¹

θ, (27)

withθ∈[−1,+∞), and

γ_θ(t) = 1 θ

t^−θ−1

. (28)

Gaussian. This family is not Archimedean, with Kendall’sτ ranging in[−1,1], is absolutely continuous, and has no tail dependence The expression of the copula is as follows:

C_θ(u,v) = 1 2π√

1−θ²

Z _Φ⁻¹(u)

−∞

Z _Φ⁻¹(v)

−∞ exp

−s²−2θst+t² 2(1−θ²)

ds dt, (29) withθ∈[−1,+1], whereΦ⁻¹(·)denotes the inverse of the Normal distribution.

(12)

0 0.2 0.4 0.6 0.8 1 0

0.2 0.4 0.6 0.8 1

U

V

Simulated sample (Ref. Cop.)

FIGURE1. Example of a sample of size m=50extracted from the “reference” Gumbel copula — see text. Also shown are the isolines of the Gumbel copula, for the selected levels t=0.1,0.2, . . . ,0.9.

Cuadras-Augé. This family is not Archimedean, with Kendall’sτ ranging in[0,1], has both an absolutely continuous and a singular component, and has upper tail dependence. The expression of the copula is as follows:

C_θ(u,v) = (uv)^θ min{u^1−θ,v^θ}, (30) withθ∈[0,1].

2. Three different values of the copula parameterθare selected, in order to yield the following values of Kendall’sτ associated withC:τ₁=0.25 (forθ=θ₁),τ₂=0.5 (forθ=θ₂), and τ₃=0.75 (forθ=θ₃). This gives the possibility to test the robustness of the approximating procedure against different levels of association / concordance.

3. Three different ordersnof the partitionTnare selected:n₁=3,n₂=4, andn₃=5. Thus, T1 contains 9 points,T2 contains 17 points, andT3 contains 33 points. This gives the possibility to check the role played by the order of the partition in terms of “goodness” of approximation.

4. In order to carry out the tests, it is necessary to use a sample{X₁, . . . ,Xm}of bivariate observations. Three different values of the sample sizemare selected:m₁=50,m₂=500, andm₃=5000. In particular,m₁ is chosen in order to represent the typical size of the samples frequently found, e.g., in hydrological or environmental applications (i.e., small sizes). Instead,m₃should provide a statistically significant sample size, i.e. large enough

(13)

0 0.2 0.4 0.6 0.8 1 0

0.2 0.4 0.6 0.8 1

t

K

Kendall’s measure

Gumbel Empirical Bound Approx.

Inv. Appr.

FIGURE2. Comparison between Kendall’s distribution functionK_Cof the “reference” Gumbel copula, its empirical estimateK, and the corresponding approximationˆ Kn. Also shown are the lower boundKM₂, as well as the inverse functionK⁻¹_n . Here the random sample used to calculateKˆ andK_nis the one shown in Figure1, and the dyadic partitionTnis of order n=4.

to draw reasonable conclusions from a statistical point of view. The value chosen form₂ represents an intermediate case betweenm₁andm₃, and is used to fully assess the behavior of the techniques proposed in this work.

5. The performance of the approach outlined in this work will be investigated via simulation techniques: thus, for each combination(Ci,θj,n_k,ml), N=1000 independent bivariate random samples of sizeml are extracted from the “reference” copulaCi with parameter θj, and analyzed by using a partition of ordernk. The value N=1000 is large enough to construct reliable empirical confidence intervals for the quantities of interest here. As an illustration, in Figure1we show an example of (starting) sample extracted from the

“reference” Gumbel copula, with parameterθ=2 andτ=0.5: hereinafter, this copula will be used to illustrate the results.

4.2. Details of the procedure

For each combination(Ci,θj,nk,ml), and for each of theN independent samples of sizeml

extracted from the “reference” copulaCiwith parameterθj, the following steps are carried out by using a partition of ordern_k. Without loss of generality, here we fix the average inter-arrival time of the realizations asµ=1 year (see Section2). Thus, the temporal unit of the Return Periods calculated in the sequel will be in years.

1. An empirical estimate ˆKof the trueK_Cis calculated at the abscissas of the partitionTn,

(14)

0 0.2 0.4 0.6 0.8 1

−0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

t

a

Coefficients ’’a’’

0 0.2 0.4 0.6 0.8 1

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

t

b

Coefficients ’’b’’

0 0.2 0.4 0.6 0.8 1

−2 0 2 4 6 8 10 12

t Log10 c

Coefficients ’’c’’

0 0.2 0.4 0.6 0.8 1

0 1 2 3 4 5 6 7 8

t

γ

Approximating Archimedean Generator

Gumbel Approx.

FIGURE3. Estimates of the coefficients ai,n’s, bi,n’s, and ci,n’s by using the random sample shown in Figure1:

here the order of the dyadic partitionTnis n=4. Also shown is the Archimedean generatorγnassociated with the approximating functionKnshown in Figure2, as well as (a version of) the one of the “reference” Gumbel copula.

generating the set of valuesYn. Since the estimating procedure is not “intrinsic”, some of the estimates ˆK’s may be smaller than the lower boundKM₂(t) =tat some abscissas ti’s. In turn, these latter are discarded, and only the consistent estimates are eventually used. As an illustration, in Figure2we show a comparison between Kendall’s distribution functionK_Cof the “reference” Gumbel copula, and its empirical estimate ˆK. In particular, a non-admissible empirical estimatey₁₅is well visible in the right upper corner of Figure2 at the abscissat₁₅=15/16: once discarded, the approximating functionKnsimply joins the closest admissible valuesy₁₄andy₁₆.

Note that the arbitrary dismissal of someyi’s may yield a biased estimate ofK_C, whose precise features are difficult to quantify. However, on the one hand, we believe thata priori it makes little sense to construct a possibly non-consistent approximationKn; on the other hand, the regularity (smoothness) of the approximating functionKngenerally improves by discarding the non-admissibleyi’s, as shown in Figure2: this may yield random simulations extracted fromCnless prone to show bizarre (practically unreliable) structures — see later the discussion of the features of Figure4.

(15)

0 0.2 0.4 0.6 0.8 1 0

0.2 0.4 0.6 0.8 1

U

V

Simulated sample (App. Cop.)

FIGURE4. Simulated sample of size 10,000 extracted from the approximating copulaC_nconstructed using the Archimedean generatorγnshown in Figure3. Also shown are the isolines of the “reference” Gumbel copula, for the selected levels t=0.1,0.2, . . . ,0.9.

2. The sequences of the relevant parameters a_i,n’s, b_i,n’s, and c_i,n’s are calculated. As an illustration, in Figure3we show the estimates of these coefficients, by using the sample shown in Figure 1. It is worth noting how the calculation of thec_i,n’s is a numerically ill-conditioned problem: in fact, the estimates range over quite a few orders of magnitude, with local abrupt jumps. In addition, some numerical care is needed when fixing theb_i,n’s:

in fact, ifb_i,n≈1 (within a software-dependent numerical tolerance), then the power-law term in Eq. (17) may “explode”, and the corresponding exponential expression has to be used instead — here we adopt the numerical criterion|b_i,n−1|<0.03. It is also interesting to note that, essentially due to a limited sample size, the empirical estimate ˆKof Kendall’s distribution functionK_Cmay sometimes be constant at two (or more) successive abscissas ti’s: this is well evident in Figure2, at the abscissast₇–t₈andt₁₂–t₁₃, and is due to a lack of sample observations between the isolines of levels 0.4–0.5 and 0.7–0.8 shown in Figure 1. Then, in order to improve the usability ofKn (see later the discussion of the features of Figure4), these pathological values, yielding a flat approximating distribution function Knwith negligible density, are discarded by simply joining the closest valuesy₇–y9and y₁₂–y₁₄, as shown in Figure2.

In turn, the approximating functionKn is constructed, yielding the generatorγnand the corresponding Archimedean copulaCn via Eq. (22). As an illustration, in Figure 2we show a comparison between Kendall’s distribution functionK_Cof the “reference” Gumbel

(16)

copula, and the corresponding approximationKn: overall, the interpolation looks empiri- cally acceptable. Also, in Figure3we show the approximating Archimedean generatorγn

associated withKn, as well as (a version of) the one of the “reference” Gumbel copula: a direct comparison is meaningless, since Archimedean generators are uniquely defined up to a multiplicative constant.

3. The KRP’sT=10,20,50,100,200,500,1000 years, of interest in practical applications, are selected, and the corresponding probability levelsp=1−µ/T and Kendall’s quantilesq_p’s are calculated (by inverting Eq. (11)). Then, by inverting theNapproximating functions Kn’s (calculated over all the N simulations mentioned above), estimates of the mean values of Kendall’s quantilesq_p’s are computed, as well as suitable corresponding empirical Confidence Intervals. In fact, here the target is to investigate how good (or bad) the approach outlined in this work may approximate the “reference” Kendall’s distributions. In particular, we check to what extent the approximation is correct (i.e., unbiased).

As an illustration, considering the case of the “reference” Gumbel copula, in Figure5we plot the estimates of the mean values of Kendall’s quantilesq_p’s, and approximate 90%

Confidence Intervals, for all the values of the parametersmandnused in this work. The estimates should be compared with the exact values of Kendall’s quantiles calculated by using the “reference” Kendall’s distributions plotted in the same pictures.

4. In order to test whether the approximating dependence structureCnmay generate samples having the required KRP’s, we calculate the percentages of simulated pairs “below” each critical layer Lq^C_pⁿ (see Definition 2, and also the discussion ensuing Definition 5) for the KRP’sT =10,20,50,100,200,500,1000 years mentioned above, and compare these estimates with the expected values given byp=1−µ/T.

For this purpose, a large simulation of size 10,000 is generated from the approximating copulaCn via Algorithm1: see Figure4for an illustration. It is interesting to note the very particular structure of the simulated pairs, as well as the link with the corresponding approximating functionKnshown in Figure2. Evidently, the density of points in the unit square is proportional to the density ofKngiven byK⁰_n(t) =dKn(t)/dt =b_i,n, for some suitable indexidepending upont. The plot of the coefficientsb_i,n’s shown in Figure3 provides the graph ofK⁰_n: in fact, theb_i,n’s are simply the slopes of the linear segments formingKn. In turn, intervals inIwhereKnis particularly flat (or, equivalently,K⁰_nis small) correspond to “circular strips” inI²where only a few pairs can be generated, coherently with the probabilistic meaning of Kendall’s distribution function (see Eq. (3)): a small value ofb_i,nin thei-th interval means that it is unlikely that random realizations could be simulated in the corresponding circular strip.

4.3. Analysis of the results

In this Section we briefly discuss the results of the tests mentioned in the previous Sections. For three of the families of copulas considered here (namely, the Gumbel, the Gaussian, and the Cuadras-Augé), in Tables1–3below we report the quantities

∆=100·q_p−qp

qp

, (31)

(17)

0.750.80.850.90.951

0.9

0.925

0.950.975

1 Critical quantile

Kendall’s mesure

m=50; n=3 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

% below Critical Layer

m=50; n=3 0.750.80.850.90.951

0.9

0.925

0.950.975

m=50; n=4 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=50; n=4 0.750.80.850.90.951

0.9

0.925

0.950.975

m=50; n=5 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=50; n=5 0.80.850.90.951

0.9

0.925

0.950.975

m=500; n=3 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=500; n=3 0.80.850.90.951

0.9

0.925

0.950.975

m=500; n=4 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=500; n=4 0.80.850.90.951

0.9

0.925

0.950.975

m=500; n=5 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=500; n=5 0.850.90.951

0.9

0.925

0.950.975

m=5000; n=3 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=5000; n=3 0.80.850.90.951

0.9

0.925

0.950.975

m=5000; n=4 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=5000; n=4 0.80.850.90.951

0.9

0.925

0.950.975

m=5000; n=5 101102103

90

91

92

93

94

95

96

97

98

99100 KRP (years)

m=5000; n=5 FIGURE5.EstimatesofthemeanvaluesofKendall’squantilesqp’s,andprobabilitiesofthesub-criticalregions,forallthevaluesoftheparametersmandnusedinthis work:herethecaseofthe“reference”Gumbelcopulaisconsidered.(Oddcolumns)MeanvaluesofKendall’squantilesqp’s(markers)—seealsoTable1:theapproximate horizontalConfidenceIntervalshavelevel90%.(Evencolumns)Exactprobabilitiesofthesub-criticalregions,fortheKRP’sT=10,20,50,100,200,500,1000years (circles),andcorrespondingestimates(asterisks):theapproximateverticalConfidenceIntervalshavelevel90%.