Estimating the parameters of a seasonal Markov-modulated Poisson process

(1)

HAL Id: hal-00965279

https://hal.archives-ouvertes.fr/hal-00965279v2

Submitted on 24 Feb 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

Estimating the parameters of a seasonal

Markov-modulated Poisson process

Armelle Guillou, Stéphane Loisel, Gilles Stupfler

To cite this version:

Armelle Guillou, Stéphane Loisel, Gilles Stupfler. Estimating the parameters of a seasonal

Markov-modulated Poisson process. Statistical Methodology, Elsevier, 2015, 26, pp.103-123. �hal-00965279v2�

(2)

Estimating the parameters of a seasonal

Markov-modulated Poisson process

Armelle Guillou

(1)

, Stéphane Loisel

(2)

& Gilles Stupfler

(3) (1) _{Université de Strasbourg & CNRS, IRMA, UMR 7501,}

7 rue René Descartes, 67084 Strasbourg Cedex, France

(2) _{Université Lyon 1, Institut de Science Financière et d’Assurances,}

50 avenue Tony Garnier, 69007 Lyon, France

(3) _{Aix Marseille Université, CNRS, EHESS, Centrale Marseille, GREQAM UMR 7316,}

13002 Marseille, France

Abstract. Motivated by seasonality and regime-switching features of some insurance claim counting processes, we study the statistical analysis of a Markov-modulated Poisson process featuring seasonality. We prove the strong consistency and the asymptotic normality of a max-imum split-time likelihood estimator of the parameters of this model, and present an algorithm to compute it in practice. The method is illustrated on a small simulation study and a real data analysis.

Keywords: Markov-modulated Poisson process, seasonality, split-time likelihood, strong

consistency, asymptotic normality.

MSC 2010 Subject Classifications: Primary 62M05, 62F12; Secondary 60F05, 60F15.

1 Introduction

It is often the case that the insurance claim frequency is impacted by environment variables. For instance, flood risk is higher in a period of frequent heavy rains, and fire risk is more

(3)

intense when the weather is particularly dry. Such environment variables may be hidden to some extent to the practitioner: for instance, it is now accepted that the probabilities of severe floods in Australia, strong snowstorms in North America or hurricanes on the East Coast of the United States increase during La Niña episodes (see Neumann et al. [17], Cole and Pfaff [5], Parisi and Lund [20] and Landreneau [10]). This is now taken seriously by most reinsurers as well as Lloyd’s and the UK Met Office [13]. However, observing and understanding the role of those variables is not easy, which makes it realistic to consider these variables as unobserved so far.

To take such a dependency into account, one may for instance assume that the underlying environment process is a Markov process J in continuous time and that in each state of J, the claim counting process N is a Poisson process. The resulting bivariate process (J, N ) is then called a Markov-Modulated Poisson Process (MMPP). MMPPs have been used in different fields during the past forty years, in particular in data traffic systems (see e.g. Salvador et al. [27]), for ATM sources (see Kesidis et al. [9]), in manufacturing systems (see e.g. Ching et al. [4]) or even in ecology (see for example Skaug [29] for applications of MMPPs to clustered line transect data). The MMPP cookbook by Fischer and Meier-Hellstern [6] sums up the main results and ideas that were behind the rise of MMPP applications. The idea of considering

Markov modulation in insurance was first introduced by Asmussen [2]; the obtained model

can capture the fact that the insurance claim frequency may be modified if climatic, political or economic factors change. Such a model has gained considerable attention recently: see for instance Lu and Li [15], Ng and Yang [18], Zhu and Yang [33] and Wei et al. [32]. The parameters of an MMPP are often estimated using a Maximum Likelihood Estimator (MLE), whose consistency was proved in Rydén [22]. Various methods have been suggested to compute the MLE; a standard tool is the Expectation-Maximization (EM) algorithm, see Rydén [25] for the implementation of this procedure for the estimation of the parameters of an MMPP. We finally mention that in a recent paper, Guillou et al. [7] introduced a new MMPP-driven loss process in insurance with several lines of business, showed the strong consistency of the MLE and fitted their model to real sets of insurance data using an adaptation of the EM algorithm. Of course, once the MMPP model is fitted, it is possible to use Bayesian techniques to determine probabilities to be in each state and consequently the average number of events during the

(4)

next period (see e.g. Scott [28]). If external information is present, then it is possible to enrich this Bayesian estimation. This is for example the case for some long-tailed non-life insurance businesses, where indices of sectorial inflation can provide useful information. For reinsurance cycles, large claims that may cause a cycle phase change as well as other aspects of competition or adverse development of reserves can sometimes be plugged into the Bayesian estimation process. For some other risks however, even if we feel that a phenomenon might have an impact on the claim frequency, it may be very hard to come up with a measure of such a phenomenon (for instance, the El Niño-La Niña phenomena). In that case our non-Bayesian framework is of interest for actuarial risk assessment.

Furthermore, many examples of practical applications in insurance display some sort of seasonal variation. For example, theft in garages are more frequent before Christmas as people tend to store Christmas gifts there, fire risk is more intense in the summer, and hurricanes occur mostly between June and November on the East Coast of the United States. These random, cyclic factors and their impact on insurance risk, which need to be taken into account to carry out a proper regime switching analysis, are yet to be understood and forecasted. In an inhomogeneous context with deterministic intensity function, Lu and Garrido [14] have fitted double-periodic Poisson intensity rates to hurricane data, for particular parametric forms (like double-beta and sine-beta intensities) to hurricane data. Helmers et al. [8] have provided an in-depth theoretical statistical analysis of such doubly periodic intensities. We aim at carrying out a theoretical statistical analysis in a stochastic intensity framework with seasonality. An important aspect of pricing in non-life insurance concerns segmentation: thanks to general-ized linear models or more sophisticated techniques, the insurer takes into account explanatory variables to adapt the price of the contract and avoid adverse selection (see Ohlsson and Jo-hansson [19]). Besides, individual ratemaking is updated thanks to credibility adjustments in order to take into account the claim history of each policyholder or contract (see Bühlmann and Gisler [3]). Our approach is only operational and interesting at the aggregated risk man-agement level, and it would be very challenging to try to combine it with regression techniques, from a theoretical point of view as well as the practical point of view since a very large number of data points would be needed to ensure that the estimation is reasonably accurate. The only simple way to combine the two approaches would be to assume that the seasonal and

(5)

Marko-vian intensities are multiplied for each class of contract by a number that corresponds to the risk of each class and does neither depend on the season nor on the state of the environment. In other terms, the ratio between claim intensities for classes A and B, say, should remain con-stant over time. This is of course a strong limitation; in wind-related risk for instance, some buildings are more subject to damage coming from wind effects than others, and consequently the rise of their risk should be sharper during windy episodes.

In this paper, we thus consider an MMPP featuring seasonality, and study estimation issues for this process when the environment process is unobserved. A further motivation for the introduction of this type of process appears in the Solvency II insurance regulation framework. In this framework, it is required to carry out an ORSA (Own Risk and Solvency Assessment). The ORSA has both qualitative and quantitative components, including the continuous com-pliance requirement: the insurer must be able to provide evidence that he/she satisfies solvency requirements at all times (see Vedani and Ramaharobandro [30]). This leads insurers to define confidence regions or to use a continuous time model. The MMPP with seasonality is one interesting potential tool for this new challenge for some non-life insurance companies. A problem, however with this type of process is that contrary to when there is no seasonality, the random sequence of the inter-event times is not ergodic. Studying the asymptotic properties of the MLE, as Rydén [22] and Guillou et al. [7] do, is thus very difficult; furthermore, the particular structure of the likelihood makes it hard to compute the MLE in practice. To tackle this issue, we borrow an idea of Rydén [23,24] and we introduce a Split-Time Likelihood (STL). The logarithm of this quantity is shown to be a sum of ergodic random variables; maximizing the STL then yields a Maximum Split-Time Likelihood Estimator (MSTLE) whose consistency and asymptotic normality can be proven using regenerative theory.

The outline of the paper is as follows. We give a precise definition of our model in Section 2

where we also introduce our estimator. The asymptotic properties are studied in Section3. We explain how to implement our estimation technique in practice in Section4where we illustrate the performance of our estimator on a small simulation study and a real data analysis. Our paper ends by a conclusion and discussion of the results in Section 5. Technical lemmas are given and proven in the appendix.

(6)

2 Our model and estimation procedure

2.1 Our model

We consider an irreducible continuous-time Markov process J with generator L on the state

space {1, . . . , r}, where r ≥ 2. Consider further a counting process N and real numbers

τ0= 0 < τ1<· · · < τk−1< 1 = τk such that for every q∈ N, on the intervals [q + τs−1, q + τs),

if J is in state i, then N is a Poisson process with jump rate λ(s)_i , where λ(s)_i _{≥ 0. The time} interval [q + τs−1, q + τs) represents season s of the period q + 1. The case k = 1, in which there

is actually no seasonality, is just the standard MMPP, which was considered from a statistical point of view by Rydén [22]; in this paper, we shall focus on the case k_{≥ 2. The jump intensity} of the counting process N is thus modulated not only by the random switches of the Markov process J but also by switches from one season to another, which happen at nonrandom times. The context of our work is the following: let us assume that the process N has been observed until time n_{∈ N \ {0}, so that the available data consists of}

1. the number r of states of J and the times τ1, . . . , τk−1,

2. the full knowledge of the process N between time 0 and time n.

The goal is to estimate the unknown parameters of the model, namely the elements ℓij of the

transition intensity matrix L of J and the jump intensities λ(s)_i of N . Since the modulating Markov process J is not observed, estimating the parameters of this model is not straightfor-ward. For the sake of shortness, we let Φ be the global parameter of the model: Φ consists of the values ℓij, for 1 ≤ i, j ≤ r and i 6= j and of the λ(s)i , 1 ≤ i ≤ r and 1 ≤ s ≤ k. The

distribution of the process with parameter Φ is denoted by PΦand we let E be the parameter

space

E =

Φ_{| L(Φ) is irreducible and min}

1≤i≤r 1≤s≤kmin λ (s)

i (Φ) > 0

.

The space _{E can be seen as the set of those parameters for which in any state of J and in}

any season, an event, namely a jump of N , can occur with positive probability, while the irreducibility assumption makes sure that all states of J are visited infinitely often and thus

(7)

|E| = r(r − 1) + rs parameters. Finally, define, for 1 ≤ s ≤ k and 0 ≤ y ≤ τs− τs−1 the

matrix-valued functions

f (y, s, Φ) = exp(y(L(Φ)− Λ(s)_(Φ)))Λ(s)_(Φ)

and F (y, s, Φ) = exp(y(L(Φ)_{− Λ}(s)(Φ)))

where Λ(s)_{(Φ) = diag(λ}(s)

1 (Φ), . . . , λ (s)

r (Φ)). The functions f and F , which will be instrumental

in defining our procedure, have a nice probabilistic interpretation (see Meier-Hellstern [16]): for any i, j∈ {1, . . . , r},

fij(y, s, Φ) dy := P(T1(s)∈ dy, J(y) = j | J(0) = i)

and Fij(y, s, Φ) := P(T1(s)> y, J(y) = j| J(0) = i),

whereT1(s)is the time of the first event of an MMPP whose Markov-modulating process is J,

with jump intensities given by Λ(s)_{. The functions f (}_{·, s, Φ) and F (·, s, Φ) can thus be seen as}

the probability density and survival functions of the inter-event times for given states of J in season s, respectively.

2.2 A Maximum Split-Time Likelihood Estimator

It is impossible to apply any known results on MMPPs here since the jump intensity in a given state of J changes as time goes by; especially, given that J is in state j, N is an inhomogeneous Poisson process. To overcome this issue, we introduce some notation: let Wq,s be the number

of jumps of N during season s of the period q, and let T1(q,s), . . . , T (q,s)

Wq,s be the successive

jump times of N during this season. Let Y_l(q,s) = T_l(q,s)_{− T}_l−1(q,s) be the inter-event times in season s (with T0(q,s)= (q− 1) + τs−1). It is assumed that the starting distribution of J is its

unique stationary distribution a(Φ) on{1, . . . , r}, that is, the only row vector a(Φ) such that a(Φ)L(Φ) = 0 and the sum of the entries of a(Φ) is equal to 1.

Let Zq = (Wq,s, Y1(q,s), . . . , Y (q,s)

Wq,s)1≤s≤k be the random vector representing the information

available for period number q. With this notation, the process (Zq)q≥1 is stationary because

J is a stationary Markov process. Besides, given the states of the irreducible Markov chain (J(q))q∈N, the random variables Zq, q≥ 1 are independent, so that arguing along the lines of

(8)

the proof of Lemma 1 in Leroux [12] shows that the process (Zq)q≥1is ergodic. We denote by

Z = (Ws, Y1(s), . . . , Y (s)

Ws)1≤s≤k a random vector which shares the distribution of the Zq, q≥ 1.

Let thenL1(Z, Φ) be the likelihood of the observations over one period, computed under the

parameter Φ: if z = (ws, y1(s), . . . , y (s) ws)1≤s≤k, L1(z, Φ) = a(Φ) " _k Y s=1 ( _w_s Y l=1 f (y_l(s), s, Φ) ! F τs− τs−1− ws X l=1 y(s)_l , s, Φ !)# 1r

where 1r is the column vector of size r having all entries equal to 1. This quantity can be

expanded as: L1(z, Φ) = X i0,...,ik+1 ai0(Φ) k Y s=1 " e′ is−1 ws Y l=1 f (y(s)_l , s, Φ) × F τs− τs−1− ws X l=1 y(s)_l , s, Φ ! eis # (2.1) where ej is the column vector of size r having all entries equal to 0 except the jth which is

equal to 1, and e′

j is the transpose of ej. Following Rydén [24], we let

ST Ln(Φ) = n

Y

q=1

L1(Zq, Φ)

be the split-time likelihood (STL) of the observations. If the random variables Zq, q≥ 1 were

independent, the STL would be the total likelihood of the observations. While these variables are actually not independent, the sequence (Zq)q≥1 possesses regenerative properties, which

implies in particular that it possesses independence properties given a suitable sequence of increasing random times; this is the key idea to the proofs of our main results (see the proof

of Theorem2). An MSTL is then any parameter that maximizes the STL, or equivalently the

log-STL log_{ST L}n(Φ) = n X q=1 log_L1(Zq, Φ)

and an MSTLE is any estimator of an MSTL.

3 Asymptotic results

We shall write Φ _{∼ Φ}′ _{whenever the distributions of Z under P}

Φ and PΦ′ agree. The

(9)

modulo_{∼, is a difficult question; for the case of an MMPP, this was discussed in Rydén [}26]. In the special case when there is a season for which the jump intensities of N are distinct, which shall be the case we consider in our simulation study and real data analysis, this problem does however possess a simple solution:

Proposition 1. Let Φ_{∈ E. Assume that for some s ∈ {1, . . . , k}, the matrix Λ}(s)_(Φ)

pos-sesses distinct diagonal elements. Then the equivalence class of Φ modulo_{∼ reduces to those} parameters Φ′ _{obtained by permutation of the states of the underlying Markov process J.}

Proof of Proposition1. Let Φ′ _{be such that Φ}

∼ Φ′ _{and (J, N ) (resp. (J}′_{, N}′_{)) have}

distri-bution PΦ(resp. PΦ′). By stationarity, the restriction of the process (J, N ) (resp. (J′, N′)) to

the time interval [τs−1, τs] actually has the same distribution as the restriction of an MMPP

(J , N ) (resp. (J′_,_N′_{)) with parameters (L(Φ), Λ}(s)_{(Φ)) (resp. (L(Φ}′_{), Λ}(s)_(Φ′_{))) to the time}

interval [0, τs− τs−1]. Using again the stationarity property of the underlying Markov process,

it is then enough to show that these two processes are deduced from one another (in the distri-butional sense) by a permutation of the states. Applying Corollary 1 in Rydén [26] completes the proof.

Roughly speaking, in this case, the strong identifiability constraint on season s carries over, in some sense, to the whole process by the stationarity property of J. Back to the general case, the following consistency result holds:

Theorem 1. Let Φ0∈ E be the true value of the parameter and Φ0 be the equivalence class of

Φ0 modulo ∼. Let K be a compact subset of E such that Φ0 ∈ K and bΦn be the MSTLE for

Φ0on K, computed over n periods. Then if O⊂ K is an open set containing K ∩ Φ0, one has

b

Φn∈ O almost surely for n large enough.

Proof of Theorem1. The proof is similar to that of Theorem 1 in Rydén [24]: let Φ_{∈ E be}

such that Φ_{6∼ Φ}0 and GΦ be a neighborhood of Φ as in Lemma2. If B(Φ, 1/q) denotes the

Euclidean open ball in_{E with center Φ and radius 1/q, the continuity of the map ϕ 7→ L}1(Z, ϕ)

yields

sup

ϕ∈GΦ∩B(Φ,1/q)

(10)

Noticing that ϕ∈GΦsup∩B(Φ,1/q) log_L1(Z, ϕ) ≤ sup_ϕ∈G_ΦlogL1(Z, ϕ) + | log L1(Z, Φ)|

the dominated convergence theorem implies E_Φ 0 " sup ϕ∈GΦ∩B(Φ,1/q) log_L1(Z, ϕ) # → EΦ0[logL1(Z, Φ)] as q→ ∞. (3.1)

Since Φ6∼ Φ0, the information inequality (see Rao [21]) gives

E_Φ₀_[log_L₁_{(Z, Φ)] + 2ε < E}_Φ₀_[log_L₁_{(Z, Φ}₀_)] _(3.2)

for some ε > 0. It is thus a consequence of (3.1) and (3.2) that there exists a (possibly different)

neighborhood GΦ of Φ with E_Φ 0 sup ϕ∈GΦ log_L1(Z, ϕ) ≤ EΦ0[logL1(Z, Φ0)]− ε. (3.3)

Besides, since (Zq)q≥1 is ergodic,

1 nϕ∈GsupΦ log_{ST L}n(ϕ) ≤ 1 n n X q=1 sup ϕ∈GΦ log_L1(Zq, ϕ)→ EΦ0 sup ϕ∈GΦ log_L1(Z, ϕ) and 1 nlogST Ln(Φ0) = 1 n n X q=1 logL1(Zq, Φ0)→ EΦ0[logL1(Z, Φ0)]

almost surely as n→ ∞. Hence

lim sup n→∞ 1 nϕ∈GsupΦ log_{ST L}n(ϕ)≤ EΦ0 sup ϕ∈GΦ log_L1(Z, ϕ) ≤ EΦ0[logL1(Z, Φ0)]− ε

almost surely as n_{→ ∞, by (}3.3). Finally, remark that the compact set Oc

∩ K, where Oc _is

the complement of O, may be covered by a finite number of such neighborhoods GΦi, 1≤ i ≤ d;

this yields sup ϕ∈Oc_∩K{log ST Ln (ϕ)_{− log ST L}n(Φ0)} ≤ max 1≤i≤d ( sup ϕ∈G_Φi log_{ST L}n(ϕ)− log ST Ln(Φ0) )

which tends to_{−∞ almost surely as n → ∞. As a consequence, necessarily b}Φn ∈ O for n large

(11)

We now wish to obtain an asymptotic normality result for our estimator. In what follows, we pick i0 ∈ {1, . . . , r} and we let ωk be the successive times when the Markov chain (J(q))

reaches i0:

ω1= min{q ≥ 1 | J(q) = i0} and ∀k ≥ 1, ωk+1= min{q > ωk| J(q) = i0}.

Let further Pi0

Φ(·) = PΦ(· | J(0) = i0) be the probability measure deduced from PΦgiven that

J starts at i0.

Note that for every Φ_{∈ E, since L(Φ) is the generator of an irreducible continuous-time Markov} chain on a finite state space, then 0 is an eigenvalue of the transpose L′_{(Φ) of L(Φ) with}

multiplicity 1 and related normalized eigenvector a′_{(Φ). Since the map ϕ}

7→ L′_{(ϕ) is infinitely}

continuously differentiable in a neighborhood of Φ, a straightforward extension of Theorem 8 in Chapter 9 of Lax [11] shows that the map ϕ7→ a′_{(ϕ) is infinitely continuously differentiable in a}

neighborhood of Φ. The function ϕ7→ log L1(Z, ϕ) is thus infinitely continuously differentiable

in a neighborhood of the true parameter Φ0; if EiΦ0 denotes the expectation under the measure

Pi0

Φ, we can set for Φ close enough to Φ0

hi(z, Φ) = ∂ log_L1 ∂ϕi (z, Φ), Aij(Φ) = EiΦ0 ω1 X q=1 hi(Zq, Φ)hj(Zq, Φ) ! , Vij(Φ) = EiΦ0 ω1 X p,q=1 hi(Zp, Φ)hj(Zq, Φ) ! .

We assume that the matrix A(Φ) = (Aij(Φ)) is invertible for Φ = Φ0 and we let V (Φ) =

(Vij(Φ)), C(Φ) = [ai0(Φ)]

−1_A−1_{(Φ)V (Φ)A}−1_{(Φ) for Φ in a neighborhood of Φ} 0.

Our asymptotic normality result for bΦn is as follows:

Theorem 2. Let Φ0 ∈ E be the true value of the parameter and Φ0 be the equivalence class

of Φ0 modulo∼. Assume that Φ0 lies in the interior ofE and let K be a compact subset of E,

whose interior contains Φ0, such that K∩ Φ0={Φ0}. Let bΦn be the MSTLE computed on K

over n periods. Then √

(12)

Proof of Theorem 2. We start as in the proof of Theorem 4 in Rydén [24]. Condition K∩ Φ0={Φ0} ensures that bΦn → Φ0 almost surely as n→ ∞. Especially, with probability

1, one has bΦn ∈ K for n large enough by Theorem1. For such n, a Taylor expansion of the

i−th partial derivative of Φ 7→ log ST Ln(Φ) at Φ0 implies that

T1,n:= 1 √_n n X q=1 hi(Zq, Φ0) = T2,n+ T3,n (3.4) where                    T2,n:=− |E| X j=1 √ n(bΦn,j− Φ0,j) " 1 n n X q=1 ∂2_log L1 ∂ϕj∂ϕi (Zq, Φ0) # T3,n:=−1 2 |E| X j=1 √ n(bΦn,j− Φ0,j)   |E| X k=1 (bΦn,k− Φ0,k)1 n n X q=1 ∂3_log_L 1 ∂ϕk∂ϕj∂ϕi(Zq, eΦn)   , with eΦn some point on the line connecting Φ0 and bΦn. We start by dealing with the

right-hand side of equality (3.4). To this end, use Lemma4and the fact that the process (Zq)q≥1 is

ergodic to obtain T2,n=− |E| X j=1 √ n(bΦn,j− Φ0,j)EΦ0 ∂2_log L1 ∂ϕj∂ϕi (Z, Φ0) (1 + oP(1)). Moreover, since ∂2_log_L 1 ∂ϕj∂ϕi = 1 L1 ∂2_L 1 ∂ϕj∂ϕi − 1 L2 1 ∂L1 ∂ϕi ∂L1 ∂ϕj ,

it is a consequence of Lemma4and of a differentiation under the expectation sign that

T2,n = |E| X j=1 √ n(bΦn,j− Φ0,j)EΦ0(hi(Z, Φ0)hj(Z, Φ0))(1 + oP(1)) = ai0(Φ0) |E| X j=1 √ n(bΦn,j− Φ0,j)EiΦ00 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! (1 + oP(1)), (3.5)

by Lemma6. Besides, Lemma4, the ergodicity of (Zq)q≥1 and the consistency of bΦn entail

T3,n= oP   |E| X j=1 √ n(bΦn,j− Φ0,j)   (3.6)

(13)

as n_{→ ∞. Collecting (}3.5) and (3.6), we obtain T2,n+ T3,n= ai0(Φ0) |E| X j=1 √_n(b_Φ n,j− Φ0,j)EiΦ00 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! (1 + oP(1)) (3.7)

as n_{→ ∞. We now examine the asymptotic properties of T}1,n: notice that (Zq)q≥1is

regener-ative with associated cycle lengths (Cq = ωq−ωq−1)q≥1(where we set ω0= 0 for convenience),

that is:

• for each l ≥ 2, the random process (Cl+1+q, Zωl+q)q≥1 is independent of the ωj, 1≤ j ≤

l− 1 and its distribution does not depend on l;

• given (ωq)q≥1, q≥ 1, the random vectors Zj, ωq−1≤ j ≤ ωq− 1 are independent.

We may thus apply a law of large numbers similar to the one proposed in Theorem 2 in Rydén [24] (see also Asmussen [1]): this leads to

1 √_nT1,n= 1 n n X q=1 hi(Zq, Φ0) P −→ ai0(Φ0)E i0 Φ0 ω1 X q=1 hi(Zq, Φ0) ! as n→ ∞

Besides, thanks to Lemma4, we may once again differentiate under the integral sign to obtain E_Φ

0(hi(Z, Φ0)) = 0. Lemma4and the ergodicity of (Zq)q≥1 thus yield

1 √_nT1,n= 1 n n X q=1 hi(Zq, Φ0)→ 0 almost surely as n → ∞.

Combining these last two convergences entails Ei0 Φ0 ω1 X q=1 hi(Zq, Φ0) ! = 0. Finally Ei0 Φ0 Pω1 q=1hi(Zq, Φ0)

2 < ∞, see Lemmas4 and 5; applying a central limit theorem similar to the one proposed in Theorem 3 in Rydén [24] then gives

T1,n = √1 n n X q=1 " hi(Zq, Φ0)− ai0(Φ0)E i0 Φ0 ω1 X q=1 hi(Zq, Φ0) !# d −→ N (0, ai0(Φ0)V (Φ0)) as n→ ∞. (3.8)

(14)

The interesting point about MSTLEs is that confidence regions for Φ0may be computed fairly

easily, see Lemma 7: it is enough to compute the MSTLE bΦn with a numerical method, to

generate random copies of the process under the law PΦbn and to apply the strong law of large

numbers to get an approximation of C(bΦn) and hence of C(Φ0).

We close this section by noting that Lemma6 makes it possible to give further insight on the non-singularity of A(Φ0). If A(Φ0) were singular then, by Lemma 6, we would have

∃j ∈ {1, . . . , |E|}, ∀i ∈ {1, . . . , |E|}, EΦ0(hi(Z, Φ0)hj(Z, Φ0)) =

|E|

X

ℓ=1 ℓ6=j

γℓEΦ0(hi(Z, Φ0)hℓ(Z, Φ0))

where the γℓ are real constants. It is then straightforward that

E_Φ 0      hj(Z, Φ0)− |E| X ℓ=1 ℓ6=j γℓhℓ(Z, Φ0)    2   = 0

so that the partial derivatives of logL1(·, Φ0) are linearly dependent. The structure of the

log-likelihood (over one period) being highly non-linear, one may therefore expect that A(Φ0)

will be invertible in the vast majority of cases.

4 Simulation study and real data analysis

4.1 Practical implementation

The MSTLE is typically computed using a (quasi-)Newton algorithm. Such an algorithm being iterative, we give a method to start it in practice. Let wq,sbe the number of jumps of N during

season s of the period q, t(q,s)1 , . . . , t (q,s)

wq,s be the successive jump times of N during this season

and let u(q,s)_l = t(q,s)_l − τs−1− (q − 1)(1 − (τs− τs−1)), 1≤ l ≤ wq,s.

For all 1≤ s ≤ k, we treat the data Ds= (u(q,s)l )q,l as if it were as a sample of a univariate

MMPP with parameters Φ(s) _{= (L, λ}(s)

1 , . . . , λ (s)

r ). An EM algorithm is then used to provide

a first estimate of Φ(s)_{; the initial estimation procedure is a straightforward adaptation of the}

one in Guillou et al. [7].

Let (eL(s)_{, e}_λ(s) 1 , . . . , eλ

(s)

r ) be the EM estimations obtained with this procedure. The EM

(15)

square matrix having size r and σ is a permutation of_{{1, . . . , r}, M ◦ σ = (m}σ(i)σ(j)). Let, for such permutations σ1, . . . , σk, e L(σ1,...,σk)₌ 1 k k X s=1 e L(s)_{◦ σ}s and e Φ(σ1,...,σk)_{= (e}_L(σ1,...,σk)_{, (e}_λ(s) σs(1), . . . , eλ (s) σs(r))1≤s≤k).

We then run the optimization algorithm starting from each of these values: in doing so, we obtain estimates bΦ(σ1,...,σk)_{of the parameters. Finally, we compute}

(σ10, . . . , σk0) = arg max σ1,...,σk

ST Ln(bΦ(σ1,...,σk))

and our MSTLE is bΦ = bΦ(σ0 1,...,σ

0 k).

4.2 Simulation study

In this section, we examine how our estimator behaves on a finite sample situation. We are motivated by climate-related statistical problems such as the impact of the El Niño-La Niña cycle on hurricane risk. We choose reasonable parameters for a simple illustration, but cannot claim that those parameters are estimated on a real data set, simply because it is very hard to understand the impact of this effect on hurricane frequency. We consider the situation r = 2, that is, the underlying Markov process J is a two-state continuous-time Markov process. We further choose k = 2 and τ1 = 1/2, so that any period is divided into two seasons of equal

length. The following cases are considered: • Case 1: ℓ12= 1, ℓ21= 2, λ(1)1 = 1, λ (1) 2 = 5, λ (2) 1 = 5, λ (2) 2 = 25. • Case 2: ℓ12= 3, ℓ21= 10, λ(1)1 = 1, λ (1) 2 = 5, λ (2) 1 = 5, λ (2) 2 = 25. • Case 3: ℓ12= 1, ℓ21= 2, λ(1)1 = 1/2, λ (1) 2 = 5, λ (2) 1 = 5/2, λ (2) 2 = 25.

For each of these cases, we consider the situations n = 50 and n = 100. Our estimation procedure is carried out on S = 100 replications of the considered process. In each case,

we compute the mean and median L1

−error related to each parameter. We also computed

(16)

display the mean and median half-length of these intervals as well as the estimated coverage probabilities in Tables1–3.

Following the intuition, the jump intensities are estimated more accurately when the ratio of the claim frequencies in different states is further from 1. This is because identifying the hidden states of J is easier when both regimes of the process are further apart, and thus more information on the process can be recovered in such cases. Besides, estimating the elements of L is harder than estimating the jump intensities since J is unobserved, which is confirmed by our results. Coverage probabilities are globally satisfying; although they are fairly low in case 2 for n = 50, they improve to reasonable levels for n = 100.

From this simulation study, one can see that in practice, if an insurance company had around 50 or 100 years of data, which would be an ideal situation, for intensities like the ones we chose (as far as the El Niño-La Niña-related problem on hurricane risk is concerned, 5 times as many hurricanes during the hurricane season as in the other one, and 5 times as many hurricanes during a bad El Niño-La Niña phase as in a favorable one), it is feasible to estimate the parameters with an average relative error of around 10 to 20%. Of course, we have taken here a quite favorable case, as we have only two states of the environment and two seasons with fixed intensities. Considering 5 or more states and 4 seasons would require a period of observation too long for the approach to be reasonable. However, in most hidden Markov models in insurance and finance, the number of states is 2 or 3. In the near future, one could probably refine these estimations thanks to partial observations of the environment process, as climatologists understand better and better the El Niño-La Niña phenomenon and its impact on insurance perils.

4.3 Real data analysis

In this section, we focus on extreme climate events (large bushfires, cyclones, floods, hail, storms, tornados, tsunamis) experienced by Australia from March 1967 to March 2012. During this timeframe, 193 such extreme events were observed. Their frequency is largely influenced by a seasonality effect; the first “season” we consider is fall-winter, which in Australia roughly corresponds to the end of March until the end of September, and the second one is spring-summer, namely from the beginning of October to mid-March. Moreover, extreme climate

(17)

events tend to be influenced by external effects such as the El Niño-La Niña phenomenon or even global phenomena like climate change, which motivates the use of our technique on this data set. Our unit of time is the standard calendar year; our data is represented in Fig.1and our estimates are given in Table4. A reconstruction of the underlying process J, carried out by a straightforward adaptation of the Viterbi algorithm (see Viterbi [31]) to our case, is given in Fig.2.

The algorithm seems to identify an underlying process J with very small transition rates. A consequence of this is that the confidence intervals for the elements of L are poor. This is not the case however for the jump intensities, and the algorithm confirms that in a given state of J, the intensities in both seasons are significantly different, thus emphasizing the importance of taking into account the seasonality effect. Furthermore, the frequency of extreme climate events is significantly higher in the spring-summer season, which was expected at least as far as bushfires, cyclones and storms were concerned. We underline that, due to the fact that there are very few transitions of J, the Markov assumption on this process is questionable; nevertheless, as already observed in Guillou et al. [7] on real sets of non-life and life insurance data, the algorithm correctly matches a point where the slope of N sharply increases with a breakpoint in the data set where the hidden process switches from state 2 to state 1, thus leading to higher frequencies of jumps of the counting process. We thus believe that our approach is superior to a naive model using seasonal versions of several different Poisson processes since such an approach would not be helpful in identifying potential unknown breakpoints in the global structure of the model. It is also superior to the use of a classical MMPP approach as in Guillou et al. [7] which would not be able to handle the seasonal components of the data.

5 Conclusion and discussion

In this paper, we introduce an MMPP featuring seasonality for which we discuss estimation issues when the environment process is unobserved. After establishing the main asymptotic properties of the resulting estimator, we illustrate its performance on a small simulation study in which we observe that the number of states is crucial. We conclude our numerical section by a real data analysis on extreme climate events in Australia on which our developed methodology

(18)

seems to perform well. In the real data study, the Markov assumption is of course questionable, as well as the absence of impact of climate change on the parameters. Extending the number of states in order to get a Markov process is feasible in theory, but would lead to estimation problems that would not be tractable in practice. Another problem, when considering climate-related problems such as the motivating hurricane risk example, is that in different states of the environment, the seasons may be longer or shorter, as conditions are more or less met for events to happen. This is not considered here and would require further theoretical analysis; linked to this concern, if one had further details about the events and access to meteorological data, it would of course be relevant to distinguish between categories of extreme events and to carry out a more detailed analysis. Having said that, even if ENSO values were available, their measure would not be precise enough and their link with claim intensity and frequency would still be too loose to use credibility adjustments at this stage, although it is likely to be possible in the near future. Finally, note that we focus here on frequency of extreme events, but of course for the insurer the claim amounts are important. Nevertheless, our approach accounts for periods of time when there are significantly more events, which is very important for reinsurance and risk management purposes.

Acknowledgments

The authors thank the associate editor and two anonymous referees for their helpful comments which led to significant improvements of this article.

References

[1] S. Asmussen, Applied probability and queues, Wiley, New York, 1987.

[2] S. Asmussen, Risk theory in a Markovian environment, Scand. Actuar. J. 2 (1989) 69–100. [3] H. Bühlmann, A. Gisler, A course in credibility theory and its applications, Springer,

(19)

[4] W.K. Ching, R.H. Chan, X.Y. Zhou, Circulant preconditioners for Markov-modulated Poisson processes and their applications to manufacturing systems, SIAM J. Matrix Anal. Appl. 18 (1997) 464–481.

[5] J.D. Cole, S.R. Pfaff, A climatology of tropical cyclones affecting the Texas coast

during El Niño/non-El Niño years: 1990-1996, Technical Attachment SR/SSD

97-37, National Weather Service Office, Corpus Christi, Texas, available at

http://www.srh.noaa.gov/topics/attach/html/ssd97-37.htm(1997).

[6] W. Fischer, K.S. Meier-Hellstern, The Markov-modulated Poisson process (MMPP) cook-book, Perf. Eval. 18 (1993) 149–171.

[7] A. Guillou, S. Loisel, G. Stupfler, Estimation of the parameters of a Markov-modulated loss process in insurance, Insurance Math. Econom. 53 (2013) 388–404.

[8] R. Helmers, I.W. Mangku, R. Zitikis, A non-parametric estimator for the doubly periodic Poisson intensity function, Stat. Methodol. 4 (2007) 481–492.

[9] G. Kesidis, J. Walrand, C.-S. Chang, Effective bandwidths for multiclass Markov fluids and other ATM sources, IEEE/ACM Trans. Netw. 1 (1993) 424–428.

[10] D. Landreneau, Atlantic tropical storms and hurricanes affecting the United

States: 1899-2000, NOAA Technical Memorandum NWS SR-206 (updated through 2002), National Weather Service Office, Lake Charles, Louisiana, available at

http://www.srh.noaa.gov/lch/?n=tropical(2001).

[11] P.D. Lax, Linear Algebra and its applications, Wiley, New York, 2007.

[12] B.G. Leroux, Maximum-likelihood estimation for hidden Markov models, Stochastic Pro-cess. Appl. 40 (1992) 127–143.

[13] Lloyd’s Forecasting risk, The value of long-range forecasting for the insurance industry, joint report with the UK Met Office, 2010.

[14] Y. Lu, J. Garrido, Doubly periodic non-homogeneous models for hurricane data, Stat. Methodol. 2 (2005) 17–35.

(20)

[15] Y. Lu, S. Li, On the probability of ruin in a Markov-modulated risk model, Insurance Math. Econom. 37 (2005) 522–532.

[16] K.S. Meier-Hellstern, A fitting algorithm for Markov-modulated Poisson processes having two arrival rates, European J. Oper. Res. 29 (1987) 970–977.

[17] C.J. Neumann, B.R. Jarvinen, C.J. McAdie, J.D. Elms, Tropical Cyclones of the North Atlantic Ocean, 1871-1992, Historical Climatology Series 6-2, National Climatic Data Center, Asheville, North Carolina, 1993.

[18] A. Ng, H. Yang, On the joint distribution of surplus prior and immediately after ruin under a Markovian regime switching model, Stochastic Process. Appl. 116 (2006) 244–266. [19] E. Ohlsson, B. Johansson, Non-life insurance pricing with generalized linear models,

Springer, 2010.

[20] F. Parisi, R. Lund, Seasonality and return periods of landfalling Atlantic basin hurricanes, Aust. N. Z. J. Stat. 42 (2000) 271–282.

[21] C.R. Rao, Linear statistical inference and its applications, Wiley, New York, 1973. [22] T. Rydén, Parameter estimation for Markov modulated Poisson processes, Comm. Statist.

Stochastic Models 10 (1994) 795–829.

[23] T. Rydén, Consistent and asymptotically normal parameter estimates for hidden Markov models, Ann. Statist. 22 (1994) 1884–1895.

[24] T. Rydén, Consistent and asymptotically normal parameter estimates for Markov modu-lated Poisson processes, Scand. J. Stat. 22 (1995) 295–303.

[25] T. Rydén, An EM algorithm for estimation in Markov-modulated Poisson processes, Com-put. Statist. Data Anal. 21 (1996) 431–447.

[26] T. Rydén, On identifiability and order of continuous-time aggregated Markov chains, Markov-modulated Poisson processes, and phase-type distributions, J. Appl. Probab. 33 (1996) 640–653.

(21)

[27] P. Salvador, R. Valadas, A. Pacheco, Multiscale fitting procedure using Markov modulated Poisson processes, Telecommun. Syst. 23 (2003) 123–148.

[28] S.L. Scott, Bayesian analysis of a two-state Markov modulated Poisson process, J. Com-put. Graph. Statist. 8 (1999) 662–670.

[29] H.J. Skaug, Markov modulated Poisson processes for clustered line transect data, Environ. Ecol. Stat. 13 (2006) 199–211.

[30] J. Vedani, F. Ramaharobandro, Continuous compliance: a proxy-based monitoring

frame-work, preprint, available athttps://hal.archives-ouvertes.fr/hal-00866531(2013).

[31] A. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Inform. Theory IT-13 (1967) 260–269.

[32] J. Wei, H. Yang, R. Wang, On the Markov-modulated insurance risk model with tax, Blätter der DGVFM 31 (2010) 65–78.

[33] J. Zhu, H. Yang, Ruin theory for a Markov regime-switching model under a threshold dividend strategy, Insurance Math. Econom. 42 (2008) 311–318.

(22)

Appendix: technical lemmas and proofs

The first result is a necessary step to obtain the strong consistency of our estimator. Lemma 1. For every ϕ_{∈ E and z = (w}s, y1(s), . . . , y

(s) ws)1≤s≤k, one has m(rm)k+2(rm)w1+···+wk ≤ L1(z, ϕ)≤ M(rM)k+2(rM )w1+···+wk where M := max 1, max j,s λ (s) j (ϕ) <∞ and if Ks:= [0, τs− τs−1], m := min min

i ai(ϕ), mini,j,sy∈Kmins

fij(y, s, Φ), min i,j,sy∈Kmins

Fij(y, s, Φ)

> 0.

Proof of Lemma 1. Start by remarking that

Fαβ(y, s, ϕ) = [exp(y(L(ϕ)− Λ(s)(ϕ)))]αβ= P(J(y) = β, N(s)(y) = 0| J(0) = α) ≤ 1

when (J, N(s)_{) is an MMPP with transition intensity matrix L(ϕ) and jump intensity matrix}

Λ(s)_{(ϕ), see Meier-Hellstern [}₁₆_{]. Therefore, for every y}

≥ 0, one has Fαβ(y, s, ϕ)≤ M and

fαβ(y, s, ϕ)≤ M. Using (2.1), we immediately obtain

L1(z, ϕ)≤ M(rM)k+2(rM )w1+···+wk.

Besides, the compactness of Ksand the continuity of the maps involved entail

min

i,j,sy∈Kmins

fij(y, s, Φ) > 0 and min i,j,sy∈Kmins

Fij(y, s, Φ) > 0

so that m > 0. Hence, using (2.1), the inequality

L1(z, ϕ)≥ m(rm)k+2(rm)w1+···+wk

which completes the proof.

A second pivotal tool is the following technical lemma:

Lemma 2. For every Φ_{∈ E, there exists a neighborhood G}Φof Φ inE such that

E_Φ 0 sup_ϕ∈G_ΦlogL1(Z, ϕ) < ∞.

(23)

Proof of Lemma 2. Let GΦ be a neighborhood of Φ, A = sup ϕ∈GΦ L1(Z, ϕ)≤ 1 and write E_Φ₀ sup_ϕ∈G Φ logL1(Z, ϕ) = EΦ0 sup ϕ∈GΦ logL1(Z, ϕ)1lAc − EΦ0 sup ϕ∈GΦ logL1(Z, ϕ)1lA ≤ EΦ0 sup ϕ∈GΦ logL1(Z, ϕ)1lAc − EΦ0[logL1(Z, Φ)1lA] . (B.1)

The goal is to show that the quantity in the right-hand side of this inequality is finite for a suitable choice of the neighborhood GΦ of Φ in E. Let z = (ws, y1(s), . . . , y

(s)

ws)1≤s≤k and

GΦ be a compact neighborhood of Φ such that ϕ 7→ a(ϕ) is continuous on GΦ and M :=

supϕ∈GΦmax

n

1, maxj,sλ(s)j (ϕ)

o

<_{∞. Since the logarithm function is increasing, using Lemma}1

we deduce that E_Φ 0 sup ϕ∈GΦ logL1(Z, ϕ)1lAc ≤ log M + " k + 2 + k X s=1 E_Φ 0(Ws) # log rM <∞ (B.2)

since EΦ0(Ws) <∞ for all s. Furthermore, using Lemma1, we have

− EΦ0[logL1(Z, Φ)1lA]≤ | log m| + " k + 2 + k X s=1 E_Φ 0(Ws) # | log rm| < ∞. (B.3)

for some m > 0. Using together (B.1), (B.2) and (B.3) concludes the proof.

The next result shows that for every i0∈ {1, . . . , r}, the survival function of the random time

ω1= min{q ≥ 1 | J(q) = i0} converges to 0 geometrically fast.

Lemma 3. For every i0 ∈ {1, . . . , r}, there exists a neighborhood G of Φ0 and a constant

c_{∈ (0, 1) such that} ∀k ∈ N, sup Φ∈G Pi0 Φ(ω1> k)≤ c k_. In particular, Ei0 Φ0(ω k 1) <∞ for every k ≥ 1.

Proof of Lemma 3. The result is obviously true for k = 0. Pick k≥ 1 and note that

Pi0 Φ(ω1> k) = X j6=i0 Pi0 Φ(J(1)6= i0, . . . , J(k− 2) 6= i0, J(k− 1) = j, J(k) 6= i0) = X j6=i0 Pi0 Φ(J(1)6= i0, . . . , J(k− 2) 6= i0, J(k− 1) = j)PjΦ(J(1)6= i0). (B.4)

(24)

Set, for i, j _{∈ {1, . . . , r}, P}ij(Φ) := PiΦ(J(1) = j) = [exp(L(Φ))]ij. In particular, the maps

Φ7→ Pij(Φ) are continuous. Moreover, since the Markov process J is irreducible, it holds that

Pij(Φ) > 0 for all i and j. Consequently

0 < c = max j6=i0 sup Φ∈G Pj Φ(J(1)6= i0) < 1. Using (B.4) entails sup Φ∈G Pi0 Φ(ω1> k)≤ c sup Φ∈G Pi0 Φ(ω1> k− 1)

which gives the desired result by induction on k.

A key element in the proof of Theorem2 is to show the following technical result:

Lemma 4. There exists a neighborhood G of Φ0 in E and positive constants C, C′ such that

for any i, j, ℓ: sup ϕ∈Gmax ₁ L1(Z, ϕ),L1(Z, ϕ), ∂_L1 ∂ϕi(Z, ϕ) , ∂2 L1 ∂ϕi∂ϕj(Z, ϕ) , ∂3 L1 ∂ϕi∂ϕj∂ϕℓ(Z, ϕ) ≤ C exp C′ k X s=1 Ws ! .

Especially, there exist (possibly different) positive constants C, C′ _{such that for any i, j, ℓ:}

sup ϕ∈G max |log L1(Z, ϕ)| , ∂ log_L1 ∂ϕi (Z, ϕ) , ∂2_log L1 ∂ϕi∂ϕj (Z, ϕ) , ∂3_log L1 ∂ϕi∂ϕj∂ϕℓ (Z, ϕ) ≤ C exp C′ k X s=1 Ws ! ,

the right-hand side of the above inequality defines a random variable with finite PΦ0−moments

and for all i1, i and j,

Ei1 Φ0 sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)| <_∞.

Proof of Lemma4. Assume that G is a neighborhood of Φ0such that ϕ7→ a(ϕ) is continuous

on G and inf

ϕ∈Gminj,s λ (s)

j (ϕ) > 0. If Ks= [0, τs− τs−1] then for any ϕ∈ G, the ideas of the proof

of Lemma1 yield m := inf ϕ∈G min i ai(ϕ), miny∈Ks min

i,j,sfij(y, s, Φ), miny∈Ks

min

i,j,sFij(y, s, Φ)

> 0.

(25)

For all ϕ∈ G and every z = (ws, y1(s), . . . , y (s)

ws)1≤s≤k, Lemma1 entails

L1(z, ϕ)≥ m(rm)k+2(rm)w1+···+wk. (B.5)

If moreover G is a neighborhood of Φ0 with sup

ϕ∈G

max

i,j ℓij(ϕ) <∞ and sup_ϕ∈Gmaxj,s λ (s) j (ϕ) <∞ then similarly M1:= sup ϕ∈G max

i ai(ϕ), maxi,j,sy∈Kmaxs

fij(y, s, Φ), max i,j,s y∈Kmaxs

Fij(y, s, Φ) <_∞. Let further M2 = sup ϕ∈G max

n,p,smaxℓ y∈Kmaxs

∂f_∂ϕnp_ℓ (y, s, ϕ) , ∂F_∂ϕnp_ℓ (y, s, ϕ) , M3 = sup ϕ∈G max

n,p,smaxℓ,q y∈Kmaxs

∂ 2_f np ∂ϕℓ∂ϕq (y, s, ϕ) , ∂ 2_F np ∂ϕℓ∂ϕq (y, s, ϕ) and M4 = sup ϕ∈G max

n,p,smaxℓ,q,ry∈Kmaxs

∂3_f np ∂ϕℓ∂ϕq∂ϕr (y, s, ϕ) , ∂3_F np ∂ϕℓ∂ϕq∂ϕr (y, s, ϕ)

and note that M = max_{M1, M2, M3, M4} < ∞. Equation (2.1), inequality (B.5) and tedious

computations may then be used to show that one may find positive constants C, C′ _{such that}

sup ϕ∈G max 1 L1(Z, ϕ) ,L1(Z, ϕ), ∂_∂ϕL1_i(Z, ϕ) , ∂ 2_L 1 ∂ϕi∂ϕj (Z, ϕ) , ∂ 3_L 1 ∂ϕi∂ϕj∂ϕℓ (Z, ϕ) ≤ C exp C′ k X s=1 Ws ! . Consequently sup ϕ∈G max |log L1(Z, ϕ)| , ∂ log_∂ϕ_iL1(Z, ϕ) , ∂ 2_log_L 1 ∂ϕi∂ϕj (Z, ϕ) , ∂ 3_log_L 1 ∂ϕi∂ϕj∂ϕℓ (Z, ϕ) ≤ C exp C′ k X s=1 Ws ! ,

for (possibly different) positive constants C, C′_{. Finally, since for every s}

∈ {1, . . . , k}, Ws

is a Poisson distributed random variable, one has EΦ0(x

Ws_{) <}_{∞ for all s and x > 0, which}

proves that the right-hand side of the inequality above defines a random variable having finite

P_Φ₀_{−moments. The second part of the lemma is thus a straightforward consequence of the}

(26)

The next result shall be used to check a couple of integrability conditions used in the proof of

Theorem2.

Lemma 5. Assume that ψ is a Borel measurable nonnegative function. We consider the

random variable U =Pω1

q=1ψ(Zq).

• If EΦ0(ψ(Z)) <∞ then for every i0∈ {1, . . . , r}, E

i0

Φ0(U ) <∞.

• If EΦ0(ψ

2_{(Z)) <}

∞ then for every i0∈ {1, . . . , r}, EiΦ00(U

2_{) <}

∞. Proof of Lemma 5. We start by remarking that for any l≥ 1, if EΦ0(ψ

l_{(Z)) <}_{∞ then for} any i1: Ei1 Φ0|ψ l_(Z) | ≤ EΦ0|ψ l_(Z) | ai1(Φ0) <_∞ since ai1(Φ0) > 0. To prove the first statement, we then write

Ei0 Φ0(U ) = ∞ X N =1 Ei0 Φ0 N X q=1 ψ(Zq)1l{ω1=N } ! = ∞ X q=1 Ei0 Φ0(ψ(Zq)1l{ω1≥q}).

Since_{ω1≥ q} =Tq−1l=0{J(l) 6= i0}, the hidden Markov structure of (Zq) yields

Ei0 Φ0(U ) = ∞ X q=1 X i16=i0 Ei1 Φ0(ψ(Z))P i0 Φ0(ω1≥ q, J(q) = i1)≤ E i0 Φ0(ω1) max_i 1 Ei1 Φ0(ψ(Z)).

Since the Markov chain (J(q)) is irreducible on {1, . . . , r} and has stationary distribution a(Φ0), one has EiΦ00(ω1) = 1/ai0(Φ0) < ∞, from which we deduce that the right-hand side is

finite.

We now turn to the second part of the lemma. Note that Ei0 Φ0(U 2_{) =} ∞ X N =1 Ei0 Φ0 N X p,q=1 ψ(Zp)ψ(Zq)1l{ω1=N } ! = ∞ X N =1 N X q=1 Ei0 Φ0 ψ 2_(Z q)1l{ω1=N } + 2 ∞ X N =1 Ei0 Φ0    N X p,q=1 p<q ψ(Zp)ψ(Zq)1l{ω1=N }    . We already know from the first statement of the lemma that

∞ X N =1 N X q=1 Ei0 Φ0(ψ 2_(Z q)1l{ω1=N }) = E i0 Φ0 ω1 X q=1 ψ2_(Z q) ! <_∞.

(27)

Further, for all Borel nonnegative functions f1, . . . , fN and all i1, . . . , iN, the hidden Markov structure of (Zq) entails Ei0 Φ0 f1(Z1)· · · fN(ZN)1l{J(1)=i1,..., J(N )=iN} = "_N Y q=1 Ei0 Φ0(fq(Zq)| J(q − 1) = iq−1, J(q) = iq) # Pi0 Φ0 N \ q=1 {J(q) = iq} ! = "_N Y q=1 Eiq−1 Φ0 (fq(Z1)| J(1) = iq) # Pi0 Φ0 N \ q=1 {J(q) = iq} ! . Since we may write

{ω1= N} =   [ j1,...,jN −16=i0 N −1_\ q=1 {J(q) = jq}   ∩ {J(N) = i0} it is straightforward that ∞ X N =1 Ei0 Φ0    N X p,q=1 p<q ψ(Zp)ψ(Zq)1l{ω1=N }    ≤ max i1,i2 Ei1 Φ0(ψ(Z)| J(1) = i2) 2 Ei0 Φ0(ω 2 1)

which is finite since Ei0

Φ0(ω

2

1) <∞, see Lemma3.

Lemma6 is the key to the conclusion of the proof of Theorem2.

Lemma 6. Assume that ψ is a Borel measurable function such that EΦ0|ψ(Z)| < ∞. Then

E_Φ₀_{(ψ(Z)) = a}_i₀_(Φ₀_)Ei0 Φ0 ω1 X q=1 ψ(Zq) ! . Proof of Lemma 6. The ergodicity of the process (Zq)q≥1 entails

1 n n X q=1 ψ(Zq)→ EΦ0(ψ(Z)) almost surely as n→ ∞.

Besides, Lemma5yields E_|Pω1

q=1ψ(Zq)| < ∞, so that the regenerative properties of (Zq) and

a law of large numbers similar to the one proposed in Theorem 2 in Rydén [24] give 1 n n X q=1 ψ(Zq) P −→ ai0(Φ0)E i0 Φ0 ω1 X q=1 ψ(Zq) ! as n_{→ ∞}

(28)

Lemma7below is the essential result needed to construct confidence regions for the estimator.

Lemma 7. If Φn→ Φ0 as n→ ∞, one has C(Φn)→ C(Φ0) as n→ ∞.

Proof of Lemma7. We shall show that if Φn→ Φ0, then A(Φn)→ A(Φ0) as n→ ∞. Let n

be so large that Φn belongs to a (compact) neighborhood G of Φ0 as in Lemma4. Using the

definition of A, one has: |Aij(Φn)− Aij(Φ0)| ≤ E i0 Φn ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! + E i0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! . Applying Lemma6and differentiating under the expectation sign yields for any Φ_{∈ G}

Ei0 Φ0 ω1 X q=1 hi(Zq, Φ)hj(Zq, Φ) ! = 1 ai0(Φ0) E_Φ₀ −∂ 2_log L1 ∂ϕi∂ϕj (Z, Φ) .

Lemma4 and the dominated convergence theorem thus entail

E i0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! → 0 as n → ∞.

Afterwards, notice that for every Φ_{∈ E} Ei0 Φ ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! =X i16=i0 ∞ X q=1 Ei1 Φ(hi(Z, Φn)hj(Z, Φn))PiΦ0(ω1≥ q, J(q − 1) = i1) so that E i0 Φn ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! ≤ r1,n+ r2,n with r1,n = C X i16=i0 q≥1 Pi0 Φn(ω1≥ q, J(q − 1) = i1)− P i0 Φ0(ω1≥ q, J(q − 1) = i1) and r2,n = max i1 Ei1 Φn(hi(Z, Φn)hj(Z, Φn))− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)) X q≥1 Pi0 Φn(ω1≥ q) where we set C = max i1 Ei1 Φ0 sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)|

(29)

which is finite by Lemma4. We start by controlling r1,n. Set Pij(Φ) = PiΦ(J(1) = j) as in the

proof of Lemma3. For all q≥ 1 and j 6= i0,

Pi0 Φn(ω1≥ q, J(q − 1) = j) = X i1,...,iq−26=i0 Pi0,i1(Φn)· · · Piq−2,j(Φn) → X i1,...,iq−26=i0 Pi0,i1(Φ0)· · · Piq−2,j(Φ0) as n→ ∞ = Pi0 Φ0(ω1≥ q, J(q − 1) = j).

Besides, Lemma 3 entails that there exists a constant c _{∈ (0, 1) such that for all j 6= i}0 and

q_{≥ 1:} sup n≥1|P i0 Φn(ω1≥ q, J(q − 1) = j) − P i0 Φ0(ω1≥ q, J(q − 1) = j)| ≤ 2c q−1

so that the dominated convergence theorem gives r1,n → 0 as n → ∞. To control r2,n, use

Lemma3 to get for all n

r2,n≤ 1 1_{− c}maxi1 Ei1 Φn(hi(Z, Φn)hj(Z, Φn))− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)) .

To prove that the right-hand side of this inequality converges to 0 as n→ ∞, pick p ≥ 1 and letkW k =Pks=1Ws be the total number of events over one period. For any i1, it holds that

Ei1 Φn(hi(Z, Φn)hj(Z, Φn))− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)) ≤ Ei1 Φn(hi(Z, Φn)hj(Z, Φn)1l{kW k≤p})− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)1l{kW k≤p}) + Ei1 Φn sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)|1l{kW k>p} + Ei1 Φ0 sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)|1l{kW k>p} . Since the state space of J is finite, it is enough to prove that the right-hand side of this inequality converges to 0 as n_{→ ∞. Using Lemma} 4, we may find positive constants C1, C2

such that Ei1 Φn sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} + Ei1 Φ0 sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} ≤ C1EiΦ1n(exp(C2kW k)1l{kW k>p}) + E i1 Φ0(exp(C2kW k)1l{kW k>p}) . Pick now ε > 0. Since sup

ϕ∈G max i,j λ (i) j (ϕ) <∞, one has sup Φ∈G Ei1 Φ(C kW k 3 exp(C4kW k)) < ∞.

(30)

Consequently, one can choose p_{∈ N \ {0} such that for every n} Ei1 Φn sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} + Ei1 Φ0 sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} ≤ 2ε/3. It is then enough to prove that for all fixed q≥ 0,

Ei1 Φn(hi(Z, Φn)hj(Z, Φn)1l{kW k=q})− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)1l{kW k=q}) → 0

as n_{→ ∞. Note that Z may be written Z = (W, Y ) where W = (W}1, . . . , Wk) and Y is the

list of inter-event times in year one. Therefore, if _L1(w, y, Φ0| i1) is the likelihood of (w, y)

given that J(0) = i1, we may write

Ei1 Φn(hi(Z, Φn)hj(Z, Φn)1l{kW k=q})− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)1l{kW k=q}) ≤ X kwk=q Z sup ϕ∈G|hi (w, y, ϕ)hj(w, y, ϕ)|1l{kwk=q}|L1(w, y, Φn| i1)− L1(w, y, Φ0| i1)|dy

where, for a given w = (w1, . . . , wk), the integral in the right-hand side is on the y =

(y(s)1 , . . . , y (s)

ws)1≤s≤k whose components are nonnegative and are such that for every s ∈

{1, . . . , k}, Pws

l=1y (s)

l ≤ τs− τs−1. Since L1 is a continuous function of the parameters, the

integrand goes to 0 everywhere as n_{→ ∞. Furthermore, this function is bounded from above}

by the function (w, y)_{7→ 2 sup} ϕ∈G|hi (w, y, ϕ)hj(w, y, ϕ)|1l{kwk=q}sup ϕ∈GL1 (w, y, ϕ_{| i}1).

By Lemma4and the irreducibility of J, there exists a positive constant Cq such that

2 sup

ϕ∈G|h

i(w, y, ϕ)hj(w, y, ϕ)|1l{kwk=q}sup ϕ∈GL

1(w, y, ϕ| i1)≤ Cq.

Finally, for every t > 0, the simplex ( (y1, . . . , yws) ws X i=1 yi≤ t and ∀i, yi≥ 0 )

is contained in the hypercube Qws

i=1{yi| 0 ≤ yi ≤ t}, whose volume is tws <∞. The

domi-nated convergence theorem consequently gives A(Φn)→ A(Φ0) as n→ ∞. The convergence

(31)

Table 1: Mean and median L1

−errors associated to the estimators in case 1.

ℓ12 ℓ21 λ(1)1 λ (1) 2 λ (2) 1 λ (2) 2 Case 1 n = 50 Mean L1 −error 0.303 0.683 0.381 0.866 0.774 2.506 Median L1 −error 0.247 0.544 0.239 0.586 0.708 1.893 Mean half–length 0.818 1.740 0.791 2.177 1.611 5.380 of the 95% ACI Median half–length 0.766 0.837 0.886 2.082 1.570 5.184 of the 95% ACI Coverage probabilities 0.93 0.93 0.92 0.95 0.92 0.88 of the 95% ACI n = 100 Mean L1 −error 0.307 0.547 0.248 0.657 0.487 1.629 Median L1 −error 0.234 0.423 0.176 0.510 0.384 1.259 Mean half–length 0.632 1.208 0.561 1.529 1.194 3.704 of the 95% ACI Median half–length 0.613 1.097 0.551 1.460 1.155 3.582 of the 95% ACI Coverage probabilities 0.88 0.92 0.94 0.91 0.93 0.94 of the 95% ACI

(32)

(33)

(34)

Table 4: Estimates of the parameters on our real data set.

Parameters Estimates 95% ACI

ℓ12 0.0771 [−1.162, 1.316] ℓ21 0.0230 [−0.345, 0.391] λ(1)1 5.434 [2.329, 8.540] λ(1)2 0.762 [0.142, 1.381] λ(2)₁ 10.659 [6.567, 14.750] λ(2)₂ 5.361 [3.945, 6.776] 0 5 10 15 20 25 30 35 40 45 0 20 40 60 80 100 120 140 160 180 200

(35)

0 5 10 15 20 25 30 35 40 45 1.0 1.2 1.4 1.6 1.8 2.0