HAL Id: hal-00965279
https://hal.archives-ouvertes.fr/hal-00965279v2
Submitted on 24 Feb 2017
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
Estimating the parameters of a seasonal
Markov-modulated Poisson process
Armelle Guillou, Stéphane Loisel, Gilles Stupfler
To cite this version:
Armelle Guillou, Stéphane Loisel, Gilles Stupfler. Estimating the parameters of a seasonal
Markov-modulated Poisson process. Statistical Methodology, Elsevier, 2015, 26, pp.103-123. �hal-00965279v2�
Estimating the parameters of a seasonal
Markov-modulated Poisson process
Armelle Guillou
(1), Stéphane Loisel
(2)& Gilles Stupfler
(3) (1) Université de Strasbourg & CNRS, IRMA, UMR 7501,7 rue René Descartes, 67084 Strasbourg Cedex, France
(2) Université Lyon 1, Institut de Science Financière et d’Assurances,
50 avenue Tony Garnier, 69007 Lyon, France
(3) Aix Marseille Université, CNRS, EHESS, Centrale Marseille, GREQAM UMR 7316,
13002 Marseille, France
Abstract. Motivated by seasonality and regime-switching features of some insurance claim counting processes, we study the statistical analysis of a Markov-modulated Poisson process featuring seasonality. We prove the strong consistency and the asymptotic normality of a max-imum split-time likelihood estimator of the parameters of this model, and present an algorithm to compute it in practice. The method is illustrated on a small simulation study and a real data analysis.
Keywords: Markov-modulated Poisson process, seasonality, split-time likelihood, strong
consistency, asymptotic normality.
MSC 2010 Subject Classifications: Primary 62M05, 62F12; Secondary 60F05, 60F15.
1
Introduction
It is often the case that the insurance claim frequency is impacted by environment variables. For instance, flood risk is higher in a period of frequent heavy rains, and fire risk is more
intense when the weather is particularly dry. Such environment variables may be hidden to some extent to the practitioner: for instance, it is now accepted that the probabilities of severe floods in Australia, strong snowstorms in North America or hurricanes on the East Coast of the United States increase during La Niña episodes (see Neumann et al. [17], Cole and Pfaff [5], Parisi and Lund [20] and Landreneau [10]). This is now taken seriously by most reinsurers as well as Lloyd’s and the UK Met Office [13]. However, observing and understanding the role of those variables is not easy, which makes it realistic to consider these variables as unobserved so far.
To take such a dependency into account, one may for instance assume that the underlying environment process is a Markov process J in continuous time and that in each state of J, the claim counting process N is a Poisson process. The resulting bivariate process (J, N ) is then called a Markov-Modulated Poisson Process (MMPP). MMPPs have been used in different fields during the past forty years, in particular in data traffic systems (see e.g. Salvador et al. [27]), for ATM sources (see Kesidis et al. [9]), in manufacturing systems (see e.g. Ching et al. [4]) or even in ecology (see for example Skaug [29] for applications of MMPPs to clustered line transect data). The MMPP cookbook by Fischer and Meier-Hellstern [6] sums up the main results and ideas that were behind the rise of MMPP applications. The idea of considering
Markov modulation in insurance was first introduced by Asmussen [2]; the obtained model
can capture the fact that the insurance claim frequency may be modified if climatic, political or economic factors change. Such a model has gained considerable attention recently: see for instance Lu and Li [15], Ng and Yang [18], Zhu and Yang [33] and Wei et al. [32]. The parameters of an MMPP are often estimated using a Maximum Likelihood Estimator (MLE), whose consistency was proved in Rydén [22]. Various methods have been suggested to compute the MLE; a standard tool is the Expectation-Maximization (EM) algorithm, see Rydén [25] for the implementation of this procedure for the estimation of the parameters of an MMPP. We finally mention that in a recent paper, Guillou et al. [7] introduced a new MMPP-driven loss process in insurance with several lines of business, showed the strong consistency of the MLE and fitted their model to real sets of insurance data using an adaptation of the EM algorithm. Of course, once the MMPP model is fitted, it is possible to use Bayesian techniques to determine probabilities to be in each state and consequently the average number of events during the
next period (see e.g. Scott [28]). If external information is present, then it is possible to enrich this Bayesian estimation. This is for example the case for some long-tailed non-life insurance businesses, where indices of sectorial inflation can provide useful information. For reinsurance cycles, large claims that may cause a cycle phase change as well as other aspects of competition or adverse development of reserves can sometimes be plugged into the Bayesian estimation process. For some other risks however, even if we feel that a phenomenon might have an impact on the claim frequency, it may be very hard to come up with a measure of such a phenomenon (for instance, the El Niño-La Niña phenomena). In that case our non-Bayesian framework is of interest for actuarial risk assessment.
Furthermore, many examples of practical applications in insurance display some sort of seasonal variation. For example, theft in garages are more frequent before Christmas as people tend to store Christmas gifts there, fire risk is more intense in the summer, and hurricanes occur mostly between June and November on the East Coast of the United States. These random, cyclic factors and their impact on insurance risk, which need to be taken into account to carry out a proper regime switching analysis, are yet to be understood and forecasted. In an inhomogeneous context with deterministic intensity function, Lu and Garrido [14] have fitted double-periodic Poisson intensity rates to hurricane data, for particular parametric forms (like double-beta and sine-beta intensities) to hurricane data. Helmers et al. [8] have provided an in-depth theoretical statistical analysis of such doubly periodic intensities. We aim at carrying out a theoretical statistical analysis in a stochastic intensity framework with seasonality. An important aspect of pricing in non-life insurance concerns segmentation: thanks to general-ized linear models or more sophisticated techniques, the insurer takes into account explanatory variables to adapt the price of the contract and avoid adverse selection (see Ohlsson and Jo-hansson [19]). Besides, individual ratemaking is updated thanks to credibility adjustments in order to take into account the claim history of each policyholder or contract (see Bühlmann and Gisler [3]). Our approach is only operational and interesting at the aggregated risk man-agement level, and it would be very challenging to try to combine it with regression techniques, from a theoretical point of view as well as the practical point of view since a very large number of data points would be needed to ensure that the estimation is reasonably accurate. The only simple way to combine the two approaches would be to assume that the seasonal and
Marko-vian intensities are multiplied for each class of contract by a number that corresponds to the risk of each class and does neither depend on the season nor on the state of the environment. In other terms, the ratio between claim intensities for classes A and B, say, should remain con-stant over time. This is of course a strong limitation; in wind-related risk for instance, some buildings are more subject to damage coming from wind effects than others, and consequently the rise of their risk should be sharper during windy episodes.
In this paper, we thus consider an MMPP featuring seasonality, and study estimation issues for this process when the environment process is unobserved. A further motivation for the introduction of this type of process appears in the Solvency II insurance regulation framework. In this framework, it is required to carry out an ORSA (Own Risk and Solvency Assessment). The ORSA has both qualitative and quantitative components, including the continuous com-pliance requirement: the insurer must be able to provide evidence that he/she satisfies solvency requirements at all times (see Vedani and Ramaharobandro [30]). This leads insurers to define confidence regions or to use a continuous time model. The MMPP with seasonality is one interesting potential tool for this new challenge for some non-life insurance companies. A problem, however with this type of process is that contrary to when there is no seasonality, the random sequence of the inter-event times is not ergodic. Studying the asymptotic properties of the MLE, as Rydén [22] and Guillou et al. [7] do, is thus very difficult; furthermore, the particular structure of the likelihood makes it hard to compute the MLE in practice. To tackle this issue, we borrow an idea of Rydén [23,24] and we introduce a Split-Time Likelihood (STL). The logarithm of this quantity is shown to be a sum of ergodic random variables; maximizing the STL then yields a Maximum Split-Time Likelihood Estimator (MSTLE) whose consistency and asymptotic normality can be proven using regenerative theory.
The outline of the paper is as follows. We give a precise definition of our model in Section 2
where we also introduce our estimator. The asymptotic properties are studied in Section3. We explain how to implement our estimation technique in practice in Section4where we illustrate the performance of our estimator on a small simulation study and a real data analysis. Our paper ends by a conclusion and discussion of the results in Section 5. Technical lemmas are given and proven in the appendix.
2
Our model and estimation procedure
2.1
Our model
We consider an irreducible continuous-time Markov process J with generator L on the state
space {1, . . . , r}, where r ≥ 2. Consider further a counting process N and real numbers
τ0= 0 < τ1<· · · < τk−1< 1 = τk such that for every q∈ N, on the intervals [q + τs−1, q + τs),
if J is in state i, then N is a Poisson process with jump rate λ(s)i , where λ(s)i ≥ 0. The time interval [q + τs−1, q + τs) represents season s of the period q + 1. The case k = 1, in which there
is actually no seasonality, is just the standard MMPP, which was considered from a statistical point of view by Rydén [22]; in this paper, we shall focus on the case k≥ 2. The jump intensity of the counting process N is thus modulated not only by the random switches of the Markov process J but also by switches from one season to another, which happen at nonrandom times. The context of our work is the following: let us assume that the process N has been observed until time n∈ N \ {0}, so that the available data consists of
1. the number r of states of J and the times τ1, . . . , τk−1,
2. the full knowledge of the process N between time 0 and time n.
The goal is to estimate the unknown parameters of the model, namely the elements ℓij of the
transition intensity matrix L of J and the jump intensities λ(s)i of N . Since the modulating Markov process J is not observed, estimating the parameters of this model is not straightfor-ward. For the sake of shortness, we let Φ be the global parameter of the model: Φ consists of the values ℓij, for 1 ≤ i, j ≤ r and i 6= j and of the λ(s)i , 1 ≤ i ≤ r and 1 ≤ s ≤ k. The
distribution of the process with parameter Φ is denoted by PΦand we let E be the parameter
space
E =
Φ| L(Φ) is irreducible and min
1≤i≤r 1≤s≤kmin λ (s)
i (Φ) > 0
.
The space E can be seen as the set of those parameters for which in any state of J and in
any season, an event, namely a jump of N , can occur with positive probability, while the irreducibility assumption makes sure that all states of J are visited infinitely often and thus
|E| = r(r − 1) + rs parameters. Finally, define, for 1 ≤ s ≤ k and 0 ≤ y ≤ τs− τs−1 the
matrix-valued functions
f (y, s, Φ) = exp(y(L(Φ)− Λ(s)(Φ)))Λ(s)(Φ)
and F (y, s, Φ) = exp(y(L(Φ)− Λ(s)(Φ)))
where Λ(s)(Φ) = diag(λ(s)
1 (Φ), . . . , λ (s)
r (Φ)). The functions f and F , which will be instrumental
in defining our procedure, have a nice probabilistic interpretation (see Meier-Hellstern [16]): for any i, j∈ {1, . . . , r},
fij(y, s, Φ) dy := P(T1(s)∈ dy, J(y) = j | J(0) = i)
and Fij(y, s, Φ) := P(T1(s)> y, J(y) = j| J(0) = i),
whereT1(s)is the time of the first event of an MMPP whose Markov-modulating process is J,
with jump intensities given by Λ(s). The functions f (·, s, Φ) and F (·, s, Φ) can thus be seen as
the probability density and survival functions of the inter-event times for given states of J in season s, respectively.
2.2
A Maximum Split-Time Likelihood Estimator
It is impossible to apply any known results on MMPPs here since the jump intensity in a given state of J changes as time goes by; especially, given that J is in state j, N is an inhomogeneous Poisson process. To overcome this issue, we introduce some notation: let Wq,s be the number
of jumps of N during season s of the period q, and let T1(q,s), . . . , T (q,s)
Wq,s be the successive
jump times of N during this season. Let Yl(q,s) = Tl(q,s)− Tl−1(q,s) be the inter-event times in season s (with T0(q,s)= (q− 1) + τs−1). It is assumed that the starting distribution of J is its
unique stationary distribution a(Φ) on{1, . . . , r}, that is, the only row vector a(Φ) such that a(Φ)L(Φ) = 0 and the sum of the entries of a(Φ) is equal to 1.
Let Zq = (Wq,s, Y1(q,s), . . . , Y (q,s)
Wq,s)1≤s≤k be the random vector representing the information
available for period number q. With this notation, the process (Zq)q≥1 is stationary because
J is a stationary Markov process. Besides, given the states of the irreducible Markov chain (J(q))q∈N, the random variables Zq, q≥ 1 are independent, so that arguing along the lines of
the proof of Lemma 1 in Leroux [12] shows that the process (Zq)q≥1is ergodic. We denote by
Z = (Ws, Y1(s), . . . , Y (s)
Ws)1≤s≤k a random vector which shares the distribution of the Zq, q≥ 1.
Let thenL1(Z, Φ) be the likelihood of the observations over one period, computed under the
parameter Φ: if z = (ws, y1(s), . . . , y (s) ws)1≤s≤k, L1(z, Φ) = a(Φ) " k Y s=1 ( ws Y l=1 f (yl(s), s, Φ) ! F τs− τs−1− ws X l=1 y(s)l , s, Φ !)# 1r
where 1r is the column vector of size r having all entries equal to 1. This quantity can be
expanded as: L1(z, Φ) = X i0,...,ik+1 ai0(Φ) k Y s=1 " e′ is−1 ws Y l=1 f (y(s)l , s, Φ) × F τs− τs−1− ws X l=1 y(s)l , s, Φ ! eis # (2.1) where ej is the column vector of size r having all entries equal to 0 except the jth which is
equal to 1, and e′
j is the transpose of ej. Following Rydén [24], we let
ST Ln(Φ) = n
Y
q=1
L1(Zq, Φ)
be the split-time likelihood (STL) of the observations. If the random variables Zq, q≥ 1 were
independent, the STL would be the total likelihood of the observations. While these variables are actually not independent, the sequence (Zq)q≥1 possesses regenerative properties, which
implies in particular that it possesses independence properties given a suitable sequence of increasing random times; this is the key idea to the proofs of our main results (see the proof
of Theorem2). An MSTL is then any parameter that maximizes the STL, or equivalently the
log-STL logST Ln(Φ) = n X q=1 logL1(Zq, Φ)
and an MSTLE is any estimator of an MSTL.
3
Asymptotic results
We shall write Φ ∼ Φ′ whenever the distributions of Z under P
Φ and PΦ′ agree. The
modulo∼, is a difficult question; for the case of an MMPP, this was discussed in Rydén [26]. In the special case when there is a season for which the jump intensities of N are distinct, which shall be the case we consider in our simulation study and real data analysis, this problem does however possess a simple solution:
Proposition 1. Let Φ∈ E. Assume that for some s ∈ {1, . . . , k}, the matrix Λ(s)(Φ)
pos-sesses distinct diagonal elements. Then the equivalence class of Φ modulo∼ reduces to those parameters Φ′ obtained by permutation of the states of the underlying Markov process J.
Proof of Proposition1. Let Φ′ be such that Φ
∼ Φ′ and (J, N ) (resp. (J′, N′)) have
distri-bution PΦ(resp. PΦ′). By stationarity, the restriction of the process (J, N ) (resp. (J′, N′)) to
the time interval [τs−1, τs] actually has the same distribution as the restriction of an MMPP
(J , N ) (resp. (J′,N′)) with parameters (L(Φ), Λ(s)(Φ)) (resp. (L(Φ′), Λ(s)(Φ′))) to the time
interval [0, τs− τs−1]. Using again the stationarity property of the underlying Markov process,
it is then enough to show that these two processes are deduced from one another (in the distri-butional sense) by a permutation of the states. Applying Corollary 1 in Rydén [26] completes the proof.
Roughly speaking, in this case, the strong identifiability constraint on season s carries over, in some sense, to the whole process by the stationarity property of J. Back to the general case, the following consistency result holds:
Theorem 1. Let Φ0∈ E be the true value of the parameter and Φ0 be the equivalence class of
Φ0 modulo ∼. Let K be a compact subset of E such that Φ0 ∈ K and bΦn be the MSTLE for
Φ0on K, computed over n periods. Then if O⊂ K is an open set containing K ∩ Φ0, one has
b
Φn∈ O almost surely for n large enough.
Proof of Theorem1. The proof is similar to that of Theorem 1 in Rydén [24]: let Φ∈ E be
such that Φ6∼ Φ0 and GΦ be a neighborhood of Φ as in Lemma2. If B(Φ, 1/q) denotes the
Euclidean open ball inE with center Φ and radius 1/q, the continuity of the map ϕ 7→ L1(Z, ϕ)
yields
sup
ϕ∈GΦ∩B(Φ,1/q)
Noticing that ϕ∈GΦsup∩B(Φ,1/q) logL1(Z, ϕ) ≤ supϕ∈GΦlogL1(Z, ϕ) + | log L1(Z, Φ)|
the dominated convergence theorem implies EΦ 0 " sup ϕ∈GΦ∩B(Φ,1/q) logL1(Z, ϕ) # → EΦ0[logL1(Z, Φ)] as q→ ∞. (3.1)
Since Φ6∼ Φ0, the information inequality (see Rao [21]) gives
EΦ0[logL1(Z, Φ)] + 2ε < EΦ0[logL1(Z, Φ0)] (3.2)
for some ε > 0. It is thus a consequence of (3.1) and (3.2) that there exists a (possibly different)
neighborhood GΦ of Φ with EΦ 0 sup ϕ∈GΦ logL1(Z, ϕ) ≤ EΦ0[logL1(Z, Φ0)]− ε. (3.3)
Besides, since (Zq)q≥1 is ergodic,
1 nϕ∈GsupΦ logST Ln(ϕ) ≤ 1 n n X q=1 sup ϕ∈GΦ logL1(Zq, ϕ)→ EΦ0 sup ϕ∈GΦ logL1(Z, ϕ) and 1 nlogST Ln(Φ0) = 1 n n X q=1 logL1(Zq, Φ0)→ EΦ0[logL1(Z, Φ0)]
almost surely as n→ ∞. Hence
lim sup n→∞ 1 nϕ∈GsupΦ logST Ln(ϕ)≤ EΦ0 sup ϕ∈GΦ logL1(Z, ϕ) ≤ EΦ0[logL1(Z, Φ0)]− ε
almost surely as n→ ∞, by (3.3). Finally, remark that the compact set Oc
∩ K, where Oc is
the complement of O, may be covered by a finite number of such neighborhoods GΦi, 1≤ i ≤ d;
this yields sup ϕ∈Oc∩K{log ST Ln (ϕ)− log ST Ln(Φ0)} ≤ max 1≤i≤d ( sup ϕ∈GΦi logST Ln(ϕ)− log ST Ln(Φ0) )
which tends to−∞ almost surely as n → ∞. As a consequence, necessarily bΦn ∈ O for n large
We now wish to obtain an asymptotic normality result for our estimator. In what follows, we pick i0 ∈ {1, . . . , r} and we let ωk be the successive times when the Markov chain (J(q))
reaches i0:
ω1= min{q ≥ 1 | J(q) = i0} and ∀k ≥ 1, ωk+1= min{q > ωk| J(q) = i0}.
Let further Pi0
Φ(·) = PΦ(· | J(0) = i0) be the probability measure deduced from PΦgiven that
J starts at i0.
Note that for every Φ∈ E, since L(Φ) is the generator of an irreducible continuous-time Markov chain on a finite state space, then 0 is an eigenvalue of the transpose L′(Φ) of L(Φ) with
multiplicity 1 and related normalized eigenvector a′(Φ). Since the map ϕ
7→ L′(ϕ) is infinitely
continuously differentiable in a neighborhood of Φ, a straightforward extension of Theorem 8 in Chapter 9 of Lax [11] shows that the map ϕ7→ a′(ϕ) is infinitely continuously differentiable in a
neighborhood of Φ. The function ϕ7→ log L1(Z, ϕ) is thus infinitely continuously differentiable
in a neighborhood of the true parameter Φ0; if EiΦ0 denotes the expectation under the measure
Pi0
Φ, we can set for Φ close enough to Φ0
hi(z, Φ) = ∂ logL1 ∂ϕi (z, Φ), Aij(Φ) = EiΦ0 ω1 X q=1 hi(Zq, Φ)hj(Zq, Φ) ! , Vij(Φ) = EiΦ0 ω1 X p,q=1 hi(Zp, Φ)hj(Zq, Φ) ! .
We assume that the matrix A(Φ) = (Aij(Φ)) is invertible for Φ = Φ0 and we let V (Φ) =
(Vij(Φ)), C(Φ) = [ai0(Φ)]
−1A−1(Φ)V (Φ)A−1(Φ) for Φ in a neighborhood of Φ 0.
Our asymptotic normality result for bΦn is as follows:
Theorem 2. Let Φ0 ∈ E be the true value of the parameter and Φ0 be the equivalence class
of Φ0 modulo∼. Assume that Φ0 lies in the interior ofE and let K be a compact subset of E,
whose interior contains Φ0, such that K∩ Φ0={Φ0}. Let bΦn be the MSTLE computed on K
over n periods. Then √
Proof of Theorem 2. We start as in the proof of Theorem 4 in Rydén [24]. Condition K∩ Φ0={Φ0} ensures that bΦn → Φ0 almost surely as n→ ∞. Especially, with probability
1, one has bΦn ∈ K for n large enough by Theorem1. For such n, a Taylor expansion of the
i−th partial derivative of Φ 7→ log ST Ln(Φ) at Φ0 implies that
T1,n:= 1 √n n X q=1 hi(Zq, Φ0) = T2,n+ T3,n (3.4) where T2,n:=− |E| X j=1 √ n(bΦn,j− Φ0,j) " 1 n n X q=1 ∂2log L1 ∂ϕj∂ϕi (Zq, Φ0) # T3,n:=−1 2 |E| X j=1 √ n(bΦn,j− Φ0,j) |E| X k=1 (bΦn,k− Φ0,k)1 n n X q=1 ∂3logL 1 ∂ϕk∂ϕj∂ϕi(Zq, eΦn) , with eΦn some point on the line connecting Φ0 and bΦn. We start by dealing with the
right-hand side of equality (3.4). To this end, use Lemma4and the fact that the process (Zq)q≥1 is
ergodic to obtain T2,n=− |E| X j=1 √ n(bΦn,j− Φ0,j)EΦ0 ∂2log L1 ∂ϕj∂ϕi (Z, Φ0) (1 + oP(1)). Moreover, since ∂2logL 1 ∂ϕj∂ϕi = 1 L1 ∂2L 1 ∂ϕj∂ϕi − 1 L2 1 ∂L1 ∂ϕi ∂L1 ∂ϕj ,
it is a consequence of Lemma4and of a differentiation under the expectation sign that
T2,n = |E| X j=1 √ n(bΦn,j− Φ0,j)EΦ0(hi(Z, Φ0)hj(Z, Φ0))(1 + oP(1)) = ai0(Φ0) |E| X j=1 √ n(bΦn,j− Φ0,j)EiΦ00 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! (1 + oP(1)), (3.5)
by Lemma6. Besides, Lemma4, the ergodicity of (Zq)q≥1 and the consistency of bΦn entail
T3,n= oP |E| X j=1 √ n(bΦn,j− Φ0,j) (3.6)
as n→ ∞. Collecting (3.5) and (3.6), we obtain T2,n+ T3,n= ai0(Φ0) |E| X j=1 √n(bΦ n,j− Φ0,j)EiΦ00 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! (1 + oP(1)) (3.7)
as n→ ∞. We now examine the asymptotic properties of T1,n: notice that (Zq)q≥1is
regener-ative with associated cycle lengths (Cq = ωq−ωq−1)q≥1(where we set ω0= 0 for convenience),
that is:
• for each l ≥ 2, the random process (Cl+1+q, Zωl+q)q≥1 is independent of the ωj, 1≤ j ≤
l− 1 and its distribution does not depend on l;
• given (ωq)q≥1, q≥ 1, the random vectors Zj, ωq−1≤ j ≤ ωq− 1 are independent.
We may thus apply a law of large numbers similar to the one proposed in Theorem 2 in Rydén [24] (see also Asmussen [1]): this leads to
1 √nT1,n= 1 n n X q=1 hi(Zq, Φ0) P −→ ai0(Φ0)E i0 Φ0 ω1 X q=1 hi(Zq, Φ0) ! as n→ ∞
Besides, thanks to Lemma4, we may once again differentiate under the integral sign to obtain EΦ
0(hi(Z, Φ0)) = 0. Lemma4and the ergodicity of (Zq)q≥1 thus yield
1 √nT1,n= 1 n n X q=1 hi(Zq, Φ0)→ 0 almost surely as n → ∞.
Combining these last two convergences entails Ei0 Φ0 ω1 X q=1 hi(Zq, Φ0) ! = 0. Finally Ei0 Φ0 Pω1 q=1hi(Zq, Φ0)
2 < ∞, see Lemmas4 and 5; applying a central limit theorem similar to the one proposed in Theorem 3 in Rydén [24] then gives
T1,n = √1 n n X q=1 " hi(Zq, Φ0)− ai0(Φ0)E i0 Φ0 ω1 X q=1 hi(Zq, Φ0) !# d −→ N (0, ai0(Φ0)V (Φ0)) as n→ ∞. (3.8)
The interesting point about MSTLEs is that confidence regions for Φ0may be computed fairly
easily, see Lemma 7: it is enough to compute the MSTLE bΦn with a numerical method, to
generate random copies of the process under the law PΦbn and to apply the strong law of large
numbers to get an approximation of C(bΦn) and hence of C(Φ0).
We close this section by noting that Lemma6 makes it possible to give further insight on the non-singularity of A(Φ0). If A(Φ0) were singular then, by Lemma 6, we would have
∃j ∈ {1, . . . , |E|}, ∀i ∈ {1, . . . , |E|}, EΦ0(hi(Z, Φ0)hj(Z, Φ0)) =
|E|
X
ℓ=1 ℓ6=j
γℓEΦ0(hi(Z, Φ0)hℓ(Z, Φ0))
where the γℓ are real constants. It is then straightforward that
EΦ 0 hj(Z, Φ0)− |E| X ℓ=1 ℓ6=j γℓhℓ(Z, Φ0) 2 = 0
so that the partial derivatives of logL1(·, Φ0) are linearly dependent. The structure of the
log-likelihood (over one period) being highly non-linear, one may therefore expect that A(Φ0)
will be invertible in the vast majority of cases.
4
Simulation study and real data analysis
4.1
Practical implementation
The MSTLE is typically computed using a (quasi-)Newton algorithm. Such an algorithm being iterative, we give a method to start it in practice. Let wq,sbe the number of jumps of N during
season s of the period q, t(q,s)1 , . . . , t (q,s)
wq,s be the successive jump times of N during this season
and let u(q,s)l = t(q,s)l − τs−1− (q − 1)(1 − (τs− τs−1)), 1≤ l ≤ wq,s.
For all 1≤ s ≤ k, we treat the data Ds= (u(q,s)l )q,l as if it were as a sample of a univariate
MMPP with parameters Φ(s) = (L, λ(s)
1 , . . . , λ (s)
r ). An EM algorithm is then used to provide
a first estimate of Φ(s); the initial estimation procedure is a straightforward adaptation of the
one in Guillou et al. [7].
Let (eL(s), eλ(s) 1 , . . . , eλ
(s)
r ) be the EM estimations obtained with this procedure. The EM
square matrix having size r and σ is a permutation of{1, . . . , r}, M ◦ σ = (mσ(i)σ(j)). Let, for such permutations σ1, . . . , σk, e L(σ1,...,σk)= 1 k k X s=1 e L(s)◦ σs and e Φ(σ1,...,σk)= (eL(σ1,...,σk), (eλ(s) σs(1), . . . , eλ (s) σs(r))1≤s≤k).
We then run the optimization algorithm starting from each of these values: in doing so, we obtain estimates bΦ(σ1,...,σk)of the parameters. Finally, we compute
(σ10, . . . , σk0) = arg max σ1,...,σk
ST Ln(bΦ(σ1,...,σk))
and our MSTLE is bΦ = bΦ(σ0 1,...,σ
0 k).
4.2
Simulation study
In this section, we examine how our estimator behaves on a finite sample situation. We are motivated by climate-related statistical problems such as the impact of the El Niño-La Niña cycle on hurricane risk. We choose reasonable parameters for a simple illustration, but cannot claim that those parameters are estimated on a real data set, simply because it is very hard to understand the impact of this effect on hurricane frequency. We consider the situation r = 2, that is, the underlying Markov process J is a two-state continuous-time Markov process. We further choose k = 2 and τ1 = 1/2, so that any period is divided into two seasons of equal
length. The following cases are considered: • Case 1: ℓ12= 1, ℓ21= 2, λ(1)1 = 1, λ (1) 2 = 5, λ (2) 1 = 5, λ (2) 2 = 25. • Case 2: ℓ12= 3, ℓ21= 10, λ(1)1 = 1, λ (1) 2 = 5, λ (2) 1 = 5, λ (2) 2 = 25. • Case 3: ℓ12= 1, ℓ21= 2, λ(1)1 = 1/2, λ (1) 2 = 5, λ (2) 1 = 5/2, λ (2) 2 = 25.
For each of these cases, we consider the situations n = 50 and n = 100. Our estimation procedure is carried out on S = 100 replications of the considered process. In each case,
we compute the mean and median L1
−error related to each parameter. We also computed
display the mean and median half-length of these intervals as well as the estimated coverage probabilities in Tables1–3.
Following the intuition, the jump intensities are estimated more accurately when the ratio of the claim frequencies in different states is further from 1. This is because identifying the hidden states of J is easier when both regimes of the process are further apart, and thus more information on the process can be recovered in such cases. Besides, estimating the elements of L is harder than estimating the jump intensities since J is unobserved, which is confirmed by our results. Coverage probabilities are globally satisfying; although they are fairly low in case 2 for n = 50, they improve to reasonable levels for n = 100.
From this simulation study, one can see that in practice, if an insurance company had around 50 or 100 years of data, which would be an ideal situation, for intensities like the ones we chose (as far as the El Niño-La Niña-related problem on hurricane risk is concerned, 5 times as many hurricanes during the hurricane season as in the other one, and 5 times as many hurricanes during a bad El Niño-La Niña phase as in a favorable one), it is feasible to estimate the parameters with an average relative error of around 10 to 20%. Of course, we have taken here a quite favorable case, as we have only two states of the environment and two seasons with fixed intensities. Considering 5 or more states and 4 seasons would require a period of observation too long for the approach to be reasonable. However, in most hidden Markov models in insurance and finance, the number of states is 2 or 3. In the near future, one could probably refine these estimations thanks to partial observations of the environment process, as climatologists understand better and better the El Niño-La Niña phenomenon and its impact on insurance perils.
4.3
Real data analysis
In this section, we focus on extreme climate events (large bushfires, cyclones, floods, hail, storms, tornados, tsunamis) experienced by Australia from March 1967 to March 2012. During this timeframe, 193 such extreme events were observed. Their frequency is largely influenced by a seasonality effect; the first “season” we consider is fall-winter, which in Australia roughly corresponds to the end of March until the end of September, and the second one is spring-summer, namely from the beginning of October to mid-March. Moreover, extreme climate
events tend to be influenced by external effects such as the El Niño-La Niña phenomenon or even global phenomena like climate change, which motivates the use of our technique on this data set. Our unit of time is the standard calendar year; our data is represented in Fig.1and our estimates are given in Table4. A reconstruction of the underlying process J, carried out by a straightforward adaptation of the Viterbi algorithm (see Viterbi [31]) to our case, is given in Fig.2.
The algorithm seems to identify an underlying process J with very small transition rates. A consequence of this is that the confidence intervals for the elements of L are poor. This is not the case however for the jump intensities, and the algorithm confirms that in a given state of J, the intensities in both seasons are significantly different, thus emphasizing the importance of taking into account the seasonality effect. Furthermore, the frequency of extreme climate events is significantly higher in the spring-summer season, which was expected at least as far as bushfires, cyclones and storms were concerned. We underline that, due to the fact that there are very few transitions of J, the Markov assumption on this process is questionable; nevertheless, as already observed in Guillou et al. [7] on real sets of non-life and life insurance data, the algorithm correctly matches a point where the slope of N sharply increases with a breakpoint in the data set where the hidden process switches from state 2 to state 1, thus leading to higher frequencies of jumps of the counting process. We thus believe that our approach is superior to a naive model using seasonal versions of several different Poisson processes since such an approach would not be helpful in identifying potential unknown breakpoints in the global structure of the model. It is also superior to the use of a classical MMPP approach as in Guillou et al. [7] which would not be able to handle the seasonal components of the data.
5
Conclusion and discussion
In this paper, we introduce an MMPP featuring seasonality for which we discuss estimation issues when the environment process is unobserved. After establishing the main asymptotic properties of the resulting estimator, we illustrate its performance on a small simulation study in which we observe that the number of states is crucial. We conclude our numerical section by a real data analysis on extreme climate events in Australia on which our developed methodology
seems to perform well. In the real data study, the Markov assumption is of course questionable, as well as the absence of impact of climate change on the parameters. Extending the number of states in order to get a Markov process is feasible in theory, but would lead to estimation problems that would not be tractable in practice. Another problem, when considering climate-related problems such as the motivating hurricane risk example, is that in different states of the environment, the seasons may be longer or shorter, as conditions are more or less met for events to happen. This is not considered here and would require further theoretical analysis; linked to this concern, if one had further details about the events and access to meteorological data, it would of course be relevant to distinguish between categories of extreme events and to carry out a more detailed analysis. Having said that, even if ENSO values were available, their measure would not be precise enough and their link with claim intensity and frequency would still be too loose to use credibility adjustments at this stage, although it is likely to be possible in the near future. Finally, note that we focus here on frequency of extreme events, but of course for the insurer the claim amounts are important. Nevertheless, our approach accounts for periods of time when there are significantly more events, which is very important for reinsurance and risk management purposes.
Acknowledgments
The authors thank the associate editor and two anonymous referees for their helpful comments which led to significant improvements of this article.
References
[1] S. Asmussen, Applied probability and queues, Wiley, New York, 1987.
[2] S. Asmussen, Risk theory in a Markovian environment, Scand. Actuar. J. 2 (1989) 69–100. [3] H. Bühlmann, A. Gisler, A course in credibility theory and its applications, Springer,
[4] W.K. Ching, R.H. Chan, X.Y. Zhou, Circulant preconditioners for Markov-modulated Poisson processes and their applications to manufacturing systems, SIAM J. Matrix Anal. Appl. 18 (1997) 464–481.
[5] J.D. Cole, S.R. Pfaff, A climatology of tropical cyclones affecting the Texas coast
during El Niño/non-El Niño years: 1990-1996, Technical Attachment SR/SSD
97-37, National Weather Service Office, Corpus Christi, Texas, available at
http://www.srh.noaa.gov/topics/attach/html/ssd97-37.htm(1997).
[6] W. Fischer, K.S. Meier-Hellstern, The Markov-modulated Poisson process (MMPP) cook-book, Perf. Eval. 18 (1993) 149–171.
[7] A. Guillou, S. Loisel, G. Stupfler, Estimation of the parameters of a Markov-modulated loss process in insurance, Insurance Math. Econom. 53 (2013) 388–404.
[8] R. Helmers, I.W. Mangku, R. Zitikis, A non-parametric estimator for the doubly periodic Poisson intensity function, Stat. Methodol. 4 (2007) 481–492.
[9] G. Kesidis, J. Walrand, C.-S. Chang, Effective bandwidths for multiclass Markov fluids and other ATM sources, IEEE/ACM Trans. Netw. 1 (1993) 424–428.
[10] D. Landreneau, Atlantic tropical storms and hurricanes affecting the United
States: 1899-2000, NOAA Technical Memorandum NWS SR-206 (updated through 2002), National Weather Service Office, Lake Charles, Louisiana, available at
http://www.srh.noaa.gov/lch/?n=tropical(2001).
[11] P.D. Lax, Linear Algebra and its applications, Wiley, New York, 2007.
[12] B.G. Leroux, Maximum-likelihood estimation for hidden Markov models, Stochastic Pro-cess. Appl. 40 (1992) 127–143.
[13] Lloyd’s Forecasting risk, The value of long-range forecasting for the insurance industry, joint report with the UK Met Office, 2010.
[14] Y. Lu, J. Garrido, Doubly periodic non-homogeneous models for hurricane data, Stat. Methodol. 2 (2005) 17–35.
[15] Y. Lu, S. Li, On the probability of ruin in a Markov-modulated risk model, Insurance Math. Econom. 37 (2005) 522–532.
[16] K.S. Meier-Hellstern, A fitting algorithm for Markov-modulated Poisson processes having two arrival rates, European J. Oper. Res. 29 (1987) 970–977.
[17] C.J. Neumann, B.R. Jarvinen, C.J. McAdie, J.D. Elms, Tropical Cyclones of the North Atlantic Ocean, 1871-1992, Historical Climatology Series 6-2, National Climatic Data Center, Asheville, North Carolina, 1993.
[18] A. Ng, H. Yang, On the joint distribution of surplus prior and immediately after ruin under a Markovian regime switching model, Stochastic Process. Appl. 116 (2006) 244–266. [19] E. Ohlsson, B. Johansson, Non-life insurance pricing with generalized linear models,
Springer, 2010.
[20] F. Parisi, R. Lund, Seasonality and return periods of landfalling Atlantic basin hurricanes, Aust. N. Z. J. Stat. 42 (2000) 271–282.
[21] C.R. Rao, Linear statistical inference and its applications, Wiley, New York, 1973. [22] T. Rydén, Parameter estimation for Markov modulated Poisson processes, Comm. Statist.
Stochastic Models 10 (1994) 795–829.
[23] T. Rydén, Consistent and asymptotically normal parameter estimates for hidden Markov models, Ann. Statist. 22 (1994) 1884–1895.
[24] T. Rydén, Consistent and asymptotically normal parameter estimates for Markov modu-lated Poisson processes, Scand. J. Stat. 22 (1995) 295–303.
[25] T. Rydén, An EM algorithm for estimation in Markov-modulated Poisson processes, Com-put. Statist. Data Anal. 21 (1996) 431–447.
[26] T. Rydén, On identifiability and order of continuous-time aggregated Markov chains, Markov-modulated Poisson processes, and phase-type distributions, J. Appl. Probab. 33 (1996) 640–653.
[27] P. Salvador, R. Valadas, A. Pacheco, Multiscale fitting procedure using Markov modulated Poisson processes, Telecommun. Syst. 23 (2003) 123–148.
[28] S.L. Scott, Bayesian analysis of a two-state Markov modulated Poisson process, J. Com-put. Graph. Statist. 8 (1999) 662–670.
[29] H.J. Skaug, Markov modulated Poisson processes for clustered line transect data, Environ. Ecol. Stat. 13 (2006) 199–211.
[30] J. Vedani, F. Ramaharobandro, Continuous compliance: a proxy-based monitoring
frame-work, preprint, available athttps://hal.archives-ouvertes.fr/hal-00866531(2013).
[31] A. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Inform. Theory IT-13 (1967) 260–269.
[32] J. Wei, H. Yang, R. Wang, On the Markov-modulated insurance risk model with tax, Blätter der DGVFM 31 (2010) 65–78.
[33] J. Zhu, H. Yang, Ruin theory for a Markov regime-switching model under a threshold dividend strategy, Insurance Math. Econom. 42 (2008) 311–318.
Appendix: technical lemmas and proofs
The first result is a necessary step to obtain the strong consistency of our estimator. Lemma 1. For every ϕ∈ E and z = (ws, y1(s), . . . , y
(s) ws)1≤s≤k, one has m(rm)k+2(rm)w1+···+wk ≤ L1(z, ϕ)≤ M(rM)k+2(rM )w1+···+wk where M := max 1, max j,s λ (s) j (ϕ) <∞ and if Ks:= [0, τs− τs−1], m := min min
i ai(ϕ), mini,j,sy∈Kmins
fij(y, s, Φ), min i,j,sy∈Kmins
Fij(y, s, Φ)
> 0.
Proof of Lemma 1. Start by remarking that
Fαβ(y, s, ϕ) = [exp(y(L(ϕ)− Λ(s)(ϕ)))]αβ= P(J(y) = β, N(s)(y) = 0| J(0) = α) ≤ 1
when (J, N(s)) is an MMPP with transition intensity matrix L(ϕ) and jump intensity matrix
Λ(s)(ϕ), see Meier-Hellstern [16]. Therefore, for every y
≥ 0, one has Fαβ(y, s, ϕ)≤ M and
fαβ(y, s, ϕ)≤ M. Using (2.1), we immediately obtain
L1(z, ϕ)≤ M(rM)k+2(rM )w1+···+wk.
Besides, the compactness of Ksand the continuity of the maps involved entail
min
i,j,sy∈Kmins
fij(y, s, Φ) > 0 and min i,j,sy∈Kmins
Fij(y, s, Φ) > 0
so that m > 0. Hence, using (2.1), the inequality
L1(z, ϕ)≥ m(rm)k+2(rm)w1+···+wk
which completes the proof.
A second pivotal tool is the following technical lemma:
Lemma 2. For every Φ∈ E, there exists a neighborhood GΦof Φ inE such that
EΦ 0 supϕ∈GΦlogL1(Z, ϕ) < ∞.
Proof of Lemma 2. Let GΦ be a neighborhood of Φ, A = sup ϕ∈GΦ L1(Z, ϕ)≤ 1 and write EΦ0 supϕ∈G Φ logL1(Z, ϕ) = EΦ0 sup ϕ∈GΦ logL1(Z, ϕ)1lAc − EΦ0 sup ϕ∈GΦ logL1(Z, ϕ)1lA ≤ EΦ0 sup ϕ∈GΦ logL1(Z, ϕ)1lAc − EΦ0[logL1(Z, Φ)1lA] . (B.1)
The goal is to show that the quantity in the right-hand side of this inequality is finite for a suitable choice of the neighborhood GΦ of Φ in E. Let z = (ws, y1(s), . . . , y
(s)
ws)1≤s≤k and
GΦ be a compact neighborhood of Φ such that ϕ 7→ a(ϕ) is continuous on GΦ and M :=
supϕ∈GΦmax
n
1, maxj,sλ(s)j (ϕ)
o
<∞. Since the logarithm function is increasing, using Lemma1
we deduce that EΦ 0 sup ϕ∈GΦ logL1(Z, ϕ)1lAc ≤ log M + " k + 2 + k X s=1 EΦ 0(Ws) # log rM <∞ (B.2)
since EΦ0(Ws) <∞ for all s. Furthermore, using Lemma1, we have
− EΦ0[logL1(Z, Φ)1lA]≤ | log m| + " k + 2 + k X s=1 EΦ 0(Ws) # | log rm| < ∞. (B.3)
for some m > 0. Using together (B.1), (B.2) and (B.3) concludes the proof.
The next result shows that for every i0∈ {1, . . . , r}, the survival function of the random time
ω1= min{q ≥ 1 | J(q) = i0} converges to 0 geometrically fast.
Lemma 3. For every i0 ∈ {1, . . . , r}, there exists a neighborhood G of Φ0 and a constant
c∈ (0, 1) such that ∀k ∈ N, sup Φ∈G Pi0 Φ(ω1> k)≤ c k. In particular, Ei0 Φ0(ω k 1) <∞ for every k ≥ 1.
Proof of Lemma 3. The result is obviously true for k = 0. Pick k≥ 1 and note that
Pi0 Φ(ω1> k) = X j6=i0 Pi0 Φ(J(1)6= i0, . . . , J(k− 2) 6= i0, J(k− 1) = j, J(k) 6= i0) = X j6=i0 Pi0 Φ(J(1)6= i0, . . . , J(k− 2) 6= i0, J(k− 1) = j)PjΦ(J(1)6= i0). (B.4)
Set, for i, j ∈ {1, . . . , r}, Pij(Φ) := PiΦ(J(1) = j) = [exp(L(Φ))]ij. In particular, the maps
Φ7→ Pij(Φ) are continuous. Moreover, since the Markov process J is irreducible, it holds that
Pij(Φ) > 0 for all i and j. Consequently
0 < c = max j6=i0 sup Φ∈G Pj Φ(J(1)6= i0) < 1. Using (B.4) entails sup Φ∈G Pi0 Φ(ω1> k)≤ c sup Φ∈G Pi0 Φ(ω1> k− 1)
which gives the desired result by induction on k.
A key element in the proof of Theorem2 is to show the following technical result:
Lemma 4. There exists a neighborhood G of Φ0 in E and positive constants C, C′ such that
for any i, j, ℓ: sup ϕ∈Gmax 1 L1(Z, ϕ),L1(Z, ϕ), ∂L1 ∂ϕi(Z, ϕ) , ∂2 L1 ∂ϕi∂ϕj(Z, ϕ) , ∂3 L1 ∂ϕi∂ϕj∂ϕℓ(Z, ϕ) ≤ C exp C′ k X s=1 Ws ! .
Especially, there exist (possibly different) positive constants C, C′ such that for any i, j, ℓ:
sup ϕ∈G max |log L1(Z, ϕ)| , ∂ logL1 ∂ϕi (Z, ϕ) , ∂2log L1 ∂ϕi∂ϕj (Z, ϕ) , ∂3log L1 ∂ϕi∂ϕj∂ϕℓ (Z, ϕ) ≤ C exp C′ k X s=1 Ws ! ,
the right-hand side of the above inequality defines a random variable with finite PΦ0−moments
and for all i1, i and j,
Ei1 Φ0 sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)| <∞.
Proof of Lemma4. Assume that G is a neighborhood of Φ0such that ϕ7→ a(ϕ) is continuous
on G and inf
ϕ∈Gminj,s λ (s)
j (ϕ) > 0. If Ks= [0, τs− τs−1] then for any ϕ∈ G, the ideas of the proof
of Lemma1 yield m := inf ϕ∈G min i ai(ϕ), miny∈Ks min
i,j,sfij(y, s, Φ), miny∈Ks
min
i,j,sFij(y, s, Φ)
> 0.
For all ϕ∈ G and every z = (ws, y1(s), . . . , y (s)
ws)1≤s≤k, Lemma1 entails
L1(z, ϕ)≥ m(rm)k+2(rm)w1+···+wk. (B.5)
If moreover G is a neighborhood of Φ0 with sup
ϕ∈G
max
i,j ℓij(ϕ) <∞ and supϕ∈Gmaxj,s λ (s) j (ϕ) <∞ then similarly M1:= sup ϕ∈G max
i ai(ϕ), maxi,j,sy∈Kmaxs
fij(y, s, Φ), max i,j,s y∈Kmaxs
Fij(y, s, Φ) <∞. Let further M2 = sup ϕ∈G max
n,p,smaxℓ y∈Kmaxs
∂f∂ϕnpℓ (y, s, ϕ) , ∂F∂ϕnpℓ (y, s, ϕ) , M3 = sup ϕ∈G max
n,p,smaxℓ,q y∈Kmaxs
∂ 2f np ∂ϕℓ∂ϕq (y, s, ϕ) , ∂ 2F np ∂ϕℓ∂ϕq (y, s, ϕ) and M4 = sup ϕ∈G max
n,p,smaxℓ,q,ry∈Kmaxs
∂3f np ∂ϕℓ∂ϕq∂ϕr (y, s, ϕ) , ∂3F np ∂ϕℓ∂ϕq∂ϕr (y, s, ϕ)
and note that M = max{M1, M2, M3, M4} < ∞. Equation (2.1), inequality (B.5) and tedious
computations may then be used to show that one may find positive constants C, C′ such that
sup ϕ∈G max 1 L1(Z, ϕ) ,L1(Z, ϕ), ∂∂ϕL1i(Z, ϕ) , ∂ 2L 1 ∂ϕi∂ϕj (Z, ϕ) , ∂ 3L 1 ∂ϕi∂ϕj∂ϕℓ (Z, ϕ) ≤ C exp C′ k X s=1 Ws ! . Consequently sup ϕ∈G max |log L1(Z, ϕ)| , ∂ log∂ϕiL1(Z, ϕ) , ∂ 2logL 1 ∂ϕi∂ϕj (Z, ϕ) , ∂ 3logL 1 ∂ϕi∂ϕj∂ϕℓ (Z, ϕ) ≤ C exp C′ k X s=1 Ws ! ,
for (possibly different) positive constants C, C′. Finally, since for every s
∈ {1, . . . , k}, Ws
is a Poisson distributed random variable, one has EΦ0(x
Ws) <∞ for all s and x > 0, which
proves that the right-hand side of the inequality above defines a random variable having finite
PΦ0−moments. The second part of the lemma is thus a straightforward consequence of the
The next result shall be used to check a couple of integrability conditions used in the proof of
Theorem2.
Lemma 5. Assume that ψ is a Borel measurable nonnegative function. We consider the
random variable U =Pω1
q=1ψ(Zq).
• If EΦ0(ψ(Z)) <∞ then for every i0∈ {1, . . . , r}, E
i0
Φ0(U ) <∞.
• If EΦ0(ψ
2(Z)) <
∞ then for every i0∈ {1, . . . , r}, EiΦ00(U
2) <
∞. Proof of Lemma 5. We start by remarking that for any l≥ 1, if EΦ0(ψ
l(Z)) <∞ then for any i1: Ei1 Φ0|ψ l(Z) | ≤ EΦ0|ψ l(Z) | ai1(Φ0) <∞ since ai1(Φ0) > 0. To prove the first statement, we then write
Ei0 Φ0(U ) = ∞ X N =1 Ei0 Φ0 N X q=1 ψ(Zq)1l{ω1=N } ! = ∞ X q=1 Ei0 Φ0(ψ(Zq)1l{ω1≥q}).
Since{ω1≥ q} =Tq−1l=0{J(l) 6= i0}, the hidden Markov structure of (Zq) yields
Ei0 Φ0(U ) = ∞ X q=1 X i16=i0 Ei1 Φ0(ψ(Z))P i0 Φ0(ω1≥ q, J(q) = i1)≤ E i0 Φ0(ω1) maxi 1 Ei1 Φ0(ψ(Z)).
Since the Markov chain (J(q)) is irreducible on {1, . . . , r} and has stationary distribution a(Φ0), one has EiΦ00(ω1) = 1/ai0(Φ0) < ∞, from which we deduce that the right-hand side is
finite.
We now turn to the second part of the lemma. Note that Ei0 Φ0(U 2) = ∞ X N =1 Ei0 Φ0 N X p,q=1 ψ(Zp)ψ(Zq)1l{ω1=N } ! = ∞ X N =1 N X q=1 Ei0 Φ0 ψ 2(Z q)1l{ω1=N } + 2 ∞ X N =1 Ei0 Φ0 N X p,q=1 p<q ψ(Zp)ψ(Zq)1l{ω1=N } . We already know from the first statement of the lemma that
∞ X N =1 N X q=1 Ei0 Φ0(ψ 2(Z q)1l{ω1=N }) = E i0 Φ0 ω1 X q=1 ψ2(Z q) ! <∞.
Further, for all Borel nonnegative functions f1, . . . , fN and all i1, . . . , iN, the hidden Markov structure of (Zq) entails Ei0 Φ0 f1(Z1)· · · fN(ZN)1l{J(1)=i1,..., J(N )=iN} = "N Y q=1 Ei0 Φ0(fq(Zq)| J(q − 1) = iq−1, J(q) = iq) # Pi0 Φ0 N \ q=1 {J(q) = iq} ! = "N Y q=1 Eiq−1 Φ0 (fq(Z1)| J(1) = iq) # Pi0 Φ0 N \ q=1 {J(q) = iq} ! . Since we may write
{ω1= N} = [ j1,...,jN −16=i0 N −1\ q=1 {J(q) = jq} ∩ {J(N) = i0} it is straightforward that ∞ X N =1 Ei0 Φ0 N X p,q=1 p<q ψ(Zp)ψ(Zq)1l{ω1=N } ≤ max i1,i2 Ei1 Φ0(ψ(Z)| J(1) = i2) 2 Ei0 Φ0(ω 2 1)
which is finite since Ei0
Φ0(ω
2
1) <∞, see Lemma3.
Lemma6 is the key to the conclusion of the proof of Theorem2.
Lemma 6. Assume that ψ is a Borel measurable function such that EΦ0|ψ(Z)| < ∞. Then
EΦ0(ψ(Z)) = ai0(Φ0)Ei0 Φ0 ω1 X q=1 ψ(Zq) ! . Proof of Lemma 6. The ergodicity of the process (Zq)q≥1 entails
1 n n X q=1 ψ(Zq)→ EΦ0(ψ(Z)) almost surely as n→ ∞.
Besides, Lemma5yields E|Pω1
q=1ψ(Zq)| < ∞, so that the regenerative properties of (Zq) and
a law of large numbers similar to the one proposed in Theorem 2 in Rydén [24] give 1 n n X q=1 ψ(Zq) P −→ ai0(Φ0)E i0 Φ0 ω1 X q=1 ψ(Zq) ! as n→ ∞
Lemma7below is the essential result needed to construct confidence regions for the estimator.
Lemma 7. If Φn→ Φ0 as n→ ∞, one has C(Φn)→ C(Φ0) as n→ ∞.
Proof of Lemma7. We shall show that if Φn→ Φ0, then A(Φn)→ A(Φ0) as n→ ∞. Let n
be so large that Φn belongs to a (compact) neighborhood G of Φ0 as in Lemma4. Using the
definition of A, one has: |Aij(Φn)− Aij(Φ0)| ≤ E i0 Φn ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! + E i0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! . Applying Lemma6and differentiating under the expectation sign yields for any Φ∈ G
Ei0 Φ0 ω1 X q=1 hi(Zq, Φ)hj(Zq, Φ) ! = 1 ai0(Φ0) EΦ0 −∂ 2log L1 ∂ϕi∂ϕj (Z, Φ) .
Lemma4 and the dominated convergence theorem thus entail
E i0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φ0)hj(Zq, Φ0) ! → 0 as n → ∞.
Afterwards, notice that for every Φ∈ E Ei0 Φ ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! =X i16=i0 ∞ X q=1 Ei1 Φ(hi(Z, Φn)hj(Z, Φn))PiΦ0(ω1≥ q, J(q − 1) = i1) so that E i0 Φn ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! − Ei0 Φ0 ω1 X q=1 hi(Zq, Φn)hj(Zq, Φn) ! ≤ r1,n+ r2,n with r1,n = C X i16=i0 q≥1 Pi0 Φn(ω1≥ q, J(q − 1) = i1)− P i0 Φ0(ω1≥ q, J(q − 1) = i1) and r2,n = max i1 Ei1 Φn(hi(Z, Φn)hj(Z, Φn))− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)) X q≥1 Pi0 Φn(ω1≥ q) where we set C = max i1 Ei1 Φ0 sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)|
which is finite by Lemma4. We start by controlling r1,n. Set Pij(Φ) = PiΦ(J(1) = j) as in the
proof of Lemma3. For all q≥ 1 and j 6= i0,
Pi0 Φn(ω1≥ q, J(q − 1) = j) = X i1,...,iq−26=i0 Pi0,i1(Φn)· · · Piq−2,j(Φn) → X i1,...,iq−26=i0 Pi0,i1(Φ0)· · · Piq−2,j(Φ0) as n→ ∞ = Pi0 Φ0(ω1≥ q, J(q − 1) = j).
Besides, Lemma 3 entails that there exists a constant c ∈ (0, 1) such that for all j 6= i0 and
q≥ 1: sup n≥1|P i0 Φn(ω1≥ q, J(q − 1) = j) − P i0 Φ0(ω1≥ q, J(q − 1) = j)| ≤ 2c q−1
so that the dominated convergence theorem gives r1,n → 0 as n → ∞. To control r2,n, use
Lemma3 to get for all n
r2,n≤ 1 1− cmaxi1 Ei1 Φn(hi(Z, Φn)hj(Z, Φn))− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)) .
To prove that the right-hand side of this inequality converges to 0 as n→ ∞, pick p ≥ 1 and letkW k =Pks=1Ws be the total number of events over one period. For any i1, it holds that
Ei1 Φn(hi(Z, Φn)hj(Z, Φn))− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)) ≤ Ei1 Φn(hi(Z, Φn)hj(Z, Φn)1l{kW k≤p})− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)1l{kW k≤p}) + Ei1 Φn sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)|1l{kW k>p} + Ei1 Φ0 sup ϕ∈G|hi (Z, ϕ)hj(Z, ϕ)|1l{kW k>p} . Since the state space of J is finite, it is enough to prove that the right-hand side of this inequality converges to 0 as n→ ∞. Using Lemma 4, we may find positive constants C1, C2
such that Ei1 Φn sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} + Ei1 Φ0 sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} ≤ C1EiΦ1n(exp(C2kW k)1l{kW k>p}) + E i1 Φ0(exp(C2kW k)1l{kW k>p}) . Pick now ε > 0. Since sup
ϕ∈G max i,j λ (i) j (ϕ) <∞, one has sup Φ∈G Ei1 Φ(C kW k 3 exp(C4kW k)) < ∞.
Consequently, one can choose p∈ N \ {0} such that for every n Ei1 Φn sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} + Ei1 Φ0 sup ϕ∈G|h i(Z, ϕ)hj(Z, ϕ)|1l{kW k>p} ≤ 2ε/3. It is then enough to prove that for all fixed q≥ 0,
Ei1 Φn(hi(Z, Φn)hj(Z, Φn)1l{kW k=q})− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)1l{kW k=q}) → 0
as n→ ∞. Note that Z may be written Z = (W, Y ) where W = (W1, . . . , Wk) and Y is the
list of inter-event times in year one. Therefore, if L1(w, y, Φ0| i1) is the likelihood of (w, y)
given that J(0) = i1, we may write
Ei1 Φn(hi(Z, Φn)hj(Z, Φn)1l{kW k=q})− E i1 Φ0(hi(Z, Φn)hj(Z, Φn)1l{kW k=q}) ≤ X kwk=q Z sup ϕ∈G|hi (w, y, ϕ)hj(w, y, ϕ)|1l{kwk=q}|L1(w, y, Φn| i1)− L1(w, y, Φ0| i1)|dy
where, for a given w = (w1, . . . , wk), the integral in the right-hand side is on the y =
(y(s)1 , . . . , y (s)
ws)1≤s≤k whose components are nonnegative and are such that for every s ∈
{1, . . . , k}, Pws
l=1y (s)
l ≤ τs− τs−1. Since L1 is a continuous function of the parameters, the
integrand goes to 0 everywhere as n→ ∞. Furthermore, this function is bounded from above
by the function (w, y)7→ 2 sup ϕ∈G|hi (w, y, ϕ)hj(w, y, ϕ)|1l{kwk=q}sup ϕ∈GL1 (w, y, ϕ| i1).
By Lemma4and the irreducibility of J, there exists a positive constant Cq such that
2 sup
ϕ∈G|h
i(w, y, ϕ)hj(w, y, ϕ)|1l{kwk=q}sup ϕ∈GL
1(w, y, ϕ| i1)≤ Cq.
Finally, for every t > 0, the simplex ( (y1, . . . , yws) ws X i=1 yi≤ t and ∀i, yi≥ 0 )
is contained in the hypercube Qws
i=1{yi| 0 ≤ yi ≤ t}, whose volume is tws <∞. The
domi-nated convergence theorem consequently gives A(Φn)→ A(Φ0) as n→ ∞. The convergence
Table 1: Mean and median L1
−errors associated to the estimators in case 1.
ℓ12 ℓ21 λ(1)1 λ (1) 2 λ (2) 1 λ (2) 2 Case 1 n = 50 Mean L1 −error 0.303 0.683 0.381 0.866 0.774 2.506 Median L1 −error 0.247 0.544 0.239 0.586 0.708 1.893 Mean half–length 0.818 1.740 0.791 2.177 1.611 5.380 of the 95% ACI Median half–length 0.766 0.837 0.886 2.082 1.570 5.184 of the 95% ACI Coverage probabilities 0.93 0.93 0.92 0.95 0.92 0.88 of the 95% ACI n = 100 Mean L1 −error 0.307 0.547 0.248 0.657 0.487 1.629 Median L1 −error 0.234 0.423 0.176 0.510 0.384 1.259 Mean half–length 0.632 1.208 0.561 1.529 1.194 3.704 of the 95% ACI Median half–length 0.613 1.097 0.551 1.460 1.155 3.582 of the 95% ACI Coverage probabilities 0.88 0.92 0.94 0.91 0.93 0.94 of the 95% ACI
Table 2: Mean and median L1
−errors associated to the estimators in case 2.
ℓ12 ℓ21 λ(1)1 λ (1) 2 λ (2) 1 λ (2) 2 Case 2 n = 50 Mean L1 −error 1.593 4.228 0.692 2.448 1.207 5.341 Median L1 −error 1.372 3.504 0.472 1.773 1.030 4.182 Mean half–length 3.528 9.221 1.219 4.549 2.510 11.258 of the 95% ACI Median half–length 3.036 7.936 1.034 3.682 2.362 10.473 of the 95% ACI Coverage probabilities 0.79 0.83 0.84 0.79 0.86 0.91 of the 95% ACI n = 100 Mean L1 −error 1.304 3.145 0.483 1.715 0.847 3.599 Median L1 −error 0.816 2.184 0.262 1.272 0.654 2.980 Mean half–length 2.893 7.323 0.882 3.115 1.875 8.269 of the 95% ACI Median half–length 2.429 6.393 0.764 2.788 1.749 7.860 of the 95% ACI Coverage probabilities 0.9 0.95 0.86 0.87 0.92 0.93 of the 95% ACI
Table 3: Mean and median L1
−errors associated to the estimators in case 3.
ℓ12 ℓ21 λ(1)1 λ (1) 2 λ (2) 1 λ (2) 2 Case 3 n = 50 Mean L1 −error 0.272 0.551 0.255 0.812 0.370 1.671 Median L1 −error 0.213 0.490 0.197 0.611 0.289 1.435 Mean half–length 0.715 1.487 0.593 1.972 1.073 4.707 of the 95% ACI Median half–length 0.720 1.452 0.594 1.963 1.059 4.709 of the 95% ACI Coverage probabilities 0.94 0.96 0.91 0.92 0.98 0.96 of the 95% ACI n = 100 Mean L1 −error 0.197 0.410 0.194 0.568 0.298 1.392 Median L1 −error 0.169 0.332 0.173 0.505 0.238 1.200 Mean half–length 0.479 0.978 0.421 1.386 0.757 3.240 of the 95% ACI Median half–length 0.476 0.951 0.416 1.371 0.753 3.221 of the 95% ACI Coverage probabilities 0.91 0.92 0.9 0.9 0.96 0.96 of the 95% ACI
Table 4: Estimates of the parameters on our real data set.
Parameters Estimates 95% ACI
ℓ12 0.0771 [−1.162, 1.316] ℓ21 0.0230 [−0.345, 0.391] λ(1)1 5.434 [2.329, 8.540] λ(1)2 0.762 [0.142, 1.381] λ(2)1 10.659 [6.567, 14.750] λ(2)2 5.361 [3.945, 6.776] 0 5 10 15 20 25 30 35 40 45 0 20 40 60 80 100 120 140 160 180 200
0 5 10 15 20 25 30 35 40 45 1.0 1.2 1.4 1.6 1.8 2.0