TWO INCOMPATIBLE OBJECTIVES WITH INDIVIDUAL RESERVE MODELS: AN APPROACH WITH MULTIVARIATE ADAPTATIVE REGRESSION SPLINE MODELS

(1)

M

ODELS

:

AN

A

PPROACH WITH

M

ULTIVARIATE

A

DAPTATIVE

R

EGRESSION

S

PLINE

M

ODELS

A PREPRINT

Jean-Philippe Boucher

Chaire Co-operators en analyse des risques actuariels Departement de mathematiques

Universite du Quebec a Montreal [email protected]

March 17, 2021

A

BSTRACT

Although individual techniques are usually proposed as alternatives to collective methods for the computation of actuarial liabilities in financial statements, individual methods could also be used to dynamically track the liabilities of an insurance company at any time, or to have a precise estimate of ultimate costs of all opened claims in a portfolio. However, in this paper we show that reserves methods used to estimate the overall liability on an insurer cannot be similar to individual techniques used to have a precise estimation. Indeed, when the insurer expect that the average cost of a claim will increase over time since its opening, granular reserving methods cannot be used to satisfy these two objectives. Simulations are used to expose and show the overall problem. To illustrate the situation with real insurance data from a major Canadian insurance company, we develop a new granular reserving model based on Multivariate Adaptive Regression Spline (MARS) models, which are well known to have an interesting bias-variance trade-off. We show that the hinge functions used in the MARS model are useful for obtaining an analytical form of each individual reserve at any time. Keywords Micro-reserving, Solvency, Multivariate Adaptive Regression Spline (MARS)

(2)

1 Introduction

Historically, P&C insurance companies, as well as actuarial researchers, have used and developed provisioning techniques based on aggregated costs, mainly in the form of runoff triangles (see Wutrich & Merz (2008) or Friedland (2010) for an overview). Instead of using collectiver methods, Arjas (1989) and Norberg (1986) proposed to use individual claims data. With the recent development of granular information about each claim, there are an increasing number of publications in actuarial sciences that goes in that direction by proposing individual reserving methods. Those techniques generally suppose that the cost of each claim of the portfolio can be individually predicted, see for example Pigeon et. al. (2013), Antonio and Plat (2014) or Duval and Pigeon (2019). The individual techniques are now seen as good alternatives to collective methods in calculating the reserves that an insurance company must enter in its financial statements. For more details, we refer to papers that compare individual and collective approaches, such as Huang et al. (2015), Charpentier and Pigeon (2016) or Wüthrich (2018).

Instead of only using granular reserving methods to estimate the overall actuarial liability on an insurer, we want to focus on individual precision of each claim prediction. Indeed, in this paper, we show that individual methods could also be used to dynamically track the liabilities of an insurance company at any time. In addition, by having an accurate estimate of each claim, individual models could also help the claims department to have a better image of the final cost of a claim, which could help claims managers choose which actions to take for each claim. Moreover, by using the prediction of the ultimate cost of each claim, actuaries from ratemaking teams could potentially use the whole portfolio without relying only on closed claims. To our knowledge, this other use of granular reserving methods has not been proposed yet in the literature.

Formally, by having an instantaneous estimate S(x) of each claim at each instant x, we can say that a P&C insurance company is willing to have a granular reserving model satisfying the following two objectives:

O1 : An adequate prediction of the amounts paid for each claim; O2 : An adequate value for the company’s actuarial liability at all times.

In this paper, we show that when we expect that the average cost of a claim will increase over time since its opening (which could be referred to the age of the claim), the two objectives mentioned above cannot be jointly satisfied by a granular reserving model. In such a situation, instead of trying to meet both objectives, we will show that the ideal to be reached is:

1. Respect objective O2;

2. Optimize objective O1, by minimizing as much as possible the difference between the prediction and the true cost of the claim at closure.

1.1 Simple Example

To illustrate the problem, we use a simple example. Suppose, that we have a portfolio with only 3 claims, all of which occur at the same moment and are all reported at the same time t = 0:

• Claim # 1 which will be closed at time (age) τ1and which will cost $200;

with τ1 < τ2< τ3. Obviously, the insurer does not know which claim will cost $200, $500 or $2000, but by using

standard reserving models, he is able to know (approximately) the total amount of claims to be paid in the future. Given that this example does not suppose incurred but not reported (IBNR) claims, the insurer will suppose that the sum of all individual reserves corresponds to its actuarial liability. Based on this simple example, we can assume that there are

(3)

two main ways to estimate individual reserves for this portfolio: a static individual reserve estimation, and a dynamic reserve estimation.

1.1.1 Static Estimation

The static estimation of the individual reserves means that at time t = 0, when all claims occur and are reported, a reserve is assigned to each claim and that this reserve does not change over time. We can then separate the analysis into time periods:

1. At t = 0, knowing that the total of amount of claims will be (approximately) $2700, each claim is reserved for $900.

2. Between time [0, τ1[, the sum of all individual reserves is $2700, for a total of ultimate cost of claims of $2700.

The actuarial liabilities are thus correctly valued.

3. At t = τ1, claim # 1, reserved for $900, is paid at $200. A positive difference of $700 is observed between the

prediction and the outcome.

4. Between time [τ1, τ2[, the sum of all individual reserves is now $1800, for total ultimate claim of $2500. The

actuarial liability is then undervalued by $700.

prediction and the outcome.

6. Between time [τ2, τ3[, the sum of all individual reserves is now $900, when we know that the final claim will

be paid at $2000. The actuarial liability is then undervalued by $1100.

7. At t = τ3, claim # 3, reserved for $900 is paid at $2000. A negative difference of 1100$ is observed between

the prediction and the outcome.

In this static estimation of individual claim, we can see that the sum of all individual reserves at the end time τ1, τ2and

τ3($900 + $900 + $900) is equal to the total cost of claims ($2700), meaning that objective O1 is satisfied. However,

objective O2 is not satisfied, as we are in an actuarial deficit situation between time τ1and τ3.

1.1.2 Dynamic Estimation

Static estimation cannot satisfy both O1 and O2. Moreover, it seems unrealistic in practice not to change the individual

reserves over time. Indeed, after the payment of the first claim at τ1, knowing that the total amount of future claims will

be 2500$, an insurance company will certainly increase its individual reserve for the two remaining claims. This is what we call a dynamic estimation of individual claims. In such a scenario, we can once again separate the analysis into time periods to analyze the impact of this type of reserve approach:

1. At t = 0, knowing that the total of amount of claims will be $2700, each claim is reserved for $900.

2. Between time [0, τ1[, the sum of all individual reserves is $2700, for a total of ultimate cost of claims of $2700.

prediction and the outcome. Because we know that remaining claims will cost $2500, each of the remaining two claims is then reserved at $1250.

4. Between time [τ1, τ2[, the sum of all individual reserves is now $2500, for a total of ultimate claims of $2500.

5. At t = τ2, claim # 2, reserved for $1250, is paid at $500. A positive difference of $750 is observed between

the prediction and the outcome. Because we know that the remaining claim will cost $2000, the last claim is then reserved at 2000$.

6. Between time [τ2, τ3[, the sum of all individual reserves is now 2000$, when we know that the final claim will

(4)

7. At t = τ3, claim # 3, reserved for $2000 is paid at $2000. There is no difference between the prediction and the outcome for that claim.

In summary, throughout the period analyzed, the actuarial liabilities have always been correctly valued. However, the total sum of individual reserves recorded at the time of payment of each claim ($900 + $1250 + $2000 = $3150) is much higher than the total sum of claims ($2700). For the dynamic estimation approach, O2 is satisfied, but O1 is not.

1.2 Structure of the paper

It is easy to understand the problems raised by this simple example.

1. In the case of a static reserve, as mentioned above, no correction is made over time. When the average cost of claim increases over time, the average individual reserve by claim should also increase. Because we are in a static approach, we cannot modify each individual reserves, and this causes a deficit in the actuarial liability. 2. For the dynamic model, knowing that the reserve of a specific claim can be seen as the average of all future claims, and that the claims are in increasing order, the next claim payment will always be lower than the expected average of future claims (until we reach the last claim). That means that the sum of all individual claims will necessarily be less than the sum of the amounts reserved for each claim at the time of closure. The better understand the problem, and to propose ways to solve them, this paper will be structured as follows. In Section 2, we use simulations to study two more sophisticated examples. Without going into a precise and rigorous analysis of the stochastic processes that can represent the dynamic evolution of the reserve over time, we can nevertheless mention some conclusions about our observations. We will see that in order to obtain an individual claims reserving model that satisfies objectives O1 and O2, no increasing trend based on the age of the claim must be observed in the modeling. If an increasing trend is nevertheless present, we will see that objectives O1 and O2 cannot be jointly satisfied. We will show that the best individual reserve model must then favor O2 over O1, and more precisely, that our goal will become to favor O2 while trying to get as close as possible to objective O1 - without however, reaching it. In Section 3, we will apply our conclusions from Section 2 to a real insurance dataset from a major Canadian insurance company to see if the previous problem is observed. To see how objectives O1 and O2 are fulfiled with real data, we will develop a new individual reserve model based on Multivariate Adaptive Regression Spline (MARS). This allows to show that the MARS approach is an interesting way to model the relationship between a random variable and a numeric covariate. We also show that the MARS model can be used to obtain an analytical form of the estimate of the reserve at all times, a condition often essential for being able to regularly use the individual reserve models in practice. A cross-validation approach is used to find the hyperparameters. The proposed MARS models is shown to partially satisfy objective O2, and is closer to satisfying objective O1 than other, simpler models. In the conclusion, we suggest some other ideas for research.

2 Modeling and Simulations

The simple example at the beginning of the paper helps to understand the problem of an expected growth in the cost of a claim when it becomes older. However, simulation of larger insurance portfolios must also by analyzed in order to grasp how we should model individual reserves in such a situation.

2.1 Waiting Times and Expected Value of Truncated Costs

First, to study the impact of cost growth as a function of the age of a claim, we will assume a claims process with the following structure:

• We work with an insurance product where there is only one indemnity payment occurring on the closing date of the claim, as shown in Figure 1. This type of claims is similar to the claims we will analyze with real insurance data.

(5)

• We assume that the age of a claim at closure follows an exponential distribution of mean 180 days.

• We assume that the cost of a claim S follows a gamma distribution of mean µ = 10, 000 + 50τ , and of variance

µ2_{α, with α = 10, 000.}

• We simulate 10 claims per day, over a calendar time period ranging from 1, . . . , 15, 000 days.

S

Occurrence Date Closing Date

τ

Figure 1: Illustration of a claim’s life

We thus create a model where there is a growth of cost as the age of the claim increases. Figure 2 thus shows the simulated database, showing the link between the age of a claim at closure (τ ) and its closing value (S).

Figure 2: Simulated Data

2.1.1 Calculation of Individual Reserves

We believe there are at least three possibilities to consider when calculating the reserve for each claim, at any age x: 1. Identical and unique reserves for all claims, at any age x (a.k.a. static reserve). The reserve for each claim i

would therefore be equal to:

R1(x) = E[S] = E[E[S|T = t]] = Z ∞ 0 (β0+ β1t)f (t)dt = Z ∞ 0 (β0+ β1t)λ exp(−λt)dt = β0+ β1 λ

(6)

As we saw in the introduction, the problem with R1(x) is that the value of the reserve does not depend on the

age of the claim at the evaluation x. Given the value of β1> 0, if we analyze an opened claim at age x, it

would be logical to believe that the expected value of the claim is greater than it was when x was equal to 0. This scenario is therefore by no means realistic in practice: with the increase of x, the expected value of the claim must also increase.

2. Reserve a claim at any age x, as if x corresponds exactly to the time the claim is closed:

R2(x) = β0+ β1x

We could think that this approach should be considered when we fit a basic regression model linking the amount paid S with the age of payment τ . It is however a naive approach having no theoretical foundation. Indeed, the approach only estimates the reserve at age x as if the claim closed exactly at this same age x.

Knowing that the value of claim S increases over time, we can expect that E[S|x] > R2(x) for all x > 0.

That’s the reason why we did not consider this approach for the example in the introduction. However, as it may sound logical at first, we wanted to include it in our analysis to show its limitations.

3. Reserve as the conditional expectation of S, knowing that the claim has survived until the time x (a.k.a. dynamic reserve): R3(x) = E[S|F > x] = E[E[S|F > x, T = t]] (1) = Z ∞ x (β0+ β1t) f (t) 1 − F (x)dt = 1 1 − F (x) Z ∞ x β0f (t)dt + 1 1 − F (x) Z ∞ x β1tf (t)dt = β0+ β1x + β1 λ

This approach seems to be the most logical and coherent form for the individual reserve. However, in

connection with the expression β0+ β1x used to simulate the data, we quickly see that an extra elementβ_λ1 is

added with R3(x).

2.1.2 Temporal Analysis of Individual Reserves

Thanks to the simulated data, we have at hand the time of occurrence of each claim, its final cost and the time that the

claim will take before being closed. We can analyze the evolution of the individual reserves R1(x), R2(x) and R3(x)

when x corresponds to the time since the opening of the claim (age of claim), but also when x corresponds to calendar time. By discretizing the time per day, Figures 3 and 4 thus correspond to the daily evolution of the reserves when x is the age of the claim, or when x is calendar time.

Figure 3, based on the age of the claim, shows the difference between the various methods of reserving in the calculation

of the actuarial liability, and the estimate of the reserve at the time the claim closes. As expected, the method R1(x) (in

purple), which assigns a single and unique reserve to all claims from their opening date strongly under-evaluates the actuarial liability when the time since the opening of claims increases. When the age of the claim increases, the average

amount of claims becomes more and more significant, but R1(x) cannot modify the individual value of the reserves.

The figure on the right, showing the total value of reserves at the time the claim closes, indicates that the method R1(x)

is exact: the sum of the individual reserves (at closure) converges towards the total sum of claims. This is not surprising

given that the sum R1(x) is effectively equal to the sum of the claims. In summary, we can see that R1(x) satisfies

objective O1, but not objective O2.

Analysis of R2(x), shown in red, reveals that the situation is even worse than R1(x) for O2 objective because the claims

liabilities are always undervalued, even for small values of x. It is not surprising to see such a result for this method

because the reserve R2(x) = β0+ β1x will be constantly undervalued for x < τ , with τ corresponding to the exact

(7)

0e+00 1e+09 2e+09

0 500 1000 1500 2000

x=Time since Opening

Reser v e V alue 0e+00 1e+09 2e+09 3e+09 4e+09 0 500 1000 1500 2000

x=Time since Opening

Reser v e V alue at Closing R1(x) R2(x) R3(x) Real Value

Figure 3: Total value of the individual reserves based on the age of the claim (x). The graph on the left shows the total value of reserves for open claims, while the graph on the right shows the value of the reserve when the claim was closed

fact that R2(x) is precisely constructed to have an exact correspondence between the reserve and the amount paid when

x = τ . In summary, we therefore see that R2(x) satisfies the objective O1, but performs even less well for objective O2

than R1(x).

Finally, R3(x), in blue, under the black curve in the figure on the left, shows an almost perfect correspondence between

the ultimate costs and the sum of the reserves, when x is the age of the claim. Conversely, the figure on the right shows that the sum of the individual reserves at the time of closing is much higher than the total that is really paid. Compared

to the method R2(x) which generated an exact result for objective O1, the difference can be explained by the presence

of the element β1

λ of the equation of R3(x). The difference between the total claim amounts and R3(x) is then simply

equal to n × β1

λ, with n corresponding to total number of claims analyzed. In summary, in opposition to R1(x) and

R2(x), the method R3(x) satisfies the objective O2 but does not satisfy O1.

0e+00 2e+07 4e+07 0 5000 10000 15000 x=Calendar Time Reser v e V alue 0e+00 1e+09 2e+09 3e+09 4e+09 0 5000 10000 15000 x=Calendar Time Reser v e V alue at Closing R1(x) R2(x) R3(x) Real Value

Figure 4: Total value of the individual reserves based on calendar time (x). The graph on the left shows the total value of reserves for open claims, while the one on the right shows the value of the reserve when the claim was closed

It is also interesting to analyze the evolution of individual reserves as a function of calendar time, as shown in Figure 4.

We can see a relative similarity between R1(x) and R2(x) for objectives O1 and O2: systematic underestimation of

actuarial liability at all times x, but an exact estimate of the sum of ultimate claims. It is interesting to see, moreover,

that R3(x) seems to satisfy on average the criterion for O2, without always being completely sufficient. Indeed, on a

few occasions (for example in the interval [6250, 6500]), the sum of the amounts reserved is not sufficient to pay for the sum of expected claims. The random variation nature of claims reserving means that a safety margin greater than the

complement β1

λ should be necessary to satisfy O2, at the cost of reserving more than what the sum of all the claims will

(8)

Type of Claim Proportion λ β0 β1 α

Minor 40% 1/10 1, 000 20 1, 000

Moderate 45% 1/50 2, 500 30 1, 000

Major 10% 1/600 20, 000 50 1, 000

Catastrophic 5% 1/1, 000 50, 000 80 1, 000

Table 1: Simulations parameters

2.2 Covariates

One of the great advantages of analyzing reserves at the individual level is that precise individual information about a claim, or information from the insured, can be used in the modeling. In this sense, to complete our simulations analysis, we will assume a slightly more complex claim process in order to see the impact of covariates in the pursuit of objectives O1 and O2. We will keep the same working hypotheses as those set out in Section 2.1, with τ ∼ Exponential(λ) and

S|τ ∼ Gamma(µ = β0+ β1τ, α). However, we will introduce 4 types of claims, each having their own distribution

parameters as indicated in Table 1.

We can thus see that the more a claim is severe, the higher its average cost, the higher its trend, and the expected time to close the claim is also much more important. As we will see in our application with real data, this kind of dependence between the complexity of the claim, its cost and its time before being settled is realistic for a bodily injury coverage. Figure 5 thus shows the simulated database, showing the link between the age of a claim (τ ) and its closing value (S).

Figure 5: Simulated Data with covariates, with linear and quadratic relationship with the age of claims on the right We propose different approaches to better understand the impact of the covariates in our analysis:

A) We first assume a total ignorance of the covariates: covariates that affect the age of the claim at closure τ and the severity of the claim S are not considered. This can represent a situation when actuaries forgot to use covariates,

or when covariates are unavailable for analysis. In such a case, we will only assume that 1

b

λ = τ . To model the

link between the age of a claim (τ ) and the closing value of the claim (S), a GLM regression assuming a gamma distribution with an identity link is used:

µi= γ0+ γ1τi+ γ2τi2

for the ith_{claim, where S}

i∼ Gamma(µi, α). The fit of the data by this regression model is illustrated in Figure 5.

To be more precise, a model with splines could be used. However, for the sake of illustration, adding a square term is sufficient to obtain a reasonable adjustment.

B) We consider covariates in the analysis of the age of the claim τ , but ignore it for the severity of the claim S. In this

situation, the GLM model described in method A will be used for S. The exact values of λiare used (based on the

type of claim).

C) We consider covariates for the modeling the age of the claim τ and the cost of claim S. The exact values of λi, β0,i

and β1will be considered, with β0,iand λicorresponding respectively to the appropriate intercept of the claim i

(9)

As in the previous section, there are still 3 possibilities for the calculation of the reserve for each claim at the time x. 1. Static reserve R1(x): R1A(x) = γb0+ b γ1 b λ + c 2γ₂ b λ2 R1B(x) = γb0+ b γ1 λ + 2_bγ2 λ2 R1C(x) = β0,i+ β1 λ

where R1A(x) and R1B(x) come from:

E[E[S|T = t]] = Z ∞ 0 (γ0+ γ1t + γ2t2)f (t)dt = γ0+ γ1 λ + 2γ2 λ2 2. Naive approach R2(x): R2A(x) = R2B(x) = γ_b0+γ_b1x +γ_b2x2 R2C(x) = β0+ β1x 3. Dynamic approach R3(x): R3A(x) = γb0+γb1 x + 1 b λ +γ_b2 x2+2x b λ + 2 b λ2 R3B(x) = γb0+γb1 x + 1 λ +γ_b2 x2+2x λ + 2 λ2 R3C(x) = β0+ β1x + β1 λ with: E[E[S|F > x, T = t]] = Z ∞ x (γ0+ γ1t + γ2t2) f (t) 1 − F (x)dt = 1 1 − F (x) Z ∞ x γ0f (t)dt + 1 1 − F (x) Z ∞ x γ1tf (t)dt + 1 1 − F (x) Z ∞ x γ2t2f (t)dt = γ0+ γ1 x + 1 λ + γ2 x2+2x λ + 2 λ2

As in the analysis of the first simulated data set, it might be interesting to analyze the trend in claim liabilities over

calendar time. Figure 6 illustrates this evolution for the first two types of reserves, namely R1(x) and R2(x), for the

three ways (A-B-C) of approaching covariates. We obtain substantially the same conclusions as for the first dataset for

the methods R1(x), R2(x): they underestimate the claim liabilities. However, with the exception of R1A(x), which

shows a slight difference of just over 5% (probably due to the error in estimating the real trend β0and the approximation

of the parameter λ), the methods R1(x) and R2(x) manage to satisfy objective O1.

The evolution of the reserves generated by the models of the family R3(x) are illustrated in Figure 6. The model R3A,

based on estimates of parameters γ0, γ1and γ2to measure the trend of the age of the claim on the cost, and on estimate

(10)

0.0e+00 5.0e+07 1.0e+08 1.5e+08 0 5000 10000 15000 20000 x=Calendar Time Reser v e V alue 0.0e+00 5.0e+08 1.0e+09 1.5e+09 2.0e+09 0 5000 10000 15000 20000 x=Calendar Time Reser v e V alue at Closing R1A(x) R1B(x) R1C(x) Real Value 0.0e+00 5.0e+07 1.0e+08 1.5e+08 0 5000 10000 15000 20000 x=Calendar Time Reser v e V alue 0.0e+00 5.0e+08 1.0e+09 1.5e+09 2.0e+09 0 5000 10000 15000 20000 x=Calendar Time Reser v e V alue at Closing R2A(x) R2C(x) Real Value

Figure 6: Total value of the individual reserves based on the calendar time (x) for reserve models R1and R2. The graph

on the left shows the total value of reserves for open claims, while the graph on the right shows the value of the reserve when the claim was closed

information on the data generation process, generally succeeds in satisfying the actuarial liability objective. What is

surprising is the very good performance of the R3B(x) model. Thus, based on this first summary study, we see that

even if it is difficult to grasp the link between costs and the age of claims, an adequate modeling of the random variable

τ could allow us to get closer to our O2 objective. However, R3B(x) and R3C(x) are not always similar: as shown

in Figure 8, by ignoring covariates, the subgroups of the portfolio may not be adequately reserved. Knowing that the calculation of the amounts in individual provisioning sometimes have the objective of being used in pricing, we must therefore be careful. 0.0e+00 5.0e+07 1.0e+08 1.5e+08 0 5000 10000 15000 20000 x=Calendar Time Reser v e V alue 0e+00 1e+09 2e+09 3e+09 4e+09 0 5000 10000 15000 20000 x=Calendar Time Reser v e V alue at Closing R3A(x) R3B(x) R3C(x) Real Value

Figure 7: Total value of the individual reserves based on the calendar time (x) for reserve models R3. The graph on the

left shows the total value of reserves for open claims, while the graph on the right shows the value of the reserve when the claim was closed BBBB

Regarding objective O1, Table 2.2 indicates the difference between what was predicted as total costs by the

model R3(x), compared to the true final costs. Model R3A overestimates the total cost of claims by 90.08%

(while not satisfying criterion O2!). Model R3B, which compared favorably to R3C regarding the criteria for

(11)

0e+00 1e+05 2e+05 0 5000 10000 15000 x=Calendar Time Reser v e V alue R3B(x) R3C(x) Real Value 0e+00 1e+06 2e+06 3e+06 0 5000 10000 15000 x=Calendar Time Reser v e V alue R3B(x) R3C(x) Real Value 0e+00 2e+07 4e+07 6e+07 0 5000 10000 15000 x=Calendar Time Reser v e V alue R3B(x) R3C(x) Real Value 0e+00 3e+07 6e+07 9e+07 0 5000 10000 15000 20000 x=Calendar Time Reser v e V alue R3B(x) R3C(x) Real Value

Figure 8: Total value of the individual reserves based on calendar time (x) for reserve models R3by type of claim,

respectively Minor, Moderate, Major and Catastrophic BBB

Reserve Models Total Cost Total Predicted Difference % Difference

R3A 2,062.9 3,921.1 1,858.3 90.08%

R3B 2,062.9 3,860.4 1,797.6 87.14%

R3C 2,062.9 3,239.0 1,176.2 57.02%

Table 2: Prediction versus observation (in millions $)

the link between the costs and the age of the loss thus seems to impact objective O1 more than objective O2.

Finally, the best model remains R3C. Despite the introduction of all possible covariates in the modeling, the

model nevertheless shows a huge difference of 57.02% between observed and predicted. As indicated in the introduction to this paper, when there is a significant growth in claims that depends on the age of the claim, it is

not possible to satisfy O1 and O2 simultaneously. The R3C model, even if it generates a significant difference

be-tween the predicted and the observed, nevertheless corresponds to the best possible model for micro-reserving prediction.

We can also show that the absolute difference of 1, 176.2 between the prediction and the observation simply corresponds

toPn

i=1 β1

λi. Conditionally to the satisfaction of criterion for objective O2, the only way to decrease this difference is

thus to decrease the value of β1, namely the increasing trend of the cost of claims according to its age. In this second

set of simulations, it is not possible to decrease this value. However, in practice, the trend factor β1(or any increasing

trend factor) can perhaps be reduced by introducing regressors which better explain the severity of the claim. As we will see in the practical application in the next section, age, gender, type of accident or any specific information about

the victim or the claim is likely to decrease the value of β1.

3 Application with Real Insurance Data

With simulated data based on known parameters, it might be possible to obtain an analytical form for the value of the actuarial liability at any time x for the portfolio, and we might also be able to compute the difference between what was predicted and what was observed. However, we cannot be so lucky with real insurance data: the impact of covariates on

(12)

random variables cannot be measured precisely, some important variables might be missing, the distribution of the age of the claim is not so well defined, and the link between the severity of a claim and its age can be caused by unobserved variables. In this section, real insurance data will be used to study how much objectives O1 and O2 are satisfied.

3.1 Description of Data

We are using a sample dataset from a specific segment of an insurance portfolio from a major Canadian insurance company. We analyze the the bodily-injury (BI) coverage. This coverage is protecting the insured from the financial

damages he can cause to a third-party (TP) when he is at least partially (legally) responsible for the loss1_{. BI coverage,}

compared with other forms of coverage in the Canadian P&C industry, has some particularities:

• The damages are usually paid to a third-party. Each TP involved in an accident has their own damages, and thus can receive separate indemnity. That means that a single accident can have multiple victims. Formally, a TP is called an exposure, and each accident is called a claim. Micro-level reserving method must estimate the ultimate cost of each exposure.

• The insurance company has a lot of structured and verified information about their own insureds: their date of birth, address, credit score, marital status, address of workplace, etc. However, for a TP, the same basic information is not so simple to obtain and is not automatically verified and validated. Obviously, if important payments are made to a TP, basic information about this TP might be available, but in an unformatted way. In other words, even if the insurer has a lot of information on the TP, and on the injury of the TP, sometimes, text-mining is needed to obtain simple information.

• As the damages are paid to a TP, a unique and final payment of indemnity is often seen. Indeed, throughout negotiations with the TP (or his or her lawyer) or after a court decision, the insurer will issue a unique payment for the suffered damage.

For our modeling study, we will make the following working assumptions:

1. We will restrict ourselves to the modeling of the indemnity payment, and we will suppose that all payments to an exposure are made at the closing date. This is similar to what we illustrated in 1, and also identical to what we supposed in our simulations study. As we will see it from the analysis of our dataset, this assumption is realistic.

2. We will not consider the potential dependence between exposures for the same claim. For BI coverage in automobile insurance, many exposures can be injured from the same car accident and we can expect some similarities between all the victims of the same accident. Future analyses where dependence between exposures, from the same claim, should be considered.

3. Even if we consider all the available covariates of the database, only a few of them are interesting.

4. We will suppose that the covariates do not change over time, and are fully available at the time the claim is reported to the insurer. For future studies, and when this kind of data will be available for individual claims reserving methods, dynamic covariates should be included to improve the model, such as the evolution of the injury over time, the hiring of a lawyer by the TP or the amount paid for allocated loss adjustment expenses (ALAE) before the final settlement.

3.1.1 Basic Statistics

We use exposures from claims with date of loss between 2011 to 2016 (for a total of 2000 days). This results in 23,807

exposures. Table 3.1.1 shows basic statistics for this database2. Indemnity paid corresponds to our interest variable S,

which depends on the age of the exposure at closure τ . Almost three quarters (74.2%) of all exposures closes with no indemnity paid. This can be explained by a precautionary principle, where the claim adjuster will automatically open an

1

Special situations where the insured is not responsible is also possible, but we do not want to explain all those situations in this paper

2

(13)

exposure for each TP involved in a claim, even if no damage seems apparent. Only 7.2% of exposures have more than one payment of indemnity, and more than 95% of the total amount of indemnity paid come from single payment at the closing date.

Figure 9 shows an histogram of S when the indemnity paid is positive, and the distribution of the age of exposure.

Average Std.Err Minimum Maximum 25th percentile 50th percentile 75th percentile

Indemnity Paid 17,985.21 85,747.89 0.00 ≈ 4,000,000.00 0.00 0.00 118.33

Age at closure 367.97 478.86 0.00 3,706.00 61.00 180.00 477.00

Table 3: Indemnity paid and Age at closure for the BI sample database

log(Indemnity) Number of e xposures 0 5 10 15 20 0 500 1000 1500

Age of the exposure

Number of e xposures 0 1000 2000 3000 0 2000 6000 10000

Figure 9: Distribution of the logarithm of the indemnity paid (for positive value of indemnity), and the distribution of the age of the exposure for a sample dataset

3.1.2 Covariates

Covariates are available to analyze the cost of the exposure, and the age of the exposure at closure. In Table 3.1.2, we refer to reporting delay that represents the time in days between the date of loss and the date of reporting it, while exposure delay refers to the time in days between the reporting date and the date the claim adjuster opens the exposure for a TP. To detect trends, or modification in claim managing, the date of loss is also an important continuous covariate to considerer in the modeling.

Figure 10 shows all the other available covariates for the modeling of τ and S, as well as the proportion of each modalities. At first sight, some variables might not seem to be predictive, but because we do not have a lot of covariates to include in the models, we kept them to see if they have some impacts.

• Kind of loss;

• How the claim was reported; • Severity of the injury: • Fault of the insured;

• Is the vehicle (of the insured) driveable after the accident;

• Are the airbags (of the insured’s vehicle) deployed during the accident;

Average Std. Err. Min. Max. 25th pct. 50th pct. 75th pct.

Reporting Delay 52.25 154.57 0.00 2 343.00 2.00 5.00 20.00

Exposure Delay 9.33 41.80 0.00 1 432.00 0.00 1.00 5.00

(14)

• Total number of exposures on the same claim; • Type of injury; • Presence of an ambulance. 0% 25% 50% 75%

Multi−vehicle Hit Pedestrian Single−vehicle Other

Kind of loss Propor tion 0.0% 10.0% 20.0% 30.0% 40.0% 50.0%

phone agent afterhours fax other

How reported Propor tion 0% 25% 50% 75%

Minor Moderate Major Fatal

Severity of the Injury

Propor tion 0% 20% 40% 60%

Insured at fault Insured not at fault Insured partly at fault

Fault Propor tion 0% 20% 40% Yes No Vehicle Driveable Propor tion 0% 20% 40% 60% 80% No Yes Airbags deployed Propor tion 0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 1 2 3 4 5 6+

Number of expositions in claim

Propor tion 0% 20% 40% 60% 80%

Soft Tissue Other Fracture Pain Psycho. Brain Fatality

Propor tion 0% 25% 50% 75% 0 1

Propor

tion

Figure 10: Proportion of different modalities for available categorical variables Finally, for the modeling, we separated the database into a validation set (70%) and a test set (30%).

3.2 Modeling Approach

Our modeling strategy is a two-steps approach, where the age of the exposure at closure (τ ) will be modeled first, and the cost of exposure (S) will be modeled thereafter. With this two models, as already seen with equation (1), the reserve for each claim at time x will be defined and computed as:

R(x) = E[E[S|τ > x]] = 1

1 − Fτ(x)

Z ∞

x

E[S(t)]fτ(t)dt (2)

That means that we are looking to model only the mean of S(τ ), but that a defined distribution of the age of the claim at

closure (fτ(t)) might be needed if we want to analytically express the reserve as the expected future cost of an exposure

at any time x. In practice, this constraint of having an analytical equation to compute the reserve might be important: at any time, a claim adjuster or a claim manager can be interested in the reserve value of a specific claim, or the total liability value of the claim portfolio. As a result, models based on simulations, or on numerical approximations might not be a solution as they are not able to be quick enough to generate a number.

3.2.1 Age of Exposure at Closure

For our numerical analysis, we select the exponential, the gamma as well as the Weibull distribution for the distribution

fτ(t) of the age of the exposure at closure. Without any covariates, Figure 11 shows the qq-plots of these three

(15)

at the qqplot tells us that the exponential distribution underestimate the tail of the distribution, while the Weibull does the opposite. The gamma distributions seems well suited to model this random variable.

0 500 1000 1500 2000 2500 0 500 1000 1500 2000 Theoritical Quantiles Sample Quantile 0 500 1000 1500 2000 2500 0 1000 2000 Theoritical Quantiles Sample Quantile 0 500 1000 1500 2000 2500 0 1000 2000 3000 Theoritical Quantiles Sample Quantile

Figure 11: Qq-plots for the modeling of the age of exposure at closure, for the exponential, the gamma and the Weibull distributions

To improve the adjustment, we include covariates into the mean parameter of each of the three distribution. We tried several approaches, for example GLM, GAM, GAMLSS or classification trees, to estimate the parameters. We finally chose to use the model with the best adjustment, the random forest, with hyper-parameters calibrated by cross-validation. The most important variables to explain the age at closure are shown in Figure 12. The two most important variables are mainly related to the injury of the victim. This was expected given that a more severe injury will indeed be longer to settle. Using the estimated mean for each exposure, on the training set, we then estimated by maximum likelihood the dispersion parameters of the gamma (0.7267) and the Weibull distribution (0.8314), while the dispersion parameter of the exponential distribution is automatically 1. We chose not to include covariates in the dispersion parameters: we think this could be an interesting analysis to do in the future.

3.2.2 Cost of Indemnity for each Exposure

We then modeled the final cost S of each exposure. First, to understand the data, we chose to analyze the link between the age of the exposure at closure, and the final cost. To do this, a scatter graph is illustrated in Figure 13. Exposures have been grouped together by considering an interval of 50 days, and each point of the graph corresponds to the average of the indemnity paid for each group (that means that the number of exposures per point is not the same). We can see an almost linear relationship between the indemnity paid according to its age at closure. This relationship is not surprising because complex claims tend to take time before closing. On the other hand, as highlighted in the example at the beginning of the document, this relationship must be explained by explanatory variables if we want to satisfy both objectives O1 and O2 of the granular reserving models.

We then subsequently modeled the final cost of each exposure according to all available covariates. Several exploratory analyses have been carried out and we do not have enough space to show everything. However, to illustrate the situation, a simple random forest can show us the classification of the most important covariates that explain the total cost of indemnity. The result is shown in Figure 14. It shows that the age of the exposure at closure is by far the most important covariate, followed by the severity of the injury, and the type of injury. The level of fault and the date of loss also seem somewhat important, but the other covariates do not seem really predictive.

The importance of the age of the claim in a context of increasing cost trends, as illustrated in the introductory example, will make it impossible to expect both a good individual estimate of each exposure and an adequate total of reserve in the calculation of the actuarial liability. However, we still propose to build a granular reserve model adapted to the

(16)

AMBULANCE_CG HOWREPORTEDTYPE_TYPECODE EXPO_DELAY LOSSKIND_L_EN_CA NUMDATE FAULTRATING_L_EN_CA EXPO_COUNT REP_DELAY SEVERITY BODILYINJURYTYPE_DESC 0 10000 20000 30000 Importance AMBULANCE_CG LOSSKIND_L_EN_CA HOWREPORTEDTYPE_TYPECODE FAULTRATING_L_EN_CA EXPO_COUNT EXPO_DELAY REP_DELAY NUMDATE SEVERITY BODILYINJURYTYPE_DESC 0 50000000 100000000 150000000 200000000 250000000 Importance

Figure 12: Importance of variables from a random forest model for the age of the exposure at closure (respectively permutation and impurity)

●●●●●●●● ●●●● ●● ●● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 100000 200000 300000 0 1000 2000 3000

Age of the Exposure

Mean Indemnity Number of exposures ● ● ● 1000 2000 3000

Figure 13: Relation between the indemnity paid at the age of the exposure (at closure)

available data. This will allow us to explicitly see how the age of the exposure impacts O1 and O2, and will allow us to verify whether a more complex modeling of the reserve could minimize this impact.

In order to be able to calculate the reserve R(x) at any time x, according to the form described in equation (2), an explicit form of the mean as a function of x seems desirable. That means that a simple possibility is thus to take the following model:

E[S(τ )] = X0β + γ1τ

where τ is the age of the exposure at closure. Other covariates would be included in X0δ, while γ1would be used to

model the linear trend of the age of the exposure at closure. Then, to obtain the reserve at any time x, we simply have to integrate all possible closing times of the exposure. Concretely, by supposing for example a gamma distribution for the age of the exposure, we can thus obtain:

(17)

DRIVEABLE EXPO_COUNT NUMDATE EXPO_DELAY LOSSKIND_L_EN_CA REP_DELAY FAULTRATING_L_EN_CA SEVERITY BODILYINJURYTYPE_DESC AGE_EXPO 0 500000000 1000000000 Importance AMBULANCE_CG HOWREPORTEDTYPE_TYPECODE EXPO_COUNT EXPO_DELAY FAULTRATING_L_EN_CA REP_DELAY NUMDATE SEVERITY BODILYINJURYTYPE_DESC AGE_EXPO 0 5000000000000 10000000000000 Importance

Figure 14: Importance of variables from a random forest model for the cost of the exposure (respectively permutation and impurity) R(x) = 1 1 − Fτ(x; α, β) Z ∞ x E[S(t)]fτ(t)dt = 1 1 − Fτ(x; α, β) ( Z ∞ x (X0δ + γ1t)fτ(t)dt) = 1 1 − Fτ(x; α, β) (X0δS(x; α, β) + γ1 α βSτ(x; α + 1, τ )) (3)

with Fτ(x; α, β), Sτ(x; α, β) the cumulative and the survival functions of the gamma distributions with parameters

α, β. We could obviously add a term τ2_{in the estimation of the model, and we might wonder why we don’t also add}

another term τ3or more generally any function of τ to better understand the relationship between the cost of indemnity

and the age of the exposure. A smoothing function should even be considered. However, not all the splines could be

used if we want to easily computeR_x∞E[S(t)]f (t)dt.

To introduce better flexibility to our approach while being able to easily compute R(x), we chose to use multivariate adaptative splines regression (MARS) models. Without going into too much details (see Duncan et al. 2016 for an application of MARS model in actuarial sciences), the MARS model is a non-parametric technique that extends the linear regression model. The MARS model supposes the following weighted sum:

E[S] = p X

i=1

ciBi(y)

where each ci, i = 1, . . . , p are parameters to be estimated. The term Bi(y) is a function of covariate y. MARS can

handle both categorical and numeric covariates. For categorical covariate y, Bi(y) can represents a dummy function

(18)

the age of the exposure at closure τ , Bi(τ ) is a hinge function h(.), with Bi(τ ) = h(ai− τ ) = max(ai− τ, 0) or

Bi(τ ) = h(τ − ai) = max(τ − ai, 0) for a specific value of aiestimated from the data. Another interesting property of

the MARS model is that it can also include interaction between covariates. That means, for example, that the model can

include terms like Bi(τ, y) = h(ai− τ ) × h(y − bi), where both covariates τ and y are used in a single Bi(.) function.

As mentionned by Kuhn & Johnson (2013), MARS models have other interesting properties: the model does not need

data to be prepared before the modeling, the model automatically calibrates the hinge functions by selecting aiwhich

partition the data, the MARS model does automatic variables selection, etc. However, as we use hinge functions, note that fitted functions will not be smooth.

Applyinbg this to our context, we then suppose the following structure to model the cost of the exposure, which depends on τ : E[S(τ )] = n X j=1 cjBj(y) + m X k=1 ckBk(τ )

where we separate B() functions into 2 categories: those that depend on the age of the exposure at closure τ , and those

that do not. Because Bk(t) is a hinge function, it becomes easy to compute

R∞

x E[S(t)]fτ(t)dt. Indeed, we have:

R(x) = 1 1 − Fτ(x) Z ∞ x E[S(t)]fτ(t)dt = 1 1 − Fτ(x) Z ∞ x   n X j=1 cjBj(x) + m X k=1 ckBk(t)  fτ(t)dt = 1 1 − Fτ(x)   Z ∞ x n X j=1 cjBj(x)fτ(t)dt + Z ∞ x m X k=1 ckBk(t)fτ(t)dt   = 1 1 − Fτ(x)   n X j=1 cjBj(x)Sτ(x) + m X k=1 ck Z ∞ x Bk(t)fτ(t)dt  . (4)

Each elements of this last equation can be solved easily. For example, we could have:

Z ∞ x Bk(t)f (t)dt = Z ∞ x h(ak− t)f (t)dt = Z ∞ x max(ak− t, 0)fτ(t)dt = (Rak x (ak− t, 0)fτ(t)dt if x < ak 0 otherwise (5)

Depending on the distribution of τ , R(x) can then be easily computed for any possible values of x.

3.2.3 Parameters Estimation

The MARS model proposed in this paper requires that several hyperparameters be chosen before estimating the other parameters. Several techniques preprogrammed in various R packages allow, by cross-validation, to determine the level of interaction between covariates (degree) and the number of terms in the model (nprune). However, it is good to remember that our objective is not to find the best model, in the sense of the best relationship, between the covariates and E[S]. Instead, our objective is rather to set up a method of estimating individual reserves.

The measurement of the quality of the prediction must therefore be done, in our opinion, on a comparison between

the reserve of the exposure i, Ri(x) at any time x, with the final cost of this exposure Si. As x is not fixed, it is not

simple to define a measure of precision. So, while its ultimate cost Siis fixed and known (as it does not change for

(19)

Distribution of τ Degree Nb. Prune Age Limit Cost Limit P M

exponential 2 30 1950 210,000 51,217,558

gamma 2 32 2050 185,000 42,033,388

Weibull 2 32 2000 180,000 45,904,175

Table 5: Hyperparameters for the MARS models

simulations, the reserve Ri(x) can be different every day and must therefore be compared every day with the final value

Si. Consequently, the prediction measure to define is complex and can cause many problems. For example, given that

the exposure can be opened for a short or long period of time, the total length of observation of Ri(x) is not the same

for each i: the total prediction error for each exposure i should therefore consider some kind of averaging over the life of the exposure.

We considered many possible prediction measures. Among all the possibilities, we finally chose to define the prediction measure to minimize as:

P M (age) =

r Pm

x=1nx(Ri(t) − Si)2

m (6)

where x is defined as the age of the exposure, nxthe number of exposures still opened after x days and m the largest

value observed for the age at closure. The P M (age) measure corresponds to the sum of the daily difference between the reserve of all exposures aged x, and the total indemnity amount of these opened exposures, weighted by the number of opened exposures with age at closure greater than or equal to x. Using this measure P M (age), we calibrated the reserving model as follows:

1. We separated the validation dataset into 5 binds, where 4/5 are used to estimate the parameters and 1/5 is used to check for prediction;

2. We used many hyperparameters for the MARS model: (1) allowing interactions between B(.) functions or not

3_{, and (2) the number of prune terms.}

3. The MARS model, through hinge functions, finds a relation between S and τ . Even if the largest value of τ in the database is around 3500 days, the model can be used to estimate the expected value of S for value of τ far larger than 3500. Given that the reserve function at time x integrates all values of S for τ > x, the value of the reserve can be unjustifiably impacted by what the model projects for larger values of τ . We then include two other hyperparameters, namely an age limit, and the cost of the exposure at this limit. These hyperparameters determine the expected value of the exposure once the age of the exposure reaches this age limit.

4. For each set of hyperparameters, we estimate the parameters and compute the measure P M (age) on the prediction bind. We then obtain an average measure P M (age), on 5 binds, for each set of hyperparameters. We perform a grid search over almost 2000 combinations of hyperparameters for models with exponential, gamma and Weibull distributions for τ . The hyperparameters that generate the best tuning on the validation set are shown in Table 3.2.3. Based on the validation set, gamma distribution for the distribution of τ seems to offer the best adjustment, as compared to exponential and Weibull. Calibration also shows us that it is best to limit the value of the reserve for all exposures aged older than approximately 2000 days. All models are better when interaction of degree 2 is allowed, and models with 32 parameters (or 30 for the exponential) offer the best prediction measure. Without showing all the 32 parameters retained in the models, let’s say that 11 of those parameters are related to the age of the exposure at closure. Based on eqaution (4), that means that 11 integrations based on equation (5) have to be done for those models. Including hinge functions with different cutting values, and interaction with other categorical covariates (mainly the type of injury for our final model) allows the MARS approach to propose an interesting non-linear and complex relationship between the cost of the exposure and its age.

3

(20)

Using the hyperparameters shown previously, we estimated all the parameters from the validation set of our data, and then calculated predictions from the test set. To show the advantages of the MARS approach, we also included a simpler linear model based on the gamma distribution for τ , where only the age of the exposure at closure τ was included to model E[S]. The first simple model does not have a limit for the value of its individual reserve, while the other one supposes that claims older than 2050 days are reserved at 185, 000.

As was done in the simulations studies, and shown in Figure 3, we can see the evolution of the individual reserving models over time, as the exposure becomes older in Figure 15, where we compares it with the ultimate cost of all claims still opened after x days. We can see that there is almost no difference between the models based on exponential, gamma or Weibull distribution. Compared with simpler models, models based on the MARS approach seems to better predict the value of the total reserve, based on the age of the exposure.

Another way of comparing models, depending on its age, is to analyze the mean reserve of all opened claims for x = 1, . . . , X. Figure 16 illustrates it. We can see that the average reserve increases with the age of the exposure. All three MARS models seem to overestimate the reserve for old exposures. This is explained by the fact that the model was fitted on the validation set where old exposures had highest amount of claim. We can also see how the hyperparameters that limit the cost of old exposures affect the average reserves. Indeed, the graph on the right of the same figure illustrates that the average reserve of the simpler model without limit (the blue line) still increases for age older than 2050 days.

0 50000000 100000000 150000000 0 1000 2000 3000 Age of exposure T otal Reser v e/Ultimate Cost MARS expo. MARS gamma MARS Weibull Real value 0 50000000 100000000 150000000 0 1000 2000 3000 Age of exposure T otal Reser v e/Ultimate Cost LM (limit) LM (no limit) Real value

Figure 15: Reserve of each models depending on the age of the exposure

0 50000 100000 150000 200000 250000 0 1000 2000 3000 Age of exposure A v er age Reser v e/Ultimate Cost MARS expo. MARS gamma MARS Weibull Real value 0 50000 100000 150000 200000 250000 0 1000 2000 3000 Age of exposure T otal Reser v e/Ultimate Cost LM (limit) LM (no limit) Real value

Figure 16: Mean reserve of each models depending on the age of the exposure

3.3 Analyzing the Reserves

For each exposure, we have the loss date, the reported date, the opening date, and the ultimate amount of indemnity paid. This means that for each calendar day, starting from November 1st, 2011, we can compute the ultimate total amount of indemnity paid for all opened exposures in the portfolio. We can compare this value, again for each day, with the total amount of individual reserves. Results of this comparisons are shown in the first two columns of Table 3.3. The first column shows the difference, by day, between the total amount of individual reserves, and the total amount of

(21)

Distribution of τ MSEP - (open/day) % over/day MSEP - (close) Total close (/1000) % diff. MARS approach

exponential 157,798,070 71.78% 83,670.99 262,784.1 207.25%

gamma 141,419,027 72.06% 83,000.13 263,381.2 207.73%

Weibull 146,170,516 72.10% 83,081.56 267 204.8 210.74%

Simpler models, with onlyτ as regressor

gamma (no limits) 212,302,894 74.85% 87,656.37 338,243.4 266.77%

gamma (with limits) 202,570,297 74.31% 87,819.70 336,477.3 265.38%

Table 6: Reserves for the individual model, based on calendar time

indemnity that will be paid for opened exposures. The second column shows the proportion of days where the total reserve was higher than the total amount of ultimate indemnity. Those statistics are associated with objective O1: it shows that the total amount of reserves is similar to the total ultimate amount of indemnities.

Figure 17 illustrates how objective O1 is fulfilled, by day, for each model. We can again see that the distribution of τ does not have an important impact on the individual reserves. We can also see that the MARS models perform better than the simpler ones, but the difference is not so significant. We saw in the previous section of the paper, based on the age of the exposure, that the individual reserves seemed to be correctly calibrated, based on the distribution of τ , on the modeling of E[S(τ )] and on all available covariates. However, Figure 17 shows us that the total amount of individual reserves underestimates the actuarial liability until day 1500. Conversely, from day 1500 onward, the total amount of individual reserve overestimates the actuarial liability. This is not surprising: it is often the case with real insurance data, the portfolio is not homogeneous:

• Seasonality in the data;

• Trends in the severity of the exposure;

• There are almost certainly changes in the way each exposure is treated by the managing teams over time, and this might results in changes to the average age of the exposure before closing;

• Characteristics of the accidents change over time, meaning that the typical exposure is not the same in 2011 compared with 2015, in terms of covariates.

That means, for example, that on average the model might correctly fit and predict the data, but as shown as an example with Figure 8, it does not mean that all sub-elements of the portfolio are well adjusted. If the portfolio is heterogeneous over calendar time, it might highlight this misfitting. However, even more important, as mention at the beginning of the paper, the use of the age of the exposure at closure makes analysis difficult and complex. One possibility to correct this situation might be to use a quantile of S(τ ) instead of E[S(τ )] for the individual estimation. This modification, however, means that the value of each individual exposures should increase. As shown in the last three columns of Table 3.3, this means that it would make the situation worse related to objective O2. Indeed, on the day the exposure is closed, we compared the individual reserve value with the true amount paid. The third column of the table shows the average difference between the estimation and the observation. All models proposed in this paper, on average, makes an $ 83,000 prediction error, which is huge to say the least. The fourth and fifth columns show the impact on the whole portfolio. The total amount paid in the portfolio was $126,792,700. Even if the prediction error is large, we see that MARS models are way better than simpler models, meaning that improvements on the modeling can get closer to attaining objective O2 without, however, reaching it.

3.3.1 Discussion of IBNR Exposures

On all models and data analyzed, only opened exposure was used to compute the total amount of reserves for the portfolio. Given that we only used opened exposures to compared the total amount of individuals reserves with the total amount of indemnity ultimately paid, all of our previous analyses were coherent. However, if an insurer wants to know what its true actuarial liability is, incurred but not reported (IBNR) exposures, but also reported but not opened (RBNO) exposures should be included in the modeling. From this point of view, it is not enough to simply sum all individual

(22)

0 20000000 40000000 60000000 80000000 0 1000 2000 3000 4000 5000 Calendar Time T otal Reser v e/Ultimate Cost MARS expo. MARS gamma MARS Weibull Real value 0 20000000 40000000 60000000 80000000 0 1000 2000 3000 4000 5000 Calendar Time T otal Reser v e/Ultimate Cost LM (limit) LM (no limit) Real value

Figure 17: Reserve of each models depending on the calendar time

reserves of the claims portfolio to obtain the total reserve. Figure 18 shows the impact of IBNR and RBNO exposures on the total portfolio. We see that RBNO exposures are non significant, but IBNR exposures, as expected, are important in calculating the total actuarial liability.

0 20000000 40000000 60000000 0 1000 2000 3000 4000 5000 Calendar Time Ultimate Cost + RBNO + RBNO + IBNR REPORTED

Figure 18: Total value of ultimate indemnity, with INBR and RBNO, by calendar time

The purpose on this paper was to project the cost of individual exposures to see the impact of the age of the exposure at closure. To do so, IBNR (and RBNO) exposures were not considered. When the purpose of a granular reserving method is to project the ultimate indemnity for individual exposures, inclusion of IBNR and RBNO exposures is not important. However, if the individual reserve model has to be used for solvency purposes, or for ratemaking, for example, all exposures should be included, even exposures not yet reported, or not yet opened.

4 Conclusion

The development of micro-reserve methods has increased in recent years and a good number of these methods have been developed as alternatives to runoff triangles in the annual calculation of actuarial liabilities. Insurance companies quickly realized that other uses could be made of these approaches. We can thus imagine:

• For claims managers: real time follow up on opened claims to see their evolution; • Avoiding the subjective judgment of claims adjusters to estimate individual reserves;

• By being able to estimate the final cost of each claims, it becomes easier to analyze in real time the profitability of some insurance markets or insurance products (if IBNR exposures are included in one way or another in the model);

• Similarly, actuaries can directly use the claims estimates for ratemaking (if the IBNR exposures are included in one way or another in the model).

(23)

In this paper, we show that in certain circumstances, we should be careful before using micro-reserve models. Indeed, we show that when the average cost of a claim increases with its age, problem can arises. Indeed, in this case, it become impossible to have a granular reserving model which provides [O1] an adequate prediction of the amounts paid for each claim, and [O2] which provides an adequate value for the total actuarial liability.

With real data, coming from bodily injury coverage in automobile insurance, we show a real case where estimation of individual reserves becomes problematic. We proposed a new model to estimate the individual reserves in real time. By assuming that the all the indemnity is paid when the exposure closes, we developed a reserve model based on waiting time distribution with the MARS theory. The MARS approach provides a way to model the relationship between a random variable and a numeric covariate. By using hinge function, by automatically adjusting certain characteristics of the model, and by allowing interactions with other covariates, the MARS model is an interesting alternative to several other models used in data mining. In addition, a major advantage to using the MARS model is that it makes it possible to obtain an analytical form of the estimate of the reserve at all times, a condition often essential for being able to regularly use the individual reserve models in practice. A cross-validation approach was used to find the hyperparameters. The model proposed in this paper can be generalized in several ways. The MARS model is an extension of a linear regression model, and therefore assumes a gaussian distribution. Generalizations of the MARS model, in order to be able to instead use exponential family distributions might be an interesting avenue to look at. Moreover, because the cost of an exposure S should probably be better modeled in two parts, a model with a dichotomous part indicating whether or not there is a payment, and another part indicating the value of the compensation if there has been a payment, should be considered. Finally, as we saw in our reserve analysis based on calendar day, instead of evaluating the reserve as the expected value of S, a quantile might be needed.

In order to have a more precise individual reserve, we should use new covariates that better describe the claim, the exposure, and the injury. While the MARS approach proposed in this paper is constructed with static covariates known from the opening of the exposure, we believe that another modeling approach using a series of dynamic covariates will greatly improve the prediction quality of the model. Some possible covariates include the evolution of the injury over time, the presence of a lawyer, the partial payments, and the allocated loss adjustment expenses (ALAE) through the life of the exposure. It would be interesting to see how those new covariates will impact objectives O1 and O2. Such an approach, however, will probably require major changes to the model proposed here.

Acknowledgements

The author would like to thank Mathieu Pigeon for his comments, as well as Andra Crainic, Alexandre LeBlanc and Veronika Gousseva from Co-operators General Insurance Company for their advices. Finally, the author would like to thank the Co-operators Chair in Actuarial Risk Analysis, and the Natural Sciences and Engineering Research Council (NSERC) of Canada for their financial support.

References

[1] Arjas, E. The claims reserving problem in non-life insurance: some structural ideas. ASTIN Bulletin, 19(2). 1989. [2] Antonio, K., & Plat, R. Micro-level stochastic loss reserving for general insurance. Scandinavian Actuarial Journal,

2014(7), 649-669. 2014.

[3] Charpentier, A. & Pigeon, M. Macro vs. Micro methods in non-life claims reserving (an econometric perspective). Risks, 4(2):12, 2016.

[4] Duncan, I., Loginov, M., & Ludkovski, M. Testing alternative regression frameworks for predictive modeling of health care costs. North American Actuarial Journal, 20(1), 65-87, 2016.

[5] Duval, F., & Pigeon, M. Individual loss reserving using a gradient boosting-based approach. Risks, 7(3), 79. 2019. [6] Friedland, J. Estimating unpaid claims using basic techniques. Casualty Actuarial Society, vol. 201. 2010. [7] Huang, J., Qiu, C. & Wu, X. Stochastic loss reserving in discrete time: individual vs. aggregate data models

(24)

[8] Kuhn, M. & Johnson, K. Applied Predictive Modeling. New York, NY: Springer New York, 2013

[9] Norberg, R. A contribution to modeling of IBNR claims. Scandinavian Actuarial Journal, 155-203, 1986 [10] Pigeon, M., Antonio, K., & Denuit, M. Individual loss reserving with the multivariate skew normal framework.

ASTIN Bulletin, 43, 399-428, 2013

[11] Wuthrich, M. V. Machine learning in individual claims reserving. Scandinavian Actuarial Journal, vol. 2018, no 6, p. 465-480. 2018

[12] Wuthrich, M. V., & Merz, M. Stochastic claims reserving methods in insurance John Wiley & Sons., 2008 Wüthrich, M.V. (2018). Machine learning in individual claims reserving, Scandinavian Actuarial Journal, in press.