HAL Id: hal-01885117
https://hal.archives-ouvertes.fr/hal-01885117
Submitted on 1 Oct 2018
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Probability estimation by an adapted genetic algorithm in web insurance
Anne-Lise Bedenel, Laetitia Jourdan, Christophe Biernacki
To cite this version:
Anne-Lise Bedenel, Laetitia Jourdan, Christophe Biernacki. Probability estimation by an adapted
genetic algorithm in web insurance. LION 12 - Learning and Intelligent Optimization Conference,
Jun 2018, Kalamata, Greece. �hal-01885117�
Probability estimation by an adapted genetic algorithm in web insurance
Anne-Lise Bedenel 1,2,3 , Laetitia Jourdan 2 , Christophe Biernacki 3
1
MeilleureAssurance, France
[email protected]
2
Universit´ e Lille 1 CRIStAL, UMR 9189, France [email protected]
3
Inria, Universit´ e Lille 1, France
christophe.biernacki@{inria,math.univ-lille1}.fr
Abstract. In the insurance comparison domain, data constantly evolve, implying some difficulties to directly exploit them. Indeed, most of the classical learning methods require data descriptors equal to both learning and test samples. To answer business expectations, online forms where data come from are regularly modified. This constant modification of features and data descriptors makes statistical analysis more complex.
A first work with statistical methods has been realized. This method relies on likelihood and models selection with the Bayesian information criterion. Unfortunately, this method is very expensive in computation time. Moreover, with this method, all models should be exhaustively compared, what is materially unattainable, so the search space is limited to a specific models family.
In this work, we propose to use a genetic algorithm (GA) specifically adapted to overcome the statistical method defaults and shows its per- formances on real datasets provided by the company MeilleureAssur- ance.com.
Keywords: Genetic Algorithms, BIC, insurance, WEB
1 Introduction
The objective of online insurance comparators is to propose to web users the offer the most adapted to their expectations, according to their profiles. Most of the online insurance comparators compare with only one criterion: the price. To improve web users comparison, the company MeilleureAssurance.com wishes to create a model allowing predicting the best offer according to web user profiles.
It is a classical objective in statistics but, with the functioning of an insurance
comparator, standard methods cannot be used. To do an online comparison, a
web user has to fill a form of questions. When the form is filled, data are sent to
insurer partners with a web service. So, they can send the real price of the offer
back to the company. An insurance comparator adapts and changes regularly its
forms:
– For insurers: Each insurer has his particularly pricing system, questions are not homogeneous between all insurers partner.
– For web users: For more clarity and simplicity, questions are regularly adapted.
This adaptability is a specificity of insurance comparators. Due to this specificity, building a supervised classification model with these features becomes complex.
A first work has been realized to solve this problem, using several statistical tools as the likelihood, to estimate the parameters. In this first work, the mod- eling realized shows many constraints and involves a problem of model selection [1]. The model selection is realized with an exhaustive search. Indeed, with the statistical process, the selection model is performed by comparing all available models with an information criterion [2]. This method shows good results for the estimation and the model selection. However, it is time-consuming and the number of models to compare has to be limited. To avoid the exhaustive search, an optimization approach is considered.
In the literature, many papers can be found where a genetic algorithm is used to do a model selection [8], [9], [10], [11] and parameters estimation [12]. In this work, we propose a new genetic algorithm to overcome defaults of statistical methods and new operators to have the best metaheuristics. Section 2 describes the statistical work with the modeling problem. Section 3 presents the algorithm used and new operators proposed. Section 4 drives experiments and compares results between statistical method and genetic algorithm on simulated and real datasets. Section 5, gives some conclusions and perspectives for future works.
2 Modeling of the problem
2.1 Probabilistic modeling
To introduce the general modeling, a use case is studied where the question is:
“How web users react when data descriptors of feature change?”
To answer to this question, the feature Coverage levels is studied.
In a first time, this feature had four data descriptors {Third-party (T), Third- party++ (T++), third-party, fire and theft (TPFT), comprehensive (C)}. This feature denoted by X ∈ {1, . . . , I}, I designating the number of data descrip- tors (I = 4 in our use case). In a second time, it has been decomposed into seven data descriptors: {Third-party (T), Third-party+ (T+), Third-party++
(T++), third-party, fire and theft (TPFT), comprehensive (C), comprehensive+
(C+), comprehensive++ (C++)}. This feature is symmetrically denoted by Y ∈ {1, . . . , J }, J designating the new number of data descriptors (J = 7 in our use case).
Another specificity for an insurance comparator is that there is no data his- tory of web users. So, the available observations on X and Y are never matched.
More precisely, this property can be written like this:
Period before the modification N − Web users have filled the feature X , so there
are observed realisations X − = (X 1 − , . . . , X N −
−). The feature Y has never been
filled, so there are unobserved realisations Y − = (Y 1 − , . . . , Y N −
−).
Period after the modification Symmetrically, N + Web users have filled the fea- ture Y , so there are observed realisations Y + = (Y 1 + , . . . , X N +
+). The feature X has never been filled, so there are unobserved realisations X + = (X 1 + , . . . , X N +
+).
2.2 Parameters estimation and models selection
We assume each couple (X n ∗ , Y n ∗ ) is an independent and identically distributed realization of the couple (X, Y ) with n = 1, . . . , N ∗ and ∗ ∈ {−, +}. The distri- bution of the couple (X, Y ) can be written as:
P(X = i, Y = j) = p ij p i (1)
where p ij = P (Y = j|X = i) and p i = P (X = i), with i = 1, . . . , I and j = 1, . . . , J . The interest, here, is to show the transition probabilities p ij between data descriptors X and Y . The objective is to estimate the whole of transition probabilities p .. = (p ij ). It can be noted p . = (p i ), which is also an unkown parameter.
Fig. 1. Graph of possible matching between X- feature before modification and Y- feature after modification.
The set of transition (matching) probabilities p .. is introduced with the graph displayed on Fig 1, where the oriented edges represent estimated parameters in the use case. It can be noted that the number of parameters p .. is larger than the number of parameters of Y distribution. So, the model is statistically over- parameterized and therefore has multiple solutions whose range can be found through repeated optimization from different starting points. More precisely, in the use case, there are 28 matching probabilities (so 24 free parameters).
However, there are only 6 free parameters for Y distribution. So the model is said unidentifiable. To have an identifiable model, some constraints have to be imposed on p .. to limit the number of free parameters to 6 or less (dim(p .. ) ≤ J − 1). To respect the identifiability constraint, it has been proposed to fix some value of p .. to 0. This type of constraint will be a model noted m. So, it leads to a set of models M = {m} and they will be challenged. For one model m, parameters p .. are estimated with log-likelihood maximization of observed data [3] defined by:
` m (p .. , p . ; X − , Y + ) =
J
X
j=1
N j + ln
I
X
i=1
p ij p i
! +
I
X
i=1
N i − ln p i (2)
where N i − = #{X n − = i, n = 1, . . . , N − } and N j + = #{Y n + = j, n = 1, . . . , N + }.
With this maximization of log-likelihood, estimators of p . and p .. (respectively ˆ p . and ˆ p .. ) are obtained. To choice the best model m in the set M, the conditional BIC criterion (Bayesian Information Criterion) [2], given by:
BIC m = −2` m (ˆ p .. , p ˆ . ; Y + |X − ) + ν m ln N + (3) where ν m = dim(m) is the number of free parameters for the model m, is used.
The model ˆ m having the lowest value of BIC criterion is selected. This method gives good results for the estimation and model selection [1]. However, it is an exhaustive method that is time-consuming. Indeed, the exhaustive method han- dles two problems. The first one is to find values of estimated parameters. The second one is to select the best model. So, in the exhaustive method, there is a continuous problem (estimation of parameters) for each model and a combinato- rial problem (models selection). For example, in the use case, the feature X has 4 levels and Y has 7 levels. Therefore, there are 6∗4 6
= 134 596 possible models.
For each model, probability values have to be found. So there are 134 596 con- tinuous problems where probability values have to be found. All models have to be compared to select the best one. Among these possibilities, some of them are not allowed. Indeed, each X level and Y level has to be joined. Consequently, the whole of possibilities is reduced. However, it is not enough to do the exhaustive method. To do it, only a specific model family is compared which is defined by:
– The number of parameters is fixed and equal to the number of Y levels - 1.
– The parameters are probabilities, so for each X i levels, the sum has to be equal to one. This constraint imposes, for each X i levels, the last level of Y has to be equal to (1 − P J
j=1 p ij ). Figure 1 illustrates this. So the last level of Y is fixed as no free parameters. It cannot be set to 0 and is not an estimated parameter.
Even with these constraints, the exhaustive method stays time-consuming.
For example, in the use case, to do the estimation and the comparison of 4 095 models, the process needs 1h07. In a business context, it is very long. Moreover, the firm MeilleureAssurance.com could have features with more levels, so with more parameters to estimate and a larger combinatorial problem. The objective is to reduce the computation time to do the estimation and the comparison. To reach this objective, it can be interesting to perform a non-exhaustive search. The problem involves having a flexibility according to constraints and the objective is to have a high-quality solution. A stochastic method responds to these expecta- tions. Moreover, according to the problem, a metaheuristic based on population (P-Metaheuristic), and more especially an evolutionary method, is adapted. To challenge the statistical method, we propose to use a genetic algorithm [16].
3 Algorithms
The use of a genetic algorithm will allow obtaining a good solution quickly.
Moreover, a genetic algorithm is particularly adapted to real problems. To have
a better speed of convergence, a steady-state algorithm (ssGA) [15] is also used.
3.1 Encoding and evaluation
A genetic algorithm is defined by a potential solution, a population, an environ- ment (search space) and a fitness function.
Definition of the solution To define the solution, the probabilistic modeling is used. A model m is a set of transition probabilities p .. where some of them are estimated or fixed to 0. Work with probabilities involve: each estimated probability p .. ∈ [0, 1] and is a real number. A solution corresponds to a model.
Figure 2 shows an example of representation of model m and Figure 3 his matrix form. In the solution, to each X level, the sum of probabilities has to be equal to 1. Figure 3 shows estimated probabilities of X for the level T which are 0.6 and 0.4.
Fig. 2. Graph of possible match- ing between X- feature before mod- ification and Y- feature after mod- ification for a fixed model m.
X \Y T T+ T++ TPFT C C+ C++
T 0.6 0 0.4 0 0 0. 0
T++ 0 0.9 0 0 0 0 0.1
TPFT 0 0 0 0.7 0 0.3 0
C 0 0 0 0 1 0 0
Fig. 3. Matrix of transition corresponding to the model m between the levels X and Y.
Population The population is composed of available models in the search space:
the set of models M. Contrary to the statistical method, the set of models M does not need to be reduced.
Fitness function To evaluate the solution (model), the BIC criterion defined by Equation 3 is used as the fitness function which has to be minimized.
3.2 Operators
A genetic algorithm relies on three operators that are: selection, crossover and
mutation. The choice of these operators depends on encoding parameters and
for this problem a real encoding is used. The selection will be performed by a
classical operator. To perform the crossover and mutation, the choice is more
complex and these operators have to be adapted to the problem. Indeed, models have many 0 and the statistical process shows that the place of all 0 is a real information. The first objective will be to design operators adapted to manage the 0 information.
Selection operator The selection will be performed with a classical operator:
the Binary Tournament Selection [6]. This operator selects randomly two solu- tions of population M. In a second time, it evaluates solutions with BIC m and selects the best solution ˆ m which is the solution with lowest BIC criterion. So, we have :
m ˆ = argmin BIC m (4)
Crossover operators Two crossover operators relying on real encoding are compared to find the most efficient:
– Uniform crossover [5]: The uniform crossover operator is very simple, ac- cording to the crossover probability, each parameter can be crossed with a probability 0.5.
– SBX crossover (Simulated Binary Crossover) [4] [13]: The SBX crossover is an operator which adapts to the evolution algorithm according to the fitness function of parents and their offspring. From two parents p 1 (i) and p 2 (i), this crossover generates two offspring c 1 (i) and c 2 (i) by the following relation:
c 1 (i) = 0.5[(1 + β)p 1 (i) + (1 − β)p 2 (i)]
c 2 (i) = 0.5[(1 − β)p 1 (i) + (1 + β)p 2 (i)]
where β is a spread factor given by:
β =
( (2u)
η+11if u < 0.5 ( 2(1−u) 1
1
η+1