THE EDA PARADIGM - Data Mining: A Heuristic Approach

Many combinatorial optimization algorithms have no mechanism for captur-ing the relationships among the variables of the problem. The related literature has many papers proposing different heuristics in order to implicitly capture these relationships. GAs implicitly capture these relationships by concentrating samples on combinations of high-performance members of the current population through the use of the crossover operator. Crossover combines the information contained within pairs of selected ‘parent’ solutions by placing random subsets of each parent’s bits into their respective positions in a new ‘child’ solution. In GAs no explicit information is kept about which groups of variables jointly contribute to the quality of candidate solutions. The crossover operation is randomized and could disrupt many of these relationships among the variables; therefore, most of the

crossover operations yield unproductive results and the discovery of the global optima could be delayed.

On the other hand, GAs are also criticized in the literature for three aspects (Larrañaga, Etxeberria, Lozano & Peña, 2000):

• the large number of parameters and their associated preferred optimal selec-tion or tuning process (Grefenstette, 1986);

• the extremely difficult prediction of the movements of the populations in the search space; and

• their incapacity to solve the well-known deceptive problems (Goldberg, 1989).

Linkage Learning (LL) is the identification of groups of variables (or Building Blocks, BBs) that are related. Instead of extending the GA, the idea of the explicit discovery of these relationships during the optimization process itself has emerged from the roots of the GA community. One way to discover these relationships is to estimate the joint distribution of promising solutions and to use this estimate in order to generate new individuals. A general scheme of the algorithms based on this principle is called the Estimation of Distribution Algorithm (EDA) (Müehlenbein

& Paab, 1996). In EDA there are no crossover nor mutation operators, and the new population of individuals (solutions) is sampled from a probability distribution which is estimated from the selected individuals. This is the basic scheme of the EDA paradigm:

1. D_o← Generate N individuals (the initial population) randomly 2. Repeat for l = 1,2,... until a stop criterion is met:

2.1 DS_l-1← Select S ≤ N individuals from D_l-1according to a selection method 2.2 p_l ( x ) = p( x | DS_l-1) ← Estimate the joint probability distribution of

selected individuals

2.3. D_l ← Sample N individuals (the new population) from p_l ( x )

However, estimating the distribution is a critical task in EDA. The simplest way to estimate the distribution of good solutions assumes the independence between the features of the problem. New candidate solutions are sampled by only regarding the proportions of the values of the variables independently to the remaining ones. Population Based Incremental Learning (PBIL) (Baluja, 1994), Compact Genetic Algorithm (cGA) (Harik, Lobo & Goldberg, 1997), Univariate Marginal Distribution Algorithm (UMDA) (Müehlenbein, 1997) and Bit-Based Simulated Crossover (BSC) (Syswerda, 1993) are four algorithms of this type. In our work we use the PBIL and BSC algorithms.

In PBIL, the probability distribution to sample each variable of an individual of the new population is learned in the following way:

p_l ( x_i) = ( 1-α ) . p_l-1( x_i | D_l-1) + α . p_l-1( x_i | DS_l-1)

• x_i is the i-th value of the variable X_i;

• p_l-1(x_i | D_l-1) and p_l-1(x_i | DS_l-1) are the probability distributions of the variable

i in the old population and among selected individuals, respectively; and

• α is a user parameter which we fix it to 0.5.

For each possible value of every variable, BSC assigns a probability propor-tional to the evaluation function of those individuals that in the generation take the previous value:

p_l ( x_i) = e( I_i) / e( I )

• e( I_i) is the sum of the evaluation function values of the individuals with value x_i in the variable X_i; and

• e( I ) is the sum of the evaluation function values of all individuals.

The estimation of the joint probability distribution can also be done in a fast way without assuming the hypothesis of independence between the variables of the problem (which in some problems, is far away from the reality), and only taking into account dependencies between pairs of variables and discarding dependencies between groups of more variables.

Efforts covering pairwise interactions among the features of the problem have generated algorithms such as: MIMIC which uses simple chain distributions (De Bonet, Isbell & Viola, 1997), Baluja and Davies (1997) use the so-called depen-dency trees and the Bivariate Marginal Distribution Algorithm (BMDA) was created by Pelikan and Müehlenbein (1999). The method proposed by Baluja and Davies (1997) was inspired by the work of Chow and Liu (1968). In this chapter, we call TREE the dependency tree algorithm presented in the work of Chow and Liu (1968) . In our work we use the MIMIC and TREE algorithms.

In MIMIC, the joint probability distribution is factorized by a chain structure.

Given a permutation of the numbers between 1 and d, Π = i1 , i2 ,..., id , MIMIC searches for the best permutation between the d variables in order to find the chain distribution which is closest with respect to the Kullback-Leibler distance to the set of selected individuals:

p_l( x ) = p( X _i1| X _i2). p( X _i2| X _i3) ... p( X _id-1| X _id) . p( X _id)

The TREE algorithm induces the optimal dependency tree structure in the sense that among all possible trees, its probabilistic structure maximizes the likelihood of selected individuals when they are drawn from any unknown distribu-tion. See Figure 1 for a graphical representation of MIMIC and TREE models.

Figure 1: Graphical representation of MIMIC and TREE probabilistic models

MIMIC structure

TREE structure

Several approaches covering higher order interactions include the use of Bayesian networks to factorize the joint probability distribution of selected indi-viduals. In this way, the EBNA (Etxeberria & Larrañaga, 1999) and BOA (Pelikan, Goldberg & Cantú-Paz, 1999) are two algorithms of this type. The FDA (Müehlenbein

& Mahning, 1999) multivariate algorithm uses a previously fixed factorization of the joint probability distribution. Harik (1999) presents an algorithm (Extend compact Genetic Algorithm, EcGA), whose basic idea consists of factorizing the joint probability distribution in a product of marginal distributions of variable size;

these marginal distributions of variable size are related to the variables that are contained in the same group and to the probability distributions associated with them. For a more complete review of the different EDA approaches, see the work of Larrañaga, Etxeberria, Lozano, Sierra, Inza and Peña (1999).

Dans le document Data Mining: A Heuristic Approach (Page 111-114)