• Aucun résultat trouvé

Genetic algorithms

Dans le document Data Mining (Page 74-78)

Soft Computing

2.2 WHAT IS SOFT COMPUTING?

2.2.5 Genetic algorithms

Genetic algorithms (GAs) [53, 54] are adaptive and robust computational search procedures, modeled on the mechanics of natural genetic systems. They act as a biological metaphor and try to emulate some of the processes observed

56 SOFT COMPUTING

in natural evolution. While evolution operates on encodings of biological en-tities in the form of a collection of genes called a chromosome, GAs operate on string representation of possible solutions in terms of individuals or chro-mosomes containing the features. The feature value, the string structure and the string structure's decoded value in case of a GA correspond to the allele, genotype, and phenotype in natural evolution.

The components of a GA consist of

• Population of individuals

• Encoding or decoding mechanism of the individuals

• Objective function and an associated fitness evaluation criterion

• Selection procedure

• Genetic operators like recombination or crossover, mutation

• Probabilities to perform the genetic operations

• Replacement technique

• Termination conditions

Let us consider, as an example, the optimization of a function y =

A binary vector is used as a chromosome to represent real values of the vari-ables Xi, with the length of the vector depending on the required precision.

A population is a set of individuals (chromosomes) representing the concate-nated parameter set X i , x-z, • • • , xp, where each member refers to a coded pos-sible solution. For example, a sample chromosome

0000|0100|...|1100

could correspond to Xi = 0000, x% = 0100, and xp = 1100. The chromosomes can be of fixed or variable size. Selection obeys the Darwinian survival of the fittest strategy, with the objective function playing the role of Nature (environment). Variation is introduced in the population through the genetic operations like recombination (crossover) and mutation. Normally the initial population is chosen randomly.

Encoding is used to convert parameter values into chromosomal representa-tion. In case of continuous- valued parameters, a decimal-to-binary conversion is used. For example, using a 5-bit representation, 13 is encoded as 01101. In case of parameters having categorical values, a particular bit position in the chromosomal representation is set to 1 if it comes from a certain category. For example, the gender of a person can have values from {male, female}, such

WHAT IS SOFT COMPUTING? 57

that male/female is represented by the string 10/01. These strings (represent-ing the parameters of a problem) are concatenated to form a chromosome.

Decoding is the reverse of encoding. For a continuous-valued parameter the binary representation is converted to a continuous value by the expression

•^.bits-used— 1 i •, r\i

lower -bound + =~bit U8ed — * - * (upper -bound — lower -bound).

£ "~ ~~~ J.

Hence 01101 in five bits (bits-used] is decoded back to 13, using lower -bound = 0 and upper Jbound = 31. In case of categorical parameters, the value is found by consulting the original mapping.

The fitness function provides a measure of a chromosome's performance.

Selection gives more chance to better-fitted individuals, thereby mimicking the natural selection procedure. Some of the popular selection techniques include roulette wheel selection, stochastic universal sampling, linear normalization selection, and tournament selection. The roulette wheel selection procedure initially sums the fitness values (^s) of all the N chromosomes in the pop-ulation, and it stores them in slots sized accordingly. Let this sum be given by total -fitness. The probability of selection pi for the ith chromosome is expressed as

p. = - li - (2.30)

total-fitness

while the cumulative probability qi after inclusion of the ith chromosome is given by

3=1

Selection is made by spinning the roulette wheel N times, on each occasion generating a random number nr in [0, total -fitness]. This returns the first chromosome whose fitness, when added to the fitness of the preceding popu-lation members, is greater than or equal to nr. In rule form, we have IF nr < qi THEN select the first chromosome,

ELSE select the ith chromosome such that Qi-i < nr < qi.

For example, let there be five chromosomes with fitness values 40, 30, 18, 10, 2, having total-fitness = 100. These constitute slots sized 40%, 30%, 18%, 10%, and 2% of the area of the wheel. Each time one requires to select a chromosome, for applying crossover or mutation, a simple spin of the roulette wheel is made with nr. Here, with nr = 45, the algorithm selects the second chromosome, since 40 + 30 > 45.

Recombination or crossover is modeled by choosing mating pairs from the selected chromosomes. Crossover probability pc is used to determine whether a pair should be crossed over, and then the corresponding chromosome segments are interchanged. A random number nrc is generated in the range [0, 1]. If

58 SOFT COMPUTING

nrc < PC, the corresponding chromosome pair is selected for crossover. Again, crossover can be one point, two point, multipoint, or uniform. Let us consider, as an example, two parent chromosomes xyxyxyxy and abababab where x, y, o, 6 are binary. In one-point crossover at the 6th bit involving the parent chromosomes

xyxyx\yxy ababa\bab, one generates the children

xyxyx\bab ababa\yxy.

Here the segment involving bits 6 to 8 is interchanged between the parents.

In case of two-point crossover at the 3rd and 6th bits, involving parent chro-mosomes

xy\xyx\yxy ab\aba\bab, we obtain the children chromosomes

xy\aba\yxy ab\xyx\bab.

Here the segment constituting bits 3 to 5 is swapped between the parents to generate the pair of offsprings.

Mutation is used to introduce diversity in the population. Mutation prob-ability pm determines whether a bit should be mutated, and then the corre-sponding location is flipped. For example, a mutation at the 3rd bit would transform the chromosome 00|1|000 to 00|0|000. Probabilities pc and pm can be fixed or variable, and they typically have values ranging between 0.6 to 0.9, and 0.001 to 0.01, respectively.

Let us consider a simple example related to minimizing the surface area A of a solid cylinder, given radius r and height h, to illustrate the working principle of GAs. Here the fitness function can be expressed as

A = 2n * r * h + 2?r * r2 — 2n * r(h + r).

We need to encode the parameters r and h in a chromosome. Using a 3-bit representation, we demonstrate encoding, crossover, and mutation. For n = 3, hi = 4 and r% = 4, h-2 = 3, we generate parent chromosomes 011(100 and 100)011 with AI — 132, AI — 176, respectively. Let there be one-point crossover at bit 4, producing the children chromosomes 011(011 and 100J100.

This is decoded as r\c = 3, h\c = 3 and r^c — 4, h%c = 4, with A\c = 16.16 and A-2C = 28.72, respectively. Now, let there be mutation at bit 5 of the first child. This generates the chromosome 0110|0|1, for ricm = 3 and h\cm = 1,

WHAT IS SOFT COMPUTING? 59

with Aicm = 10.77. This is the minimum value of fitness obtained thus far.

Consecutive applications of the genetic operations of selection, crossover, and mutation, up to termination, enable the minimization (optimization) of the chosen fitness function.

The replacement techniques can be

1. Generational, where all the n individuals are replaced at a time by the n children created by reproduction. Elitism is often introduced to retain the best solution obtained so far.

2. Steady state, where m < n members are replaced at a time by the m children reproduced.

The terminating criterion for the algorithm can be on the basis of

• execution for a fixed number of generations or iterations,

• a bound on the fitness value of the generated solution, or

• acquiring of a certain degree of homogeneity by the population.

GAs have been applied in diverse problems involving optimization, schedul-ing, graph colorschedul-ing, genetic programmschedul-ing, pattern recognition, image pro-cessing, data mining, artificial immune systems, and financial prediction or bidding strategies.

Dans le document Data Mining (Page 74-78)