DYNAMIC SYSTEM MODELS OF GENETIC ALGORITHMS

Mathematical Models of Genetic Algorithms

4.5 DYNAMIC SYSTEM MODELS OF GENETIC ALGORITHMS

In this section we use the Markov model of the previous section to derive a dynamic system model of G As. The Markov model gives us the probability of occurrence of each population distribution as the number of generations approaches infinity.

The dynamic system model that we derive here is quite different; it will give us the percentage of each individual in the population as a function of time as the population size approaches infinity. The view of a GA as a dynamic system was originally published in [Nix and Vose, 1992], [Vose, 1990], [Vose and Liepins, 1991], and is explained further in [Reeves and Rowe, 2003], [Vose, 1999].

Recall from Equation (4.22) that v = [ V\ · · · vⁿ ]T is the population vector, Vi is the number of X{ individuals in the population, and the elements of v sum to TV, which is the population size. We define the proportionality vector as

p = v/N (4.49) which means that the elements of p sum to 1.

4.5.1 Selection

To find a dynamic system model of a G A with selection only (i.e., no mutation or crossover), we can divide the numerator and denominator of Equation (4.36) by TV to write the probability of selecting individual Xi from a population described by population vector v as follows:

SECTION 4.5: DYNAMIC SYSTEM MODELS OF GENETIC ALGORITHMS 8 3

Pi / I \ Pi H

s{Xi\v) = = n T-L,j=iPjfj

(4.50) f^TP

where / is the column vector of fitness values. Writing Equation (4.50) for i € [1, n and combining all n equations gives

" Pe(Xl\v) ~

Ps(x\v) = diag(/)j> , .

~ΎΓ

^{( 4}

·

5 1 )

Ps(Xn\v)

where diag(/) is the nxn diagonal matrix whose diagonal entries are comprised of the elements of / .

The law of large numbers tells us that the average of the results obtained from a large number of trials should be close to the expected value of a single trial [Grinstead and Snell, 1997]. This means that as the population size becomes large, the proportion of selections of each individual Xi will be close to P^s(xi\v). But the number of selections of xi is simply equal to vi at the next generation. Therefore, for large population sizes, Equation (4.50) can be written as

Pi(t) = ^ ( * " 1 } / ;u - (4-52)

where t is the generation number.

Now suppose that

n m Pi(°)fi^{A roï

m = TJü^m' ( }

This is clearly true for t = 1, as can be seen from Equation (4.52). Supposing that Equation (4.53) holds for t — 1, the numerator of Equation (4.52) can be written as

f.O.(t-i) - f ^ ( Ο ) / /- 1

pmn

and the denominator of Equation (4.52) can be written as

(4.54)

" èn ⁼¹ /rv ^fc (o)· ⁽⁴ · ⁵⁵⁾

Substituting Equations (4.54) and (4.55) into Equation (4.52) gives

Lfc=i fk P * ( ° )

This equation gives the proportionality vector as a function of time, as a function of the fitness values, and as a function of the initial proportionality vector, when only selection (no mutation or crossover) is implemented in a GA.

■ EXAMPLE 4.11

As in Example 4.9, we consider the three-bit one-max problem with fitness values

/(000) = 1, /(001) = 2, /(010) = 2, /(Oil) = 3,

/(100) = 2, /(101) = 3, /(110) = 3, /(111) = 4. ^'⁰ⁱ⁾

Suppose the initial proportionality vector is

p ( 0 ) = [ 0 . 9 3 0.01 0.01 0.01 0.01 0.01 0.01 0.01 ]T. (4.58)

93% of the initial population is comprised of the least fit individual, and only 1% of the population is comprised of the most fit individual. Figure 4.4 shows a plot of Equation (4.56). We see that as the GA population evolves, £4, XQ, and £7, which are the second best individuals, initially gain much of the population that originally belonged to p\. The least fit individual, #i, quickly is removed from the population by the selection process. p2> P3, and p$ are not shown in the figure. It does not take very many generations before the entire population converges to xg? the optimal individual.

Figure 4.4 Population proportionality vector evolution for Example 4.11. Even though the best individual, x&, starts with only 1% of the population, it quickly converges to 100%.

The least fit individual, xi, starts with 93% of the population but quickly decreases to 0%.

SECTION 4.5: DYNAMIC SYSTEM MODELS OF GENETIC ALGORITHMS 8 5

We have discussed the dynamic system model for fitness-proportional selection, but other types of selection, such as tournament selection and rank selection, can also be modeled as a dynamic system [Reeves and Rowe, 2003], [Vose, 1999].

4.5.2 Mutation

Equation (4.51), along with the law of large numbers, tells us that diag(/)p(t - 1)

P(*) = fTp(t - 1) (selection only). (4.59) If selection is followed by mutation, and Mji is the probability that Xj mutatates to Xi, then we can use a derivation similar to Equation (4.38) to obtain

, x MTdmg(f)p(t - 1) , , .

p(t) = -pF/ 7 (selection and mutation).

fÂp{t-1) (4.60) eigenvector of A. We see that the steady-state proportionality vector of a selection-mutation G A (i.e., no crossover) is an eigenvector of MTdiag(/).

EXAMPLE 4.12

As in Example 4.10, we consider the three-bit deceptive problem with fitness values

/(000) = 5, /(001) = 2, /(010) = 2, /(Oil) = 3,

/(100) = 2, /(101) = 3, /(110) = 3, /(111) = 4. (4.62) We use a mutation rate of 2% per bit in this example. For this problem, we obtain

We calculate the eigenvectors of MTdiag(/) as indicated by Equation (4.61) and scale each eigenvector so that its elements sum to 1. Recall that eigen-vectors of a matrix are invariant up to a scaling value; that is, if p is an

eigenvector, then cp is also an eigenvector for any nonzero constant c. Since the each eigenvector represents a proportionality vector, its elements must sum to 1 as indicated by Equation (4.49). We obtain eight eigenvectors, but only one of them is comprised entirely of positive elements, and so there is only one steady-state proportionality vector:

paa{l) = [ 0.90074 0.03070 0.03070 0.00221

0.03070 0.00221 0.00221 0.0005 ]T. (4.64) This indicates that the G A will converge to a population consisting of 90.074%

of x\ individuals, 3.07% each of X2, #3, and x$ individuals, and so on. Over 90% of the GA population will consist of optimal individuals. However, there is also an eigenvector of MTdiag(/) that contains only one negative element:

pss{2) = [ -0.0008 0.0045 0.0045 0.0644

0.0045 0.0644 0.0644 0.7941 ]T. (4.65)

This is called a metastable point [Reeves and Rowe, 2003], and it includes a high percentage (79.41%) of x% individuals, which is the second most fit individual in the population. Any proportionality vector close to p^ss{2) will tend to stay there since p^ss(2) is a fixed point of Equation (4.61). However, pss(2) is not a valid proportionality vector since it has a negative element, and so even though the G A population is attracted to ps s(2), eventually the population will drift away from it and will converge to ps s(l). Figure 4.5 shows the results of a simulation of the select ion-mut at ion G A. We used a population size N — 500, and an initial proportionality vector of

p ( 0 ) = [ 0 . 0 0.0 0.0 0.1 0.0 0.1 0.1 0.7 ]^T (4.66) which is close to the metastable point p^ss(2). We see from Figure 4.5 that

for about 30 generations the population stays close to its original distribu-tion, which is comprised of 70% of x$ individuals, and which is close to the metastable point pss(2). After about 30 generations, the population quickly converges to the stable point ps s(l), which is comprised of about 90% of x\

individuals. Note that if the simulation is run again it will give different re-sults because of the random number generator that is used for selection and mutation.

Figure 4.6 shows p\ and p$ from Equation (4.60) for 100 generations. This gives an exact proportion of x\ and x% individuals as the population size approaches infinity. We can see that Figures 4.5 and 4.6 are similar, but Figure 4.5 is the result of a finite population size simulation and will change each time the simulation is run due to the random number generator that is used. Figure 4.6, on the other hand, is exact.

SECTION 4.5: DYNAMIC SYSTEM MODELS OF GENETIC ALGORITHMS 8 7

1 0.8

| 0.6 t o _Q.

CLo 0.4 0.2

0 20 40 60 80 100 Generation

Figure 4.5 Simulation results for Example 4.12. The population hovers around the met astable point, which is comprised of 70% of x\ individuals, before eventually converging to the stable point of 90% of x& individuals. Results will change from one simulation to the next due to the stochastic nature of the simulation.

1 0.8

| 0.6 '•c o

QL O

CL 0.4

0.2

°0 20 40 60 8Ö~~ 100 Generation

Figure 4.6 Analytical results for Example 4.12. Compare with Figure 4.5. Analytical results do not depend on random number generators.

4.5.3 Crossover

As in Section 4.4.3, we use r ^ to denote the probability that Xj and Xk cross to form xi. If the population is specified by the proportionality vector p in an infinite population, then the probability that Xi is obtained from a random crossover is derived as follows:

Proportion of [0 0 0]

Proportion of [1 1 1]

Pc(Xi\p) = n n

j=\ fc=l fc=l j=l

Y^Pk [ Pi · · ' Pn ]

k=l

= [ Pi '" Pn ] y^Pfc

fc=l

Tnki riki

Tnki

Σ

^η

[ rlu : rini J P [ f*nli * ' ' Vnni J P ï*lli * * ' fini

rnli

PTRiP (4.67) where the element in the j - t h row and k-th column of Ri is r ^ , the probability that Xj and Xk cross to form X{. We again using the law of large numbers [Grinstead and Snell, 1997] to find that in the limit as the population size N approaches infinity, crossover changes the proportion of Xi individuals as follows:

pi = Pc(xi\p)=pTRip. (4.68)

Although Ri is often nonsymmetric, the quadratic P^c(xi\p) can always be written using a symmetric matrix as follows.

Pc(xi\p) = pTRiP

= ^p^TRiP+^{pTRip)T (4.69) where the second line follows because p^TRiP is a scalar, and the transpose of a scalar is equal to the scalar. Therefore, recalling that (ABC)^T = C^TBTAT,

Pc(xi\p) = ^PTRiP+^PTR[p

= -p^T(Ri + Rj)P

= pTRiP where the symmetric matrix Ri is given by

Ri — 9 ^ * "*" ^ )'

(4.70)

(4.71)

SECTION 4.5: DYNAMIC SYSTEM MODELS OF GENETIC ALGORITHMS 8 9

EXAMPLE 4.13

As in Example 4.8, suppose we have a four-element search space with indi-viduals x = {#i,x2,#3,£4} = {00,01,10,11}. We implement crossover by randomly setting 6 = 1 or 6 = 2 with equal probability, and then concatenat-ing bits 1 —l· b from the first parent with bits ( 6 + 1 ) —> 2 from the second parent. The crossover possibilities can be written as

0 0 x 0 0 This gives rjki crossover probabilities, which are the probabilities that Xj and Xk cross to give x\ — 00, as follows:

which results in the crossover matrix

Ri =

Ri is clearly nonsymmetric, but P^c(xi\p) can still be written using the sym-metric matrix

The other Ri matrices can be found similarly.

Now suppose we have a GA with selection, mutation, and crossover, in that order. We have a proportionality vector p at generation t — 1. Selection and mutation modify p as shown in Equation (4.60):

MTdmg(f)p(t-l)

P{t)= fTpit-1) · ( 4·7 6 )

Crossover modifies pi as shown in Equation (4.68). However, p on the right side of Equation (4.68) has already been modified by selection and mutation to result in the p shown in Equation (4.76). Therefore, the sequence of selection, mutation, and crossover, results in pi as shown in Equation (4.68), but with the p on the right side of Equation (4.68) replaced by the p resulting from the selection and mutation of Equation (4.76):

Pi(t) = MTdmg(f)p(t - 1)

f

p(t-i)

Ri MTdiag(f)p{t - 1) fTp(t - 1) pT(t - l)dmg(f)MRiMTdi&g(f)p(t - 1)

(fTP(t - I))² ^(4.77) Ri can be replaced with Ri in Equation (4.77) to give an equivalent expression.

Equation (4.77) gives an exact, analytic expression for the dynamics of the propor-tion of Xi individuals in an infinite populapropor-tion.

We see that we need to calculate the dynamic system model of Equation (4.77) for i G [Ι,Ή]) at each generation, where n is the search space size. The matrices in Equation (4.77) are n x n, and the computational effort of matrix multiplication is proportional to n3 if implemented with standard algorithms. Therefore, the dynamic system model requires computation on the order of n4. This is much less computational effort than the Markov model requires, but it still grows very rapidly as the search space size n increases, and it is still requires unattainable computational resources for even moderately-sized problems.

■ EXAMPLE 4.14

Once again we consider the three-bit one-max problem (see Example 4.9) in which each individual's fitness is proportional to the number of ones. We use a crossover probability of 90%, a mutation probability of 1% per bit, a population size of 1,000, and an initial population proportionality vector of

p(0) = [ 0.8 0.1 0.1 0.0 0.0 0.0 0.0 0.0 ]T. (4.78) Figure 4.7 shows the percent of optimal individuals in the population from a

single simulation, along with the exact theoretical results of Equation (4.77).

The simulation results match the theory nicely, but the simulation results are approximate and will vary from one run to the next, while the theory is exact.

Now suppose that we change the initial population proportionality vector to

p(0) = [ 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.8 ]T. (4.79)

SECTION 4.5: DYNAMIC SYSTEM MODELS OF GENETIC ALGORITHMS 9 1

Figure 4.8 shows the percent of least fit individuals from a single simulation, along with the exact theoretical results. Since the probability of obtaining a least-fit individual is so low, the simulation results show a couple of spikes in the graph due to random mutations. The spikes look large given the graph scale, but they are actually quite small, peaking at only 0.2%. The theoret-ical results, however, are exact. They show that the proportion of least-fit individuals initially increases for a few generations due to mutation, and then quickly decreases to the steady-state value of precisely 0.00502%. It would take many, many simulations to arrive at this conclusion. Even after thou-sands of simulations, the wrong conclusion may be reached, depending on the integrity of the random number generator that is used.

iyo^uU^^

II

( H JT \ y v * y * ' * r " V s ;

simulationl

« -theory

generation 40

Figure 4.7 Proportion of most-fit individuals for Example 4.14.

0.15 to Φ

o 0.1 j Έ

0.05

simulation1

theory

0 20 40 60 80 generation

Figure 4.8 Proportion of least-fit individuals for Example 4.14.

4.6 CONCLUSION

In this chapter we outlined Markov models and dynamic system models for GAs.

These models, which were first developed in the 1990s, give theoretically exact results, whereas simulations change from one run to the next due to the random number generator that is used for selection, crossover, and mutation. The size of the Markov model increases factorially with the population size and with the search space cardinality. The dynamic system model increases with n⁴, where n is the search space cardinality. These computational requirements restrict the application of the Markov models and dynamic system models to very small problems. However, the models are still useful for comparing different implementations of GAs and for comparing different EAs, as we see in [Simon et al., 2011b]. Some additional ideas and developments along these directions can be found in [Reeves and Rowe, 2003], [Vose, 1999].

Markov modeling and dynamic system modeling are very mature fields and many general results have been obtained. There is a lot of room for the additional appli-cation of these subjects to GAs and other EAs.

Other methods can also be used to model or analyze the behavior of GAs. For example, the field of statistical mechanics involves averaging many molecular par-ticles to model the behavior of a group of molecules, and we can use this idea to model GA behavior with large populations [Reeves and Rowe, 2003, Chapter 7].

We can also use the Fourier and Walsh transforms can to analyze GA behavior [Vose and Wright, 1998a], [Vose and Wright, 1998b]. Finally, we can use Price's selection and covariance theorem to mathematically model GAs [Poli et al., 2008, Chapter 3].

The ideas presented in this chapter could be applied to many other EAs besides GAs. We do this in [Simon et al., 2011a] and [Simon, 2011a] for biogeography-based optimization, and other researchers apply these ideas to other EAs, but there is still a lot of room for the application of Markov models and dynamic system models to EAs. This would allow for comparisons and contrasts between various EAs on an analytical level, at least for small problems, rather than reliance on simulation.

Simuation is necessary in our study of EAs, but it should be used to support theory.

PROBLEMS 93

PROBLEMS

Written Exercises

4.1 How many schemata of length 2 exist? How many of them are order 0, how

Dans le document EVOLUTIONARY OPTIMIZATION ALGORITHMS (Page 116-127)