Function Set - THE FUNDAMENTALS OF GENETIC PROGRAMMING

A precursor to modern GP was the variable-length GA developed by Stephen Smith in his 1980 doctoral dissertation [Smith, 1980], in which each individual in

7.2 THE FUNDAMENTALS OF GENETIC PROGRAMMING

7.2.4 Function Set

What is the function set for the evolving computer programs? This set describes the functions that can appear at the non-terminal nodes of the syntax trees, such as the following.

• Standard mathematical operators can be included in the function set (for example, addition, subtraction, multiplication, division, absolute value).

• Problem-specific functions that we think are important for our particular optimization problem can be included in the function set (for example, ex-ponential functions, logarithmic functions, trigonometric functions, filters, integrators, differentiators).

SECTION 7.2: THE FUNDAMENTALS OF GENETIC PROGRAMMING 1 5 1

Number of Probability of extra variables success (percent)

0 99.8 1 96.6 4 84.0 8 67.0 16 57.0 32 35.0 Table 7.1 GP probability of success after 50 generations for discovering the program x³ + x² + x- The population size was 1,000. The data is obtained from [Koza, 1992, Chapter 24].

• Conditional tests can be included in the function set (for example, greater than, less than, equal to).

• Logic functions can be included in the function set, if we think that they could be applicable to the solution of our particular optimization problem (for example, and, nand, or, xor, nor, not).

• Variable assignment functions can be included in the function set.

• Loop statements can be included in the function set (for example, while loops, for loops).

• Subroutine calls can be included in the function set, if we have a set of pre-defined functions that we have created for our problem.

The syntax trees in Figure 7.3 include five functions: addition, subtraction, mul-tiplication, division, and absolute value. We need to find the right balance in our definition of the function set and the terminal set. The sets need to be large enough to be able to represent a solution to our problem, but if they are too large, then the search space will be so large that the GP will have a hard time finding a good solution.

Some functions need to be modified for GP because the syntax trees evolve might not have legal function arguments. For example, GP could evolve the s-expression (/ x 0), which is division by zero. This would result in a Lisp error, which would cause the GP to terminate. Therefore, instead of using the standard division operator in Lisp, we can define a division operator DIV that protects against division by zero, and that also protects against overflow due to division by a very small number:

(defun DIV (x y) ; define a protected division function (if (< (abs y) e) (return-from DIV 1)) ; return 1 if the divisor is very small return-from DIV (/ x y)) ; else return x/y

(7.4) where e is a very small positive constant, like 10 20. Equation (7.4) shows the Lisp syntax for defining a protected division routine.1 The DIV function returns 1 if the

Note that any text following a semicolon in a Lisp function is interpreted as a comment.

divisor has a very small magnitude. We may need to redefine other functions in a similar way (logarithm functions, inverse trigonometric functions, and so on) to make sure that the functions in our function set can handle all possible inputs.

7.2.5 Initialization

How should we generate the initial population of computer programs? We have two basic options for initialization, which are referred to as the full method and the grow method. We can also combine these options to get a third option, which is referred to as the ramped half-and-half method [Koza, 1992].

The full method creates programs such that the number of nodes from each terminal node to the top-level node is D^c, a user-specified constant. D^c is called the depth of the syntax tree. As an example, Parent 1 in Figure 7.3 has a depth of three, while Parent 2 has a depth of four. Parent 1 in Figure 7.3 is a full syntax tree because there are three nodes from each terminal node to the top-level addition node. However, Parent 2 is not a full syntax tree because some of the program branches have a depth of four while others only have a depth of three.

We can use recursion to generate random syntax trees. For example, if we want to generate a syntax tree with a structure like Parent 2 in Figure 7.3, we first generate the subtraction node at the top level and note that it requires two arguments. For the first argument, we generate the multiplication node and note that it requires two arguments. This process continues for each node and each argument until we have generated enough levels to reach the desired depth. When we reach the desired depth, we generate a random terminal node to complete that branch of the syntax tree. Figure 7.6 illustrates the concept for a recursive algorithm that generates random computer programs. We can generate a random syntax tree by calling routine GrowProgramFull(Dc, 1), where D^c is our desired syntax tree depth.

GrowProgramFull calls itself each time it needs to add another layer in its growing syntax tree.

The grow method of initialization creates programs such that the number of nodes from each terminal node to the top-level node is less than or equal to Dc. If the parents in Figure 7.3 were created by random initialization, then Parent 1 might have been generated with either the full method or the grow method, while Parent 2 was definitely generated with the grow method since it is not a full syntax tree.

The grow method can be implemented the same way as the full method, except that when we generate a random node at depths less than Dc, either a function or terminal node can be generated. If a function node is generated, the syntax tree continues to grow. As with the full method, when we reach the maximum depth Dc, we generate a random terminal to complete that branch of the syntax tree.

Figure 7.7 illustrates the concept for a recursive algorithm that generates random computer programs with the grow method.

The ramped half-and-half method generates half of the initial population with the full method, and half with the grow method. Also, it generates an equal number of syntax trees for each value of depth between 2 and D^c, which is the maximum allowable depth specified by the user. Figure 7.8 illustrates the concept of ramped half-and-half syntax tree initialization.

SECTION 7.2: THE FUNDAMENTALS OF GENETIC PROGRAMMING 1 5 3

function [SyntaxTree] = GrowProgramPull(Depth, NumArgs) SyntaxTree <- 0

For i = 1 to NumArgs If Depth = 1

SyntaxTree <— Random terminal else

NewFunction «— Randomly chosen function

NewNumArgs <— Number of arguments required by NewFunction

SyntaxTree «— (NewFunction + GrowProgramFull(Depth—1, NewNumArgs)) Next i End

Figure 7.6 A conceptual view of a recursive algorithm to grow random syntax trees in s-expression form with the full method. This routine is initially called with the syntax GrowProgramFull(Z)c, 1), where D^c is the desired depth of the random syntax tree. The plus operator indicates string concatenation. Note that this algorithm is conceptual; it does not include all of the details required for valid syntax tree generation, such as correct parenthesis placement.

function [SyntaxTree] = GrowProgramGrow(Depth, NumArgs) SyntaxTree <- 0

For i — 1 to NumArgs If Depth = 1

SyntaxTree <— Random terminal else

NewNode <— Randomly chosen function or terminal If NewNode is a terminal

SyntaxTree «- (SyntaxTree + NewNode) else

NewNumArgs <- Number of arguments required by NewNode

SyntaxTree «— (NewNode + GrowProgramGrow(Depth— 1, NewNumArgs)' End End

Next i

Figure 7.7 A conceptual view of a recursive algorithm to grow random syntax trees in s-expression form with the grow method. This routine is initially called with the syntax GrowProgramGrow(Dc, 1), where D^c is the desired depth of the random syntax tree. As with Figure 7.6, the plus operator indicates string concatenation, but this algorithm does not include all of the details required for implementation.

Dc = maximum syntax tree depth TV = population size

For i = 1 to N

Depth <-U[2,D^c] r <- t/[0,1]

If r < 0.5

SyntaxTree(z) <— GrowProgramGrow(Depth, 1) else

SyntraxTree(i) <— GrowProgramFull(Depth, 1) Next i End

Figure 7.8 Algorithm to create an initial GP population with the ramped half-and-half method. U[2,D^C] is a random integer uniformly distributed on [2, Dc], and C/[0,1] is a random real number uniformly distributed on [0,1]. This algorithm calls the routines of Figures 7.6 and 7.7.

Koza experimented with the three different types of initializations described above for some simple GP problems [Koza, 1992, Chapter 25]. He found a difference in the probability of GP success depending on which initialization method was used, as shown in Table 7.2. The table shows that the ramped half-and-half initialization method is generally much better than the other two initialization methods.

Problem Full Grow Ramped Half-and-Half Symbolic Regression 3% 17% 23%

Boolean Logic 42% 53% 66%

Artificial Ant 14% 50% 46%

Linear Equation 6% 37% 53%

Table 7.2 GP probability of success for various problems and various initialization methods. This data is obtained from [Koza, 1992, Chapter 25].

To conclude our discussion of initialization, we note that it is often advantageous to seed the initial population of an EA with some known good individuals. These good individuals may be user-generated individuals, or they may come from some other optimization algorithm or other source. However, seeding does not neces-sarily improve EA performance. If there are only a few good individuals in the initial population, and the rest of the individuals are relatively poor randomly-generated individuals, then the few good individuals could dominate the selection process, and the poor individuals might quickly die out. This could result in an evolutionary dead end and premature convergence, otherwise known as "survival of the mediocre" [Koza, 1992, page 104]. However, the chances that this negative event occurs depends on the type of selection that we use (see Section 8.7). If we use roulette-wheel selection, then selection pressure is high and a few fit individu-als are likely to quickly dominate the population. If we use tournament selection,

SECTION 7.2: THE FUNDAMENTALS OF GENETIC PROGRAMMING 1 5 5

then selection pressure is much lower and the probability of a few fit individuals dominating the population is correspondingly lower.

Dans le document EVOLUTIONARY OPTIMIZATION ALGORITHMS (Page 184-189)