Genetic Programming Parameters - THE FUNDAMENTALS OF GENETIC PROGRAMMING

A precursor to modern GP was the variable-length GA developed by Stephen Smith in his 1980 doctoral dissertation [Smith, 1980], in which each individual in

7.2 THE FUNDAMENTALS OF GENETIC PROGRAMMING

7.2.6 Genetic Programming Parameters

What are the parameters that control GP execution? These parameters include those that are used for other EAs, but also include GP-specific parameters.

1. We need to specify the selection method by which parents are chosen to par-ticipate in crossover. We could use fitness-proportional selection, tournament selection, or some other method. In fact, we could use any of the selection methods discussed in Section 8.7.

This is also a good place to mention that we could implement tree-based crossover more intelligently than simply selecting random crossover points.

There are some subtrees that are more useful than others, and we may not want to break up those subtrees. We could quantify the fitness of subtrees by obtaining correlations between crossover points and the fitness of child programs, and then using those correlations to bias the selection of future crossover points [Iba and de Garis, 1996].

2. We need to specify the population size. Since there are so many degrees of freedom in computer programs, GP usually has larger populations than other EAs. GP usually has a population size of at least 500, and often has a population size of several thousand.

3. We need to specify the mutation method. Various GP mutation methods have been used over the years, some of which are described as follows.

(a) We can select a random node, and replace everything below that node with a randomly-generated syntax subtree. This is called subtree mu-tation [Koza, 1992, page 106]. This is equivalent to crossing a program with a randomly generated program, and is also called headless chicken crossover [Angeline, 1997].

(b) Expansion mutation replaces a terminal with a randomly-generated tree. This is equivalent to subtree mutation if the replaced node in sub-tree mutation is a terminal.

(c) We can replace a randomly selected node or terminal with a new ran-domly generated node or terminal. This is called point mutation or node replacement mutation, and requires that the arity of the replaced node be equal to the arity of the replacement node.² For example, we could replace an addition operation with a multiplication operation, or we could replace an absolute value operation with a sine operation.

(d) Hoist mutation creates a new program that is a randomly selected sub-tree of the parent program [Kinnear, 1994].

(e) Shrink mutation replaces a randomly chosen syntax subtree with a ran-domly selected terminal [Angeline, 1996a]; this is also called collapse

2The arity of a function is equal to the number of its arguments. For example, a constant has an arity of 0, the absolute value function has an arity of 1, and an addition function can have an arity of two or more.

subtree mutation. Hoist mutation and shrink mutation were originally introduced to reduce code bloat (see Section 7.4).

(f ) Permutation mutation randomly permutes the arguments of a randomly selected function [Koza, 1992]. For example, we could replace the x and y arguments of a division function. Of course, this type of mutation does not have any affect on commutative functions.

(g) We can randomly mutate constants in a program [Schoenauer et al., 1996].

We often implement mutation in such a way that the mutated program re-places the original program only if it is more fit. This idea of replace-only-if-more-fit can be applied to mutation in any EA.

4. We need to specify the mutation probability p^m. This is similar to other EAs.

Mutation in a GP with N individuals is often implemented with a method similar to the following:

For each candidate computer program a^, where i G [l,iV]

Generate a random number r uniformly distributed in [0,1]

If r < pm

Randomly select a node k in computer program Xi Replace the selected subtree starting at node k with a

randomly-generated subtree Next computer program End

The large population size that is used in GP, along with the large number of possible nodes at which crossover can occur, usually means that good GP results do not depend on mutation [Koza, 1992, Chapter 25]. Often we can get good results with p^m = 0. However, mutation may still be desirable just in case an important terminal or function is lost from the population. If that occurs, mutation is the only way that it could re-enter the population.

5. We need to specify the crossover probability p^c. This is similar to G As. After selecting two parents in Figure 7.5, we can either use crossover to combine them, or we can instead clone them for the next generation. The line:

Mate pi and pi to create children c\ and c²

in Figure 7.5 would then be replaced with something like the following:

Generate a random number r uniformly distributed on [0,1]

If r < Pc

Mate p\ and p^ to create children c\ and c2

else

c\ <-pi c2 <~P2 End

SECTION 7.2: THE FUNDAMENTALS OF GENETIC PROGRAMMING 1 5 7

Most experience suggests that crossover is an important aspect of GP and should be used with a probability p^c > 0.9 [Koza, 1992, Chapter 25].

6. We need to decide whether or not to use elitism. As with any other EA, we can save the best m computer programs in GP from one generation to the next to make sure they are not lost in the following generation (see Sec-tion 8.4). The parameter m is called the elitism parameter. Elitism can be implemented in several different ways. For example, we could archive the best m individuals at the end of a generation, create the children for the next generation as usual, and then replace the worst m children with the elites from the previous generation. Alternatively, we could copy the m elites to the first m children each generation, and then create only (TV — m) additional children each generation (where N is the population size).

7. We need to specify D^, the maximum program size of the initial population. A program's size can be quantified by its depth, which measures the maximum number of nodes between the highest level and the lowest level (inclusive).

For example, Parent 1 in Figure 7.3 has a depth of three, while Parent 2 has a depth of four.

8. We also need to specify D^c, the maximum depth of child programs. During GP operation, child programs can grow larger and larger with each succeeding generation. If a maximum depth is not enforced, then child programs can become unreasonably long, wasting space and execution time; this is called GP bloat (Section 7.4). The maximum depth D^c can be enforced in several ways. One way is to replace a child with one of its parents if the child's depth exceeds D^c. Another way is to redo the crossover operation if the child's depth exceeds Dc. Yet another way is to examine the parent syntax trees before choosing their crossover points, and constrain the randomly selected crossover points so that D^c will not be exceeded by the children's depths.

9. We need to decide whether or not we want to allow a terminal node in a syntax tree to be replaced with a subtree during crossover. Figure 7.4 shows that the z terminal in Parent 1 is selected for crossover, and is replaced with a subtree in Child 1. We use pi to denote the probability of crossover at an internal node. When selecting a crossover point, we generate a random number r uniformly distributed on [0,1]. If r is less than pi, then we select a terminal node for crossover; that is, we select a symbol in the syntax tree that is not immediately preceded by a left parenthesis. However, if r is greater than pi, then we select an s-expression for crossover; that is, we select a subtree that is surrounded by matching left and right parentheses for crossover.

10. We need to decide whether or not to worry about duplicate individuals in the population. Duplicate individuals are a waste of computer resources. In EAs with relatively small search spaces or small populations, duplicates can arise quite often, and dealing with duplicates can be an important aspect of the EA (see Section 8.6.1). However, in GP, the search space is so large that duplicates rarely occur. Therefore, we usually do not need to worry about duplicate individuals in GP.

Dans le document EVOLUTIONARY OPTIMIZATION ALGORITHMS (Page 189-192)