Code Growth Control Using Alternative Crossover Mechanisms

It has been recognised from the early stages in GP that the choice of genetic recombination operator can influence the growth in code size. Sims (1993) was an early attempt to limit code growth using a set of mutation operators that were designed to bias against an increase in code size. Rosca and Ballard (1996) approached the problem from a different angle, suggesting that the genetic operators themselves change dynamically to deal with increases in the size of members of the population selected for recombination. Their approach focuses on the effect of varying the probability of destructive genetic operators relative to the complexity of structures present in the population.

Standard crossover should not be regarded as the best method of genetic recombination simply because it is the most widely used. Alternate forms are possible, but existing off-the-shelf GP systems may only provide standard crossover (or perhaps also mutation), and implementing nonstandard forms of crossover can be time consuming and may produce no extra benefits for a data mining project.

However, research is taking crossover in the direction of domain-specific forms of crossover (Langdon, 2000) and it is likely that in the future crossover may be more directed to the domain of the problem.

SUMMARY

This chapter has outlined areas in which GP may be successfully applied to data mining including clustering and categorisation. GP has faced problems with scaling and these have been investigated. Some of the recent research into identifying why the scaling problem exists and what can be done to improve the performance of GP on harder problems has also been described.

GP can be a potentially useful tool for data mining applications, but its strengths and weaknesses need to be clearly understood. This chapter describes some of the common problems that are encountered in using GP and suggests potential solutions.

Code growth is identified as a major problem GP and four methods of attack are described in increasing order of difficulty. In trying to apply GP to any problem, it is important to identify the method that is most likely to lead to an enhanced search capability and apply it. However, it is important also to recognise that some methods of enhancing search are easier to implement than others and it is useful to try easy

methods first. In the future many of these methods are almost certain to be provided without any explicit effort by the user.

REFERENCES

Altenberg, L. (1994). The evolution of evolvability in genetic programming. In Kinnear K.J.

(ed.) Advances in genetic programming, pp. 47-74. MIT Press, Cambridge, Mass.

Angeline P.J.(1997). Subtree Crossover: Building Block Engine or Macromutation? Ge-netic Programming 1997: Proceedings of the Second Annual Conference, pp. 9-17, Morgan Kaufmann, San Francisco.

Banzhaf W. ,Nordin P., Keller R. E. and Francone, F. D. (1998). Genetic Programming: An Iintroduction. Morgan Kaufmann, San Francisco.

Barnum, H., Bernstein H.J. and Spector L. (2000). Quantum circuits for OR and AND of Ors Journal of Physics A: Mathematical and General, 33(45), 8047-8057.

Bergstrom, A., Jaksetic, P. and Nordin P. (2000). Acquiring textual relations automatically on the web using genetic programming. Genetic Programming: Proceedings of EuroGP’2000, LNCS, Vol. 1802, pp. 237-246, Springer-Verlag, 2000.

Choenni, S. (1999) On the suitability of genetic-based algorithms for data mining Advances in Database Technologies, LNCS, Vol. 1552, pp. 55-67, Springer-Verlag, 1999.

Davidson, J.W., Savic, D.A. and Walters, G.A. (2000). Rainfall runoff modelling using a new polynomial regression method. Proceedings of the 4^th International Conference on HydroInformatics.

Duffy, J. and Engle-Warnick, J. (1999). Using symbolic regression to infer strategies from experimental data. Proceedings of the fifth International Conference: Computing in Economics and Finance, 150-155.

Eggermont, J. and Van Hemert, J.I. (2001). Adaptive Genetic Programming Applied to New and Existing Simple Regression Problems. In Miller J. et al. (Eds.) Genetic Program-ming: Proceedings of the 4^th European Conference on Genetic Programming, EUROGP2001, Lake Como, Italy. pp. 23-35. Lecture Notes in Computer Science Number 2038. Springer-Verlag, Berlin.

Evett, M. and Fernandez, T. (1998). Numeric Mutation Improves the discovery of Numeric Constants by Genetic Programming. In Koza, J.R. et al. (Eds.) Genetic Programming 98:

Proceedings of the Third International Conference, pp. 223-231. Morgan Kaufmann.

Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996). From data mining to knowledge discovery: an overview. In Fayyad U., Piatetsky-Shapiro G., Smyth P. and Uthurusamy R. (eds.) Advances in Knowledge Discovery and Data Mining. pp. 1-32. AAAI Press, Cambridge, Mass. Regression in GP. Evolutionary Programming VII. Proceedings of the 7^th Annual Conference.

Freitas, A. (1997). A genetic programming framework for two data mining tasks: classifi-cation and generalised rule induction. In Koza J.R. et al. (eds.) Genetic Programming 1997: Proceedings of the second International Conference, pp. 96-101, Morgan Kaufmann.

Gathercole, C. and Ross, P. (1996). An adverse interaction between crossover and restricted tree depth in genetic programming. In Koza J.R. et al. (eds.) Proceedings of the First International Conference on Genetic Programming. pp. 291-296. MIT Press, Cam-bridge, Mass.

Geum, Yong Lee (1999). Genetic recursive regression for modelling and forecasting

real-world chaotic time series. In Spector L. (ed.) Advances in Genetic Programming 3, pp.

401-423, MIT Press, 1999.

Howley, B. (1996) Genetic programming of near-minimum-time spacecraft attitude ma-neuvers Genetic Programming 1996: Proceedings of the First Annual Conference, pp.

98-106, MIT Press, Cambridge, Mass.

Iba, H., de Garis, H. and Sato, T. (1994) Genetic programming using a minimum description length principle. In Kinnear K.E. (ed.) Advances in Genetic Programming, pp. 265-284 MIT Press, Cambridge, Mass.

Kaboudan, M.A. (1999) Genetic evolution of regression models for business and economic forecasting. Proceedings of the Congress on Evolutionary Computation, Vol. 2, pp.

1260-1268. IEEE Press.

Kinnear, K.E. (1993) Generality and difficulty in genetic programming: Evolving a sort. In Forrest S. (ed.) Proceedings of the Fifth International Conference on Genetic Algo-rithms, pp. 287-294. University of Illinois at Urbana-Champaign. Morgan Kaufmann, San Francisco, CA.

Koza, J.R. (1992). Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, Mass.

Langdon, W.B. (2000) Size fair and homologous tree genetic programming crossovers.

Genetic Programming and Evolvable Machines, 1,2, pp. 95-119.

Langdon, W.B. and Poli, R. (1997b). Fitness causes bloat. Technical Report CSRP-97-09 University of Birmingham, School of Computer Science.

Langdon, W.B. and Buxton, B.F. (2001). Evolving Receiver Operating Characteristics for Data Fusion. In Miller, J. et al. (Eds.) Genetic Programming: Proceedings of the 4^th European Conference on Genetic Programming, EUROGP2001, Lake Como, Italy. pp.

23-35. Lecture Notes in Computer Science Number 2038. Springer-Verlag, Berlin.

Luke, S. (2000). Issues in Scaling Genetic Programming: Breeding Strategies, Tree Generation, and Code Bloat. PhD Thesis, Department of Computer Science, University of Maryland.

Marmelstein, R.E. and Lamont, G. (1998). Pattern classification using a hybrid genetic program-decision tree approach. In Koza J.R. et al. (Eds.) Genetic Programming 98:

Proceedings of the Third International Conference, pp. 223-231. Morgan Kaufmann.

McKay, B., Willis, M.J. and Barton, G.W. (1997). Steady-state modelling of chemical process systems using genetic programming. Computers and Chemical Engineering, 21(9), 981-996.

Nachbar, R.B. (2000). Molecular evolution: automated manipulation of hierarchical chemi-cal topology and its application to average molecular structures Genetic Programming And Evolvable Machines, 1(1/2), 57-94.

Nordin, P. and Banzhaf, W. (1995). Complexity compression and evolution. In Eshelman L.J. (Ed.) Proceedings of the Sixth International Conference on Genetic Algorithms, pp.

310-317. Morgan Kaufmann.

Piatetsky-Shapiro, G. and Matheus, C. (1994). The interestingness of deviations. In Proceedings of KDD-1994. Fayyad U. and Uthurusamy R. (eds.) Technical Report WS-03. Menlo Park Ca, AAAI Press.

Reynolds, C.W. (1994). Evolution of Obstacle Avoidance Behaviour: Using Noise to Promote Robust Solutions. In Kinnear K.E. (ed.) Advances in Genetic Programming, pp.

221-242 MIT Press, Cambridge, Mass.

Rosca, J.P. and Ballard, D.H. (1996). Complexity drift in evolutionary computation with tree representations. Technical Report NRL5, University of Rochester.

Ross, Q. J. (1993). C4.5 – Programs for machine learning. Morgan Kaufmann, 1993.

Silberschatz, A. and Tuzhilin, A. (1995) On subjective measures of interestingness in knowledge discovery. In Proceedings of KDD-95: First International Conference on Knowledge Discovery and Data Mining, pp. 275-281. Menlo Park, Ca: AAAI.

Sims, K. (1993) Interactive evolution of equations for procedural models. The Visual Computer,9 pp. 466-476, 1993.

Smith, P.W.H. and Harries, K. (1998). Code growth, introns and alternative selection schemes. In Evolutionary Computation. Cambridge: MIT Press. 6(4), 339-360.

Smith, P.W.H. (1999) Controlling code growth in genetic programming. In John R. and Birkenhead R. (eds.) Advances In Soft Computing, pp. 166-171. Physica-Verlag, 1999.

Soule, T. and Foster, J.A. (1997). Code size and depth flows in genetic programming. In Koza J.R. et al. (eds.) Genetic Programming 1997: Proceedings of the Second Interna-tional Conference, pp. 313-320, Morgan Kaufmann.

Soule, T., Foster, J.A. and Dickinson, J. (1996). Code growth in genetic programming. In John R. Koza et al. (eds.) Genetic Programming 1996: Proceedings of the First International Conference, pp. 215-223, Stanford Unversity. MIT Press, Camb, Mass.

Taylor, J., Goodacre, R., Winson, M.K., Rowland, J.J., Gilbert ,R.J. and Kell, D.B. (1998).

Genetic programming in the interpretation of fourier transform infrared spectra: quan-tification of metabolites of pharmaceutical importance. In Koza J.R. et al. (eds.)Genetic Programming 98: Proceedings of the Third International Conference, 377-380. Morgan Kaufmann.

Zhang, B. T. and Mühlenbein, H. (1995). Balancing accuracy and parsimony in genetic programming. Evolutionary Computation. Cambridge: MIT Press. 3(1), 17-38.

Chapter IX

A Building Block Approach

Dans le document Data Mining: A Heuristic Approach (Page 180-184)