CONCLUSIONS AND FUTURE WORK - Data Mining: A Heuristic Approach

As an alternative to GAs, the application of the EDA paradigm to solve the well- known FSS problem on datasets of a large dimensionality has been studied.

As the application of Bayesian networks is discarded in this kind of large dimension-ality tasks, four simple probabilistic models (PBIL, BSC, MIMIC and TREE) have been used within the EDA paradigm to factorize the probability distribution of best individuals of the population of possible solutions. MIMIC and TREE are able to cover interactions between pairs of variables and PBIL and BSC assume the independence among the variables of the problem. We note that using three of these four probabilistic models (BSC, MIMIC and TREE), GA approaches need more generations than the EDA approach to discover similar fitness solutions. We show this behavior on a set of natural and artificial datasets where these three EDA approaches carry out a faster discovery than the other approaches of the feature relationships and the underlying structure of the problem. In this way, when the wrapper approach is used, this fast discovery of high fitness solutions is highly desirable to save CPU time. However, because of the high CPU times needed for the induction of order-two algorithms in the Internet advertisements domain, the CPU time savings produced by this reduction in the number of solutions relative to GA approaches is noticeably reduced.

As future work, we envision the use of other probabilistic models with large dimensionality datasets, models which assume small order dependencies among the variables of the domain. Another interesting possibility is the use of parallel algorithms to induce Bayesian networks in these kinds of tasks (Xiang & Chu, 1999). When dimensionalities are higher than 1,000 variables, research is also needed on the reduction of CPU times associated with the use of probabilistic order-two approaches.

Biological Data Mining is an interesting application area of FSS techniques

(Ben-Dor, Bruhn, Friedman, Nachman, Schummer, Yakhini, 2000). Ever since efficient and relatively cheap methods have been developed for the acquisition of biological data, data sequences of high dimensionality have been obtained. Thus, the application of an FSS procedure is an essential task.

REFERENCES

Aha, D.W., & Bankert, R.L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAI’94 Workshop on Case-Based Reasoning (pp. 106-112).

Alpaydin, E. (1999). Combined 5x2cv F test for comparing supervised classification learning algorithms. Neural Computation, 11, 1885-1982.

Bäck, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford University Press.

Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Pittsburgh, PA: Technical Report CMU-CS-94-163, Carnegie Mellon University.

Baluja, S., & Davies, S. (1997). Using optimal dependency-trees for combinatorial optimization: Learning the structure of the search space. In Proceedings of the Four-teenth International Conference on Machine Learning, 30-38.

Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., & Yakhini, Z. (2000).

Tissue Classification with Gene Expression Profiles. Journal of Computational Biology.

Blum, A.L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245-271.

Cestnik, B. (1990). Estimating Probabilities: A crucial task in Machine Learning. In Proceedings of the European Conference on Artificial Intelligence, 147-149.

Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with depen-dence trees. IEEE Transactions on Information Theory, 14, 462-467.

De Bonet, J.S., Isbell, C.L., & Viola, P. (1997). MIMIC: Finding optima by estimating probability densities. In Advances in Neural Information Processing Systems, Vol. 9.

Cambridge, MA: MIT Press.

Doak, J. (1992). An evaluation of feature selection methods and their application to computer security. Davis, CA: Technical Report CSE-92-18, University of California at Davis.

Etxeberria, R., & Larrañaga, P. (1999). Global Optimization with Bayesian networks. In Proceedings of the Second Symposium on Artificial Intelligence, 332-339.

Friedman, N., & Yakhini, Z. (1996). On the Sample Complexity of Learning Bayesian Networks. In Proceedings of the Twelveth Conference on Uncertainty in Artificial Intelligence, 274-282.

Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning.

Reading, MA: Addison-Wesley.

González, C., Lozano, J.A., & Larrañaga, P. (1999). The convergence behavior of PBIL algorithm: a preliminary approach. Donostia - San Sebastián, Spain: Technical Report EHU-KZAA-IK-3/99, University of the Basque Country.

Harik, G. (1999). Linkage Learning via Probabilistic Modelling in the ECGA. Urbana-Champaign, ILL: IlliGAL Report 99010, University of Illinois at Urbana-Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

Harik, G.R., Lobo, F.G., & Goldberg, D.E. (1997). The compact genetic algorithm. Urbana-Champaign, ILL: IlliGAL Report 97006, University of Illinois at Urbana-Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

Henrion, M. (1988). Propagating uncertainty in Bayesian networks by probabilistic logic sampling. In Uncertainty in Artificial Intelligence, Vol. 2 (pp. 149-163). Amsterdam, The Netherlands: Elsevier Science Publishers, B.V.

Inza, I., Larrañaga, P., Etxeberria, R., & Sierra, B. (2000). Feature Subset Selection by Bayesian network based optimization. Artificial Intelligence, 123 (1-2), 157-184.

Inza, I., Larrañaga, P., & Sierra, B. (2001). Feature Subset Selection by Bayesian networks:

A comparison with genetic and sequential algorithms. International Journal of Approxi-mate Reasoning, 27(2), 143-164.

Jain, A.K., & Chandrasekaran, R. (1982). Dimensionality and sample size considerations in pattern recognition practice. In Krishnaiah, P.R. & Kanal, L.N. (Eds.), Handbook of Statistics, Vol. 2 (pp.835-855). Amsterdam, The Netherlands: North-Holland.

Jain, A., & Zongker, D. (1997). Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2), 153-158.

Kittler, J. (1978). Feature Set Search Algorithms. In Chen, C.H. (Ed.), Pattern Recognition and Signal Processing (pp. 41-60). Alphen aan den Rijn, The Netherlands: Sithoff and Noordhoff.

Kohavi, R,. & John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2), 273-324.

Kohavi, R., Sommerfield, D., & Dougherty, J. (1997). Data mining using MLC++, a Machine Learning Library in C++. International Journal of Artificial Intelligence Tools, 6, 537-566.

Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33, 25-41.

Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, 399-406.

Larrañaga, P., Etxeberria, R., Lozano, J.A., Sierra, B., Inza, I. &, Peña, J.M. (1999). A review of the cooperation between evolutionary computation and probabilistic graphical models. In Proceedings of the Second Symposium on Artificial Intelligence, 314-324.

Larrañaga, P., Etxeberria, R., Lozano, J.A., & Peña, J.M. (2000). Combinatorial Optimiza-tion by Learning and SimulaOptimiza-tion of Bayesian Networks. In Proceedings of the Conference in Uncertainty in Artificial Intelligence, 343-352.

Liu, H., & Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Norwell, MA: Kluwer Academic Publishers.

Miller, A.J. (1998). Subset Selection in Regression. Washington, DC: Chapman and Hall.

Mladenic, D. (1998). Feature subset selection in text-learning. In Proceedings of the Tenth European Conference on Machine Learning, 95-100.

Müehlenbein, H. (1996). The equation for response to selection and its use for prediction.

Evolutionary Computation, 5 (3), 303-346.

Müehlenbein, H., & Mahning, T. (1999). FDA: A scalable evolutionary algorithm of distributions. Binary parameters. Evolutionary Computation, 7 (4), 353-376.

Müehlenbein, H., & Paab, G. (1996). From recombination of genes to the estimation of distributions. Binary parameters. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, 178-187.

Murphy, P. (1995). UCI Repository of machine learning databases. Irvine, CA: University

of California, Department of Information and Computer Science.

Narendra, P., & Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computer, C-26(9), 917-922.

Ng, A.Y. (1997). Preventing ‘Overfitting’ of Cross-Validation Data. In Proceedings of the Fourteenth Conference on Machine Learning, 245-253.

Pelikan, M., Goldberg, D.E., & Cantú-Paz, E. (1999). BOA: The Bayesian Optimization Algorithm. Urbana-Champaign, ILL: IlliGAL Report 99003, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

Pelikan, M., & Müehlenbein, H. (1999). The Bivariate Marginal Distribution Algorithm. In Advances in Soft Computing-Engineering Design and Manufacturing (pp. 521-535).

London, England: Springer-Verlag.

Pudil, P., Novovicova, J., & Kittler, J. (1994). Floating Search Methods in Feature Selection. Pattern Recognition Letters, 15(1), 1119-1125.

Sangüesa, R., Cortés, U., & Gisolfi, A. (1998). A parallel algorithms for building possibilistic causal networks. International Journal of Approximate Reasoning, 18 (3-4), 251-270.

Siedelecky, W., & Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2, 197-220.

Syswerda, G. (1993). Simulated crossover in genetic algorithms. In Whitley, L.D. (Ed.), Foundations of Genetic Algorithms, Vol. 2 (pp. 239-255). San Mateo, CA: Morgan Kaufmann.

Thierens, D., & Goldberg, D.E. (1993). Mixing in Genetic Algorithms. In Proceedings of the Fifth International Conference in Genetic Algorithms, 38-45.

Vafaie, H., & De Jong, K. (1993). Robust feature selection algorithms. In Proceedings of the Fifth International Conference on Tools with Artificial Intelligence (pp. 356-363).

Yang, Y., & Pedersen, J.O. (1997). A Comparative Study on Feature Selection in Text-Categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, 412-420.

Xiang, Y., & Chu, T. (1999). Parallel Learning of Belief Networks in Large and Difficult Domains. Data Mining and Knowledge Discovery, 3 (3), 315-338.

Chapter VI

Towards the

Dans le document Data Mining: A Heuristic Approach (Page 123-127)