Conclusions and Future Work - Application and Comparison of Classification Techniques in Contr

Application and Comparison of Classification Techniques in Controlling Credit Risk

5. Conclusions and Future Work

Controlling credit risk is crucial for commercial banks to identify the clients that will probably breach their contracts in the future. Although the credit rating system provides an effective tool, it is not possible to rate all the clients and repeat the rating frequently. Data mining and computational intelligence, especially classification techniques, can be applied to learn and predict the credit rating automatically, thus helping

commercial banks detect the potential high-risk clients in an accurate and timely manner.

A comprehensive examination of several well-known classifiers is described in this chapter. All these classifiers have been applied to 244 rated companies mainly from the Industrial and Commercial Bank of China. The results revealed that traditional statistical models had the poorest outcomes, and that C4.5 and SVM did not achieve a satisfactory performance as expected. On the other hand, CBA, an associative classification technique, seemed to be the most appropriate choice.

Future work may focus on collecting more data for experiments and applications, particularly with more exploration of Chinese credit rating data structures. In this chapter, feature selection/transformation methods such as ANOVA or PCA analysis are found independent of these classification methods and did not lead to improvements of their prediction abilities. An investigation in the future might be to apply another type of feature selection methods, which are dependent on the classification algorithms, in order to find out the best feature combination for each classifier.

Acknowledgements

The work was partly supported by the National Natural Science Foundation of China (79925001/70231010/70321001), and the MOE Funds for Doctoral Programs (20020003095).

References

Agrawal, R., Srikant, R. (1994). A fast algorithm for mining association rules.

The 20th International Conference on Very Large Data Bases, Santiago, Chile.

Agrawal, R., Imielinski, T., Swami, A. (1993). Mining association rules between sets of items in large databases. The ACM SIGMOID Conference on Management of Data, Washington, D.C. U.S.A.

Aha, D., D. Kibler. (1991). Instance-based learning algorithms, Machine Learning, 6, 37-66.

Baesens, B. (2003). Developing intelligent systems for credit scoring using machine learning techniques. Department of Applied Economic Sciences.

Leuven, Belgium, Leuven University: 221.

Biggs, D., Ville, B. (1991). A method of choosing mulitway partitions for classification and decision trees, Applied Statistics, 18, 49-62.

Bishop, C.M. (1995). Neural networks for pattern recognition, Oxford University Press.

Breiman, L., Friedman, J., Olshen, R., Stone, C. (1984). Classification and Regression trees. Wadsworth and Brooks: Monterey, CA, U.S.A.

Burges, C. J. C. (1998). A tutorial on Support Vector Machines for pattern recognition, Data Mining and Knowledge Discovery, 2(2), 121-167.

Chang, C. -C., Lin, C.-J. (2003). LIBSVM: a library for support vector machines, (URL: http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf).

Chen, G. Q., Liu, H. Y., Yu, L., Wei, Q., Zhang, X. (2007). A New Approach to Classification Based on Association Rule Mining. Decision Support Systems, (to appear).

Cristianini, N., Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines, Cambridge University Press.

Dong, G., Zhang, X., Wong, L., Li, J. (1999). CAEP: Classification by aggregating emerging patterns. 2nd International Conference on Discovery Science,(DS’99), Lecture Notes in Artificial Intelligence 1721, Tokyo, Japan, Springer-Verlag.

Duda, R. O., Hart, P. E., Stork, D. G. (2001). Pattern Classification, John Wiley and Sons.

Dutta, S., S. Shekhar. (1988). Bond rating: a non-conservative application of neural networks. IEEE International Conference on Neural Networks.

Ederington, H. L. (1985). Classification models and bond ratings, Financial Review, 20(4): 237-262.

Fayyad, U. M., and Irani, K. B. (1993). Multi-interval discretization of continuous valued attributes for classification learning. the Thirteenth International Joint Conference on Artificial Intelligence (IJCAI), Morgan Kaufmann: Chambery, France.

Friedman, N., Geiger, D., Goldszmidt, M. (1997). Bayesian network classifiers.

Machine Learning, 29, 131-163.

Galindo, J., and Tamayo, P. (2000). Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications, Computational Economics, 15(1-2), 107-143.

Guo, M. H. (2003). Credit rating, China Renmin University Press, China.

Han, J., and Kamber, M. (2001). Data Mining: Concepts and Techniques, Morgan Kaufmann: San Francisco, CA, U.S.A.

Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning, Data Mining, Inference, and Prediction, Springer.

Haykin, S. (1998). Neural networks: a comprehensive foundation, Prentice Hall.

Hosmer, D. W., and Lemeshow, S. (2000). Applied logistic regression. Wiley:

New York, NY, U.S.A.

Hsu, C. -W., and Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines, IEEE Transactions on Neural Networks, 13(2), 415-425.

Huang, Z., Chen, H. Hsu, C.-J., Chen, W.-H., and Wu, S. (2004). Credit rating analysis with support vector machines and neural networks: a market comparative study, Decision Support Systems, 37(4), 543-558.

Johnson, R. A., and Wichern, D. W. (2002). Applied multivariate statistical analysis. Prentice Hall: Upper Saddle River, N.J., U.S.A.

Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data, Applied statistics, 29, 119-127.

Keerthi, S., Shevade, S., Bhattacharyya, C., and Murthy, K, (1999).

Improvements to Platt’s SMO algorithm for SVM classifier design. Banglore, India, Dept. of CSA.

Kim, J. W. (1993). Expert systems for bond rating: a comparative analysis of statistical, rule-based and neural network systems, Expert Systems, 10, 167-171.

Langley, P., Iba, W., Thompson, K. (1992). An analysis of Bayesian classifiers, the tenth National Conference on Artificial Intelligence (AAAI’92), San Jose, CA, U.S.A, AAAI Press.

Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining, the 4th International Conference on Discovery and Data Mining, New York, NY, U.S. A.

Liu, W., Han, J., Pei. J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules, ICDM’01, San Jose, CA, U.S.A.

Long, E. R. D., Long, D. M. D., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach, Biometrics, 44, 837-845.

Maher, J. J., and Sen, T. K. (1997). Predicting bond ratings using neural networks: a comparison with logistic regression, Intelligent Systems in Accounting, Finance and Management, 6, 59-72.

Moody, J., and Utans, J. (1995). Architecture selection strategies for neural networks application to corporate bond rating, Neural Networks in the Capital Markets, 277-300.

Osuna, E., Freund, R. and Girosi, F. (1997). Improved training algorithm for Support Vector Machines, Proc. IEEE NNSP ’97.

Pearl, J. (1988). Probabilistic reasoning in Intelligent Systems: networks for plausible inference, Morgan Kaufmann: San Francisco, CA, U.S.A.

Pinches, G. E., and Mingo, K. A. (1973). A multivariate analysis of industrial bond ratings, The journal of finance, 28(1): 1-18.

Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization, in Advances in Kernel Methods - Support Vector Learning, B. Scholkopf, C. Burges and A. J. Smola (Eds.). The MIT Press:

Cambridge, MA, U.S.A.

Platt, J. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods, in Advances in Large Margin

Classifiers, A.J. Smola, P.L. Bartlett, B. Scholkopf and D. Schuurmans (Eds.), The MIT Press: Cambridge, MA, U.S.A.

Platt, J. C., Cristianini, N. and Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification, Advances in Neural Information Processing Systems, 12, 547-533.

Provost, F., and Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. 3rd International Conference on Knowledge Discovery and Data Mining, AAAI Press: Huntington Beach, CA, U.S.A.

Quinlan, J. R. (1993). C4.5 programs for machine learning, Morgan Kaufmann:

San Francisco, CA, U. S. A.

Ripley, B. D. (1996). Pattern Recognition and Neural Networks, Cambridge University Press.

Sacha, J. (1999). jBNC: Bayesian Network Classifier Toolbox. (Online Version:

http://jbnc.sourceforge.net/).

Shin, K.-S., and Han, I. (1999). Case-based reasoning supported by genetic algorithms for corporate bond rating, Expert Systems with Applications, 16, 85-95.

Shin, K.-S., and Lee, Y. J. (2002). A genetic algorithm application in bankruptcy prediction modeling, Expert Systems with Applications, 23, 321-328.

Singleton, J. C., and Surkan, A. J. (1990). Neural networks for bond rating improved by multiple hidden layers, in Proc. Of IEEE International Conference on Neural Networks.

Swets, J. A., and Pickett, R.M. (1982). Evaluation of Diagnostic Systems:

Methods from Signal Detection Theory. Academic Press: New York, NY, U.

S. A.

Vapnik, V. (1995). The nature of statistical learning theory, Springer-Verlag, New York, NY, U.S.A.

Wang, K., and Zhou, S. (2000). Growing decision trees on support-less association rules, in KDD’00, Boston, MA, U. S. A.

Weston, J., and Watkinsm C. (1999). Multi-class support vector machines, in the 7th European Symposium on Artificial Neural Networks, Brussels, Belgium.

Yang, L., Hua, L. and Yu, W. B. (2003). Credit risk management in Bank:

Theory, Technology and Practice, Economic Management Press.

Yin, X., and Han, J. (2003). CPAR: Classification based on predictive association rules, in the 2003 SIAM International Conference on Data Mining (SDM’03), San Francisco, CA, U.S.A.

Authors’ Biographical Statements

Lan Yu graduated from the School of Economics and Management, Tsinghua University (Beijing, China), in 2006 with a doctoral degree in management. In recent years he has been doing research on data mining, focusing on the improvement and application of classification techniques.

His research publications have appeared in several international journals including Decision Support Systems, and Expert Systems with Applications. Dr. Yu is currently working as a post-doctoral researcher at Tsinghua University’s Department of Computer Science and Technology, in expanding the business intelligence knowledge to the banks in China.

Guoqing Chen received his PhD from the Catholic University of Leuven (K.U. Leuven, Belgium), and now is the EMC² Chair Professor of information systems at the School of Economics and Management, Tsinghua University (Beijing, China). His research interests include information systems management, business intelligence and decision support, and soft computing.

He has published internationally and served as area editor/associate editor/editorial board member for international journals such as Information Sciences, Information Processing & Management, Journal of Strategic Information Systems, Information & Management, Fuzzy Sets and Systems, etc. Prof. Chen is the founding president of Association for Information Systems (AIS) China Chapter (CNAIS), and served as chair/co-chair for several international conferences including IFSA2005 World Congress, IEEE ICEBE2005, IESM2007, etc.

Andy Koronios earned his PhD from the University of Queensland (Brisbane, Australia), and now is a professor of information systems at the School of Computer & Information Science, University of South Australia (Adelaide, Australia).

His research interests include electronic commerce, data quality and security, multimedia and online learning systems. He has a major role in the CRC for Integrated Engineering Asset Management (CIEAMP) as a research program leader in the area of systems integration and IT for

assets management. Professor Koronios has numerous publications in international journals, edited volumes and conference proceedings.

Shiwu Zhu received his PhD from the Shanghai University of Finance and Economics (Shanghai, China), and currently is an associate professor of finance at the School of Economics and Management, Tsinghua University (Beijing, China). His research interests include fixed income, risk management, credit derivative pricing, and financial database. Dr.

Zhu has been the Principal Investigator for a number of research grants including the research grant awarded by the National Natural Science Foundation of China (NSFC).

Xunhua Guo received his doctoral degree from Tsinghua University (Beijing, China) in 2005, and currently he is an assistant professor of information systems at the School of Economics and Management, Tsinghua University. His research interests include information systems and organizational evolution, systems analysis and design, and data management. His academic publications have appeared in international journals such as Communications of the ACM, Information Sciences, Journal of Enterprise Information Systems etc. He has co-authored books on information systems management, and co-developed a case recently on Digital China published by Harvard Business School in 2007. Dr. Guo has served as a Co-Chair for the International Conference on Industrial Engineering and Systems Management (IESM2007).

147

Predictive Classification with Imbalanced

Dans le document Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications (Page 169-177)