• Aucun résultat trouvé

Training RBF Neural Networks on Unbalanced Data

Dans le document Advanced Information and Knowledge Processing (Page 120-127)

4.5 RBF Neural Networks Dealing with Unbalanced Data

4.5.3 Training RBF Neural Networks on Unbalanced Data

unbalanced cases, in which the sample sizes of different classes in a data set are unbalanced. Unbalanced training data may lead to an unbalanced architecture in training. In our work, we add larger weights on minority classes in order to attract more attention in training for the minority members.

Assume that the number of samples in classi isNi. The total number of samples in the data set isN =N1+· · ·+Ni+· · ·+NM. The error function shown in Eq. (4.19) can be written as:

E0(W) =1 During the training of neural networks with unbalanced training data, a general error function such as Eq. (4.16) or Eq. (4.21) cannot lead to a bal-anced classification performance on all classes in the data set because majority classes contribute more compared to minority classes and therefore result in more weight adjustments on majority classes. In supervised training algo-rithms, neural networks are constructed by minimizing a neural network error function whose variables are the network weights connecting layers. Thus, the training procedure has a bias towards frequently occurring classes.

In order to increase the contribution of minority classes in weight ad-justments, we change Eq. (4.21) to:

E(W) =1 DifferentiateE with respect to wmj, and let

4.5 RBF Neural Networks Dealing with Unbalanced Data 113

∂E(W)

∂wmj = 0. (4.24)

Substituting Eq. (4.22) into Eq. (4.24), we obtain:

M We introduce a new parameterrn replacingβi:

rn=βi when Xn∈Ai. (4.26)

Ai is classi. Substitute Eq. (4.26) into Eq. (4.25):

N Similarly as stated in [22], there is the following new pseudo-inverse equation for calculating weightW:

Tφ)WTT =φTT. (4.29)

Different to the pseudo-inverse equation shown in Eq. (4.12), hereφ→ønj√rn, andT →tni√rn.

As indicated in the above equations, we have taken the unbalanced data into consideration when training RBF neural networks. The parametersrn in-troduce biased weights which are opposite to the proportions of classes in a data set. The effect of the weight parametersrnis shown in Sect. 4.5.4. Com-pared with the training method without considering an unbalanced condition in the data, the classification accuracy of the minority classes is improved.

We also allow large overlaps between clusters of the same class to reduce the number of hidden units [102][104].

The modified training algorithm for RBF neural networks, in which small overlaps between clusters of different classes and large overlaps between clusters of the same class are allowed, is used in this section.

4.5.4 Experimental Results

The car evaluation data set in Chap. 3 are used here to demonstrate our algorithm. The data set is divided into three parts, i.e., training, validation,

114 4 An Improved RBF Neural Network Classifier

and test sets. Each experiment is repeated five times with different initial conditions and the average results are recorded.

We generate an unbalanced car data set based on function 5 shown in Chap. 3. There are nine attributes and two classes: Class A and Class B.

Samples which do not meet the conditions of Class A are samples of Class B in the car data set. 4000 patterns are in the training data set and 2000 patterns for the testing data set. There are 507 patterns of class 1 (Class A) in the training data set, and 205 patterns of class 1 in the testing data set. The testing data set is divided into two subsets: the validation set and the testing set with 1000 patterns, respectively. Class A is the minority class. Class B is the majority class.

Comparison between small overlaps and large overlaps among clusters of the same class are shown on classification error rates and the number of hidden units. When allowing large overlaps among clusters of the same class, the number of hidden units is reduced from 328 to 303, and the classification error rate on the test data set is increased slightly from 4.1% to 4.5%.

In table 4.6, the comparison of overall classification error rates between with and without considering the unbalanced condition is shown. Here large overlaps are allowed between clusters with the same class label. It is also shown in Table 4.6, when considering the unbalanced condition in the data set, that the classification error rate of the minority class decreases from 34.65% to 8.73%. At the same time, the error rate of the majority class increases slightly from 1.37% to 4.1%. Since, in most cases, the minority class is embedded with important information, improving the individual accuracy of the minority class is critical.

In this section, a modification [103] is described to the training algo-rithm for the construction and training of the RBF network on unbalanced data by increasing bias towards the minority classes. Weights inversely propor-tional to the number of patterns of classes are given to each class in the MSE function. Experimental results show that the proposed method is effective in improving the classification accuracy of minority classes while maintaining the overall classification performance.

4.6 Summary

In this chapter, we described a modified training algorithm for RBF neural networks, which we proposed earlier [107]. This modified algorithm leads to fewer hidden units while maintaining the classification accuracy of RBF clas-sifiers. Training is carried out without knowing in advance the number of hidden units and without making any assumptions on the data.

We described two useful modifications to Royet al.’s algorithm for the construction and training of an RBF network, by allowing for large overlaps among clusters of the same class and dynamically determining the cluster overlaps of different classes.

4.6 Summary 115 Table 4.6. Comparison of classification error rates of the RBF neural network for each class of the car data set between with and without considering the unbalanced condition when allowing large overlaps between clusters with the same class label (average results of five independent runs). ( c2005 IEEE) We thank the IEEE for allowing the reproduction of this table, first appeared in [103].

Without considering unbalanced condition Overall error rates

Training set Validation set Testing set

1.89% 5.0% 4.8%

Class A

Training set Validation set Testing set

11.69% 27.69% 34.65%

Class B

Training set Validation set Testing set

0.77% 2.41% 1.37%

Considering unbalanced condition Overall error rates Training set Validation set Testing set

1.2% 5.1% 4.5%

Class A

Training set Validation set Testing set

4.27% 4.58% 8.73%

Class B

Training set Validation set Testing set

0.85% 5.15% 4.1%

In RBF neural network classifiers, larger overlaps between different classes lead to higher classification errors. However, large overlaps between clusters with the same class labels will not degrade classification performance since the overlaps occur between clusters of the same class, i.e., the num-ber of hidden units, is reduced and the classification error rate is reduced or maintained by this modification.

The ratio between the number of patterns of a certain class (in-class patterns) and the total number of patterns in the cluster represents the over-laps of different classes. A dynamic parameterθis applied to control the ratio according to the training condition. If the trials for searching for a qualified cluster reach a certain threshold,θwill be decreased for searching clusters.

The two modifications may help reduce detrimental effects from noisy patterns and isolated patterns while maintaining classification performance.

There are two training stages in the training algorithm. By searching for clusters based on the proposed modifications, widths, and centers of Gaussian kernel functions are determined at the first training stage. Weights connecting the hidden layer and the output layer are determined at the second training

116 4 An Improved RBF Neural Network Classifier

stage by the LLS method. Experimental results show that the modifications are effective in reducing the number of hidden units while maintaining or even increasing the classification accuracy.

This new approach can be feasibly used for classification when the underlying distributions of the data are unknown. The accuracy is comparable with Roy et al.’s method [264], but the computational time is greater than Roy’s method. However, based on the experimental results, there seems to be room for further research to speed up the training algorithm. In future work, the present approach could be enhanced by analyzing the relationships among the clusters for improving classification accuracies and reducing computational time.

In addition, a new algorithm is presented for the construction and train-ing of an RBF neural network with unbalanced data. In applications, minority classes with much fewer samples are often present in data sets. The learning process of a neural network is usually biased towards classes with majority populations. Our study focused on improving the classification accuracy of minority classes while maintaining the overall classification performance.

5

Attribute Importance Ranking for Data Dimensionality Reduction

Large-scale data can only be handled with the aids of computers. However, processing commands may need to be entered manually by data analysts and data mining results can be fully used by decision makers only when the re-sults can be understood explicitly. The removal of irrelevant or redundant attributes could benefit us in making decisions and analyzing data efficiently.

Data miners are expected to present discovered knowledge in an easily under-standable way. Data dimensionality reduction (DDR) is an essential part in the data mining processes. Drawn from methods in pattern recognition and statistics, DDR is developed to fulfill objectives such as improving accuracy of prediction models, scaling the data mining models, reducing computational cost, and providing a better understanding of knowledge extracted.

5.1 Introduction

DDR plays an important role in data mining tasks since those semi-automated or automated methods perform better with lower-dimensional data with the removal of irrelevant or redundant attributes compared to higher-dimensional data. Irrelevant or redundant attributes as unuseful information often interfere with useful ones. In the classification task, the main aim of DDR is to reduce the number of attributes used in classification while maintaining an acceptable classification accuracy.

The problem of DDR is to select a subset of attributes which represents the concept of data without losing important information. Feature (attribute) extraction and feature selection are two techniques of DDR. LDA (linear dis-criminant analysis) [168][198] and PCA (principal component analysis) [166]

are common feature extraction methods. However, by the transformation op-eration in feature extraction, new features which are linear or non-linear com-binations of the original features are generated. Unwanted artifacts often come out with the new features. In addition, non-linear transformation is usually not

118 5 Attribute Importance Ranking for Data Dimensionality Reduction reversible, which brings difficulties in understanding data through extracted features.

Feature selection does not generate unwanted artifacts, i.e., feature se-lection is carried out in the original measurement space. This can be achieved by removing redundant or irrelevant attributes without losing the original concept of data.

In optimal feature selection, all possible feature combinations should be inspected. Though some methods are explored to save some work [43], the high computational cost is still a problem unsolved. Under the circumstance, suboptimal feature selection algorithms are an alternative. Though subopti-mal feature selection algorithms do not guarantee the optisubopti-mal solution, the selected feature subset usually leads to a higher performance in the induction system (such as a classifier).

One wishes to find a measure that can determine the irrelevant at-tributes with little computational cost. Consider two samples with different class labels in a data set, which are presented by a set of attributes. There are differences observed in the two samples’ attributes, i.e., there are correlations between attributes and class labels. Irrelevant attributes will not reflect the correlation relationship when changing from one sample to another sample, and the correlations may be used to rank attribute importance.

On the other hand, large class distance is expected in order to dis-tinguish different classes. Irrelevant attributes have no positive influence on separating distinct classes, and the removal of redundant attributes has no negative influence on forming distinct classes. Hence, class separability can be used as a criterion to evaluate attribute importance.

Feature selection can be performed based on the evaluation of attribute importance. Dash et al. [71] proposed an entropy measure to rank attribute importance. In mutual information based feature selection (MIFS) [18][27],

‘the information content’ of each attribute (feature) is evaluated corresponding to classes and the other attributes. However, the number of attributes included in the selected attribute subset has to be predefined, which requires prior knowledge of data. The importance level of attributes is evaluated by the evaluation criterion. Kononenko [180] introduced a Relief-F method to rank attribute importance in order to reduce data dimensionality. In the Relief-F method, for a given instance, nearest neighbors are searched for from each class. The difference in an attribute in each pair of instances is calculated.

The importance level of the attribute is evaluated by the probabilities of these differences.

In this chapter, we describe a novel separability-correlation measure (SCM), which was first proposed in [107] for determining the importance of the original attributes. Then, different attribute subsets obtained based on attribute ranking results are used as inputs to RBF classifiers. The classifi-cation results are used to evaluate the feature subsets in order to reduce the data dimensionality, and the RBF network architecture can be simplified with the reduced attribute subsets. The SCM includes two parts, the intraclass

dis-5.2 A Class-Separability Measure 119

Dans le document Advanced Information and Knowledge Processing (Page 120-127)