Monk3 Data Set - Training a Modiﬁed RBF Neural Network

4.3 Training a Modiﬁed RBF Neural Network

4.4.3 Monk3 Data Set

The Monk3 data set is a collection of a binary classiﬁcation problem over a six-attribute discrete domain. The data set involves learning a binary function deﬁned over this domain, from some training examples of this function.

There are 122 patterns in the training set and 432 patterns in the test set. We divide the test set into 200 patterns for validation and 221 patterns for testing since the size of the validation set aﬀects the generalization per-formance of RBF classiﬁers.

Table 4.3 shows that when large overlaps among clusters of the same class are permitted, the number of hidden units is reduced from 33.2 to 19.6 on average, and the classiﬁcation error rate increases slightly from 5.91% to 6.88%.

Table 4.3.Same as Table 4.1, with Monk3 data set.

Performance Monk3

criteria Small Large overlaps overlaps Training error 4.10% 4.92%

Validation error 8.0% 5.76%

Testing error 5.91% 6.88%

Number of

hidden neurons 33.2 19.6

108 4 An Improved RBF Neural Network Classiﬁer 4.4.4 Breast Cancer Data Set

This breast cancer data set was obtained from Dr. William H. Wolberg of the University of Wisconsin Hospitals, Madison. There are nine attributes and 699 patterns in the Breast cancer data set. 16 patterns with missing attributes are removed. Of the 683 patterns left, 444 were benign and the rest were malignant. In the 683 patterns, 410 patterns are for training, 136 for validation, and 137 for testing.

The attributes are clump thickness, uniformity of cell size, uniformity of cell shape, marginal adhesion, single epithelial cell size, bare nuclei, bland chromatin, normal nucleoli, and mitoses. There are two classes, i.e., benign and malignant.

Table 4.4 shows that when large overlaps among clusters of the same class are permitted, the number of hidden units is decreased from 31 to 11 on average, and the classiﬁcation error rate is reduced from 2.92% to 1.46%.

Table 4.4. Same as Table 4.1, with the breast cancer data set.

Performance Breast Cancer criteria Small Large

overlaps overlaps Training error 2.44% 0.73%

Validation error 2.92% 1.96%

Testing error 2.92% 1.46%

Number of

hidden neurons 31 11

4.4.5 Mushroom Data Set

The mushroom data set includes descriptions of hypothetical samples cor-responding to 23 species of gilled mushrooms in the Agaricus and Lepiota families.

There are 22 nominal attributes and 8124 patterns in the Mushroom data set. Among the 8124 patterns, 4500 patterns are for training, 1812 are for validation, and 1812 are for testing. There are two classes, edible and poisonous.

The attributes are:

1. cap-shape: bell (b), conical (c), convex (x), ﬂat (f), knobbed (k), sunken (s)

2. cap-surface: ﬁbrous (f), grooves (g), scaly (y), smooth (s)

3. cap-color: brown (n), buﬀ (b), cinnamon (c), gray (g), green (r), pink (p), purple (u), red (e), white (w), yellow (y)

4.4 Experimental Results 109 4. bruises?: bruises (t), no (f)

5. odor: almond (a), anise (l), creosote (c), ﬁshy (y), foul (f), musty (m), none (n), pungent (p), spicy (s)

6. gill-attachment: attached (a), descending (d), free (f), notched (n) 7. gill-spacing: close (c), crowded (w), distant (d)

8. gill-size: broad (b), narrow (n)

9. gill-color: black (k), brown (n), buﬀ (b), chocolate (h), gray (g), green (r), orange (o), pink (p), purple (u), red (e), white (w), yellow (y)

10. stalk-shape: enlarging (e), tapering (t)

11. stalk-root: bulbous (b), club (c), cup (u), equal (e), rhizomorphs (z), rooted (r), missing (?)

12. stalk-surface-above-ring: fibrous (f), scaly (y), silky (k), smooth (s) 13. stalk-surface-below-ring: fibrous (f), scaly (y), silky (k), smooth (s) 14. stalk-color-above-ring: brown (n), buff (b), cinnamon (c), gray (g), orange

(o), pink (p), red (e), white (w), yellow (y)

15. stalk-color-below-ring: brown (n), buﬀ (b), cinnamon (c), gray (g), orange (o), pink (p), red (e), white (w), yellow (y)

16. veil-type: partial (p), universal (u)

17. veil-color: brown (n), orange (o), white (w), yellow (y) 18. ring-number: none (n), one (o), two (t)

19. ring-type: cobwebby (c), evanescent (e), ﬂaring (f), large (l), none (n), pendant (p), sheathing (s), zone (z)

20. spore-print-color: black (k), brown (n), buﬀ (b), chocolate (h), green (r), orange (o), purple (u), white (w), yellow (y)

21. population: abundant (a), clustered (c), numerous (n), scattered (s), sev-eral (v), solitary (y)

22. habitat: grasses (g), leaves (l), meadows (m), paths (p), urban (u), waste (w), woods (d)

Table 4.5 shows that when large overlaps among clusters of the same class are permitted, the number of hidden units is decreased from 35 to 29 on average, and the classiﬁcation error rate remains.

Table 4.5. Same as Table 4.1, with the Mushroom data set.

Performance Mushroom criteria Small Large

overlaps overlaps Training error 0.8% 1.0%

Validation error 0.5% 0.7%

Testing error 1.1% 1.1%

Number of

hidden neurons 35 29

110 4 An Improved RBF Neural Network Classiﬁer

4.5 RBF Neural Networks Dealing with Unbalanced Data

4.5.1 Introduction

In neural network training, if some classes have much fewer samples compared with the other classes, the neural network system may respond wrongly for the minority classes because the overwhelming samples in the majority classes dominate the adjustment procedure in training.

In [19], a cost-sensitive neural network was proposed, in which different costs were associated with making errors in different classes. When the sum of squared errors was calculated for the multi-player perceptron (MLP) neural network, each term was multiplied by a class-dependent factor (cost). This idea has a much earlier origin in machine learning, that is, the loss matrix [22] which deals with different risks (costs) associated with making errors in different classes. For example, in classification of medical images, these class-dependent risk factors will need to be selected from practical experiences. By assigning larger costs to minority classes, Berardi and Zhang [19] were able to improve the classification accuracies for minority classes. In that work, as in earlier discussions on risk matrices, cost factors are selected in an ad hoc manner. In this work, we propose a method that determines these cost factors automatically such that all classes, i.e., minority and majority classes, are roughly equally important. We demonstrate this method in the case of the RBF classifier.

In [201], two methods had been presented for handling unbalanced data sets. In the first method, the samples of minority classes were duplicated to in-crease their effects on training neural networks. In the second method, the so-called snowball method proposed in [330] for multi-font character recognition was used to improve the accuracy of the minority class [201]. In the snowball method, neural networks were first trained with the samples in the minority class, which favors the minority populations. Next, samples of majority classes were added gradually while training the neural network dynamically. The two methods were used in the MLP neural network, the ART (adaptive resonance theory) neural network, and the RBF neural network. However, it was found that the two methods mentioned above had no effect on the MLP and RBF neural networks.

We explore a training method of RBF neural networks for handling unbalanced data [103]. Constructing an RBF neural network with a compact architecture but robust for unbalanced data is a challenging task.

In this section, we present an algorithm, ﬁrst proposed in [103], which deals with unbalanced data by increasing the contribution of minority samples to the MSE (mean squared error) function. The MSE function is a powerful objective function in neural network training. When learning a data set with a minority class, the weights of a neural network will be dominated by the majority classes. For example, when using the back-propagation method to

4.5 RBF Neural Networks Dealing with Unbalanced Data 111 train weights, the weights are updated according to the variation in the error function of the neural network [201]. It is clear that the weights obtained finally will reflect the nature of the majority classes but not much of the minority class. Thus, it motivates us to increase the magnitudes of weighted parameters of minority classes to balance the influence of the unbalanced classes, i.e., the errors brought by different classes are weighted by parameters inversely proportional to the number of samples in the classes.

4.5.2 The Standard RBF Neural Network Training Algorithm for Unbalanced Data Sets

For a neural network classiﬁer, its training algorithm is developed based on its classiﬁcation performance. The MSE function (Eq. (4.9)) is usually used as the objective function in neural networks.

Here the MSE function of an RBF neural network is written as follows:

E₀(W) =1 whereW is the weight vector,N is the number of patterns,M is the number of outputs, andKis the number of hidden units.w_mjis the weight connecting thejth hidden unit with themth output unit. øⁿ_j represents the output of the jth kernel function for thenth input pattern.tⁿ_mrepresents the target output of themth output unit when inputting thenth pattern.

Assume that there areM classes in a data set and M output neurons in the network. Themth output of an RBF neural network corresponding to thenth input vector is as follows:

y_mⁿ(Xⁿ) = K j=1

w_mjø_j(Xⁿ) +w_m0b_m. (4.17) Here,Xⁿis thenth input pattern vector,m= 1,2, ..., M, andKis the number of hidden units.w_mj is the weight connecting thejth hidden unit to themth output node. b_m is the bias. w_m0 is the weight connecting the bias and the mth output node. øⁿ_j(Xⁿ) is the activation function of the jth hidden unit corresponding to thenth input vector.

øⁿ_j(Xⁿ) = e

−||Xn−Cj||2

2σj2 , (4.18)

where C_j and σ_j are the center and the width for the jth hidden unit re-spectively, which are adjusted during learning. When calculating the distance between input patterns and centers of hidden units, Euclidean distance mea-sure is employed in RBF neural networks.

Substitute Eq. (4.17) into Eq. (4.16):

112 4 An Improved RBF Neural Network Classiﬁer DiﬀerentiateE₀ with respect to w_mj and let

∂E₀(W)

∂w_mj = 0, (4.20)

then as shown in earlier section, Eq. (4.11) is obtained. Equation (4.11) is written in a matrix notation (Eq. (4.12)), which leads to the pseudo-inverse for solving weights.

4.5.3 Training RBF Neural Networks on Unbalanced Data Sets It is shown in the above equations that there is no particular attention paid to unbalanced cases, in which the sample sizes of diﬀerent classes in a data set are unbalanced. Unbalanced training data may lead to an unbalanced architecture in training. In our work, we add larger weights on minority classes in order to attract more attention in training for the minority members.

Assume that the number of samples in classi isN_i. The total number of samples in the data set isN =N₁+· · ·+N_i+· · ·+N_M. The error function shown in Eq. (4.19) can be written as:

E₀(W) =1 During the training of neural networks with unbalanced training data, a general error function such as Eq. (4.16) or Eq. (4.21) cannot lead to a bal-anced classiﬁcation performance on all classes in the data set because majority classes contribute more compared to minority classes and therefore result in more weight adjustments on majority classes. In supervised training algo-rithms, neural networks are constructed by minimizing a neural network error function whose variables are the network weights connecting layers. Thus, the training procedure has a bias towards frequently occurring classes.

In order to increase the contribution of minority classes in weight ad-justments, we change Eq. (4.21) to:

E(W) =1 DiﬀerentiateE with respect to w_mj, and let

4.5 RBF Neural Networks Dealing with Unbalanced Data 113

∂E(W)

∂w_mj = 0. (4.24)

Substituting Eq. (4.22) into Eq. (4.24), we obtain:

M We introduce a new parameterr_n replacingβ_i:

r_n=β_i when Xⁿ∈A_i. (4.26)

A_i is classi. Substitute Eq. (4.26) into Eq. (4.25):

N Similarly as stated in [22], there is the following new pseudo-inverse equation for calculating weightW:

(φ^Tφ)W^TT =φ^TT. (4.29)

Diﬀerent to the pseudo-inverse equation shown in Eq. (4.12), hereφ→øⁿ_j√r_n, andT →tⁿ_i√r_n.

As indicated in the above equations, we have taken the unbalanced data into consideration when training RBF neural networks. The parametersr_n in-troduce biased weights which are opposite to the proportions of classes in a data set. The eﬀect of the weight parametersr_nis shown in Sect. 4.5.4. Com-pared with the training method without considering an unbalanced condition in the data, the classiﬁcation accuracy of the minority classes is improved.

We also allow large overlaps between clusters of the same class to reduce the number of hidden units [102][104].

The modiﬁed training algorithm for RBF neural networks, in which small overlaps between clusters of diﬀerent classes and large overlaps between clusters of the same class are allowed, is used in this section.

4.5.4 Experimental Results

The car evaluation data set in Chap. 3 are used here to demonstrate our algorithm. The data set is divided into three parts, i.e., training, validation,

114 4 An Improved RBF Neural Network Classiﬁer

and test sets. Each experiment is repeated ﬁve times with diﬀerent initial conditions and the average results are recorded.

We generate an unbalanced car data set based on function 5 shown in Chap. 3. There are nine attributes and two classes: Class A and Class B.

Samples which do not meet the conditions of Class A are samples of Class B in the car data set. 4000 patterns are in the training data set and 2000 patterns for the testing data set. There are 507 patterns of class 1 (Class A) in the training data set, and 205 patterns of class 1 in the testing data set. The testing data set is divided into two subsets: the validation set and the testing set with 1000 patterns, respectively. Class A is the minority class. Class B is the majority class.

Comparison between small overlaps and large overlaps among clusters of the same class are shown on classiﬁcation error rates and the number of hidden units. When allowing large overlaps among clusters of the same class, the number of hidden units is reduced from 328 to 303, and the classiﬁcation error rate on the test data set is increased slightly from 4.1% to 4.5%.

In table 4.6, the comparison of overall classiﬁcation error rates between with and without considering the unbalanced condition is shown. Here large overlaps are allowed between clusters with the same class label. It is also shown in Table 4.6, when considering the unbalanced condition in the data set, that the classiﬁcation error rate of the minority class decreases from 34.65% to 8.73%. At the same time, the error rate of the majority class increases slightly from 1.37% to 4.1%. Since, in most cases, the minority class is embedded with important information, improving the individual accuracy of the minority class is critical.

In this section, a modification [103] is described to the training algo-rithm for the construction and training of the RBF network on unbalanced data by increasing bias towards the minority classes. Weights inversely propor-tional to the number of patterns of classes are given to each class in the MSE function. Experimental results show that the proposed method is effective in improving the classification accuracy of minority classes while maintaining the overall classification performance.

4.6 Summary

In this chapter, we described a modified training algorithm for RBF neural networks, which we proposed earlier [107]. This modified algorithm leads to fewer hidden units while maintaining the classification accuracy of RBF clas-sifiers. Training is carried out without knowing in advance the number of hidden units and without making any assumptions on the data.

We described two useful modiﬁcations to Royet al.’s algorithm for the construction and training of an RBF network, by allowing for large overlaps among clusters of the same class and dynamically determining the cluster overlaps of diﬀerent classes.

4.6 Summary 115 Table 4.6. Comparison of classification error rates of the RBF neural network for each class of the car data set between with and without considering the unbalanced condition when allowing large overlaps between clusters with the same class label (average results of five independent runs). ( c2005 IEEE) We thank the IEEE for allowing the reproduction of this table, first appeared in [103].

Without considering unbalanced condition Overall error rates