More Compact Rules - Rule Extraction Combining GAs and the RBF Neural Network 165

7.4 Rule Extraction Combining GAs and the RBF Neural Network 165

7.4.6 More Compact Rules

We have so far implicitly assumed that all attributes contribute to each rule. However, some attributes contribute little to the description of the data, or even do not contribute to the description at all, i.e., those attributes are irrelevant or redundant for some rules and hence should be removed.

Based on our methods in the preceding sections, if there aremhidden neurons in the trained RBF neural network,mrules will be adjusted by a GA.

Originally, the premise part of each rule is composed ofnconditions (nis the number of attributes). For example, for rulei, conditionj is in this form: ‘IF attribute j is within the interval (L_ji, U_ji)’. If an attribute does not affect a certain rule, its variation in the rule will not affect the final rule decision. It is desirable to remove this attribute from the rule. Assume that the minimum value and the maximum value in the data for attributej are min_j and max_j, respectively. If the following conditions are satisfied:

L_ji≤min_j, (7.9)

and

U_ji≥max_j, (7.10)

condition j will be removed from the original rule i. Thus, rule i will be replaced by a new rule with fewer conditions.

7.4.7 Experimental Results

We use the Iris and Thyroid data sets [223] to demonstrate our method.

Each data set is divided into three parts, i.e., training, validation, and test

7.4 Rule Extraction Combining GAs and the RBF Neural Network 171 sets. 90 patterns of the Iris data set are used for training, 30 patterns for validation, and 30 patterns for testing. There are 215 patterns in the Thyroid data set. 115 patterns are used for training, 50 patterns for validation, and 50 patterns for testing.

We set the ratio between the number of in-class patterns and the total patterns in a cluster asθ= 100% (the initialθ-criterion) in our experiments.

Iris Data Set

Different population sizes may affect the rule extraction based on GAs. The results shown in Table 7.1 are the average values of five independent exper-iments with randomly selected initial cluster centers. The smallest number of hidden neurons required to construct an RBF neural network classifier is three for the Iris data set.

Next we use GAs to search for optimal rules based on the compact RBF neural network obtained above. The results in Table 7.1 show that the number of generations needed for reaching the top rule accuracy is reduced with the increase of population size.

Table 7.1.The results under diﬀerent population sizes for the Iris data set (average of ﬁve runs).

Population size 40 80 160 200 240

Rule error rate 6.67% 6.67% 3.3% 2.67% 2.67%

Number of generations

needed to reach the accuracy 33.5 31 13.2 16 9

Thyroid Data Set

In Chap. 4, it had shown that when large overlaps among clusters of the same class are permitted, both the number of hidden neurons and the classiﬁcation error rate are reduced. For the Thyroid data set, at least six hidden neurons are needed.

Referring to the weight matrix of the Thyroid data set shown in Chap.

6, the simpliﬁed weight matrix is:

W₁=

172 7 Rule Extraction from RBF Neural Networks

Next we use a GA to search for optimal rules based on the compact RBF neural network obtained above. The results with diﬀerent population sizes are compared in Table 7.2. It shows that the number of generations needed for approaching the optimal results decreases and the rule accuracy increases, but eventually saturates, as the size of population increases. Chro-mosomes in the initial population are initialized to be around the centers of the Gaussian kernel functions. The number of bits for eachG^(p)_jiin Eq. (7.3) isk= 6. The crossover probability is 90% and the mutation rate is dynamic.

The initial mutation rate is 10%. If the number of identical members in a population exceeds 25%, the mutation rate is increased by 1%. The number of elite chromosomes, which remain unchanged and live from one generation to the next, is two. The GA searching procedure is stopped if the accuracy of the extracted rules does not change by more than 0.5% for the validation set for ﬁve consecutive generations.

Table 7.2. The results under diﬀerent population sizes for the Thyroid data set (average of ﬁve runs).

Population size 40 80 160 200 240 280

Rule error rate 25.2% 24.3% 22% 21% 20% 20%

Number of generations

needed to reach the accuracy 36.5 28 23 38 38.2 26

After the searching procedure of the GA based on the smallest trained RBF neural network, the redundant premises in each rule will be checked according to Eqs. (7.9) and (7.10). We obtain three symbolic rules for the Iris data set. The average number of premises in each rule is two. The accuracy of the symbolic rules that we obtain through the proposed method is 97.33%

for the Iris data set. There are six rules for the Thyroid data set. The average number of premises in each rule is 4.2, and the accuracy of the extracted rules is 80%. In addition, there is a default rule for each data set. The dominant class will be the class label for the default rule if the data set is biased. If the classes in the data set have the same number of patterns, then the default rule corresponds to the last class. The smallest number of generations needed to obtain the best accuracy is 200 for the Iris data set and 240 for the Thyroid data set.

The linguistic rules for describing the two data sets obtained are as follows. For the Iris data set, three rules are obtained using our method; the accuracy is 97.33% for the test data set. These rules are not the same as rules extracted from the MLP [28][145], but have the same accuracy.

Rule 1:

IF the petal length is within the interval (0.00, 2.08) AND the petal width is within the interval (0.00, 1.11)

7.4 Rule Extraction Combining GAs and the RBF Neural Network 173 THEN the class label is Setosa.

Rule 2:

IF the petal length is within the interval (2.08, 5.26) AND the petal width is within the interval (0, 1.71) THEN the class label is Versicolor.

Rule 3:

IF the petal length is within the interval (0.77, 6.9) AND the petal width is within the interval (1.79, 2.5) THEN the class label is Virginica.

Default rule:

the class label is Virginica.

For the Thyroid data set, six rules are obtained, and the accuracy is 80% for the test data set.

Rule 1:

IF attribute 1 is within the interval (116.64, 118.94) AND attribute 3 is within the interval ( 0.00, 7.63) AND attribute 4 is within the interval (0.00, 56.40) AND attribute 5 is within the interval ( 0.00, 35.36) THEN the class label is normal.

Rule 2:

IF attribute 1 is within the interval (33.22, 144.00) AND attribute 2 is within the interval (13.54, 25.30) AND attribute 3 is within the interval (0.00, 9.03) AND attribute 4 is within the interval (0.00, 30.70) THEN the class label is hyper.

Rule 3:

IF attribute 1 is within the interval (63.55, 140.36) AND attribute 2 is within the interval (0.00, 13.35) AND attribute 3 is within the interval (0.00, 7.20) AND attribute 4 is within the interval (8.42, 8.93) AND attribute 5 is within the interval (0.00, 53.84) THEN the class label is normal.

Rule 4:

IF attribute 1 is within the interval (68.40, 144.00) AND attribute 2 is within the interval (0.00, 19.06) AND attribute 4 is within the interval (0.15, 15.94) THEN the class label is hypo.

Rule 5:

IF attribute 1 is within the interval (11.19, 144.00) AND attribute 2 is within the interval (24.11, 24.11) AND attribute 4 is within the interval (0.00, 42.15) AND attribute 5 is within the interval (0.00, 11.37) THEN the class label is hyper.

Rule 6:

IF attribute 1 is within the interval (51.26, 108.86)

174 7 Rule Extraction from RBF Neural Networks

AND attribute 2 is within the interval (14.57, 25.30) AND attribute 3 is within the interval (7.33, 10.00) AND attribute 4 is within the interval (0.00, 34.74) AND attribute 5 is within the interval (0.00, 18.65) THEN the class label is hyper.

Default rule:

the class label is normal.

Halgamugeet al.[128] extracted rules based on RBF neural networks;

however, ﬁve or six rules were needed to represent the concept of the Iris data set (the accuracy is not available). Huber and Berthold [150] used eight rules to represent the Iris data set (the accuracy is not available). In order to obtain a small rule base, unimportant rules were pruned according to ranking [150];

however, the accuracy of the rules was reduced at the same time. McGarry et al.[212][213][214] extracted rules from RBF neural networks directly based on the parameters of Gaussian kernel functions and weights. In [212], the accuracy reached 100%, but the number of rules was large (for the Iris data set, 53 rules are needed). In [213] and [214], the number of rules for the Iris data set was small, i.e., three, but the accuracy of the extracted rules was only 40% and around 80%, respectively. The rule set for the Iris data set extracted by McGarryet al.[213] is included in Appendix A for comparison. The results of the extracted rules for the Thyroid data set using other methods are not available.

In order to evaluate the complexity of the extracted rule set, a com-plexity measure [115] was calculated for comparison. The comcom-plexity measure C was deﬁned as:

C= 0.6N R+ 0.4N P, (7.11)

where N Ris the number of rules and N P is the number of premises (Table 7.3).

Much work has been carried out on extracting rules using MLPs [28][145]. However, prior knowledge about how to divide the range of each attribute into several parts is needed. In most cases, it is diﬃcult to obtain this knowledge.

In addition, in [28], no explicit rule corresponds to the Virginica class, i.e., the patterns belonging to the Virginica class cannot be selected from the data set independently, in contrast to other methods in Table 7.3. The rules extracted by Bologna et al. [28] are shown in Appendix A. The rules extracted by Hruschka and Ebecken [145] are also shown in Appendix A. In the presented algorithm, rules with hyper-rectangular decision boundaries are obtained directly without the need for transforming continuous attributes into discrete ones.

7.4.8 Summary

We have described a novel rule-extraction algorithm that we proposed earlier by combining GAs and RBF networks. Rule extraction is carried out from

7.5 Rule Extraction by Gradient Descent 175

Dans le document Advanced Information and Knowledge Processing (Page 177-182)