Rule Extraction After Data Dimensionality Reduction

Data dimensionality reduction (DDR) is usually carried out before inputting patterns to classiﬁers. In order to obtain good results in data mining, careful selections of relevant data are desirable. Irrelevant or redundant attributes

7.6 Rule Extraction After Data Dimensionality Reduction 181 interfere with knowledge discovery from data sets. In this section, we carry out rule extraction after data dimensionality reduction shown in Chap. 5.

According to the attribute ranking results, the attribute subsets which lead to the best classification results are selected and used as inputs to a classifier, such as an RBF neural network used in this book. The complexity of the classifier can thus be reduced and its classification performance improved. The results are input to our rule extraction system to discover knowledge from data sets. Rules with hyper-rectangular decision boundaries are extracted based on the trained RBF neural networks and DDR using the gradient descent method.

7.6.1 Experimental Results Iris Data Set

150 patterns of the Iris data set are divided into three sets, i.e., 90 patterns for training, 30 for validation, and 30 for testing. For the Iris data set, based on the attribute subset{3,4}selected in Chap. 5, two rules are obtained with two antecedents per rule. The accuracy is 100% for the test data set (Table 7.8). We compare our rule extraction results for the Iris data set with other methods in Table 7.9.

Table 7.8.Rule accuracy and numbers of rules for the Iris data set based on DDR.

Training accuracy 100%

Rule accuracy Validation accuracy 96.67%

Testing accuracy 100%

The number of premises/rule 2

The number of rules 2

Monk3 Data Set

There are six attributes in the Monk3 data set. The Monk3 data set has a training set with 122 patterns and a test set with 421 patterns. We divide the test set into 200 patterns for validation and 221 patterns for testing.

For the Monk3 data set, based on the attribute subset{2,4,5}selected in Chap. 5, we obtain three rules with three antecedents per rule (Table 7.10).

The rule accuracy is 98% for the test data set. Setiono [286] extracted two rules, 5.83 antecedents per rule, and 100% rule accuracy for the Monk3 data set based on the pruned MLP.

182 7 Rule Extraction from RBF Neural Networks

Table 7.9. A comparison of results for the Iris data set obtained with diﬀerent methods.

Methodology Rule Type of decision boundary accuracy

Modiﬁed RX algorithm 97.33% Hyper-plane based on MLP [145]

Inputs are transformed into 97.33% Hyper-rectangular discrete ones artiﬁcially

based on IMLP [28]

Based on RBF [214] 80% Hyper-rectangular Based on RBF [212] 100% Hyper-rectangular Our algorithm 100% Hyper-rectangular

Table 7.10. Rule accuracy and numbers of rules for the Monk3 data set based on DDR.

Training accuracy 99.4%

Rule accuracy Validation accuracy 96.6%

Testing accuracy 98%

The number of premises/rule 3

The number of rules 3

Thyroid Data Set

There are ﬁve attributes in the Thyroid data set. There are 215 patterns in the Thyroid data set, 115 patterns for training, 50 for validation, and 50 for testing.

In Chap. 5,χ= 0.4 is selected because it leads to the smallest attribute subset{2,3,5} with the lowest classiﬁcation error rates.

Table 7.11 shows the properties of rules for the Thyroid data set. The number of rules is fewer compared to results obtained by GAs (shown in Sect.

7.4), and the rule accuracy is also higher.

Table 7.11.Rule accuracy and numbers of rules for the Thyroid data set based on DDR.

Training accuracy 95.65%

Rule accuracy Validation accuracy 95%

Testing accuracy 95%

The number of premises/rule 3

The number of rules 4

7.6 Rule Extraction After Data Dimensionality Reduction 183 Rules for the Thyroid data set obtained by the present approach are:

Rule 1:

IF attribute 2 is within the interval [11, 25.3]

AND attribute 3 is within the interval [2.9, 10]

AND attribute 5 is within the interval [0, 10.9]

THEN the class label is hyper-thyroid.

Rule 2:

IF attribute 2 is within the interval [15, 23.5]

AND attribute 3 is within the interval [0.77, 2.9]

AND attribute 5 is within the interval [0, 10.2]

THEN the class label is hyper-thyroid.

Rule 3:

IF attribute 2 is within the interval [0.0, 5.12]

AND attribute 3 is within the interval [0.0, 2.12]

AND attribute 5 is within the interval [0, 19.6]

THEN the class label is hypo-thyroid.

Rule 4:

IF attribute 2 is within the interval [0.0, 8.5]

AND attribute 3 is within the interval [0.0, 3.0]

AND attribute 5 is within the interval [14.2, 56.3]

THEN the class label is hypo-thyroid.

Default rule:

the class label is normal.

Breast Cancer Data Set

For the Breast cancer data set, based on the attribute subset{2,3,7}selected in Chap. 5, we obtain four rules with three antecedents per rule (Table 7.12).

The rule accuracy is 97.8% for the test data set. Setiono [286] extracted 2.9 rules and obtained 94.04% accuracy for the Breast cancer data set based on the pruned MLP.

Table 7.12. Rule accuracy and numbers of rules for the Breast cancer data set based on DDR.

Training accuracy 96%

Rule accuracy Validation accuracy 97%

Testing accuracy 97.8%

The number of premises/rule 3

The number of rules 4

184 7 Rule Extraction from RBF Neural Networks The rules for the Breast cancer data set are below:

Rule 1:

IF uniformity of cell shape is within [2, 10]

AND bland chromatin is within [4, 10]

THEN this case is Malignant.

Rule 2:

IF uniformity of cell shape is within [5, 10]

AND bland chromatin is within [2, 10]

THEN this case is Malignant.

Rule 3:

IF uniformity of cell size is within [3, 10]

AND uniformity of cell shape is within [3, 10]

THEN this case is Malignant.

Default rule:

this case is benign.

Mushroom Data Set

There are 22 nominal attributes and 8124 patterns in the Mushroom data set. Among the 8124 patterns, 4500 patterns are for training, 1812 are for validation, and 1812 are for testing.

For the Mushroom data set, based on the attribute subset {5,9,20} selected in Chap. 5, we obtain 16 rules with the antecedents per rule (Table 7.13). The rule accuracy is 98.86% for the test data set. By the RulEx method [9], four rules, 78 premises, and a rule accuracy of 97.02% were obtained.

Setiono [286] extracted three rules with 4.3 premises per rule and obtained 98.12% accuracy for the Mushroom data set based on the pruned MLP. Better rule accuracy is obtained by our method, though the number of rules is larger.

Table 7.13.Rule accuracy and numbers of rules for the Mushroom data set based on DDR.

Training accuracy 99.33%

Rule accuracy Validation accuracy 100%

Testing accuracy 98.86%

The number of premises/rule 3

The number of rules 16

7.6.2 Summary

In this section, we extracted rules from trained RBF neural networks after data dimensionality reduction. Our SCM is used to rank importance of attributes

7.7 Rule Extraction Based on Class-dependent Features 185

Dans le document Advanced Information and Knowledge Processing (Page 187-192)