Wine Data Set - Encoding Feature Masks Using GAs

6.4 Encoding Feature Masks Using GAs

6.5.3 Wine Data Set

The Wine data set was obtained from chemical analysis of wines pro-duced in the same region of Italy but derived from three diﬀerent cultivars.

There are 13 attributes and 178 patterns in the Wine data set. 106 patterns are for training, 36 for validation, and 36 for testing. There are three classes corresponding to the three diﬀerent cultivars.

With class-dependent features, the classiﬁcation error rates are 2.83%

for the training set, 0% for the validation set, and 2.78% for the test set.

Without feature masks, the classiﬁcation error rates are 3.77% for the training set, 2.78% for the validation set, and 2.78% for the test set.

It is shown in the feature masks (Table 6.4) that features 1 and 8 do not play any important role in discriminating the classes. For class 1, the feature subset{2,4,5,6,7,9,11,12}can discriminate it from other classes. Features 3, 4, 5, 6, 7, 10, 11, 12, and 13 are used to discriminate class 2 from other classes.

The feature subset{3,4,11,12,13} is used to discriminate class 3 from other classes. The average number of features used for each class is seven, compared to the original 13 features.

Table 6.4. The feature mask found by GAs for the Wine data set.

Classes Feature masks Class 1 0 1 0 1 1 1 1 0 1 0 1 1 0 Class 2 0 0 1 1 1 1 1 0 0 1 1 1 1 Class 3 0 0 1 1 0 0 0 0 0 0 1 1 1

6.6 Summary

In this chapter, we have selected class-dependent features for each class and described a novel RBF classiﬁer with class-dependent features which are

se-156 6 Genetic Algorithms for Class-Dependent Feature Selection

lected by GAs based on RBF classification performance. The feature subset is selected for each class individually based on its ability in discriminating the class from other classes, which brings out the relationship between the fea-ture subset and the class concerned. The Glass, Thyroid, and Wine data sets are used to demonstrate the algorithm. Experimental results show that the algorithm is effective in reducing the number of feature inputs and improving classification accuracies simultaneously.

DDR is often the ﬁrst step for data mining tasks. The class-dependent feature selection results obtained above provide new information for analyzing the relationship between features and classes. The reduction in dimensionality can lead to compact rules in the rule extraction task. Extracting rules based on the classiﬁcation results obtained above will be shown in a latter chapter.

7 Rule Extraction from RBF Neural Networks

In this chapter, we ﬁrst review rule extraction techniques. Next, a type of data mining systems, i.e., the rule extraction system, is discussed from the view-point of its components. Then, four decompositional rule extraction methods based on RBF neural networks are described. The ﬁrst rule extraction method extracts rules from trained RBF neural networks through a GA: the GA is used to determine the rule premises [100]. The second extracts rules from trained RBF neural networks by a gradient descent method[101][333]. In the third rule extraction method, we extract rules based on the result of data dimensionality reduction using the gradient descent method [102]. The fourth rule extraction method utilizes the results of class-dependent feature selection to extract rules [106].

7.1 Introduction

A major problem in data mining using neural networks is that knowledge hid-den in a trained neural network is not comprehensible to humans. Linguistic rule extraction [28][102] [269][287][317][324] aims at solving this problem.

Rule extraction can facilitate data mining in many aspects:

• Increase perceptibility and help human beings better understand decisions of learning models. This advantage of rule extraction is extremely helpful in medical diagnosis.

• Reﬁne initial domain knowledge. Irrelevant or redundant attributes tend to be absent in extracted rules. In future data collections, labor cost can be reduced by skipping redundant or irrelevant attributes.

• Explain data concepts by linguistic rules to clients.

• Find active attributes in decision making. Many attributes may play roles in decision making. However, some attributes may be more active com-pared to others. Learning models usually are opaque in identifying active attributes. Rule extraction provides a solution in this problem.

158 7 Rule Extraction from RBF Neural Networks

In the literature, extracted rules are mainly evaluated based on two criteria [115][146]:

• rule accuracy,

• rule complexity.

An additional criterion is ‘the fidelity’ [286]. ‘The fidelity’ is defined as the ratio between correctly classified outputs from rule extraction and correctly classified outputs from the corresponding classifier. ‘The fidelity’

reﬂects how well a rule extraction method matches its corresponding classiﬁer.

The objective of rule extraction is to obtain a comprehensible descrip-tion of the data, rather than a descripdescrip-tion of the network structure in most cases. Based on this, we conclude that the fidelity measurement is not al-ways necessary for evaluating rule extraction, especially when models based on which rules are extracted do not have high learning accuracy. High fi-delity under this condition does not correspond to high performance in rules extracted, but to high mapping accuracy from rules extracted to learning models. In some cases, the accuracy of the extracted rules might be higher than the accuracy of the learning models, i.e., high consistency between a data concept and rules extracted is obtained rather than high consistency between rules extracted and learning models. Zhou [362] had questioned whether rule extraction is implemented using neural networks or for neural networks, i.e., to compensate for neural network ‘black-box’ behavior. And, Zhou [362] also concluded that rule extraction using neural networks and rule extraction for neural networks are two different tasks, and different criteria should be used for evaluating rule extraction techniques for the two tasks.

Given a data set, learning models from artiﬁcial intelligence and ma-chine learning are employed to abstract essences of the data by training the models on a set of training patterns. Rule extraction is considered as a proce-dure to discover hidden information from data sets and represent it in explicit rules, which are relatively easy to understand, and potentially provoke new ideas in further data analysis.

For a data set with tens or hundreds of attributes and thousands of data patterns, it is hard to identify the roles of the attributes in classifying new patterns without any aid from learning models. Neural networks can be trained on these training samples to abstract essences and store the learned essential knowledge as parameters in the network. However, though essential knowledge has been captured and embedded in the trained neural network, humans cannot tell exactly why a new pattern is classiﬁed to a class, which is sometimes referred to as ‘black-box’ characteristics of neural networks. In the medical domain, a disjunctive explanation given as ‘If medical measurement A is a₁, and medical measurement B is b₁,..., then conclusion.’ is preferable to a complex mathematical decision function hidden in neural networks.

Rule extraction from neural networks has been an active research topic in recent years. In early rule extraction work, Gallant [116] used trained neural networks to develop an expert-system engine and interpret the knowledge

7.1 Introduction 159 embedded in neural network models by IF–THEN rules. More than a decade passed. The capability of rule extraction [1][139][288][301] had been shown for delivering comprehensible descriptions of data concepts from complex machine learning models.

Rule extraction techniques are usually based on machine learning meth-ods such as neural networks, genetic algorithms, statistical methmeth-ods, rough sets, decision trees, and fuzzy logic.

Many methods [9][70][81][218][309] have been proposed for rule extrac-tion from neural networks. These rule extracextrac-tion methods can be characterized by:

1. forms of allowed variables: continuous, discrete, or both continuous and discrete variables

Data may have discrete, continuous, or mixed attributes. There are a few methods dealing with discrete variables [269][287][317] and continuous variables [284][305]. And, some methods deal with both continuous and discrete variables [28][102].

2. forms of extracted rule decision boundaries: rectangular, hyper-plane, hyper-ellipse, and fuzzy boundaries (Fig. 7.1).

Crisp rule extraction methods extract rules with hyper-plane decision boundaries [111][145] or hyper-rectangular [28][154][200] decision bound-aries . Rules with hyper-ellipse decision boundbound-aries can be obtained from RBF-based rule extraction methods directly; however, the complexity of extracted rules makes this type of rule unpopular. For some applications, fuzzy rules are preferred over crisp rules in cases when approximate rea-soning is desirable rather than exact rearea-soning. A detailed survey of fuzzy rule extraction based on neural networks can be found in [218].

3. approaches for searching rules: pedagogical, decompositional, and eclectic approaches [269].

The pedagogical algorithms consider a learning model (such as a neural network model) as a black box and use only the activation value of input and output units in the model when extracting rules from the trained model. In contrast, the decompositional algorithms consider each unit in a learning model and unify them into the rules corresponding to the model. Compared with the former algorithms, the latter ones can utilize each single unit of trained models, and can obtain detailed rules [273]. The eclectic approach [9] is a combination of the above two categories (Fig.

7.2).

4. Learning models constructed before extracting rules: classiﬁcation models and regression models.

Most rule extraction methods are developed based on constructed classi-fiers, and extracted rules are used for representing decisions of the clas-sifiers in classification and prediction. Some methods [276][288] extract rules from regression models (Fig. 7.3).

160 7 Rule Extraction from RBF Neural Networks

Rule Boundaries

Crisp Boundaries Fuzzy Boundaries

Hyper-Rectangular

Hyper-Plane

Hyper-Ellipse

Fig. 7.1.Rule extraction boundaries.

Rule Extraction Approaches

The Pedagogical Approach

The Decompositional Approach

The Eclectic Approach

Fig. 7.2.Rule extraction approaches.

Dans le document Advanced Information and Knowledge Processing (Page 162-167)