Rule Extraction Based on Classiﬁcation Models

Neural networks, support vector machines (SVMs), and decision trees are popular due to their good generalization capabilities. In this section, major rule extraction techniques are reviewed according to types of learning models from which rules are extracted.

7.2 Rule Extraction Based on Classiﬁcation Models 161 Data

Classification-Based Rule Extraction

Regression-Based Rule Extraction NN-based SVM-based Decision

Tree Classification

Models

Regression Models

Linear Regression

Non-linear Regression

Rules

Fig. 7.3.Rule extraction based on diﬀerent learning models.

7.2.1 Rule Extraction Based on Neural Network Classiﬁers

In classification and prediction tasks, it has been shown that neural network classifiers have remarkable generalization capabilities. Consider a data set (x_i, y_i), where i = 1,2, ..., N, input x_i ∈R^L, and class label y_i ∈ R. With-out any prior knowledge abWith-out the relationship between input samples and their corresponding class labels, neural networks are used to map the relation-ship to network parameters. A neural network may have multiple layers. The weights connecting one layer to the next layer are real valued. The activation functions of hidden layer neurons are non-linear. These parameters lead to complex decision functions of neural networks, which might be non-linear and non-monotonic. These factors impede explicit description of how and why an unseen sample is classified to a certain class.

Much research work has been carried out on extracting rules from MLPs [155][200][270][284]. Generally speaking, approaches to extract rules from the MLPs can be categorized according to the ways in dealing with individual neu-rons. In [116] and [305], rules are extracted by interpreting outputs in terms of inputs. However, hidden neurons are assumed to work independently in [116]

and [305], which limits the capability of the proposed rule extraction methods.

These two methods could be considered as the earliest pedagogical rule extrac-tion approaches. Thevalidity interval analysis(VIA) approach [305] improves generalization compared to Gallant’s approach.Validity intervalsare tuned as constraining inputs and outputs by detecting and excluding activation values which are not consistent with the trained neural network. This method is also considered as a pedagogical method due to the direct mapping from inputs

162 7 Rule Extraction from RBF Neural Networks

to outputs. Other early pedagogical rule extraction work includes the RU-LENEG algorithm [244], which extracts rules from a trained neural network by stepwise negation, and the DEDEC method [309], which finds minimal information separating a pattern from the others based on the trained neural network. Narazaki et al. [226] proposed a rule extraction method analyzing the function learned by a trained NN. The rule boundaries were based on the relationship between inputs and outputs learned by the NN, as well as on the class label predicted by the NN. In [155], continuous inputs are represented by linguistic inputs, and each possible combination of linguistic inputs is ex-amined for generating rules. The number of rules is 4^L, where Lis the data dimensionality. In addition, a GA is applied to select a small rule set. Jianget al. [162] transformed continuous data attributes into categorical values, and the roles of an attribute in its categorical values are inspected in order to gen-erate rules with the attribute as antecedents. The drawback lies in that it is difficult to categorize attributes without sufficient prior knowledge of the data distribution, and it is not practical to check the combination of categorical attributes for generating rules when the data have high dimensionality.

Pedagogical rule extraction approaches have also been developed based on recurrent networks. Vahed and Omlin [323] used a symbolic learning al-gorithm with polynomial time to extract rules solely based on changes in inputs and outputs of a trained network. The clustering phase is eliminated in this rule extraction approach, which increases the ﬁdelity of the extracted knowledge.

Craven and Shavlik developed the TREPAN algorithm [66] to extract rules by forming decision trees and querying the class labels of samples through trained neural networks, which can deal with both continuous and discrete attributes. Schmitzet al.[276] proposed an artiﬁcial neural network decision-tree algorithm (ANN-DT), in which a univariate decision decision-tree is generated from a trained neural network.

In [162] and [361], rules are extracted from a neural network ensemble by pedagogical approaches. Similar to the method in [66], Zhou and Jiang [361] combined C4.5 with neural network ensembles to extract rules. Neural network ensembles are expected to deliver better generalization compared to single networks.

Decompositional rule extraction approaches are local methods since the basic components of neural networks, including interconnected weights and neurons, are decomposed to represent the relationship between the input and the output.

In a number of decompositional rule extraction methods [146][200], clus-tering techniques are used for grouping activation values of hidden neurons.

In these approaches, the connected weights and activation values are approx-imated according to clustering results.

GAs have been widely used for practical problem solving and for sci-entiﬁc modelling. With the capability of searching for desirable solutions in the problem space, GAs have been employed for extracting rules from neural

7.2 Rule Extraction Based on Classiﬁcation Models 163 networks. Fukumi and Akamatsu [111] used GAs to prune the connections in neural networks before extracting rules. Hruschka and Ebecken [146] proposed a clustering genetic algorithm (CGA) to cluster the activation values of the hidden units of a trained neural network. Rules were then extracted based on the results from the CGA. Ishibuchi et al. [152][153][154][155][156] used GAs to obtain concise rules by selecting important members from the rules extracted from a neural network.

Decision trees are often combined with neural networks in both ped-agogical and decompositional rule extraction approaches [273][283]. In the decompositional approach proposed in [273], neural networks are ﬁrst trained to extract the essential relationship between the input and the output. The relationship is thus embedded in interconnected weights and hidden neurons of trained neural networks. Then, decision trees are applied to decompose the relationship between inputs and hidden neurons, as well as the relation-ship between hidden neurons and outputs. The results from decision trees are combined to deliver rules.

7.2.2 Rule Extraction Based on Support Vector Machine Classiﬁers

In recent years, support vector machines (SVMs) [39][40][42][163] have at-tracted lots of interest for their capability of solving classification and regres-sion problems. Successful applications of SVMs have been reported in various areas, such as communication [122], time-series prediction [119], and bioinfor-matics [34][222]. In many applications, it is desirable to know not only the classification decisions but also what leads to the decisions. However, SVMs offer little insight into the reasons why SVMs offer their final results.

In [231], rules are extracted after clustering. Distances from support vectors to a cluster center are checked to generate rules based on the cluster.

The RulExSVM method [108] extracts hyper-rectangular rules which are then ﬁne tuned, with redundant rules merged to produce a compact rule set. This will be described in a later chapter.

7.2.3 Rule Extraction Based on Decision Trees

As stated in previous paragraphs, decision trees are employed for facilitating rule extraction from neural networks. Rules can be obtained from decision tree classifiers directly since each distinct path through the decision tree produces a distinct rule. In order to generate rules, each path is traced in the decision tree, from the root node to the leaf node, recording the outcome as the antecedents and the leaf-node classification as the consequences. Decision trees are easy to construct automatically from labelled instances. Two well-known programs for constructing decision trees are C4.5 [249] and CART (classification and regression tree) [30]. Decision trees can be regarded as rule-based systems.

164 7 Rule Extraction from RBF Neural Networks

Knowledge learned by decision trees can be transformed into expressive rules easily. However, decision trees are prone to errors when data are noisy.

Decision trees [126][321] can form concise rules, in contrast to neural networks. However, the accuracy of decision trees is often lower than neural networks for noisy data and it is diﬃcult for decision trees to tackle dynamic data. Zhao [360] constructed a decision tree with each node being an expert neural network, therefore combining the advantages of both the decision tree and the neural network. Tsang et al.[315] and Umanoet al.[321] combined neural networks with decision trees to obtain better performance in rule ex-traction.

7.2.4 Rule Extraction Based on Regression Models

Most rule extraction methods for analyzing data or explaining functions learned by trained models are developed based on classiﬁcation models. Classi-ﬁcation models have discrete outputs, i.e., the categories of the inputs. Regres-sion and approximation models are constructed for approximating continuous outputs. There are fewer methods developed for extracting rules from learning models with continuous outputs.

Setionoet al.[288] proposed a method REFANN (rule extraction from function approximating neural networks) to extract rules from trained neural networks for non-linear function approximation or regression. The trained neural network is first pruned by removing redundant inputs and hidden neu-rons. Then, either a three-piece or a five-piece linear function is used to ap-proximate the continuous activation function of each hidden neuron. Finally, the input space is divided into subregions. In each subregion, the function values of samples are computed as a linear function of the inputs. That is, the antecedents of a rule are a subregion represented by attributes, and the con-sequence of the rule is a linear function of inputs as the final approximation of the non-linear function learned by the neural network.

Tsukimoto [318] extracted rules from regression models and mathemat-ical formulae. In the rule extraction technique called LRA (logmathemat-ical regression analysis) [318], the trained neural network is decomposed into neurons. In this algorithm, the output of each neuron in a trained neural network is assumed to be monotone increasing in the algorithm [318].

Dans le document Advanced Information and Knowledge Processing (Page 167-171)