ROLE OF NEURAL NETWORKS IN DATA MINING

Soft Computing

2.4 ROLE OF NEURAL NETWORKS IN DATA MINING

Neural networks were earlier thought to be unsuitable for data mining be-cause of their inherent black-box nature. No information was available from them in symbolic form, suitable for verification or interpretation by humans.

However, recent investigations have concentrated on extracting the embed-ded knowledge in trained networks in the form of symbolic rules [25]. Unlike fuzzy sets, the main contribution of neural nets towards data mining stems from rule extraction and clustering [3].

2.4.1 Rule extraction

In general, the primary input to a connectionist rule extraction algorithm is a representation of a trained (layered) neural network, in terms of its nodes, links, and sometimes the dataset. One or more hidden and output units are used to automatically derive the rules, which may later be combined and simplified to arrive at a more comprehensible rule set. These rules can also provide new insights into the application domain. The use of neural nets helps in (i) incorporating parallelism and (ii) tackling optimization problems in the data domain. The models are usually suitable in data-rich environments.

Typically, a network is first trained to achieve the required accuracy rate.

Redundant connections of the network are then removed using a pruning algorithm. The link weights and activation values of the hidden units in the network are analyzed, and classification rules are generated [25, 86]. Further details on rule generation can be obtained in Section 8.2.

2.4.2 Rule evaluation

Here we provide some quantitative measures to evaluate the performance of the generated rules [87]. This relates to the goodness of fit chosen for the rules. Let the (i, j)th element of an / x I matrix, n»j, indicate the number of objects (patterns) actually belonging to class i, but classified as class j.

68 SOFT COMPUTING

• Accuracy. It is the correct classification percentage, provided by the rules on a test set defined as

^ * 100,

where n, is equal to the number of objects in class i such that Ui^c of these are correctly classified.

• User's accuracy: It gives a measure of the confidence that a classifier attributes to a region as belonging to a class. If raj objects are found to be classified into class z, then the user's accuracy (U) is defined as

U = ^ * 100.

i

In other words, it denotes the level of purity associated with a region.

• Kappa: The coefficient of agreement, kappa, measures the relationship of beyond chance agreement to expected disagreement. It uses all the cells in the confusion matrix, not just the diagonal elements. The kappa value for class i (Ki) is defined as

where N indicates the total number of data samples. The estimate of kappa is the proportion of agreement, after chance agreement is removed from consideration. The numerator and denominator of overall kappa are obtained by summing the respective numerators and denominators of Ki separately over all classes.

• Fidelity. It is measured as the percentage of the test set for which network and the rulebase output agree [87].

• Confusion: This measure quantifies the goal that the "confusion should be restricted within minimum number of classes." Let n^ be the mean of all nij for i^j. Then [87]

Canf =

for an / class problem. The lower the value of confusion, the smaller the number of classes between which confusion occurs.

Coverage: The percentage of examples from a test set for which no rules are fired is used as a measure of the uncovered region. A rulebase having a smaller uncovered region is superior.

ROLE OF NEURAL NETWORKS IN DATA MINING 69

• Rulebase size: This is measured in terms of the number of rules. The lower its value, the more compact the rulebase. This leads to better understandability.

• Computational complexity. This is measured in terms of the CPU time required.

• Confidence: The confidence of the rules is defined by a confidence factor cf. We use [87]

cfj = inf (ZiWji-Oj)^ (² ^

j: all nodes in the path jLiiWji

where Wji is the zth incoming link weight to node j and Oj is its threshold.

2.4.3 Clustering and self-organization

One of the big challenges to data mining is the organization and retrieval of documents from archives. Kohonen et al. [88] have demonstrated the utility of a huge self-organizing map (SOM) with more than one million nodes to partition a little less than seven million patent abstracts, where the documents are represented by 500-dimensional feature vectors. Very large text collections have been automatically organized into document maps that are suitable for visualization and intuitive exploration of the information space. Vesanto et al.

[89] employ a stepwise strategy by partitioning the data with a SOM, followed by its clustering. Alahakoon et al. [90] perform hierarchical clustering of SOMs, based on a spread factor which is independent of the dimensionality of the data. Further details of these algorithms are provided in Section 6.5.2.

Shalvi and De Claris [91] have designed a data mining technique, combining Kohonen's self-organizing neural network with data visualization, for cluster-ing a set of pathological data containcluster-ing information regardcluster-ing the patients' drugs, topographies (body locations) and morphologies (physiological abnor-malities). Koenig [92] has combined SOM and Sammon's nonlinear mapping for reducing the dimension of data representation for visualization purposes.

2.4.4 Regression

Neural networks have also been used for a variety of classification and regres-sion tasks [93]. Time series prediction has been attempted by Lee and Liu [94].

They have employed a neural oscillatory elastic graph matching model, with hybrid radial basis functions, for tropical cyclone identification and tracking.

2.4.5 Information retrieval

The SOM has been used for information retrieval [95]. A map of text docu-ments arranged using the SOM is organized in a meaningful manner, so that

70 SOFT COMPUTING

items with similar content appear at nearby locations of the two-dimensional map display, such that the data is clustered. This results in an approximate model of the data distribution in the high-dimensional document space. A document map is automatically organized for browsing and visualization, and it is successfully utilized in speeding up document retrieval while maintaining high perceived quality. The objective of the search is to locate a small number Nf of best documents in the order of goodness corresponding to a query. The strategy is outlined below.

• Indexing phase: Apply the SOM to partition a document collection of D documents into K subsets or clusters, representing each subset by its centroid.

• Search phase: For a given query,

— Pre-select: select the best subsets, based on comparison with the centroids, and collect the documents in these subsets until K' doc-uments (K1 > N') are obtained.

— Refine: perform an exhaustive search among the K' prospective documents and return the N' best ones in the order of goodness.

A collection of 1460 document vectors was organized on a 10 x 15 SOM, using the WEBSOM principles, so that each map unit could contain an average of

10 documents.

Dans le document Data Mining (Page 86-89)