SELF-ORGANIZING MAPS

KOHONEN NETWORKS

EXAMPLE OF A KOHONEN NETWORK STUDY CLUSTER VALIDITY

APPLICATION OF CLUSTERING USING KOHONEN NETWORKS USING CLUSTER MEMBERSHIP AS INPUT TO DOWNSTREAM DATA MINING MODELS

Kohonen networkswere introduced in 1982 by Finnish researcher Tuevo Kohonen [1].

Although applied initially to image and sound analysis, Kohonen networks are never-theless an effective mechanism for clustering analysis. Kohonen networks represent a type ofself-organizing map(SOM), which itself represents a special class of neural networks, which we studied in Chapter 7.

The goal of self-organizing maps is to convert a complex high-dimensional input signal into a simpler low-dimensional discrete map [2]. Thus, SOMs are nicely appropriate for cluster analysis, where underlying hidden patterns among records and ﬁelds are sought. SOMs structure the output nodes into clusters of nodes, where nodes in closer proximity are more similar to each other than to other nodes that are farther apart. Ritter [3] has shown that SOMs represent a nonlinear generalization of principal components analysis, another dimension-reduction technique.

Self-organizing maps are based oncompetitive learning, where the output nodes compete among themselves to be the winning node (or neuron), the only node to be activated by a particular input observation. As Haykin [2] describes it: “The neurons becomeselectively tunedto various input patterns (stimuli) or classes of input patterns in the course of a competitive learning process.” A typical SOM architecture is shown in Figure 9.1. The input layer is shown at the bottom of the figure, with one input node for each field. Just as with neural networks, these input nodes do no processing themselves but simply pass the field input values along downstream.

Discovering Knowledge in Data: An Introduction to Data Mining, By Daniel T. Larose ISBN 0-471-66657-2 Copyright^C 2005 John Wiley & Sons, Inc.

163

Output Layer

Input Layer Connections with Weights

Age

Income

Figure 9.1 Topology of a simple self-organizing map for clustering records by age and income.

Like neural networks, SOMs arefeedforwardandcompletely connected. Feed-forwardnetworks do not allow looping or cycling.Completely connectedmeans that every node in a given layer is connected to every node in the next layer, although not to other nodes in the same layer. Like neural networks, each connection between nodes has a weight associated with it, which at initialization is assigned randomly to a value between zero and 1. Adjusting these weights represents the key for the learning mechanism in both neural networks and self-organizing maps. Variable values need to be normalized or standardized, just as for neural networks, so that certain variables do not overwhelm others in the learning algorithm.

Unlike most neural networks, however, SOMs have no hidden layer. The data from the input layer is passed along directly to the output layer. The output layer is represented in the form of a lattice, usually in one or two dimensions, and typically in the shape of a rectangle, although other shapes, such as hexagons, may be used.

The output layer shown in Figure 9.1 is a 3×3 square.

For a given record (instance), a particular ﬁeld value is forwarded from a par-ticular input node to every node in the output layer. For example, suppose that the normalized age and income values for the ﬁrst record in the data set are 0.69 and 0.88, respectively. The 0.69 value would enter the SOM through the input node associated withage, and this node would pass this value of 0.69 to every node in the output layer. Similarly, the 0.88 value would be distributed through theincomeinput node to every node in the output layer. These values, together with the weights assigned to each of the connections, would determine the values of ascoring function(such as

KOHONEN NETWORKS 165 Euclidean distance) for each output node. The output node with the “best” outcome from the scoring function would then be designated as thewinning node.

Self-organizing maps exhibit three characteristic processes:

1. Competition. As mentioned above, the output nodes compete with each other to produce the best value for a particular scoring function, most commonly the Euclidean distance. In this case, the output node that has the smallest Euclidean distance between the ﬁeld inputs and the connection weights would be declared the winner. Later, we examine in detail an example of how this works.

2. Cooperation. The winning node therefore becomes the center of a neighborhood of excited neurons. This emulates the behavior of human neurons, which are sensitive to the output of other neurons in their immediate neighborhood. In self-organizing maps, all the nodes in this neighborhood share in the “excitement” or

“reward” earned by the winning nodes, that ofadaptation. Thus, even though the nodes in the output layer are not connected directly, they tend to share common features, due to this neighborliness parameter.

3. Adaptation. The nodes in the neighborhood of the winning node participate in adaptation, that is, learning. The weights of these nodes are adjusted so as to further improve the score function. In other words, these nodes will thereby have an increased chance of winning the competition once again, for a similar set of ﬁeld values.

KOHONEN NETWORKS

Kohonen networks are self-organizing maps that exhibitKohonen learning. Suppose that we consider the set ofmﬁeld values for thenth record to be an input vectorxn= xn1,xn2, . . . ,xnm, and the current set ofmweights for a particular output node j to be a weight vectorwj =w1j, w2j, . . . , wm j. In Kohonen learning, the nodes in the neighborhood of the winning node adjust their weights using a linear combination of the input vector and the current weight vector:

wi j,new=wi j,current+␩(xni−wi j,current) (9.1)

where␩, 0<␩<1, represents thelearning rate, analogous to the neural networks case. Kohonen [4] indicates the learning rate should be a decreasing function of train-ing epochs (runs through the data set) and that a linearly or geometrically decreastrain-ing

␩is satisfactory for most purposes.

The algorithm for Kohonen networks (after Fausett [5]) is shown in the ac-componying box. At initialization, the weights are randomly assigned, unless ﬁrm a priori knowledge exists regarding the proper value for the weight vectors. Also at initialization, the learning rate␩and neighborhood sizeRare assigned. The value of R may start out moderately large but should decrease as the algorithm progresses.

Note that nodes that do not attract a sufﬁcient number of hits may be pruned, thereby improving algorithm efﬁciency.

KOHONEN NETWORKS ALGORITHM

For each input vectorx, do:

rCompetition. For each output node j, calculate the value D(wj,xn) of the scoring function. For example, for Euclidean distance, D(wj,xn)=

i(wi j−xni)². Find the winning nodeJthat minimizesD(wj,xn) over all output nodes.

rCooperation.Identify all output nodes jwithin the neighborhood of Jdeﬁned by the neighborhood sizeR. For these nodes, do the following for all input record ﬁelds:

◦Adaptation.Adjust the weights:

wi j,new=wi j,current+␩(xni−wi j,current) rAdjust the learning rate and neighborhood size, as needed.

rStop when the termination criteria are met.

Dans le document An Introduction to Data Mining (Page 182-185)