• Aucun résultat trouvé

RBF Neural Networks for Classification

Dans le document Advanced Information and Knowledge Processing (Page 106-110)

In a classification task, for each classCi (i= 1,2, ..., M) with Ni samples, a posterior probability is [22]:

P(Ci|X) = p(X|Ci)P(Ci) p(X)

= p(X|Ci)P(Ci) N

i=1

p(X|Ci)P(Ci)

, (4.1)

whereX is a data sample.

Ni=N.

The probability mentioned above can be represented as an RBF neural network with radial basis kernel functions given by [22]:

øi(X) = p(X|Ci) N

i=1

p(X|Ci)P(Ci)

(4.2)

and second-layer connections which consist of one weight from each hidden unit to the corresponding output unit, with value p(Ci). Thus, the posterior probabilities can be approximated by the RBF neural network.

An RBF neural network may be considered as a mixture model for representing the distribution of the data set. Assume thatKradial basis kernel functions are generated in an RBF neural network in order to represent all the class-conditional densities. The jth kernel function can be expressed as follows [22]:

p(X|Ci) = K j=1

p(X|j)P(j|Ci). (4.3) The unconditional density is:

4.2 RBF Neural Networks for Classification 99

where the probabilities of the radial basis functions are:

P(j) = N i=1

P(j|Ci)P(Ci). (4.5) The unconditional density of the input data (p(X)) can be presented by a mixture model, in which the component densities are given by the radial basis functions shown in Eq. (4.4).

We obtain the posterior probabilities of class membership by substitut-ing Eq. (4.3) and Eq. (4.4) into Eq. (4.1) [22]:

Equation (4.6) represents an RBF neural network with K radial basis kernel functions, in which the weights are given by [22]:

wij = P(j|Ci)P(Ci) P(j)

=P(Ci|j) (4.7)

and thejth kernel function is given by [22]:

øj(X) = P(X|j)P(j) K

j=1

p(X|j)P(j)

=P(j|X). (4.8)

Thus, the classification of an RBF neural network can be interpreted as the posterior probabilities represented by the radial basis functions, and the weights connecting the hidden units with the outputs.

100 4 An Improved RBF Neural Network Classifier 4.2.1 The Pseudo-inverse Method

For a neural network classifier, training is based on its classification perfor-mance. The MSE (mean squared error) function is usually used as the objec-tive function in neural networks:

E= 1

2|d−Y|2, (4.9)

wheredis the target vector andY is the output vector.

In an RBF neural network, its MSE function is as follows:

E= 1 where N is the number of patterns, M is the number of outputs, and K is the number of hidden units.wmj is the weight connecting thejth hidden unit with themth output unit. ønj represents the output of thejth kernel function for thenth input pattern.tnmrepresents the target output of themth output unit when inputting thenth pattern.

Assume that the parameters (the number of hidden units, centers, and widths of hidden units) of the hidden layer have been fixed at the first training stage. weights between the hidden layer and the output layer need to be determined. In order to minimize the MSE, Eq. (4.16) is differentiated with respect to wmj and the derivative is set to be zero [22]:

N Equation (4.11) is written in the form of a matrix:

Tφ)WT=φTT, (4.12)

where φ, with elements ønj, has dimensionsN×K. W is an M ×K matrix with elements wmj. T has dimensions N×M and elements tnm. The matrix φTφ in Eq. (4.12) is a square matrix with dimensions K×K. If φTφ is a non-singular matrix, the solution to Eq. (4.12) is given [120][253] as follows:

WT=φT, (4.13)

whereφ is aK×N matrix known as the pseudo-inverse ofφ:

φTφ)−1φT. (4.14)

It is noted that, if the matrix φTφ is singular, Eqs. (4.13) and (4.14) do not exist, i.e., there is not a unique solution for Eq. (4.12). Redefine the pseudo-inverse of φas:

4.2 RBF Neural Networks for Classification 101 φ ≡lim→0Tφ+I)−1φT, (4.15) whereI is the unit matrix. >0. It is clear that the limit always exists [22], and it can be used to minimize the MSE.

4.2.2 Comparison between the RBF and the MLP

Though both the RBF neural network [33][85][219][245] and the MLP neural network are powerful tools in function approximation, classification, and data mining tasks, there exist differences in their performances and applications, which result from the differences stated as follows:

1. Activation function [22].

In RBF neural networks, the activation of a hidden unit is determined by the transformed distance between the input pattern and the center of the hidden unit. The transformation of the distance is by an activation function with a localized nature, such as the Gaussian kernel function. The activations of hidden units of MLP neural networks depend on weighted linear summations of the inputs transformed by activation functions, such as the sigmoidal function and the hyperbolic tangent function, which are not local in nature.

2. Data space partition.

An MLP neural network separates the data space by hyper-planes, while an RBF neural network generates hyper-spheres to partition the input space. This difference is a direct consequence of the difference in activation functions used in the RBF and MLP networks.

3. Training procedure.

The parameters of an MLP neural network are usually determined in a single training procedure. An RBF neural network’s parameters are typically trained in several stages. The parameters of the kernel functions are trained first by unsupervised techniques, and the weights connecting the hidden layer and the output layer are determined at the second stage.

4. The weights.

All weights in an MLP neural network are usually adjustable. The weights between the input layer and the hidden layer in a typical RBF neural network are fixed as 1’s. The weights connecting hidden units and output units can be obtained by the linear least square (LLS) method.

5. The local minima problem.

The error function to be minimized may have numerous local minima in the parameter space. The weights between the hidden layer and the output layer in an RBF network can be determined by the LLS method, which does not lead to any local minima; however, hidden neuron parameter adjustments by clustering may cause local minima.

102 4 An Improved RBF Neural Network Classifier

Dans le document Advanced Information and Knowledge Processing (Page 106-110)