RBF Neural Networks for Classiﬁcation - Advanced Information and Knowledge Processing

In a classiﬁcation task, for each classC_i (i= 1,2, ..., M) with N_i samples, a posterior probability is [22]:

P(C_i|X) = p(X|C_i)P(C_i) p(X)

= p(X|C_i)P(C_i) N

i=1

p(X|C_i)P(C_i)

, (4.1)

whereX is a data sample.

N_i=N.

The probability mentioned above can be represented as an RBF neural network with radial basis kernel functions given by [22]:

ø_i(X) = p(X|C_i) N

i=1

p(X|C_i)P(C_i)

(4.2)

and second-layer connections which consist of one weight from each hidden unit to the corresponding output unit, with value p(C_i). Thus, the posterior probabilities can be approximated by the RBF neural network.

An RBF neural network may be considered as a mixture model for representing the distribution of the data set. Assume thatKradial basis kernel functions are generated in an RBF neural network in order to represent all the class-conditional densities. The jth kernel function can be expressed as follows [22]:

p(X|C_i) = K j=1

p(X|j)P(j|C_i). (4.3) The unconditional density is:

4.2 RBF Neural Networks for Classiﬁcation 99

where the probabilities of the radial basis functions are:

P(j) = N i=1

P(j|C_i)P(C_i). (4.5) The unconditional density of the input data (p(X)) can be presented by a mixture model, in which the component densities are given by the radial basis functions shown in Eq. (4.4).

We obtain the posterior probabilities of class membership by substitut-ing Eq. (4.3) and Eq. (4.4) into Eq. (4.1) [22]:

Equation (4.6) represents an RBF neural network with K radial basis kernel functions, in which the weights are given by [22]:

w_ij = P(j|C_i)P(C_i) P(j)

=P(C_i|j) (4.7)

and thejth kernel function is given by [22]:

ø_j(X) = P(X|j)P(j) K

j=1

p(X|j)P(j)

=P(j|X). (4.8)

Thus, the classiﬁcation of an RBF neural network can be interpreted as the posterior probabilities represented by the radial basis functions, and the weights connecting the hidden units with the outputs.

100 4 An Improved RBF Neural Network Classiﬁer 4.2.1 The Pseudo-inverse Method

For a neural network classiﬁer, training is based on its classiﬁcation perfor-mance. The MSE (mean squared error) function is usually used as the objec-tive function in neural networks:

E= 1

2|d−Y|², (4.9)

wheredis the target vector andY is the output vector.

In an RBF neural network, its MSE function is as follows:

E= 1 where N is the number of patterns, M is the number of outputs, and K is the number of hidden units.w_mj is the weight connecting thejth hidden unit with themth output unit. øⁿ_j represents the output of thejth kernel function for thenth input pattern.tⁿ_mrepresents the target output of themth output unit when inputting thenth pattern.

Assume that the parameters (the number of hidden units, centers, and widths of hidden units) of the hidden layer have been fixed at the first training stage. weights between the hidden layer and the output layer need to be determined. In order to minimize the MSE, Eq. (4.16) is differentiated with respect to w_mj and the derivative is set to be zero [22]:

N Equation (4.11) is written in the form of a matrix:

(φ^Tφ)W^T=φ^TT, (4.12)

where φ, with elements øⁿ_j, has dimensionsN×K. W is an M ×K matrix with elements w_mj. T has dimensions N×M and elements tⁿ_m. The matrix φ^Tφ in Eq. (4.12) is a square matrix with dimensions K×K. If φ^Tφ is a non-singular matrix, the solution to Eq. (4.12) is given [120][253] as follows:

W^T=φ^†T, (4.13)

whereφ^† is aK×N matrix known as the pseudo-inverse ofφ:

φ^†≡(φ^Tφ)⁻¹φ^T. (4.14)

It is noted that, if the matrix φ^Tφ is singular, Eqs. (4.13) and (4.14) do not exist, i.e., there is not a unique solution for Eq. (4.12). Redeﬁne the pseudo-inverse of φas:

4.2 RBF Neural Networks for Classiﬁcation 101 φ^† ≡lim_→0(φ^Tφ+I)⁻¹φ^T, (4.15) whereI is the unit matrix. >0. It is clear that the limit always exists [22], and it can be used to minimize the MSE.

4.2.2 Comparison between the RBF and the MLP

Though both the RBF neural network [33][85][219][245] and the MLP neural network are powerful tools in function approximation, classification, and data mining tasks, there exist differences in their performances and applications, which result from the differences stated as follows:

1. Activation function [22].

In RBF neural networks, the activation of a hidden unit is determined by the transformed distance between the input pattern and the center of the hidden unit. The transformation of the distance is by an activation function with a localized nature, such as the Gaussian kernel function. The activations of hidden units of MLP neural networks depend on weighted linear summations of the inputs transformed by activation functions, such as the sigmoidal function and the hyperbolic tangent function, which are not local in nature.

2. Data space partition.

An MLP neural network separates the data space by hyper-planes, while an RBF neural network generates hyper-spheres to partition the input space. This diﬀerence is a direct consequence of the diﬀerence in activation functions used in the RBF and MLP networks.

3. Training procedure.

The parameters of an MLP neural network are usually determined in a single training procedure. An RBF neural network’s parameters are typically trained in several stages. The parameters of the kernel functions are trained ﬁrst by unsupervised techniques, and the weights connecting the hidden layer and the output layer are determined at the second stage.

4. The weights.

All weights in an MLP neural network are usually adjustable. The weights between the input layer and the hidden layer in a typical RBF neural network are ﬁxed as 1’s. The weights connecting hidden units and output units can be obtained by the linear least square (LLS) method.

5. The local minima problem.

The error function to be minimized may have numerous local minima in the parameter space. The weights between the hidden layer and the output layer in an RBF network can be determined by the LLS method, which does not lead to any local minima; however, hidden neuron parameter adjustments by clustering may cause local minima.

102 4 An Improved RBF Neural Network Classiﬁer

Dans le document Advanced Information and Knowledge Processing (Page 106-110)