Support Vector Gaussian RBFs - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LA

In the classical RBF system, we used a clustering technique that computes Gaussian kernels irrespective of the exact recognition task to be solved. One can view the clustering operation as building a separate distribution-based model for each digit class using spherical Gaussian clusters. The RBF digit recognizers classify new digit patterns by determining how \similar"

they are to each of the 10 digit manifolds, based on distance measurements to the Gaussian kernels. In this second system, we build a similar Gaussian RBF-based 10 class digit recognizer using a dierent initialization technique, called the support vector algorithm [26], that concentrates Gaussian kernels at feature space locations critical for the recognition task at hand. The support vector algorithm is a general procedure that sieves through example databases for useful data subsets relevant to a given learning task. The algorithm works for many dierent learning machine architectures, and the resulting data subsets (i.e. the

Digit Recognizer 0 1 2 3 4 5 6 7 8 9

# Support Vectors 274 104 377 361 334 388 236 235 342 263 Table 3.4: Number of support vectors for each each digit recognizer. Notice that for each digit recognizer, thesupport vector set contains both positive and negative example patterns, i.e. patterns from within and outside the target class. The same digit pattern can be a support vector for two or more recognizers. Table 3.3 shows the number of distinct patterns from each digit class selected as support vectors.

support vector sets) for dierent architectures are often almost identical. Interestingly, for RBF networks, the support vector sets also serve well as locations for Gaussian centers. We shall only briey describe the support vector algorithm with particular emphasis on its role as a mechanism for dening and locating Gaussian kernels in RBF networks. The interested reader should refer to the following papers for further details: [102] [14] [26].

The support vector algorithm is based in part on the idea of structural risk mini-mization [102], whose motivation can be summarized follows: In example-based function-approximation learning, the goal is to synthesize an function-approximation function that (1) maps input examples onto their respective output values, and (2) reasonably predicts output val-ues at input locations where no examples are available. This second property is commonly known as the learner's generalization ability. Together, one can quantify the above two constraints in terms of a risk measure that depends on the number of training examples and the VC-dimension [100] [101] [1] (i.e. complexity) of the approximation function class.

We refer the reader to [80] for a more detailed and mathematical treatment of structural risk minimizationand function-approximation learning.

When available training data is limited, one must constrain the learning machine's structural complexity in order to minimize risk and generalize reasonably. Structural risk minimizationchooses the function of \optimal" complexity from an approximation function class so that the resulting risk is minimal. The support vector algorithm essentially performs structural risk minimization on an approximation function class whose structure is a set of hyperplanes. For spherical Gaussian RBF networks, the algorithm minimizes risk by determining the number of Gaussian kernels that leads to best generalization. In our current RBF support vector algorithm formulation, we deal with a structure in which all Gaussian kernels must have the same xed user-specied variance.

We use the support vector algorithm to construct 10 RBF-based single-digit recognizers with xed Gaussian variances of ² = 38^:4, each trained to separate a given digit from

Classication Error Rate

USPS Database Original Classical RBF Support Vector RBF

Training (7291 patterns) 0.33% 1.73% 0.01%

Test (2007 patterns) 5.33% 6.73% 4.88%

Table 3.5: 10-class digit recognitionerror ratesfor three dierent system architectures. The rst system is based on the pattern identication framework within our proposed object and pattern class detection approach. The other two are the Gaussian RBF-based systems we trained, one with a classical RBF algorithm and the second with the support vector algorithm. The test results show that our proposed pattern identication framework compares reasonably well against classical digit recognition architec-tures, hence suggesting that it is indeed general enough to even model and capture pattern variations in problem domains that are essentially patternrecognitionin spirit.

the other 9. We experimented with several ² values and chose the setting with the best recognition result on the USPS test database. For each single-digit recognizer, the support vector algorithmselects a set of positive and negative example digit patterns from the USPS trainingdatabase as Gaussian kernel centers. Table 3.4 shows the number of support vectors selected for each recognizer. Notice that the same digit pattern can be chosen as a support vector for two or more digit recognizers. In Table 3.3, we show the number of distinct patterns from each digit class that have been selected as support vectors. We use these gures in the rst classical RBF system as an appropriate number of Gaussian kernels for each digit class.

3.5.4 Results

We ran our original 10-class digit recognizer and the two spherical Gaussian RBF-based systems described above on the USPS test digit database. For each test pattern, the arbitration procedure in all three systems simply returns the digit class whose recognizer gives the strongest response. Table 3.5 shows the 10-class digit recognition error rates for our original system and the two RBF-based systems. The results should also be compared with values achieved on the same test database by a ve-layer multi-layer perceptron net, 5^:1% [29], a two-layer multi-layer perceptron net, 5^:9%, and human performance, 2^:5% [16].

On the whole, the test results show that our proposed pattern identication framework compares reasonably well against classical digit recognition architectures in classifying digit patterns, hence suggesting that our framework is indeed general enough to even model and capture pattern variations in problem domains that are essentially pattern recognition in spirit.

Chapter 4

Dans le document MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I.T.R. No. January, (Page 122-125)