Learning a SimilarityMeasure between New Patterns and the Distribution- Distribution-based Target Model

Generalizing the Object and Pattern Detection Approach

3.1 Overview of the General Object and Pattern Detection Approach

3.1.3 Learning a SimilarityMeasure between New Patterns and the Distribution- Distribution-based Target Model

The nal task in designing a pattern identication procedure is to derive a suitable simi-larity measure for comparing new image patterns with the distribution-based target model.

We approach this problem by: (1) dening a set of distance measurements that coarsely locates each test pattern with respect to the distribution-based target model in the chosen feature space; and (2) learning from examples an approximation function that combines the proposed distance measurements into an empirical similarity measure for matching test patterns against the target pattern class. During detection, one simply computes and thresholds this similarity measure to determine whether a new test pattern belongs to the

target pattern class. One can view the empirical similarity function as a class identity pre-dictor for input patterns. From a probabilistic standpoint, the similarity measure can also be interpreted as a conditional probability density,^P(^Class(^~^x) =^Targetj~^x), where^~^xis the input test pattern.

We describe rst the distance measurements we use for comparing test patterns against our Gaussian mixture target model. Ideally, we want a set of measurements that clearly reects the test pattern's location relative to the target pattern distribution as represented by the Gaussian mixture model. Such a set of measurements can serve as a highly discrimi-native set of input features for distinguishing between target and distractor image patterns.

The actual measurements we use is a vector of distances between the input test pattern and all the Gaussian model's cluster centroids in the chosen feature space. One way of inter-preting the vector of distances is to treat each distance measurement as the test pattern's actual displacement from some reference location along the target pattern distribution. The entire set of all distances can thus be viewed as a crude model-centered reference system that encodes an overall \dierence" notion between the test pattern and the entire target pattern class.

Our approach uses a distribution dependent 2-value metric to represent individual dis-tances between test patterns and each model centroid. For each Gaussian cluster in our mixture model, the rst distance component is a directionally dependent Mahalanobis dis-tance between the test pattern and the cluster centroid, in a vector sub-space spanned by the cluster's larger eigenvectors. This component computes a normalized pattern dierence along the main elongation directions of the local data distribution represented by the cur-rent cluster. The normalized distance measure penalizes pattern dierences less severely along the local data distribution, and more heavily against the local data distribution. This results in a distance value that better reects the notion of \dierence" between a test pattern and the local target pattern class. The second distance component is a standard Euclidean distance between the test pattern and its projection in the sub-space of larger eigenvectors. This distance component is a robust measure that accounts for pattern dier-ences in the smaller eigenvector directions, not captured by the rst distance component.

We use a directionally independent distance measure for the second component because we believe the eigenvectors spanning this orthogonal sub-space may have signicantly inaccu-rate eigenvalues. Normalizing pattern dierences in this sub-space can therefore lead to

meaningless if not adverse results. Our full set of measurements between each test pattern and the target model is thus a vector of several 2-value distances, whose dimensionality is twice the number of Gaussian clusters in our target model.

Given a set of 2-value distance measurements as input features, our approach uses a trained multi-layer perceptron net to implement a similarity function for comparing new test patterns against the target pattern class. We train the multi-layer perceptron net on distance measurements from a comprehensive but tractable set of target and distractor image pattern to perform the desired similarity computation. During training, we teach the net to output a similarity value of \1" for input distance measurements arising from target patterns, and a \0" otherwise. During detection, the net outputs a continuous similarity value in the [0^;1] range, which can be interpreted as the probability of the test pattern belonging to the target class. One of the most critical issues in any non-trivial learning task is the problem of obtaining high quality example patterns for training. In Chapter 4, we shall examine in greater detail some principled techniques of selecting only useful training examples from among redundant ones to keep the learning task tractable.

In closing, we briey discuss why we believe multi-layer perceptron nets are suitable, as an approximation function class, for combining our proposed intermediate distance mea-surements into an empirical indicator function for identifying target patterns. Multi-layer perceptron nets are non-local approximators that partition the input space with hyperplanes into regions of dierent output classes. We have argued that one can view the intermediate distance measurements we compute as a new model-centered co-ordinate system, where each axis represents the test pattern's distance from one of the model's several Gaussian clusters. In this new model-centered co-ordinate system, one can expect most target and background image patterns to occupy very distinctive regions that can be well separated by a small number of multi-layer perceptron hyperplanes. This is because in the original distribution-based modeling feature space, most target patterns tend to be located near the \positive" model clusters, which, in our new model-centered co-ordinate system, corre-sponds to a characteristic set of small distance values along certain axes. Most background patterns, on the other hand, tend to be located either near the \negative" clusters, or far away from the entire Gaussian model, which, in the new co-ordinate system, corresponds to a very dierent characteristic set of distance values. In fact, we shall see in the next section that for our human face detection example, even a single perceptron hyperplane does an

almost perfect job at partitioning the space of model-centered distance measurements into separate target and background pattern regions.

Dans le document MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I.T.R. No. January, (Page 91-94)