Variants of Interest Point Detectors - Advances in Pattern Recognition

Correspondences between model and scene image descriptors can only be estab-lished reliably if their interest points are detected accurate enough in both the scene and the training images. Therefore interest point detector design is an important issue if reliable object recognition is to be achieved with this kind of algorithms.

For example, the invariance properties of the detector are especially important in presence of illumination changes, object deformations, or change of camera viewpoint.

A considerable amount of alternatives to the DoG detector has been reported in literature. Basically, they can be divided in two categories:

• Corner-based detectors respond well to structured regions, but rely on the presence of sufficient gradient information.

• Region-based detectors respond well to uniform regions and are also suited to regions with smooth brightness transitions.

7.3.1 Harris and Hessian-Based Detectors

A broadly used detector falling into the first category is based on a method reported by Harris and Stephens [9]. Its main idea is that the location of an interest point is well defined if there’s a considerable brightness change in two directions, e.g., at a corner point of a rectangular structure.

Imagine a small rectangular window shifted over an image. In case the window is located on top of a corner point, the intensities of some pixels located within the window change considerably if the window is shifted by a small distance, regardless of the shift direction. Points with such changes can be detected with the help of the second moment matrix M consisting of the partial derivatives Ixand Iyof the image intensities (i.e., gray value gradient in x- and y-direction):

M= In order to reduce the sensitivity of the operator to noise each matrix element is usually smoothed spatially by convolution with a Gaussian kernel: For example, the spatially smoothed value of matrix element a at pixel

x, y

is obtained by convolv-ing the a’s of the pixels in the vicinity of

x, y

with a Gaussian kernel. Corner points are indicated if the cornerness function fc based on the Gaussian-smoothed second moment matrix MG

fc=det(MG)−k·tr(MG)²= ac−b²

−k·(a+c)² (7.5)

attains a local maximum. Here, tr(·)denotes the trace and det(·)the determinant of matrix MG. k denotes a regularization constant whose value has to be chosen empirically, in literature values around 0.1 have been reported to be a good choice.

The combined usage of the trace and the determinant has the advantage of making the detector insensitive to straight line edges.

Similar calculations can be done with the Hessian matrix H consisting of the second order derivatives of the image intensity function:

H= IxxIxy

IxyIyy

(7.6)

where the Ixx, Ixy, and Iyydenote the second-order derivatives in x- and y-directions.

Interest points are detected at locations where the determinant of H reaches a local maximum. In contrast to the Harris-based detector the Hessian-based detector responds to blob- and ridge-like structures.

7.3.1.1 Rating

These two detectors have the advantage that they can be calculated rather fast, but on the other hand they do neither determine scale nor orientation. In order to overcome this disadvantage, modifications have been suggested that incorporate invariance with respect to scale (often denoted as Harris–Laplace and Hessian–Laplace detec-tor, respectively, as they are a combination of the Harris- or Hessian-based detector with a Laplacian of Gaussian function (LoG) for scale detection, cf. [16], for example) or even affine transformations (often referred to as Harris-affine and Hessian-affine detector, respectively; see the paper of Mikolajczyk and Schmid [22]

for details). The price for invariance, however, is a considerable speed loss.

7.3.2 The FAST Detector for Corners

Another detector for corner-like structures is the FAST detector (Features from Accelerated Segment Test) proposed by Rosten and Drummond [27]. The basic idea behind this approach is to reduce the number of calculations which are necessary at each pixel in order to decide whether a keypoint is detected at the pixel or not as much as possible. This is done by placing a circle consisting of 16 pixels centered at the pixel under investigation. For the corner test only gray value differences between each of the 16 circle pixels and the center pixel are evaluated, resulting in very fast computations (cf. Fig.7.6).

Fig. 7.6 Demonstrates the application of the FAST detector for the dark center point of the zoomed region shown in the right. A circle is placed around the center point (marked red) and consists of 16 pixels (marked blue). For typical values of t the cornerness criterion (Equation7.7a) is fulfilled by all circle pixels, except for the pixel on top of the center pixel

In the first step, a center pixel p is labeled as “corner” if there exist at least n consecutive “circle pixels” c which are all either at least t gray values brighter than p or, as a second possibility, all at least t gray values darker than p:

Ic≥Ip+t for n consecutive pixels (7.7a) or Ic≤Ip−t for n consecutive pixels (7.7b) Ic denotes the gray value of pixel c and Ip the gray value of pixel p respectively.

After this step, a corner usually is indicated by a connected region of pixels where this condition holds and not, as desired, by a single pixel position. Therefore, a feature is detected by non-maximum suppression in a second step. To this end, a function value v is assigned to each “corner candidate pixel” found in the first step, e.g., the maximum value of n for which p is still a corner or the maximum value t for which p is still a corner. Each pixel with at least one adjacent pixel with higher v (8-neighborhood) is removed from the corner candidates.

The initial proposition was to choose n = 12, because with n = 12 additional speedup can be achieved by testing only the top, right, bottom, and right pixel of the circle. If p is a corner the criterion defined above must hold for at least three of them. Only then all circle pixels have to be examined. It is shown in [27] that it is also possible to achieve similar speedup with other choices of n. However, n should not be chosen lower than n=9, as for n≤8 the detector responds to straight line edges as well.

7.3.2.1 Rating

Compared to the other corner detectors presented above, Rosten and Drummond [27] report FAST to be significantly faster (about 20 times faster than the Harris detector and about 50 times faster than the DoG detector of the SIFT scheme).

Surprisingly, tests of Rosten and Drummond with empirical data revealed that the reliability of keypoint detection of the FAST detector is equal or even superior to other corner detectors in many situations.

On the other hand, FAST is more sensitive to noise (which stems from the fact that for speed reasons the number of pixels evaluated at each position is reduced) and does not provide neither scale nor rotation information for the descriptor calculation.

7.3.3 Maximally Stable Extremal Regions (MSER)

The maximally stable extremal region (MSER) detector described by Matas et al.

[20] is a further example of a detector for blob-like structures. Its algorithmic princi-ple is based on thresholding the image with a variable brightness threshold. Imagine a binarization of a scene image depending on a gray value threshold t. All pixels with gray value below t are set to zero/black in the thresholded image, all pixels with gray value equal or above t are set to one/bright. Starting from t = 0 the

threshold is increased successively. In the beginning the thresholded image is com-pletely bright. As t increases, black areas will appear in the binarized image, which grow and finally merge together. Some black areas will be stable for a large range of t. These are the MSER regions, revealing a position (e.g., the center point) as well as a characteristic scale derived from region size as input data for region descriptor calculation. Altogether, all regions of the scene image are detected which are sig-nificantly darker than their surrounding. Inverting the image and repeating the same procedure with the inverted image reveals characteristic bright regions, respectively.

7.3.3.1 Rating

In contrast to many other detectors, the regions are of arbitrary shape, but can be approximated by an ellipse for descriptor calculation. The MSER detector reveals rather few regions, but their detection is very stable. Additionally, the MSER detec-tor is invariant with respect to affine transformations, which makes it suitable for applications which have to deal with viewpoint changes.

7.3.4 Comparison of the Detectors

The well-designed SIFT method often serves as a benchmark for performance eval-uation of region descriptor-based object recognition. As far as detector performance is concerned, Mikolajczyk et al. [24] reported the results of a detailed empirical eval-uation of the performance of several region detectors (Mikolajczyk also maintains a website giving detailed information about his research relating to region detectors as well as region descriptors³). They evaluated the repeatability rate of the detectors for pairs of images, i.e., the percentage of detected interest regions which exhibit “suf-ficient” spatial overlap between the two images of an image pair. The repeatability rate is determined for different kinds of image modifications, e.g., JPEG compres-sion artefacts, viewpoint or scale changes, etc., and different scene types (structured or textured scenes); see [24] for details.

As a result, there was no detector that clearly outperformed all others for all vari-ations or scene types. In many cases, but by far not all, the MSER detector achieved best results, followed by the Hessian-affine detector. There were considerable dif-ferences between different detectors as far as the number of detected regions as well as their detected size is concerned. Furthermore, different detectors respond to different region types (e.g., highly structured or with rather uniform gray values).

This gives evidence to the claim that different detectors should be used in parallel in order to achieve best performance of the overall object recognition scheme: comple-mentary properties of different detectors increase the suitability for different object types.

3http://www.robots.ox.ac.uk/∼vgg/research/affine/index.html(link active 13 January 2010).

Another aspect is invariance: some of the detectors are invariant to more kinds of transformations than others. For example, the MSER detector is invariant to affine transformations. Compared to that, the FAST detector is only rotation invariant.

While the enhanced invariance of MSER offers advantages in situations where the objects to be detected actually have been undergone affine projection, it is often not advisable to use detectors featuring more invariance than actually needed.

Dans le document Advances in Pattern Recognition (Page 170-175)