Randomized L A TEX Symbols - Further Analysis of a Detection

8 Detection of Sparse Models: Counting

Algorithm 8.1: Sparse Model—Counting Detector, Step I

8.4 Further Analysis of a Detection

8.5.2 Randomized L A TEX Symbols

Displays with randomized deformations of L^ATEX symbols are useful for investigating properties of the sparse models. The background has objects with many similarities to the one being detected, making the problem of particular interest. We use 32 randomly perturbed versions of the prototype, as shown in figure 8.10, for training.

The random perturbations involve random rotation of up to 15^◦, and±25% scaling independently in each coordinate. The 20 local features identified in the training step are given in the bottom panel of figure 8.10. The reference grid is approximately 30×30.

169 8.5 Examples

Figure 8.10 (Top) Thirty-two randomly perturbedE’s used for training. (Bottom) The 20 features of the model represented on the reference grid. The three reference points are shown in black.

Here it is possible to produce a detector from the prototype alone by simply picking some existing arrangement of edges extracted from the prototype in each region of the reference grid. Such a procedure may pick an edge or a particular arrangement that is very unstable and only present in the particular image of the prototype. This is avoided by producing a small randomly perturbed training set as above, in which case unstable arrangements are eliminated.

Some detections in randomly generated scenes containing a large number of ran-domly deformed L^ATEX symbols are shown in figure 8.11. Again, these are obtained from six different resolutions. There are at most three instances of anEin each im-age. The top panels show the outcome of steps I and II, and in the bottom panels, the outcome of running the classifier of object versus false positives, based on registered edges. In the bottom panels, only the triangle vertices are shown to enhance the view of the underlying symbols. The use of a classifier among different object classes, detected by the same model, is discussed in chapter 10. In the top row of figure 8.12 is an example of a deformable contour initialized with the scale and location of the detection, then the final detection of the deformable contour algorithm. In the bottom row, we show a similar experiment with the deformable curve algorithm.

Very similar results are obtained for any of the symbols in the database. However, certain symbols are more “generic” in their shape and share many features with a

Figure 8.11 (Top) Three examples of detections of theEmodel, steps I and II. Detection triangles are shown. (Bottom) Results after applying the classifier of object vs. false positives.

Triangles shown as three points for better viewing.

substantial number of other symbols. The best example being the digit 0, which has much in common with many round symbols. The representations trained for these symbols usually produce more false positives. Although a drawback from the point of view of accurate detection, we turn this vice into a virtue in chapter 10, where we discuss ways to integrate detection and recognition.

The random L^ATEX displays are artificial and do not represent all the complexities of real scenes. However, in many respects these scenes force us to deal with some important issues such as clutter, confusing classes, and detection at a variety of poses.

The background clutter in these images contains many components and parts of the detected object itself. This reinforces the point that detection cannot rely on individual local features and forces us to think about detecting configurations. The existence of objects very similar to the one we want to detect forces us to deal with recognition as well as detection. It seems there is still much to be explored in this controlled synthetic context.

171 8.5 Examples

Figure 8.12 (Top row) A deformable contour algorithm initialized by a detection in the middle panel of figure 8.11. On the left is the initial contour. On the right the final contour.

(Bottom row) The result of the deformable curve algorithm initialized the same way.

8.5.3 3D Object—Range of Viewing Angles

Detecting a rigid 3D object from multiple views is a complex problem. Some of the original work in the field attempted to construct 3D models and match them to data. Many of these models are described in Grimson (1990) and Haralick and Shapiro (1992). We take the view-basedapproach in which significantly different views of the object are to be considered as different 2D objects that are linked together symbolically. Much as if in the L^ATEX scene we set out to detect asubsetof symbols

Figure 8.13 A sample of the synthetically generated training images for the clip, and the 20 features identified in training.

as opposed to a particular one. Thus detection involves a sequence of searches for each of these possible objects or views.

Here we focus on developing a sparse model to detect an object from a limited range of views. The problem then reduces to the detection of anonrigidtwo-dimensional object and can be addressed with the tools we have developed. The object model is trained as the L^ATEX symbols using only one prototype image, which is deformed 32 times with random affine transformations. A sample of these deformed images is shown in figure 8.13, together with the model, which is constructed of 20 local features of complexitynr =3 on a reference grid of approximately 40×40. Note that the local features describe a rectangle of certain aspect ratio to which are attached, on the top and the bottom, two additional wirelike rectangular structures.

The outcomes of the counting detector are shown on images in figure 8.14. No final classifier has been implemented. The algorithm is run at the six resolutions. The detections appear to be invariant to changes in viewing angle around the vertical axis, clutter, occlusion, variable lighting, and articulation of the handles, despite the fact that no such examples were shown in training. Here we see the advantage of treating a rigid object in the same way as deformable objects. In the discussion section and in chapter 10, we offer some preliminary ideas on the concept of detecting 3D objects as unions of their 2D views. How can this be integrated into a more general scheme of object detection and recognition and how to avoid the combinatoric explosion of representing multiple objects at multiple views?

173 8.5 Examples

Figure 8.14 Eight examples of detections of steps I and II combined.

Dans le document 2D Object Detection and Recognition (Page 187-192)