Intermediate-Level Vision - Fundamentals of Machine Vision

Fundamentals of Machine Vision

7.2.2 Intermediate-Level Vision

In the previous subsection we considered how low-level processing could be achieved with the use of local operations. In this subsection we show how the information gleaned during low-level processing can be used to extract global image structures. This is the domain of intermediate-level vision. In particular, features detected by low-level template matching operations are grouped together to dem-onstrate the presence of structures that reflect the existence of objects in the scene [Davies, 1997].

7.2.2.1 Boundary Pattern Analysis

One of the most basic intermediate-level vision techniques is that of tracking around the boundaries of objects, following edge segments that are detected by local operators. Once a connected boundary has been identified, the centroid of the object can be computed, and the boundary can be mapped into a one-dimensional polar (r, θ) representation called a centroidal profile (see Figure 7.3(a)). An examination of this profile allows for straightforward identification of circles, as these have constant values of r over the complete

1162_Frame_C07 Page 9 Friday, August 30, 2002 7:28 PM

7-10 Opto-Mechatronic Systems Handbook: Techniques and Applications

range of θ. Square objects are slightly more complex to identify because they have four corners that are quite clear from the profile, but the straight sides of the square map into sec θ curves, which can only be confirmed to represent straight lines by careful analysis (see Figure 7.3(b)). However, measurement of the size and orientation of the square is a simple process. Other shapes, such as rectangles and ellipses, are only a little more difficult to recognize, but it is not profitable to explore the situation further here [Davies, 1997].

Unfortunately, once objects are distorted by breakage or partial occlusion, or even by one object merely touching another, the situation becomes considerably more complex. Not only do the centroidal profiles FIGURE 7.3 Centroidal profile and its problems: (a) circle; (b) square; and (c) broken square, together with their centroidal profiles. In (c), it is difficult to get any useful information from the centroidal profile.

2π

0 θ

2π

0 θ

2π

0 θ

r (a)

(b)

(c)

1162_Frame_C07 Page 10 Friday, August 30, 2002 7:28 PM

Fundamentals of Machine Vision 7-11

change shape in obvious ways to match the shapes of the objects, but they also become distorted because the centroid—which is the reference point from which all boundary measurements are made—becomes shifted (see Figure 7.3(c)). At this point the centroidal profiles become difficult to interpret, and the simplicity of the technique is lost; it must be regarded as nonrobust. For this reason recourse must be made to techniques, such as the Hough transform, that are intrinsically robust.

7.2.2.2 The Hough Transform Approach

The Hough transform [Hough, 1962] is more robust than many more basic techniques because it concentrates on searching for evidence about the existence of objects and ignores any data that do not support this evidence. For instance, when searching for circles, it aims to accumulate evidence about them by building up votes at potential circle center positions. To this end it examines each edge segment in the image and works out where the center of a circle of radius R would be if that edge segment were part of the circle, and it accumulates that vote at a location in a separate image space called a parameter space. When all such votes have been included in the parameter space, the locations of any peaks are noted and taken as possible circle centers. Significant peaks are more likely to correspond to circle center locations than to random accumulations of data, but taking any one as a circle amounts to a hypothesis;

in principle, such hypotheses need to be checked by reference to other data in the original image.

The Hough transform calculates the position of candidate center locations by moving a distance R along the edge normal direction from any given edge segment (see Figure 7.4). Thus, it is important to use an edge detector that is capable of giving accurate edge-orientation information (see Section 7.2.1.1). When the value of R is unknown, several values of R can be tried, and the solutions corresponding to the highest peaks in parameter space are the ones most likely to correspond to circle centers and to correct values of R.

The Hough transform approach can be used for locating other shapes such as ellipses or even general shapes. The necessary methodology is covered in texts such as Davies [1997]. Here, there is only space to cover one other application of the Hough transform—that used for detecting straight lines in digital images. This case is especially useful in the context of mobile robots (see Section 7.3.4).

To detect straight lines, all the edge segments in the image are located; then, an extended line is constructed through each edge segment Ei with the same orientation θi as Ei, and its distance ρi from the origin is calculated; next, a vote is accumulated in an abstract parameter space with coordinates (ρi, θι); finally, peaks are sought in this parameter space, and the peak coordinates are taken as those of likely lines (or hypotheses of lines) in the original image space. Again, in principle all hypotheses should

FIGURE 7.4 Hough transform for circle. A circle is partly occluded by a rectangular object and is shown together with the Hough transform. Note that the straight sides of the rectangle lead to straight lines of votes indicated by the five dotted lines, which give low ridges rather than peaks; thus, the peak arising from the partial circle is readily detected.

R 1162_Frame_C07 Page 11 Friday, August 30, 2002 7:28 PM

7-12 Opto-Mechatronic Systems Handbook: Techniques and Applications

be checked by reference to the original image data, but those corresponding to the highest peaks are most likely to represent valid lines in the image.

7.2.2.3 The Graph-Matching Approach

The Hough transform is particularly suited to the robust location of objects from their edge features.

When objects are to be located from sparse, widely separated point features such as corners or small holes, it has been common to use an alternative approach called graph matching. This involves matching the graph joining the point features in the real image against the graph representing the ideal object template. In fact, it is necessary to match subgraphs in each case because (a) some points may be obliterated from the image by damage or occlusion, and (b) other points may appear in the image because of noise or irrelevant background, or indeed objects other than the ones being extracted. The maximal clique graph-matching approach [Bolles and Cain, 1982] works by searching for the set of correspon-dences between the two graphs that form the largest completely consistent subset; that is, all features of one subset correspond to all features of the other subset. This is checked by ensuring that all distances between feature points are identical in the two subsets. This is a highly rigorous technique: any incon-sistency suggests that the evidence being collated is unreliable and so it does not represent a totally valid hypothesis; thus, only smaller pairs of subsets can be matched together exactly. Figure 7.5 shows some planar brackets that have been identified and accurately located by this means.

FIGURE 7.5 Object recognition by graph matching: (a) planar bracket which forms a template for the recognition process; (b) two brackets of the specified type that have been located by graph matching. The method is not confused by the holes of a different type of bracket that also appears in the image. (Thanks are due to Dr. Simon Barker for permission to reproduce this figure from his Ph.D. thesis, Royal Holloway, University of London, 1989.)

1162_Frame_C07 Page 12 Friday, August 30, 2002 7:28 PM

Fundamentals of Machine Vision 7-13

While this approach is highly effective and about as robust as the Hough transform approach, it is computationally costly; the total computation increases approximately exponentially with the number of image and template features. As a result it works well with objects containing up to five or six features, but for objects with more than 10 or 12 features, alternative methods are generally sought.

Some time ago it was found [Davies, 1992] that the Hough transform approach could also be used for matching point features, with a marked reduction in the amount of computation, by estimating the position of a reference point on any object from each pair of feature points found on it; by recording votes at each such locations, peaks in an image-like parameter space could again provide hypotheses for the locations of the specific type of object being sought. For further details of this and the maximal clique technique, see, for example, Davies [1997].

Dans le document OPTO-MECHATRONIC SYSTEMS HANDBOOK (Page 192-196)