• Aucun résultat trouvé

Introduction to the Polly system

2.6 Other visually guided mobile robots

2.6.3 Outdoor road following

A large segment of the work in visual navigation is devoted to the problem of outdoor road following. A great deal of early road following work involved the construction of explicit 3D models of the road. A large amount of work has been done on recovering the three dimensional structure of visible road fragments from a single monocular view. These systems use the extracted contours of the road, together with some set of a priori constraints of road shape to recover the shape of the road. Waxman, LeMoingne, Davis, Liang, and Siddalingaiah describe a system for reconstructing road geometry by computing vanishing points of edge segments [111]. Turk, Morgenthaler, Gremban, and Marra used a simpler system based on the assumption that the vehicle and the visible portion of the road rested on the same at plane [105]. This was called the at-earth model. The at-earth model allowed much faster processing and was sucient for simple roads. Later, they substituted the constraint that the road have constant width for the constraint that the road lie in a plane. This allowed the system to handle roads which rose and fell and so it was termed the hill and dale model. Because the hill and dale model cannot account for curved roads, DeMenthon proposed an algorithm based on the zero-bankconstraint which allows both hills and turns but does no allow the road to bank (turn about its own axis) [33].

Much of the work in outdoor road following has been done at Carnegie Mellon.

Early work at CMU was done by Wallace, et. al. [110][109][108]. They use a sensor-based algorithm driven in image coordinates for motor control. They take the approach of extracting road edges rather than segmenting the road and using the slicing technique however. They also make extensive use of color information [108].

Using the CMU Warp parallel processor, a 10-100 MIPS oating-point processor optimized for low-level vision, they have reported speeds up to 1.08 km/hour using a servo-loop time of one frame every three seconds. More recently, Crisman [30]

implemented a color-based road tracker which was able to properly distinguish road pixels from other pixels, even in the presence of complicated shadows. Pomerleau [81] has described a neural network that eciently learns to follow roads using low resolution images.

Arkin has reported an architecture based on the schema concept [6]. A schema is a description of what action is appropriate to a given situation, similar to my notion of a tactical routine. Arkin dened a number of motor-schemas for moving along a path and avoiding obstacles which, when run concurrently, were able to navigate about the UMass campus.

Dickmanns et al. [34] has described a number of road following systems which can drive on the autobahn at speeds of up to 100km/hour. The systems use Kalman ltering to eciently search for road edges within small windows of the image. By

36

using multiple processors, their system is able to process images at 25 frames per second.

37

Chapter 3

Lightweight vision

In this chapter I will argue that vision can be very cheap and suggest some general techniques for simplifying visual processing. This is not an argument that all visual tasks can be solved cheaply. My goal is to convince the reader, particularly the reader who is not a vision researcher, that cheap real-time vision systems are feasible for a variety of tasks. This is, therefore, a theory of how to build task-specic vision systems as cheaply as possible, not a theory of how the human vision system works or of how to build a general programmable vision system. Nevertheless, the issues raised in building task specic vision systems overlap a great deal with recent discussions on the nature of general vision systems.

3.1 Background

Much of the previous work in vision has focused on the construction of modules of a hypothesized general vision system. Early work viewed vision as a domain-independent system for creating monolithic models of the outside world. Proposals vary as to the nature of the models, and the processing performed to create them, but the general approach is to use a series of transformations of the sensory input to move from what I will call the \surface structure" of the input to its \deep structure."

I borrow the terms from Chomsky [27], who used them to refer to dierent levels of structure within sentences. I will use them as relative terms. For example, one might assume that a vision system takes images, transforms them into edge maps, then into depth maps via stereo matching, and nally into collections of geometric descriptions of individual objects. One often thinks of the objects as being \what's really out there," and of the image as being only a shadow; that the objects are the true reality and the images are mere appearance. Thus the object descriptions capture \what's really out there" better than the images, and so we can think of the object descriptions as the \deep structure" which is buried beneath the surface of the images.

38

In vision, the deep structure people have traditionally tried to extract is the de-tailed geometry of the environment (see Aloimonos and Rosenfeld [5] for a historical survey, or Marr [68] or Feldman [38] for examples of specic proposals). This is some-times referred to as the \reconstruction approach." (see Aloimonos and Rosenfeld [5]). The approach has a number of appealing features. First, a complete description of the geometry of the environment, suitably annotated with other surface infor-mation such as color, seems to be a convenient form from which to compute any information about the environment which may be required. Another appealing char-acteristic is that is it fully domain independent, in the sense that any information needed about the environment can be derived from the model, provided the model is suciently detailed.