MASSACHUSETTS INSTI TUTE OF TECHNOLOGY ARTI FI CI AL I NTELLI GENCE LABORATORY A.I . Memo No. 1376 Se pte mbe r, 1992

(1)

ARTIFICIALINTELLIGENCELABORATORY

A.I. MemoNo. 1376 Septemb er,1992

Localization and Positioning using

Combinations of Model Views

Ehud Rivlinand Ronen Basri

Abstract

Amethodfor localization and p ositioning inan indoor env ironment is presented. Localizati

is the act of recognizing the env ironment, and positioning is the act of computing the e

coordinates of a rob ot in the env ironment. The method is based on representing the sce

as a set of 2D v iews and predicting the app earance of nov el v iews by linear combination

the model v iews. The method accurately approx imates the app earance of scenes under weak

p ersp ectiv e projection. Analy sis of this projection as well as ex p erimental results demo

that in many cases this approx imation is sucient to accurately describ e the scene. W

orthographic approx imationis invalid, either a larger number of models canb e acquiredo

iterativ e solution to account for the p ersp ectiv e distortions canb e employ ed.

The presented method has sev eral advantages ov er ex isting methods. It uses relativ ely

representations, the representations are 2Drather than3D, and localization canbe done

a single 2Dv iewonly. The same principal method is applied b oth for the localization as

as the p ositioning problems, and a simple algorithmfor repositioning, the task of returni

a prev iouslyv isitedp osition dened bya single v iew, is deriv ed fromthis method.

c

Massachusetts Institute of Technology(1992)

This report describ es research done at the Massachusetts Institute of Technology withi

Articial Intelligence Lab oratory and the McDonnell-PewCenter for Cognitiv e Neuroscie

Supp ort for the lab oratory 's articial intelligence researchis prov idedinpart bythe Ad

Research Projects Agency of the Department of Defense under Oce of Naval Research con-

tract N00014-91-J-4038. Ronen Basri is supported by the McDonnell-Pewand the Rothchild

p ostdoctoral fellowships. Ehud Riv lin is at the Univ ersityof Mary land, College Park , MD

(2)

Basic task s inautonomous rob ot nav igationare localizationandp ositioning. Localization

act of recognizingthe env ironment, that is, assigningconsistent lab els todierent locat

positioningis the act of computing the coordinates of the rob ot inthe env ironment. Positi

is a task complementaryto localization, in the sense that p osition(e.g., \1.5meters no

of table T") is often sp ecied in a place-specic coordinate sy stem(\in room911"). I

pap er we suggest a method of b oth localizationand p ositioning using v ision alone. Avar

of the p ositioning problem, referred to as repositioning, inv olv ing the return to a prev

v isitedplace is also discussed.

Prev ious studies have ex aminedthe problems of localizationandp ositioningunder a v ar

of conditions, dened by the k ind of sensor(s) employ ed, the nature of the env ironment,

the representations used. We can distinguish b etweenactiv e and passiv e sensing, indoor

outdoor nav igation task s, and metric and top ological representations. The metric appr

attempts to utilize a detailed geometric description of the env ironment, while the top o

approachuses a more q ualitativ e description including a graph withnodes representing p

andarcs representingseq uences of actions that wouldresult inmov ing the rob ot from one n

to another.

Inthe pap er we consider a rob ot that uses apassiv e sensor, v ision, inanindoor env iron

The env ironment cannot be changedbythe rob ot to improve its p erformance;neither b eacon

nor oor or wall mark ings are employ ed. The pap er addresses both the localization and t

p ositioning problems. Solutions to these problems are presented based on object recogn

techniq ues. The method, based on the linear combinations scheme of [17], represents sce

by sets of their 2Dimages. Localization is achiev ed by comparing the observ ed image

linear combinations of model v iews, and the position of the rob ot is computed by analy z

the coecients of the linear combination that aligns the model to the image. Also, a sim

\q ualitativ e" solution to the rep ositioning problemusing the linear combinations sch

presented.

The rest of the pap er is organizedas follows. The nex t sectiondescribes the localizat

p ositioningproblems andsurv ey s prev ious solutions. The methodof localizationandposit

using linear combinations of model v iews is describ ed inSection3. The method assumes we

p ersp ectiv e projection. Aniterativ e scheme to account for p ersp ectiv e distortions is p

in Section 4. An analy sis of the error resulting fromthe projection assumption is prese

Section 5. Constraints imp osed on the motion of the rob ot as a result of sp ecial propert

indoor env ironments canb e used to reduce the complex ityof the methodpresentedhere. Th

topic is covered on Section 6. Ex p erimental results follow.

(3)

Localization and p ositioning fromv isual input are dened in the following way : Giv en a

miliar env ironment, identify the observ ed env ironment, and then nd y our p osition in t

env ironment. Localization resembles the task of object recognition, with objects repla

scenes. Once localization is accomplished, p ositioning can b e p erformed.

One problema sy stemfor localization and p ositioning should address is the v ariabil

images due to v iewp oint changes. The inex actness of practical sy stems mak es it dicult f

rob ot to return to a specied position on subseq uent v isits. The v isual data available

rob ot betweenv isits v aries inaccordance withthe v iewing p ositionof the rob ot. Alocal

sy stemshouldb e able to recognize scenes fromdierent p ositions and orientations.

Another problemis that of changes inthe scene. At subseq uent v isits the same place m

lookdierent due tochanges inthe arrangement of the objects, the introductionof newobj

and the removal of others. In general, some objects tendto be more static than others. W

chairs and book s are often moved, tables, closets, and pictures tend to change their p o

muchless, andwalls are almost guaranteedtobe static. Static cues naturallyare more re

than mobile ones. Conning the sy stemto static cues, however, may in some cases result

failure to recognize the scene due to insucient cues. The sy stemshould therefore attem

relyon static cues, but shouldnot ignore the dy namic cues.

Solutions to the problem of localizationfromv isual data req uire a large memoryandhe

computation. Ex isting sy stems often try to reduce this cost by using sparse representa

and by ex ploiting contex tual information. Sparse representations are introduced in [1

Mataric [10] represents scenes as seq uences of landmark s (suchas walls, doors, etc.) ex

by tracing the b oundaries of the scene using a sonar and a compass. Metric information

and b etween the landmark s is not stored. Sarachik [14] recognizes a roomby its dimensi

which are measured by identify ing and locating the top corners of the roomusing stereo d

(obtainedfromfour cameras). Inbothcases the representationis v erysparse, andthe sc

therefore oftenambiguous.

Richer representations are usedin[3, 5] where higher success rates are reported. Bra

[3] represents the scene by an occupancy table, a 2Dbit array which contains a 1 at ev

location occupied by some object. The table is constructed by tak ing stereo pictures cov

360

fromthe middle of the roomand projecting the obtained 3Ddata onto the oor. The

method suers fromloss of informationdue to the projection onto the oor.

Engelson et al. [5] represent the scene by a set of invariant \signatures". Asignat

usuallycomposedof low-resolutiongray -lev el or range data obtained by blurring animag

set of signatures tak enfromdierent v iewp oints are stored. Ascene is recognizedif the

encounters a signature similar to one of the stored signatures.

Sy stems that use the full information prov ided by the image (e.g., [6, 12]) usuall

on contex tual information to av oid scanning all the models in the memory and to reduce t

computational cost of comparing a model to the image. The sy stemfollows a predetermine

(4)

a v ericationproblem. Pathcontinuity inmanycases is essential, andthe so-called \dro

problemis not addressed. The emphasis in these sy stems is on p ositioning, which is use

k eep the rob ot on the path. It is ty pical for these sy stems (e.g., [1, 6, 12]) to use a

model of the env ironment.

Onoguchi et al. [12], amongothers, represent the env ironment byaset of landmark s sele

frompairs of stereoimages bya humanop erator. These landmark s are transformedbyanimag

processing programwhich is designed so as to identify the sp ecic landmark using sp e

ex traction instructions (such as what features to lookfor and at what locations). Local

is achiev ed by apply ing the ex traction procedure sp ecied for the nex t landmark . Onc

landmark is identied, the p osition of the robot relativ e to that landmark is determin

comparing the dimensions of the observ edlandmark with those of the stored model.

The methodpresented in this pap er represents the env ironment using a set of edge map

Localization and positioning are achiev ed by comparing images of the env ironment to lin

combinations of the model v iews. The method uses rich v isual information to represent

scene. The sy stemis ex ible. In many cases it is capable of recognizing its locatio

one image only (360

cov erage is not req uired). When one image is not sucient, additiona

images can b e acq uired to solv e the localization problem. Contex t can b e used to determ

the order of comparison of the models to the observ ed image andto increase the condence

a giv enmatch, but contex t is not essential: the sy stemcanalso, byp erforming more ex ten

computations, solv e the \drop-o" problem.

3 The Method

The problems of localization and object recognitionare similar in manyway s. Both probl

req uire the matching of v isual images to stored models, either of the env ironment or of

observ edobjects. Bothproblems face similar diculties, suchas v ary ingilluminationcon

and changes in appearance due to v iewp oint changes. Similar methodologies therefore can

usedfor solv ing b oth problems.

Aparticular application of an object recognition scheme, the Linear Combinations (

scheme [17], tothe problems of localizationandp ositioning is discussedb elow. The env ir

is representedin this scheme bya small set of v iews obtainedfromdierent v iewp oints an

the corresp ondence b etween the v iews. Anov el v iewis recognized by comparing it to lin

combinations of the stored v iews. Positioning is achieved by recovering the position

camera relativ e to its position in the model v iews fromthe coecients of the aligning

combination. Inthe rest of this sectionwerev iewthe linear combinations approachanddes

its application to b oth localization and p ositioning. The section concludes with a solu

the problemof rep ositioning, that is, the problemof returning to a prev iouslyv isited p

by\lock ing" into an image acq uired inthat position.

(5)

The problemof localizationis denedas follows: giv enP, a 2Dimage of aplace, andM, a s

storedmodels, nda model M i

2Msuchthat P matches M i

. Localizationis the recognition

of a place. It can therefore p otentially b enet fromusing an object recognition method

Acommon approach to handling the problemof recognition fromdierent v iewp oints is b

comparing the storedmodels to the observ edenv ironment after the v iewpoint is recovered

comp ensated for. This approach, called alignment, is used in a number of studies of ob

recognition[2, 7, 8, 9, 15, 16]. We applythe alignment approachto the problem of localiz

The sy stemdescribedb elowuses the \Linear Combinations"(LC) scheme, whichwas suggested

byUllmanand Basri [17].

We b eginwithabrief rev iewof the LCscheme. LCis denedas follows. Giv enanimage, we

construct two v iewv ectors from the feature p oints inthe image, one contains the x-coordi

of the points, and the other contains the y -coordinates of the p oints. An object (in our

the env ironment) is modeledbya set of suchv iews, where the p oints inthese v iews are orde

in corresp ondence. The appearance of a nov el v iewof the object is predicted by apply

linear combinations to the storedv iews. The predicted app earance is thencompared with

actual image, and the object is recognized if the two match. The advantage of this meth

is twofold. First, v iewer-centered representations are used rather than object-center

namely, models are composedof 2Dv iews of the observ edscene; second, nov el app earances a

predictedina simple andaccurate way(under weak p erspectiv e projection).

Formally, giv enP, a 2Dimage of a scene, andM, a set of storedmodels, the objectiv e i

nda model M i

2 Msuchthat P = P

k

j=1

j M

i

j

for some constants

j

2 R. It has b eenshown

that this scheme accurately predicts the app earance of rigid objects under weak p ersp e

projection (orthographic projection and scale). The limitations of this projection mo

discussedlater in this pap er.

More concretely, let p

i

=(x

i

;y

i

; z

i

), 1 i n, b e a set of n object p oints. Under weak

p ersp ectiv e projection, the p osition p 0

i

=(x 0

i

; y 0

i

) of these p oints inthe image are giv enby

x 0

i

= sr

11 x

i +sr

12 y

i +sr

13 z

i +t

x

y 0

i

= sr

21 x

i +sr

22 y

i +sr

23 z

i +t

y

(1)

where r

ij

are the comp onents of a 323 rotationmatrix , ands is a scale factor. Rewriting t

in v ector eq uation formwe obtain

x 0

= sr

11 x+sr

12 y+sr

13 z+t

x 1

y 0

= sr

21 x+sr

22 y+sr

23 z+t

y

1 (2)

where x; y; z; x 0

; y 0

2 R

n

are the v ectors of x

i , y

i , z

i , x

0

i and y

0

i

coordinates resp ectiv ely, and

1=(1; 1; :: : ; 1). Conseq uently,

x 0

; y 0

2 spanfx; y; z; 1g (3)

(6)

or, inother words,andx y b elong to a four-dimensional linear subspac.e(ofNotRice that

z 0

, the v ector of depthcoordinates of the projected p oints, also b elongs to this subspac

fact is used in Section 4 b elow.) Afour-dimensional space is spanned by any four line

indep endent v ectors of the space. Two v iews of the scene supply four such v ectors [13,

Denote byx

1 , y

1 and x

2 , y

2

the locationv ectors of the n p oints inthe two images; thenther

ex ist coecients a

1

; a

2

; a

3

; a

4 and b

1

; b

2

; b

3

; b

4

suchthat

x 0

= a

1 x

1 +a

2 y

1 +a

3 x

2 +a

4 1

y 0

= b

1 x

1 +b

2 y

1 +b

3 x

2 +b

4

1 (4)

(Note that the v ector y

2

already depends on the other four v ectors.) Since R is a rotatio

matrix , the coecients satisfythe following two q uadratic constraints:

a 2

1 +a

2

2 +a

2

3 0b

2

1 0 b

2

2 0 b

2

3

=2(b

1 b

3 0 a

1 a

3 )r

11 +2(b

2 b

3 0 a

2 a

3 )r

12

a

1 b

1 +a

2 b

2 +a

3 b

3 +(a

1 b

3 +a

3 b

1 )r

11 +(a

2 b

3 +a

3 b

2 )r

12

=0 (5)

Toderiv etheseconstraints the transformationb etweenthe twomodel v iews shouldbe recove

This can b e done under weak p ersp ectiv e using a third image. Alternativ ely, the constra

can be ignored, inwhichcase the sy stemwouldconfuse rigidtransformations with ane on

This usually does not prev ent successful localization since generally scenes are fairly

fromone another.

ALCscheme for the problemof localization is as follows: The env ironment is model

by a set of images with corresp ondence between the images. For ex ample, a sp ot can b

modeled bytwo of its corresp onding v iews. The corresponding q uadratic constraints maya

b e stored. Localization is achiev ed byrecov ering the linear combination that aligns the

to the observ ed image. The coecients are determined using four model p oints and the

corresp onding image p oints by solv ing a linear set of eq uations. Three p oints are sucie

determine the coecients if the q uadratic constraints are also considered. Additional

maybe usedto reduce the eect of noise.

The LCscheme uses v iewer-centered models, that is, representations that are comp os

of images. It has a number of advantages ov er methods that build full three-dimensio

models to represent the scene. First, byusingv iewer-centeredmodels that cov er relativ el

transformations we av oidthe need to handle occlusions inthe scene. If fromsome v iewpo

the scene appears dierent b ecause of occlusions we utilize a newmodel for these v iewp o

Second, v iewer-centeredmodels are easier to buildand to maintainthanobject-centered

The models containonlyimages andcorresp ondences. Bylimiting the transformationbetwe

the model images one can nd the corresp ondence using motion methods. If large p ortions

the env ironment are changedb etweenv isits anewmodel canbe constructedbysimplyreplaci

old images with newones.

One problemwith using the LCscheme for localizationis due to the weak p ersp ectiv e a

prox imation. Incontrast withthe problemof object recognition, where we cangenerallyas

(7)

ment surrounds the robot and p ersp ectiv e distortions cannot b e neglected. The limitat

of weak p erspectiv e modeling are discussed b oth mathematically and empirically in the n

two sections. It is shown that in manypractical cases weakpersp ectiv e is sucient to e

accurate localization. The main reason is that the problemof localization does not r

accurate measurements in the entire image; it only req uires identify ing a sucient numb e

sp ots to guarantee accurate naming. If these sp ots are relativ ely close to the center

image, or if the depth dierences they create are relativ ely small (as inthe case of loo

a wall when the line of sight is nearly p erp endicular to the wall), the p ersp ectiv e dist

are relativ ely small, and the sy stemcan identify the scene with high accuracy. Also,

relatedbya translation parallel to the image plane forma linear space ev enwhenpersp e

distortions are large. This case and other simplications are discussedin Section 6.

Byusing weakpersp ectiv e we av oidstabilityproblems that freq uentlyoccur inpersp ec

computations. We can therefore compute the alignment coecients bylook ing at a relativ

narroweldof v iew. The entire scheme canb e v iewedas anaccumulativ e process. Rather th

acq uiring images of the entire scene and comparing themall to a full scene model (as in

we recognize the scene image byimage, sp ot bysp ot, until we accumulate sucient conv inci

information that indicates the identityof the place.

When p ersp ectiv e distortions are relativ ely large and weak p ersp ectiv e is insucie

model the env ironment, two approaches can b e used. One p ossibility is to construct a la

number of models so as to k eep the p ossible changes b etweenthe familiar and the nov el v ie

small. Alternativ ely, an iterativ e computation can b e applied to comp ensate for these d

tions. Suchan iterativ e methodis described inSection4.

3.2 Positioning

Positioning is the problemof recov ering the ex act p osition of the rob ot. This position

sp eciedina x edcoordinate sy stem associatedwiththe env ironment (i.e., room coordina

or it canb e associatedwithsome model, inwhichcase locationis ex pressedwithresp ect t

p osition fromwhichthe model v iews were acq uired. In this section we discuss an applica

of the LCscheme to the p ositioning problem.

The idea is the following. We assume a model comp osed of two images, P

1

and P

2

; their

relativ e p osition is giv en. Giv en a nov el image P 0

; we rst align the model with the image

(i.e., localization). Byconsideringthe coecients of the linear combinationthe robot'

relativ e to the model images is recov ered. To recov er the absolute position of the rob ot

roomthe absolute positions of the model v iews should also b e prov ided.

Assuming P

2

is obtainedfrom P

1

byarotationR, translationt =(t

x

; t

y

), andscaling s, the

coordinates of a point inP 0

; (x 0

; y 0

), canb e writtenas linear combinations of the corresp onding

model p oints in the following way :

x 0

= a

1 x

1 +a

2 y

1 +a

3 x

2 +a

4

(8)

y = b

1 x

1 +b

2 y

1 +b

3 x

2 +b

4

(6)

Substituting for x

2

we obtain

x 0

= a

1 x

1 +a

2 y

1 +a

3 (sr

11 x

1 +sr

12 y

1 +sr

13 z

1 +t

x ) +a

4

y 0

= b

1 x

1 +b

2 y

1 +b

3 (sr

11 x

1 +sr

12 y

1 +sr

13 z

1 +t

x ) +b

4

(7)

and rearranging these eq uations we obtain

x 0

= (a

1 +a

3 sr

11 )x

1 +(a

2 +a

3 sr

12 )y

1 +(a

3 sr

13 )z

1 +(a

3 t

x +a

4 )

y 0

= (b

1 +b

3 sr

11 )x

1 +(b

2 +b

3 sr

12 )y

1 +(b

3 sr

13 )z

1 +(b

3 t

x +a

4

) (8)

Usingthese eq uations wecanderiv e all the parameters of the transformationbetweenthe mo

and the image. Assume the image is obtained by a rotation U, translation t

n

, and scaling s

n .

Using the orthonormalityconstraint we can rst deriv e the scale factor

s 2

n

= (a

1 +a

3 sr

11 )

2

+(a

2 +a

3 sr

12 )

2

+(a

3 sr

13 )

2

= a 2

1 +a

2

2 +a

2

3 s

2

+2a

3 s(a

1 r

11 +a

2 r

12

) (9)

FromEq uations (8) and (9), byderiv ing the components of the translation v ector, t

n

, we can

obtainthe p ositionof the rob ot in the image relativ e to its p ositionin the model v iews

1x = a

3 t

x +a

4

1y = b

3 t

y +b

4

(10)

1z = f( 1

s

n 0

1

s )

Note that 1z is deriv edfromthe change inscale of the object. The rotationmatrixU betw

P

1 and P

0

is giv enby

u

11

= a

1 +a

3 sr

11

s

n

u

12

= a

2 +a

3 sr

12

s

n

u

13

= a

3 sr

13

s

n

u

21

= b

1 +a

3 sr

21

s

n

u

22

= b

2 +a

3 sr

22

s

n

u

23

= b

3 sr

23

s

n

(11)

As was alreadymentioned, the p ositionof the rob ot is computedhere relativ e tothe p ositi

the camera whenthe rst model image, P

1

, was acq uired. 1x and1z represent the motionof

the rob ot fromP

1 to P

0

; andthe rest of the parameters represent its 3Drotationandelev ati

Toobtainthe relativ e positionthe transformationparameters betweenthe model v iews, P

1 and

P

2

, are req uired.

3.3 Rep ositioning

An interesting v ariant of the p ositioning problem, referred to as repositioning, is de

follows. Giv en an image, called the target image, p ositiony ourself in the location from

(9)

this image was observ ed.One wayto solv e this problemis to ex tract the ex act p osition from

which the target image was obtainedanddirect the rob ot to that p osition. In this sectio

are interested in a more q ualitativ e approach. Under this approach p osition is not compu

Instead, the rob ot observ es the env ironment and ex tracts only the direction to the ta

location. Unlik e the ex act approach, the method presented here does not req uire the reco

of the transformationb etweenthe model v iews.

We assume we are giv en with a model of the env ironment together with a target image.

The rob ot is allowed to tak e newimages as it is mov ing towards the target. We assume a

horizontallymov ingplatform. (Inother words, we assume three degrees of freedom rather t

six ; the rob ot is allowed to rotate around the v ertical ax is and translate horizontall

validityof this constraint is discussed in Section 6.) Belowwe giv e a simple computatio

determines a pathwhichterminates inthe target location. At eachtime stepthe robot acq u

a newimage and aligns it with the model. By comparing the alignment coecients with the

coecients for the target image the rob ot determines its nex t step. The algorithmis di

into two stages. In the rst stage the rob ot x ates on one identiable p oint and moves al

a circular path around the x ation p oint until the line of sight to this point coincide

the line of sight to the corresp onding p oint in the target image. Inthe secondstage the

advances forwardor retreats back warduntil it reaches the target location.

Giv en a model composed of two images, P

1

and P

2 , P

2

is obtained fromP

1

by a rotation

ab out the Y-ax is by an angle , horizontal translation t

x

, and scale factor s. Giv ena target

image P

t , P

t

is obtained fromP

1

by a similar rotation byan angle , translation t

t

, and scale

s

t

. Using Eq . (4) the p ositionof a target point (x

t

; y

t

) can b e ex pressedas

x

t

= a

1 x

1 +a

3 x

2 +a

4

y

t

= b

2 y

1

(12)

(The rest of the coecients are zero since the platformmov es horizontally.) Infact, the

cients are giv en by

a

1

= s

t

sin(0 )

sin

a

3

= s

t sin

ssin

(13)

a

4

= t

t 0

t

x s

t sin

ssin

b

2

= s

t

(The derivationis giv enin the App endix .)

At ev erytime stepthe rob ot acq uires animage andaligns it withthe ab ov e model. Assum

that image P

p

is obtained as a result of a rotation by an angle , translationt

p

, and scale s

p .

1

Thisproblem can b e consideredas a variant of the homingproblem. Adiscussion of the general homing

problemwitha\signature-based"solutioncanb e foundin[11].

(10)

p p

x

p

= c

1 x

1 +c

3 x

2 +c

4

y

p

= d

2 y

1

(14)

where the coecients are giv enby

c

1

= s

p

sin(0 )

sin

c

3

= s

p sin

s sin

(15)

c

4

= t

p 0

t

x s

p sin

s sin

d

2

= s

p

The step p erformedbythe robot is determined by

= c

1

c

3 0

a

1

a

3

(16)

That is,

=

s sin(0 )

sin 0

s sin(0 )

sin

=ssin(cot 0 cot ) (17)

The rob ot should nowmove so as to reduce the absolute value of . The direction of moti

dep ends on the sign of . The rob ot can deduce the direction by mov ing slightly to the s

and check ing if this motion results in an increase or decrease of . The motion is den

follows. The rob ot mov es to the right (or to the left, depending onwhichdirectionreduce

bya step 1x.

Anewimage P

n

is nowacq uired, and the x ated p oint is located in this image. Denot

its newp ositionbyx

n

. Since the motionis parallel to the image plane the depthvalues of t

p oint in the two v iews, P

p

andP

n

, are identical. We nowwant to rotate the camera so as to

returnthe x atedp oint to its original p osition. The angle of rotation, , can b e deduce

the eq uation

x

p

=x

n

cos +sin (18)

This eq uation has two solutions. We chose the one that counters the translation (namely

translation is to the right, the camera should rotate to the left), and that k eeps the a

rotation small. In the nex t time step the newpicture P

n

replaces P

p

and the procedure is

rep eated until vanishes. The resulting path is circular aroundthe point of focus.

Once the rob ot arriv es at a position for which =0 (namely, its line of sight coin

withthat of the target image, and =) it should nowadv ance forward or retreat back war

to adjust its p osition along the line of sight. Sev eral measures can b e used to determi

directionof motion; one ex ample is the termc

1

=a

1

whichsatises

c

1

a

1

= s

p

s

t

(19)

whenthe two lines of sight coincide. The objectiv e at this stage is to bring this measure

(11)

The linear combinationscheme presentedab ov e accuratelyhandles changes inv iewpoint assu

ingthe images are obtainedunder weakp ersp ectiv e projection. Error analy sis andex p erim

results demonstrate that in manypractical cases this assumption is valid. In cases wher

sp ectiv e distortions are too large to be handledbyaweakp ersp ectiv e approx imation, matc

b etween the model and the image can b e facilitated in two way s. One p ossibility is to av

cases of large p ersp ectiv e distortionbyaugmenting the libraryof storedmodels withaddi

models. In a relativ elydense library there usually ex ists a model that is related to th

by a suciently small transformation av oiding such distortions. The second alternativ e

improv e the matchb etweenthe model andthe image using an iterativ e process. Inthis sect

we consider the secondoption.

The suggested iterativ e process is based on a Tay lor ex pansion of the p ersp ectiv e co

nates. As describ ed b elow, this ex pansion results in a poly nomial consisting of terms

of which can b e approx imated by linear combinations of v iews. The rst termof this ser

represents the orthographic approx imation. The process resembles a method of matching

p oints with 2Dp oints described recently by DeMenthon and Dav is [4]. In this case, howev

the methodis appliedto 2Dmodels rather than3Dones. Inour applicationthe 3Dcoordinat

of the model points are not prov ided; insteadtheyare approx imatedfromthe model v iews.

Animage p oint (x; y) =(fX=Z; fY=Z ) is the projection of some object point, (X; Y;

the image, where f denotes the focal length. Consider the following Ta y lor ex pansion of

aroundsome depth v alue Z

0 :

1

Z

= 1

X

k=0 f

(k)

(Z

0 )

k !

(Z0 Z

0 )

k

= 1

Z

0 +

1

X

k=1

(0 1) k

(k 0 1)! (Z0 Z

0 )

k

Z k+1

0

(20)

= 1

Z

0

"

1+ 1

X

k=1

(0 1) k

(k0 1)!

Z0 Z

0

Z

0

k

#

The Tay lor series describing the position of a p oint x is therefore giv enby

x = fX

Z

= fX

Z

0

"

1+ 1

X

k=1

(0 1) k

(k0 1)!

Z0 Z

0

Z

0

k

#

(21)

Notice that the zero termcontains the orthographic approx imationfor x. Denote by 1 (k)

the

kth termof the series:

1 (k)

= fX

Z

0

(0 1) k

(k 0 1)!

Z0 Z

0

Z

0

k

(22)

Arecursiv e denition of the ab ov e series is giv en b elow.

(12)

x (0)

=1 (0)

= fX

Z

0

I terativestep:

1 (k)

= 0

Z0 Z

0

(k 0 1)Z

0 1

(k01)

x (k)

= x (k01)

+1 (k)

where x (k)

represents the kth order approx imationfor x, and1 (k)

represents the highest order

termin x (k)

.

According tothe orthographic approx imationbothXandZcanb e ex pressedas linear com

binations of the model v iews (Eq . (4)). We therefore applythe abov e procedure, approx imat

Xand Zat ev erystepusing the linear combinationthat b est aligns the model points with

image p oints. The general idea is therefore the following. First, we estimate x (0)

and 1 (0)

by

solv ing the orthographic case. Then at each step of the iteration we improv e the estimat

seek ing the linear combinationthat b est estimates the factor

0

Z0 Z

0

(k0 1)Z

0

x0 x (k01)

1 (k01)

(23)

Denote byx2 R n

the v ector of image point coordinates, anddenote by

P =[x

1

; y

1

; x

2

; 1] (24)

an n2 4 matrix containing the position of the points in the two model images. Denote

P +

=(P T

P) 01

P T

the pseudo-inv erse of P (we assume P is overdetermined). Also denote

by a (k)

the coecients computed for the kth step. Pa (k)

represents the linear combination

computed at that step to approx imate the Xor the Z v alues. Since at ev erystep Z

0

, f, and

k are constant they can b e merged into the linear combination. Denote by x (k)

and 1 (k)

the

v ectors of computedv alues of x and1at the kthstep. Aniterativ e procedure to align amo

to the image is describ ed b elow.

I nitialization:

Solv e the orthographic approx imation, namely

a (0)

= P +

x

x (0)

=1 (0)

= Pa (0)

I terativestep:

q (k)

= (x0 x (k01)

) 41 (k01)

a (k)

= P +

q (k)

1 (k)

= (Pa (k)

) 1 (k01)

x (k)

= x (k01)

+1 (k)

(13)

u v = (u

1 v

1

; : : : ; u

n v

n )

u 4 v = ( u

1

v

1

; : : : ; u

n

v

n )

5 ProjectionModel { Error Analysis

In this section we estimate the error obtained by using the linear combination method.

method assumes a weak p erspectiv e projection model. We compare this assumption with the

more accurate p ersp ectiv e projectionmodel.

Ap oint (X; Y; Z ) is projectedunder the persp ectiv e model to (x; y) =(fX=Z; fY=Z ) in

image, where f denotes the focal length. Under our weak p ersp ectiv e model the same p oi

is approx imated by (^x; ^y) =(sX; sY ) where s is a scaling factor. The b est estimate for

scalingfactor, is giv enbys =f=Z

0

, where Z

0

is the av erage depthof the observ edenv ironment.

Denote the error by

E =j^x0 xj (25)

The error is ex pressed by

E=

fX(

1

Z

0 0

1

Z )

(26)

Changing to image coordinates

E=

xZ(

1

Z

0 0

1

Z )

(27)

or

E= jxj

Z

0 0 1

(28)

The error is small when the measured feature is close the optical ax is, or when our esti

for the depth, Z

0

, is close to the real depth, Z . This supp orts the basic intuition that

images with lowdepth variance and for x ated regions (regions near the center of the ima

the obtainedp ersp ectiv e distortions are relativ elysmall, andthe sy stemcan therefore

the scene withhighaccuracy. Figures 1 and 2 showthe depth ratio Z=Z

0

as a functionof x for

=10 and 20 pix els, and Table 5 shows a number of ex amples for this function. The allowe

depthv ariance, Z=Z

0

, is computed as a function of x and the tolerated error, . For ex ampl

a 10 pix el error toleratedin a eld of v iewof up to 650 pix els is eq uivalent to allowing

variations of 20%. Fromthis discussionit is apparent that whenamodel is alignedto the i

the results of this alignment shouldb e judged dierentlyat dierent p oints of the image

farther awaya p oint is fromthe center the more discrepancy should b e tolerated between

predictionandthe actual image. Av e pix el error at p ositionx =50 is eq uivalent to a 10

error at p ositionx =100.

So far we have consideredthe discrepancies b etweenthe weakp erspectiv e andthe persp

tiv e projections of p oints. The accuracy of the LCscheme dep ends on the v alidityof the w

(14)

1 1.5 2 2.5 3 3.5 4 4.5

0 50 100 150 200 250 300

10/x + 1

Figure 1:

Z

0

as a function of x for =10 pix els.

1 2 3 4 5 6 7 8

0 50 100 150 200 250 300

20/x + 1

Figure 2:

Z

0

as a function of x for =20 pix els.

(15)

25 1.2 1.4 1.6 1.8

50 1.1 1.2 1.3 1.4

75 1.07 1.13 1.2 1.27

100 1.05 1.1 1.15 1.2

Table 1: Allowed depth ratios, Z

Z

0

, as a function of x (half the width of the eld considered)

and the error allowed(, inpix els).

p ersp ectiv e projectionb othin the model v iews andfor the incoming image. In the rest of

section we dev elop an error termfor the LCscheme assuming that b oth the model v iews and

the incoming image are obtained byp erspectiv e projection.

The error obtainedbyusing the LCscheme is giv enby

E= jx0 ax

1 0 by

1 0 cx

2

0 dj (29)

Since the scheme accurately predicts the appearances of points under weakp erspectiv e pr

tion, it satises

^x=a^x

1 0 b^y

1 0 c^x

2

0 d (30)

where accentedletters represent orthographic approx imations. Assume that in the two mo

pictures the depth ratios are roughly eq ual:

Z M

0

Z M

= Z

01

Z

1

Z

02

Z

2

(31)

(This conditionis satised, for ex ample, when b etweenthe two model images the camera on

translates along the image plane.) Using the fact that

x = fX

Z

= fX

Z

0 Z

0

Z

=^x Z

0

Z

(32)

we obtain

E = jx0 ax

1 0 by

1 0 cx

2 0 dj

^ x

Z

0

Z

0 a^x

1 Z

M

0

Z M

0 b^y

1 Z

M

0

Z M

0 c^x

2 Z

M

0

Z M

0 d

=

^ x

Z

0

Z

0 (a^x

1 0 b^y

1 0 c^x

2 )

Z M

0

Z M

0 d

=

^ x

Z

0

Z

0 (^x0 d) Z

M

0

Z M

0 d

(33)

(16)

=

^ x(

Z

0

Z 0

Z M

0

Z M

) 0 d( Z

M

0

Z M

0 1)

j^xj

Z

0

Z 0

Z M

0

Z M

+jd j

Z

M

0

Z M

0 1

The error therefore dep ends ontwo terms. The rst gets smaller as the image p oints get cl

to the center of the frame andas the dierence b etweenthe depthratios of the model andt

image gets smaller. The second gets smaller as the translationcomponent gets smaller an

the model gets close to orthographic.

Following this analy sis, weakpersp ctiv e can b e usedas a projectionmodel whenthe de

variations inthe scene are relativ ely lowand when the sy stemconcentrates on the center

of the image. We conclude that, by x ating on distinguished parts of the env ironment,

linear combinations scheme can b e usedfor localizationand p ositioning.

6 I mposing Constraints

Localizationand p ositioning req uire a large memory anda great deal of on-line computat

Alarge number of models must b e stored to enable the rob ot to nav igate and manipulate

in relativ ely large and complicated env ironments. The computational cost of model-im

comparisonis high, andif contex t (suchas pathhistory ) is not available the number of req

comparisons mayget v erylarge. Toreduce this computational cost anumber of constraints m

b e employed. These constraints tak e adv antage of the structure of the rob ot, the prop erti

indoor env ironments, and the natural prop erties of the nav igationtask . This sectionex a

some of these constraints.

One thing a sy stemmay attempt to do is to build the set of models so as to reduce the

eect of persp ectiv e distortions in order to av oid p erforming iterativ e computations.

of the env ironment obtained when the sy stemlook s relativ ely deep into the scene usua

satisfythis condition. Whenpersp ectiv e distortions are large the sy stemmayconsider mo

subsets of v iews related bya translationparallel to the image plane (p erp endicular to t

of sight). In this case the depth v alues of the p oints are roughly eq ual across all

considered, and it can b e shown that novel v iews can b e ex pressed by linear combinations

two model v iews ev enin the presence of large p ersp ectiv e distortions. This b ecomes appa

fromthe following derivation. Let (X

i

; Y

i

; Z

i

); 1 i n b e a p oint projected in the image

to (x

i

; y

i

) =(fX

i

=Z

i

; fY

i

=Z

i

), and let (x 0

i

; y 0

i

) b e the projected p oint after apply ing a rigid

transformation. Assuming that Z 0

i

=Z

i

we obtain

Z

i x

0

i

= r

11 X

i +r

12 Y

i + r

13 Z

i +t

x

Z

i y

0

i

= r

21 X

i +r

22 Y

i + r

23 Z

i +t

y

(34)

Div iding by Z

i

we obtain

x 0

i

= r

11 x

i +r

12 y

i +r

13 + t

x 1

Z

i

(17)

y 0

i

= r

21 x

i +r

22 y

i +r

23 + t

y

Z

i

(35)

Rewriting this in v ector eq uation formgiv es

x 0

= r

11 x+r

12 y+ r

13 1+ t

x z

01

y 0

= r

21 x+r

22 y+ r

23 1+ t

y z

01

(36)

where x, y, x 0

, and y 0

are the v ectors of x

i , y

i , x

0

i

, and y 0

i

v alues resp ectiv ely, 1is a v ector

of all 1s, and z 01

is a v ector of 1=Z

i

values. Conseq uently, as in the weak persp ectiv e case,

nov el v iews obtained by a translation parallel to the image plane can b e ex pressed by li

combinations of four v ectors.

An indoor env ironment usually prov ides the robot with a at, horizontal supp ort. Con

q uently, the motion of the camera is often constrained to rotation about the v ertical (Y

and to translation inthe XZ-plane. Suchmotionhas only three degrees of freedominstea

the sixdegrees of freedominthe general case. Under this constraint fewer corresp ondenc

req uired to align the model with the image. For ex ample, in Eq . (4) (ab ov e) the coecie

a

2

=b

1

=b

3

=b

4

=0. Three p oints rather thanfour are req uired to determine the coecient

bysolv ing alinear sy stem. Two, rather thanthree, are req uiredif the q uadratic constrai

also considered. Another advantage to considering only horizontal motionis the fact tha

motion constrains the p ossible epip olar lines between images. This fact can be used to

the task of corresp ondence seek ing.

Objects inindoor env ironments sometimes app ear inroughlyplanar settings. Inpartic

the relativ ely static objects tend to b e located along walls. Such objects include wi

shelv es, pictures, closets and tables. When the assumption of orthographic projection i

(for ex ample, when the rob ot is relativ ely distant fromthe wall, or when the line of si

roughlyp erp endicular to the wall) the transformationbetweenanytwo v iews can b e descri

bya 2Dane transformation. The dimensionof the space of v iews of the scene is thenreduc

to three (rather thanfour), andEq . (4) b ecomes

x 0

= a

1 x

1 +a

2 y

1 + a

4 1

y 0

= b

1 x

1 +b

2 y

1 +b

4

1 (37)

(a

3

=b

3

=0.) Only one v iewis therefore sucient to model the scene.

Most oce-lik e indoor env ironments are comp osedof rooms connectedby corridors. Nav i

gatinginsuchanenv ironment involv es maneuv ering throughthe corridors, entering andex i

the rooms. Not all p oints insuchanenv ironment areeq uallyimp ortant. Junctions, places w

the robot faces a numb er of options for changing its direction, are more imp ortant thano

places for nav igation. In an indoor env ironment these places include the thresholds of

and the b eginnings and ends of corridors. Anav igationsy stemwould therefore tend to st

more models for these points than for others.

One important prop erty shared by many junctions is that they are conned to relativ e

small areas. Consider for ex ample the threshold of a room. It is a relativ ely narrow

(18)

a commonb ehav ior includes stepping throughthe door, look ing into the room, andidentify

it before a decision is made to enter the roomor to av oidit. The set of interesting imag

this task includes the set of v iews of the roomfromits entrance. Prov idedthat threshol

narrowthese v iews are related to eachother almost ex clusiv elybyrotationaroundthe v er

ax is. Under p ersp ectiv e projection, such a rotation is relativ ely easy to recov er. The

of p oints in nov el v iews can be recov ered fromone model v iewonly. This is apparent fr

the following derivation. Consider a p oint p =(X; Y; Z ). Its p osition in a model v iewis

by (x; y) =(fX=Z; fY=Z ). Now, consider another v iewobtained by a rotation Raround t

camera. The locationof p in the newv iewis giv enby(assuming f =1)

(x 0

; y 0

) =( r

11 X+r

12 Y +r

13 Z

r

31 X+r

32 Y +r

33 Z

; r

21 X+r

22 Y +r

23 Z

r

31 X+r

32 Y +r

33 Z

) (38)

imply ing that

(x 0

; y 0

) =( r

11 x +r

12 y +r

13

r

31 x +r

32 y +r

33

; r

21 x+r

22 y +r

23

r

31 x+r

32 y +r

33

) (39)

Depthis therefore not afactor indetermining the relationb etweenthe v iews. Eq . (39) b ec

ev ensimpler if onlyrotations ab out the Y -ax is are considered:

(x 0

; y 0

) =(

xcos +sin

0 xsin+cos

;

y

0 xsin+cos

) (40)

where is the angle of rotation. In this case can b e recov ered merely froma single c

sp ondence.

7 Experiments

The LCmethod was implemented and applied to images tak en in an indoor env ironment.

Images of twooces, AandB, that hav e similar structures were tak enusingaPanasonic camer

withafocal lengthof 700 pix els. Semi-static objects, suchas heav yfurniture andpictur

usedto distinguish b etweenthe oces. Figure 3 shows two model v iews of oce A. The v iews

were tak en at a distance of ab out 4mfromthe wall. Correspondences were pick ed manuall

The results of aligning the model v iews to images of the two oces are presentedinFigur

The left image contains anov erlayof a predictedimage (the thickwhite lines), construc

linearlycombining the two v iews, and anactual image of oce A. Agood matchbetween the

two was achiev ed. The right image contains an overlayof a predicted image constructedfr

a model of oce Band an image of oce A. Because the oces share a similar structure the

static cues (the wall corners) were p erfectly aligned. The semi-static cues, howev er,

matchany features in the image.

Figure 5 shows the matching of the model of oce Awithanimage of the same oce ob-

tainedbya relativ elylarge motionforward(about 2m) andto the side (ab out 1.5m). Altho

(19)

Figure 4: Matching a model of oce Ato animage of oce A(left), and matching a model of

oce Bto the same image (right).

Figure 5: Matching a model of oce Ato an image of the same oce obtainedbya relativ ely

(20)

Figure 7: Matching the corridor model with two images of the corridor. The right image w

obtained by a relativ ely large motion forward (ab out half of the corridor length) and t

right.

the distances are relativ elyshort most p ersp ectiv e distortions are negligible, and a goo

b etween the model andthe image is obtained.

Another set of images was tak en in a corridor. Here, b ecause of the deep structure

the corridor, persp ectiv e distortions are noticeable. Nev ertheless, the alignment res

demonstrate anaccurate matchinlarge p ortions of the image. Figure 6 shows twomodel v ie

of the corridor. Figure 7 (left) shows an ov erla y of a linear combination of the model

with an image of the corridor. It can b e seen that the parts that are relativ ely distant

p erfectly. Figure 7 (right) shows the matching of the corridor model withanimage obtaine

a relativ elylarge motion(about half of the corridor length). Because of p ersp ectiv e dis

the relativ elynear features no longer align(e.g., the near door edges). The relativ elyf

howev er, still match.

The nex t ex p eriment shows the application of the iterativ e process presented in Secti

(21)

and Figure 9 shows the results of matching a linear combination of the model v iews to

image of the same oce. In this case, b ecause the image was tak en froma relativ ely cl

distance, p erspectiv e distortions cannot b e neglected. The eects of persp ectiv e distort

b e noticed on the right corner of the board, and on the edges of the hanger on the top rig

Persp ectiv e eects were reduced by using the iterativ e process. The results of apply in

procedure after one and three iterations are showninFigure 10.

The ex perimental results demonstrate that the LCmethodachiev es accurate localizatio

manycases, andthat whenthe methodfails because of large p erspectiv e distortions anite

computation canb e usedto improv e the q ualityof the match.

8 Conclusions

Amethodof localizationandp ositioning inanindoor env ironment was presented. The meth

is based onrepresenting the scene as a set of 2Dv iews and predicting the appearance of n

v iews by linear combinations of the model v iews. The method accurately approx imates th

app earances of scenes under weak p erspectiv e projection. Analy sis of this projection a

as ex p erimental results demonstrate that in many cases this approx imation is sucient

accurately describ e the scene. When the weak persp ectiv e approx imation is invalid, eit

larger number of models can b e acq uired or an iterativ e solution can b e employ ed to acco

for the p ersp ectiv e distortions.

The method presentedin this pap er has sev eral adv antages ov er ex isting methods. It u

relativ elyrichrepresentations; the representations are 2Drather than 3D, and localiza

b e done froma single 2Dv iewonly. The same basic method is used in b oth the localizati

andp ositioning problems, andasimple algorithmfor rep ositioning is deriv edfromthis me

Future workincludes handling the problemof acq uisitionandmaintenance of models, dev el

ing ecient and robust algorithms for solv ing the corresp ondence problem, and building m

using v isual input.

Appendix

Inthis appendixwe deriv ethe ex plicit v alues of the coecients of the linear combinations

case of horizontal motion. Consider a p oint p =(x; y; z) that is projectedby weakp ersp

to three images, P

1 , P

2

, and P 0

; P

2

is obtained fromP

1

bya rotation about the Y-ax is byan

angle , translation t

m

, and scale factor s

m

, and P 0

is obtained fromP

1

a rotation ab out the

Y-ax is byanangle , translationt

p

andscale s

p

. The p osition of p inthe three images is giv en

by

(x

1

; y

1

) = (x; y)

(22)

Figure 9: Matching the model to an image obtained by a relativ ely large motion. Perspect

distortions canb e seenin the table, the b oard, and the hanger at the upp er right.

Figure 10: The results of apply ing the iterativ e process to reduce p erspectiv e distortio

(23)

2 2 m m m m

(x 0

; y 0

) = (s

p

xcos +s

p

z sin +t

p

; s

p y)

The p oint (x 0

; y 0

) can b e ex pressed bya linear combinationof the rst two p oints:

x 0

= a

1 x

1 +a

2 x

2 +a

3

y 0

= by

1

Rewriting these eq uations we get

s

p

xcos +s

p

z sin +t

p

= a

1 x+ a

2 (s

m

xcos +s

m

z sin+t

m ) +a

3

s

p

y = by

Eq uating the values for the coecients inb oth sides of these eq uations we obtain

s

p

cos = a

1 +a

2 s

m cos

s

p

sin = a

2 s

m sin

t

p

= a

2 t

m +a

3

s

p

= b

and the coecients are therefore giv enby

a

1

= s

p

sin(0 )

sin

a

3

= s

p sin

s

m sin

a

4

= t

p 0

t

m s

p sin

s

m sin

b = s

p

References

[1] N. Ay ache andO. D. Faugeras. Maintaining representations of the env ironment of a mobi

rob ot, IEEETrans. onRobotics and Automation,Vol. 5, pp. 804{819, 1989.

[2] R. Basri and S. Ullman. The alignment of objects with smooth surfaces. Proc. 2nd I

Conf. onComputer Vision, Tarpon Springs, FL, pp. 482{488, 1988.

[3] D. J. Braunegg. Marv el|Asy stemfor recognizing worldlocations withstereo v ision

TR-1229, MIT, 1990.

[4] D. F. DeMenthonand L. S. Dav is. Model-basedobject p ose in 25 lines of code. Proc. 2

Europe anConf. onComputer Vision, Genova, Italy, 1992.

(24)

construction. Proc. SPIESymposium onIntel ligent Robotic Sy stems, Boston, MA, 1991.

[6] C. Fennema, A. Hanson, E. Riseman, R. J. Bev eridge, and R. Kumar. Model-directed

mobile rob ot nav igation. IEEETrans. onSy stems, ManandCy bernetics, Vol. 20, pp. 1352{

1369, 1990.

[7] M. A. Fischler andR. C. Bolles. Randomsample consensus: a paradigmfor model ttin

with application to image analy sis and automated cartography. Communications of the

ACM, Vol. 24, pp. 381{395, 1981.

[8] D. P. Huttenlocher andS. Ullman. Object recognitionusingalignment. Proc. 1st Int. C

onComputer Vision, London, UK, pp. 102{111, 1987.

[9] D. G. Lowe. Three-dimensional object recognition fromsingle two-dimensional ima

Robotics Research Te chnic al Report 202, Courant Institute of Math. Scienc es, New York

University , 1985.

[10] M. J. Mataric. Env ironment learning using a distributed representation. Proc. Int.

onRob otics and Automation, Cincinnati, OH, 1990.

[11] R. N. Nelson. Visual homing using an associativ e memory. DARPAImage Understanding

Workshop, pp. 245-262, 1989.

[12] K. Onoguchi, M. Watanabe, Y. Ok amoto, Y. Kuno, and H. Asada. Av isual nav igation

sy stem using a multi informationlocal map. Proc. Int. Conf. onRob otics andAutomation,

Cincinnati, OH, pp. 767{774, 1990.

[13] T. Poggio. 3Dobject recognition: on a result by Basri and Ullman. Technic al Repo

9005-03, IRST, Pov o, Italy, 1990.

[14] K. B. Sarachik . Visual nav igation: constructing and utilizing simple maps of an i

env ironment. AI-TR-1113, MIT, 1989.

[15] D. W. Thompson and J. L. Mundy. Three dimensional model matching froman uncon-

strained v iewpoint. Proc. Int. Conf. onRob otics and Automation, Raleigh, NC, pp. 208{

220, 1987.

[16] S. Ullman. Aligning pictorial descriptions: an approach to object recognition. Cog

Vol. 32, pp. 193{254, 1989.

[17] S. Ullman and R. Basri. Recognition by linear combinations of models. IEEETr ans. o

PatternAnaly sis and Machine Intelligence, Vol. 13, pp. 992{1006, 1991.