MASSACHUSETTS INSTI TUTE OF TECHNOLOGY ARTI FI CI AL I NTELLI GENCE LABORATORY A.I . Memo No. 1391 De ce mbe r, 1992

(1)

ARTIFICIALINTELLIGENCELABORATORY

A.I. Memo No. 1391 December, 1992

Recognitionby Prototypes

Ronen Basri

Abstract

A schemeforrecognizing3D objects fromsingle 2Dimages is introduced. Theschemeproceeds intwo

stages. Inthe rst stage, the categorization stage, the image is comparedto prototype objects. For

eachprototype, the viewthat most resembles the image is recovered, and, if the viewis foundto be

similar totheimage, theclass identityof theobject isdetermined. Inthesecondstage, theidentication

stage, theobservedobjectiscomparedtotheindividual modelsof its class, whereclassesareexpectedto

containobjects withrelativelysimilarshapes. Foreachmodel, aviewthat matchestheimageissought.

If sucha viewis found, the object's specic identityis determined. The advantageof categorizing the

object before it is identiedis twofold. First, the image is comparedto a smaller number of models,

since onlymodels that belongtotheobject'sclass needtobeconsidered. Second, thecost of comparing

theimage to eachmodel inaclass is verylow, because correspondence is computedonce for thewhole

class. Morespecically, thecorrespondenceandobjectposecomputedinthecategorizationstagetoalign

theprototype withtheimagearereusedintheidenticationstageto alignthe individual models with

theimage. As aresult, identicationis reducedtoaseries of simple template comparisons. Thepaper

concludeswithanalgorithmfor constructingoptimal prototypesfor classes of objects.

Copyright c

Massachuset t sInst it ut eofTechnology,1993

Thisreportdescrib esresearchdoneatt heArt icial I nt elligenceLaborat oryoft heMassachuset t sI nst it ut eof Technologyand

t heMcDonnell-PewCent erforCognit iveNeuroscience. Supportfort helaborat ory'sart icial int elligenceresearchisprovided

inpart byt heAdvancedResearchProject s Agencyof t heDepart ment of Defenseunder Oceof Naval Researchcont ract

N00014-91-J-4038. RonenBasri issupport edbyt heMcDonnell-Pewandt heRot hchildpost doct oral fellowships.

(2)

Our worldcontains anoverwhelmingvariet yof objects.

Whilepeopledemonstrateoutstandingabilitiestomem-

orize and recognize thousands of objects [27, 37, 38],

computer vision applications largely fail to accommo-

datethesenumbers. Apparently, themaintool thaten-

ables people to eectivelyhandle this massive amount

of objectsiscategorization. Bydividingtheobjectsinto

classes, thevisual systemiscapableof concludingprop-

erties of unfamiliar objects fromtheir resemblance to

familiarones. For familiarobjects, categorizationoers

anindexingtool intothestoredlibraryof object repre-

sentations.

Recognitioncanbe performedindierent \levels of

abstraction". For example, thesameobject canberec-

ognizedasaface, ahumanface, orasaspecicperson's

face. Psychological studiessuggesttheexistenceofapre-

ferredlevel forrecognition, called\thebasiclevel of ab-

straction"[33]. Existingcomputational schemesusually

approachrecognitionineither oneof twolevels. Several

schemes attempt toclassifyobjects intheir basiclevel

of abstraction(werefer tothis taskbyc ategorization),

while other schemes attempt to determine the specic

identityof objects (we refer to this taskbyidentic a-

tion). This paper presentsanovel approachfor recogni-

tionthatcombinesthetwotasks.

Toseehowthetwotasksarerelated, consider thefol-

lowingexample. Supposeyouarewalkingdownastreet,

andsomeone is coming towards you. You lookat the

person's face, andit looks familiar, but youcannot tell

whoitis. Soyoutrytopicturethepeopleyouknowwho

looklikethepersonyousee, until nally, yourealizewho

thepersonis.

Anumber of hypothesescanbedrawnfromthisstory.

First, recognitioncan be brokeninto two stages: cat-

egorization and identication, where categorization is

believedto precede identication. Second, during the

course of recognitionthe image is comparedagainst a

number of objectmodels. Assumingthatindeedcatego-

rizationprecedesidentication, onlymodels that belong

totheobject'sclassneedtobeconsidered. Finally, when

anewmodel is comparedtotheimage, thecomparison

processmaybenetfromtheuseofinformationacquired

duringcategorization. Notethatthesituationdescribed

hereis not specictofaces. Onecanimaginethat simi-

larsituationsoccurwhenotherobjects, suchasanimals,

cars, andchairs, areobserved.

To see howinformationacquiredduring categoriza-

tioncanbeusedforidentication, consider theexample

offacerecognition. Whenafaceisrecognized, theimage

positionsof its parts andfeaturesareknown. Inpartic-

ular, an observer alreadyknows where the eyes, nose,

andmouthare andcaneveninfer thedirectionof gaze

andexpression. Theperson'sidentityisnotessential for

extractingandlocatingthesefeatures. Instead, theyare

matchedagainst features ina\generic"representation.

Inaddition, other features, suchas abeard, hair style,

andwrinkles, that maybetter distinguishbetweendif-

ferent persons maybe located. Moregenerally, wecan

postulatethat, duringcategorization, sub-structures of

and locatedwith respect to a generic model, and the

object's poseis determined.

Tofollow this example, I proposeaschemefor recog-

nizing 3D objects from single 2D views that combines

the two stages, categorizationandidentication. Cat-

egorizationis achievedbyaligning the image toproto-

type objects. Theprototype that appears most similar

totheimagedeterminestheclass identit yof theobject.

After theobject iscategorized, itsspecicidentityisde-

terminedbyaligning theobservedobject toindividual

models of itsclass. Byrstcategorizingtheobject, not

onlythenumber of models consideredfor identication

is reduced, but also thecost of comparing eachmodel

totheimagesignicantlydecreases. This is achievedby

reusing the correspondence andpose computedfor the

prototypeinthecategorizationstagetoaligntheimage

withtheindividual models. Weshow inthispaperthat,

albeit a perfect matchbetweenthe prototype andthe

image is not obtainable, the correspondence and pose

can be computedfor the prototype, and can be used

to bring the image and the object's model into align-

ment. Consequently, recoveringthecorrespondenceand

posefortheindividual modelsbecomesunnecessary, and

identicationis reducedto a series of simple template

comparisons.

Therestof this paper isdividedas follows. Section2

reviews themainexistingapproachesfor categorization

and identication. Section 3 presents the scheme of

recognitionbyprototypes. Section4 proposes analgo-

rithmforgeneratingoptimal prototypesfor thescheme.

Section 5 discusses the relevance of the scheme to hu-

manrecognition. Implementationresults are presented

inSection6.

2 Previous Approaches

Existingschemesfor categorizationoftenusea \reduc-

tionist"approach. Theimage, whichcontainsadetailed

appearance of anobject, is transformedintoacompact

representation that is invariant for all objects of the

sameclass. Onecommonapproachtogeneratingsucha

representationis bydecomposingtheobject intoparts.

Parts areextractedbycuttingtheobject inconcavities

[17, 22, 43] andlabeledaccordingtotheirgeneral shape.

The labels, together with the spatial relationships be-

tweentheparts, areusedtoidentifytheclass of theob-

ject[4, 6, 7, 26]. Asecondapproachextractsthepartsof

theobjectthat fulll certainfunctions. Thelistof func-

tions is usedtodeterminetheobject's class[16, 39, 47].

Schemesthat breakobjects intopartsareinsucient

toexplainall theaspectsof recognitionforthefollowing

reasons. First, inmanycasesobjects thatbelongtothe

sameclassdieronlybytheirdetailedshape, whilethey

shareroughlythesameset of parts. Moreover, evenob-

jects that at somelevel maybeconsideredbelongingto

dierentclasses, suchasacatandadog, mayalsoshare

roughlythesameset of parts. Tosolvethisproblemsev-

eral systemsalsostore, inadditiontothepartstructure

of theobjects, the detailedshapeof theparts [2, 6, 7].

Another problemis thatmanyof thetechniquesfor rec-

ognizing objects bypart decompositionrelyonnding

(3)

To recognize the specic identityof objects, a rel-

ativelydetailedrepresentationof the object's shape is

comparedwiththe image. Anexample for suchmeth-

odsisalignment[3, 9, 12, 13, 18, 25, 40, 41]. Alignment

involvesrecoveringthepositionandorientation(pose)in

whichtheobjectisobservedandcomparingtheappear-

anceof theobject fromthat posewiththeimage. Only

afewattempts havebeenmade inthe past to extend

thealignment scheme to the problemof object catego-

rization(e.g., [36]). Themaindicultyinapplyingthe

alignment approachis the recoveryof the pose of the

observedobject. Inmost implementations this involves

atime-consumingstage for nding the correspondence

betweenthemodel andtheimage. Theprocessbecomes

impractical whentheimageis comparedagainstalarge

libraryof objects, becausetypicallythecorrespondence

isestablishedbetweentheimageandeachof themodels

inthelibraryseparately.

Tohandlelargelibraries, indexingmethodswerepro-

posed(e.g., [20, 46, 14]). Thebasicideaisthefollowing.

Acertainfunctionisdenedandappliedtotheviewsof

all theobjectsinthelibrary. Theobjectmodels arear-

rangedinalook-uptableindexedbytheobtainedfunc-

tion values. Whenanimage is given, the function is

appliedtotheimage, andtheobtainedvalueis usedto

indexintothetable. Toreducethesize of thetableand

the complexity of its preparation, invariant functions,

functionsthat whenappliedtodierent views of anob-

jectreturnthesamevalueregardlessof viewpoint, often

areusedas theindexingfunctions.

Indexing methods suer fromseveral shortcomings.

First, existing indexing methods handle only rigidob-

jects. Extendingthesemethods tohandleclasses of ob-

jects has not beendiscussed. Second, becauseof com-

plexityissues, indexingfunctions usuallyareappliedto

small numbers of features. As a result, high rates of

falsepositivesareobtained, andtheeectiveness of the

indexingis reduced.

The scheme presentedin this paper is designed to

workwheretraditional approachestocategorizationand

indexingfail. Theschemecombinesbothcategorization

andidenticationof objects, andusesfairlydetailedrep-

resentations for objects. Rather thanindexing directly

to the specic object model, the scheme indexes into

the library of objects bycategorizing the object. The

classeshandledbytheschemeincludeobjects withrel-

ativelysimilar shapes. Tot intothescheme, insome

casesbasiclevel classesarebrokenintosub-classes. The

general problemof categorizationthereforemayrequire

additional tools.

3 Recognitionby Prototypes

The recognitionbyprototypes scheme proceeds as fol-

lows. Alibraryof 3D object models is storedinmem-

ory. Themodels inthelibraryaredividedintoclasses,

and3D prototype objects are selectedto represent the

classes. For every class, the models in the class are

alignedinthe librarywiththe prototype object. The

roleof this 3D alignmentwill becomeclearshortly.

matchedagainst all of the prototypes. For eachproto-

typeobject, thesystemattempts torecover theviewof

theprototypethat most resemblestheimage. Todoso,

thesystemrecoversthecorrespondencebetweenthepro-

totypeandtheimage, and, usingthiscorrespondence, it

determinesthetransformationthat best aligns thepro-

totype with the image. This transformation, referred

to as the prototype transform, is then appliedto the

prototype, andthe similaritybetweenthe transformed

prototypeandtheactual imageis evaluated. Since the

observedobjectingeneral diersfromtheprototypeob-

ject, aperfect matchbetweenthetwoisnotanticipated.

Thesystemthereforeseeks aprototypethat reasonably

matchestheimage. Oncesuchaprototypeisfound, the

class identityof theobject isdetermined.

After theobject's class isdetermined, thesystemat-

tempts torecover thespecicidentityof theobject. At

this stage, theimage is matchedagainst all themodels

of the object's class. For eachof thesemodels, thesys-

temseeks torecover thetransformationthat aligns the

model withtheimage. Aswill beshownbelow, sincethe

modelsarealignedinthelibrarywiththeprototype, the

transformationthat best aligns the prototype withthe

imageis identical tothe transformationthat aligns the

model totheimage. Theprototypetransformtherefore

is appliedto the specic models, andtheir appearance

fromthis poseis comparedwiththeimage. Themodel

that aligns withtheimage, if thereis such, determines

thespecicidentityof theobject.

Therest of this sectionis dividedas follows. InSec-

tion3.1theobject representationusedinourschemeis

presented. Section3.2describesthecategorizationstage,

andSection3.3describestheidenticationstage.

3.1 Object representation{ the linear

c ombinationscheme

In our scheme, an object is modeled by a matrix M

of size n2k, wheren is thenumber of featurepoints,

andk represents the degrees of freedomof the object.

Avector ~a 2 R k

, referredto as the transformve ctor,

represents thetransformationappliedtotheobject ina

certainview, andtheobject'sappearancefromthisview

is givenby

~v=M~a (1)

Intherestof thissectionweexplaintheuseofthisnota-

tion. Thenotationfollows fromthelinear combination

scheme[42], whichis brieyreviewedbelow.

Under the linear combination scheme an object is

modeled by a small set of views, each is represented

byavectorcontainingpointpositions, wherethepoints

in these views are ordered in correspondence. Novel

viewsof theobjectareobtainedbyapplyinglinearcom-

binations to the stored views. Additional constraints

mayapplytothecoecientsof thislinearcombination.

Computingtheobjectposethereforerequiresrecovering

thecoecients of thelinear combinationthat alignthe

model withthe image andverifying that the recovered

coecients indeedsatisfythe constraints. Themethod

handlesrigidobjects under weak-perspectivepro jection

(namely, orthographicpro jectionfollowedbyauniform

(4)

of objectswithsmoothboundingsurfaces andtohandle

articulatedobjects. Inour representation, thecolumns

of themodel matrixMcontainviews of theobject, and

thecoecientsof thelinear combinationthat alignthe

model withtheimagearegivenbythetransformvector

~a.

For concreteness, we reviewthe linear combination

schemefor rigidobjects. Consider a 3D object O that

containsn featurepoints (X

i

;Y

i

; Z

i

), 1in. Under

weak-perspective pro jection, the positionof the object

following a rotation R , translation

~

t, and scaling s is

givenby

x

i

= s r

11 X

i +s r

12 Y

i +sr

13 Z

i +t

x

y

i

= s r

21 X

i +s r

22 Y

i +sr

23 Z

i +t

y

(2)

wherer

ij

arethecomponentsof therotationmatrix, R,

andt

x , t

y

arethehorizontal andvertical componentsof

thetranslationvector,

~

t respectively.

Denoteby

~

X;

~

Y ;

~

Z; ~x; ~y 2 R n

vectors of X

i

; Y

i

; Z

i

; x

i

andy

i

values respectively, anddenote

~

1 =(1; :: : ; 1) 2

R n

, wecanrewriteEq. 2inavectorequationasfollows:

~x = a

1

~

X+a

2

~

Y +a

3

~

Z +a

4

~

1

~y = b

1

~

X+b

2

~

Y +b

3

~

Z +b

4

~

1

(3)

where

a

1

= s r

11

b

1

= s r

21

a

2

= s r

12

b

2

= s r

22

a

3

= s r

13

b

3

= s r

23

a

4

= t

x

b

4

= t

y

Therefore

~x; ~y 2 spanf

~

X;

~

Y ;

~

Z;

~

1g (4)

Dierent views of the object are obtained by chang-

ingtherotation, scale, andtranslationparameters, and

thesechangesresultinchangingthecoecientsinEq. 3.

Wemaythereforeconcludethat all theviews of arigid

object arecontainedina4Dlinear space.

This property, that the views of a rigid object are

containedina 4D linear space, provides a methodfor

constructingviewer-centeredrepresentationsfor theob-

ject. Theideaistouseimagesof theobjecttoconstruct

abasisforthis space. Ingeneral, twoviewsprovidesuf-

cientlymanyvectors. Therefore, anynovel viewis a

linearcombinationof twoviews[30, 42].

Noteverylinearcombinationisavalidview of arigid

object. Followingtheorthonormalityof therowvectors

of the rotation matrix, the coecients in Eq. 3 must

satisfythetwoquadraticconstraints

a 2

1 +a

2

2 +a

2

3

=b 2

1 +b

2

2 +b

2

3

a

1 b

1 +a

2 b

2 +a

3 b

3

=0

(5)

When the constraints are not satised, distorted (by

stretchor shear) pictures of the objects are generated.

Incaseaviewer-centeredrepresentationisused, thecon-

straintschangeinaccordancewiththeselectedbasis. A

thirdviewof theobject canbeusedtorecover thenew

constraints.

Forthepurposeofthispaperamodelforarigidobject

canbeconstructedbybuildingthefollowingn24model

matrix

M=(

~

X;

~

Y ;

~

Z;

~

1)

~x = M~a

~y = M

~

b

(6)

where~a = (a

1

; a

2

; a

3

; a

4 ) and

~

b = (b

1

; b

2

; b

3

; b

4

) arethe

coecients from Eq. 3. Notice that thetwolinear sys-

tems canbemergedintoonebyconstructingamodied

model matrixinthefollowingway

~x

~y

=

M 0

0 M

~a

~

b

(7)

Similar constructions canbe obtainedfor objects with

smooth bounding surfaces andfor articulatedobjects.

Thewidthof M, k, shouldthenbemodiedaccording

tothedegrees offreedomof themodeledobject. Aswas

mentionedabove, viewer-centeredrepresentationscanbe

obtainedbyconstructinga basis for the4D space from

imagesof theobject. Therefore, viewer-centeredmodels

canbeobtainedbyreplacing the columnvectors of M

withtheconstructedbasis.

To summarize, following the linear combination

schemewecanrepresent anobject byamatrixM and

construct views of the object by applying it to trans-

formvectors ~a. For rigid objects not everytransform

vector is valid; the componentsof thetransformvector

must satisfythetwoquadraticconstraints. Recognition

involvesrecoveringthetransformvector~a andverifying

that its components satisfythe twoconstraints. Ignor-

ingtheseconstraintswill resultinrecognizingtheobject

evenwhenitundergoesgeneral 3Danetransformation.

In the analysis belowwe largely ignore the quadratic

constraints. Theseconstraints, however, canbeveried

bothduring the categorizationstage as well as during

theidenticationstage.

3.2 Categorization

Therecognitionbyprototypes schemebegins bydeter-

miningthe object's category. This is achievedbycom-

paringtheobservedobjecttoprototypeobjects, objects

thatare\typicalexemplars"fortheirclasses. Foragiven

prototype, theviewof theprototype that most resem-

blestheimageis recoveredandcomparedtotheactual

image, andtheresult of thiscomparisondeterminesthe

class identityof theobject.

Webeginour descriptionof the categorizationstage

bydeningthedata structures usedbythe scheme. A

class C =(P ; fM

1

; M

2

; : : : ; M

l

g) is apair that includes a

prototypeP andaset of object models M

1

; M

2

; : : : ; M

l .

Boththe prototype andthe models are representedby

n 2k matrices, where n denes the number of feature

pointsconsidered, andk denotesthedegrees of freedom

of theobjects. Forthesakeof simplicityweassumehere

that all theobjects intheclass share thesamenumber

of featurepoints, n, andthat theyhavesimilar degrees

of freedom, k. Note that similar objects tendto have

similar degrees of freedom (e.g., all of them are rigid).

Bothassumptions are not strict, however. The scheme

canbemodiedtotoleratebothvaryingnumber of fea-

turepoints as well as dierent degrees of freedom. The

details will bediscussedlater inthis paper. Notethat

(5)

viewer-centeredrepresentations. Incaseviewer-centered

representationsareusedweshall assumethatthemodels

representtheobjectsfromthesamerangeofviewpoints.

Aclass inour scheme contains objects withsimilar

shapes. These objects share roughly the same topolo-

gies, and there exists a \natural" correspondence be-

tweenthem. Consider, for instance, the two chairs in

Figure1. Althoughthe shapes of thesechairs are dif-

ferent, andsome parts (e.g., thearms) appear onlyin

onechairandnotintheother, anatural correspondence

betweenfeatures inthetwoobjects canbedetermined.

Inthelibraryof models, the natural correspondence

betweenobjects is made explicit. It is speciedbythe

orderoftherowvectorsof themodels. Specically, given

aprototype P andobject models M

1

; : : : ; M

l

, we order

therowsof thesemodelssuchthattherstfeaturepoint

of P correspondstotherstfeaturepointof eachof the

modelsM

1

; : : : ; M

l

, andsoforth.

Giventhelibraryof objectsandgivenanincomingim-

age, therecognitionbyprototypesschemebeginsbycat-

egorizingtheobject observedintheimage. Toachieve

this goal, the prototype objects are alignedandcom-

paredtotheimage. Foreveryprototype, thecorrespon-

dence betweentheimage andthe prototype is rst re-

solved, and, usingthis correspondence, thenearestpro-

totype viewis recovered. Bydoing so, the schemede-

couplesthetwofactorsthataecttheappearanceof the

object inthe image, namely, viewvariationsandshape

variations. Byselecting the nearest prototype viewto

theimage, theschemecompensatesfor viewvariations.

Then, byevaluatingthesimilaritybetweenthenearest

prototypeview andtheactual image, itaccountsforthe

dierences inshapebetweentheprototype andthe ob-

servedobject.

Therststageinmatchingtheprototypetotheimage

involves the recoveryof correspondence betweenproto-

type and image features. In existing systems for rec-

ognizingthespecicidentityof objects establishingthe

correspondence betweenimages and object models in-

volves atime-consumingprocess inwhichsophisticated

algorithms are applied[10, 13, 15, 18, 23, 25, 35, 41].

These algorithms relyonthe propertythat, whenthe

correctcorrespondencebetweenamodel andanimageis

established, anear-perfectmatchbetweenthetwoisob-

tained. Whilethisassumptionisvalidforidentication,

it cannotbeusedunder our schemesincetheprototype

andtheimagegenerallyrepresentdierentobjects.

Todeterminethecorrespondence betweentheproto-

typeandtheimage, wedeneanobjectivefunctionthat

isappliedtotheprototypeandtheimageunder agiven

correspondenceandthatobtainsitsminimumunder the

correctcorrespondence. Theobjectivefunctionwill mea-

surethequalityof thematchbetweentheprototypeand

theimage. Namely, under this measurethecorrect cor-

respondence is the one that brings the prototype into

its bestalignmentwiththeimage. Giventhis objective

function, correspondenceisacombinatorial optimization

problem, andsominimizationtechniquescanbeusedto

resolve the correspondence betweenthe prototype and

theimage. This paper does not proposeaspecictech-

Assumingthecorrespondenceproblemcanbesolved,

the scheme proceeds as follows. Givena prototype P

andanimageI, wegeneratea viewvector~v from the

image byextracting the locationof feature points and

arrangingtheminavector. Thepointsin~v areordered

incorrespondence to theprototypepoints; that is, the

rst point in~v corresponds to the rst point inP and

soforth. Theprototy pe transform is thetransformation

that brings theprototypepoints as closeas possible to

their correspondingimagepoints. Theprototypetrans-

form, therefore, isthetransformvector

~

b thatminimizes

theEuclideandistancebetweentheprototypeandimage

points, namely

min

~

b 0

kP

~

b 0

0~vk (8)

Asolutionfor (8) is obtainedas follows. Assuming P

is overdetermined; that is, P is n 2k wheren >k and

r ank(P) =k, and denote by P +

=(P T

P) 01

P T

the

pseudo-inverseof P, theprototypetransform,

~

b , isgiven

by

~

b =P +

~v (9)

andthene arest prototy peview~p isobtainedbyapplying

P totheprototypetransform,

~

b , that is

~p =P

~

b =PP +

~v (10)

The nearest prototypeviewis nowcomparedtothe

image, andtheir resemblancedeterminestheclass iden-

tityof theobject. Thequalityof thematchbetweenthe

prototypeandtheimageis denedby

D (P; ~v)=k~p 0~v k =k(PP +

0I )~v k (11)

To eliminate eects due to scaling of the object, this

measure shouldbe normalized, as is illustratedbythe

examplebelow. Consideranobject seenfromsomeview

~v

1

. Its distance totheprototypeis givenbyD (P; ~v

1 ).

Supposetheobjectisnow seenfromanewview ~v

2 that

is identical to~v

1

, exceptthat theobjectis nowastwice

asclosetothecamera. Under theseconditions~v

2

=2~v

1 ,

anditsdistancetotheprototypeis givenbyD (P; ~v

2 )=

2D(P; ~v

1

). Clearly, we should have a measure that is

independentof thedistanceof theobject tothecamera.

OnewaytoobtainsuchameasureisbydividingD (P; ~v)

bythenormk~v k

^

D (P; ~v )= k(PP

+

0I )~vk

k~vk

(12)

^

D (P; ~v) is proposedhereas anobjectivefunctionfor

establishing the correspondence betweenthe prototype

andtheimage. Inother words, weexpectthatif theob-

jectbelongstotheprototype'sclassthen

^

D(P; ~v)obtains

itsminimal valuewhen~v isorderedincorrespondenceto

P. Anyother permutationwill increasethevalueof

^

D .

Formally, denotebyapermutationmatrix, weassume

that

^

D (P; ~v )=min

^

D (P; ~v ) (13)

Themeasure

^

D (P; ~v) hasasecondrole. Sinceitmea-

sures the similaritybetweenthe prototypeandtheim-

age, it canalsobe usedtodeterminetheobject's class.

(6)

Anobject observedinaview ~v belongstotheclass rep-

resentedbyaprototypeP if

^

D (P; ~v )< (14)

forsomeconstant >0. Werefer to(14) asthec atego-

rizationcriterion.

The categorizationstageproceeds as follows. Given

animageI andaprototype P, thecorrespondence be-

tweenP and I is resolvedbyminimizing the measure

^

D(P; ~v )overall possiblepermutation of~v , andif the

obtainedminimum

^

D (P; ~v )isbelowthethreshold , then

theclassidentit yof theobject is determined.

Notethat inour schemetheprototypeandthecate-

gorizationcriteriondeterminetheactual divisionof ob-

jects to classes; anobject belongs to a certainclass if

its views are sucientlysimilar, according to the cate-

gorizationcriterion, to views of the prototype. Under

theabovedenition, anobject belongstoaprototype's

classifthetotal dierencebetweenitsfeaturepointsand

theircorrespondingprototypepointsdoesnotexceed .

Themeasure

^

D (P; ~v )denedheredeterminesthesim-

ilaritybetweenthe prototype P and the view~v using

onlythe distances betweenfeature points. Ingeneral,

sincecorrespondenceis dicult toachieve, suchamea-

surewouldnot berobust. Includingadditional informa-

tion about the features inthe similaritymeasure may

increasethe robustness of the scheme. Also, measures

that consider onlythe proximityof feature points are

limitedintermsof dividingthelibraryintoclasses, since

theyinduceclassesof objectswithhighlysimilarshapes.

Measures that consider additional informationcanex-

tendtheclassestoincludelarger setsof objects.

Themeasure

^

D (P; ~v) canbeenrichedbyconsidering

thesimilaritybetweencorresponding points. Asimple

example for ameasurethat considers boththe proxim-

ityandsimilaritybetweenfeaturepointsisthefollowing

measure. Eachfeature point is associated with a la-

bel (suchasacorner oraninectionpoint). Again, the

measure

^

D (P; ~v )isapplied, butthistimeonlycorrespon-

dences betweenpoints with similar labels are allowed;

namely, corners intheimagecanonlymatchcornersin

theprototype, and, similarly, inectionpoints canonly

matchinection points. Other examples for measures

that combineproximityandsimilarityincludemeasures

that retainthetangentorthecurvatureof points. More

sophisticatedmeasures maycompare the topologies of

the objects inthe two views, or, inother words, verify

that theobjectssharesimilar part structuresin2D.

Auseful technique in measuring the similarity be-

tweenthe image andthe nearest prototype viewis to

consider a dierent set of features thantheset usedto

determinetheprototypetransform. Therational behind

this techniqueis that it is generallydicult torecover

exact feature-to-featurecorrespondence, andwhilesuch

correspondences arenecessaryfor recoveringtheproto-

type transform, similaritymeasures canbesuccessfully

appliedeveninthe absence of exact feature-to-feature

correspondence. This idearesemblesthebasicprinciple

of thealignmentalgorithm[18, 41], inwhichasmall set

of points is usedto compute the object pose, while a

larger set of points is usedtoverifythis pose.

Itshouldbenotedthatthegeneral owofthescheme

and, inparticular, the identicationstageareindepen-

dent of thespecicchoiceof similaritymeasure. As has

beennotedabove, the measure aects the divisionof

model libraries intoclasses andtheselectionof optimal

prototypes for these classes. Anexample for selecting

theoptimal prototypefor agivenclass under themea-

(7)

tures)isdescribedinSection4.

Finally, althoughthemainobjectiveofthecategoriza-

tionstageistodeterminetheclassidentityoftheobject,

thecategorizationschemedescribedaboveisuseful even

if theobject's categorycannot be determined. Section

3.3 belowshows that the prototype transformcan be

reusedtoaligntheimagewiththespecicmodels. Con-

sequently, followingthecategorizationstagethecost of

comparing the image to eachof the specic models is

substantiallyreducedsince thedicult part of recover-

ing the transformationthat relates the models to the

image is appliedonly to the prototype objects. As a

result, if theclassidentityof theobjectcannotbedeter-

minedwestill needtoconsider all thespecicmodelsin

thelibrary, but the overall cost of comparingthemod-

elstotheimagewouldbelowbecausecorrespondenceis

computedoncefor thewholeclass.

3.3 Identic ation

After the observed object is categorized, the system

turnstorecoveringitsindividual identity. At this stage

theimage is matchedto all the models inthe object's

class. For eachmodel, the system seeks torecover the

transformationthat aligns the model to the image, if

thereissuch. Inpreviousschemesthisrequiredrecover-

ingthecorrespondence betweentheimage andeachof

themodels separately. Inour scheme, however, this no

longer is necessary, since theobject transformis deter-

mineddirectlyfromtheprototypetransform. Weshow

inthis sectionthat theprototypeandtheobject trans-

formsarerelatedbyasimpletransformation, whichcan

be computedinadvance, andwhichcaninfact be un-

done already in the library of storedmodels. Conse-

quently, the prototype transformcanbe reusedinthe

identicationstagetoaligntheindividual models with

theimage.

The initial stage of categorization recovers three

piecesof informationthatcanbeusedfor identication.

The three are (i) the object class, (ii) the correspon-

dence betweenthe prototype and the image, and(iii)

the prototype transform. This information is usedin

theidenticationstage as follows. First, since the ob-

ject's class is determined, only models that belong to

this class are considered. Second, using thecorrespon-

dencebetweentheprototypeandtheimageestablished

inthecategorizationstage, andusingthe storedcorre-

spondencebetweentheprototypeandtheobjectmodels,

thecorrespondence betweenthe models andthe image

is immediately recovered. Finally, as is shownbelow,

the model transform, namely, the transformation that

aligns themodel withtheimage, is recoveredfrom the

prototypetransform.

Assume we are givenwitha view~v of some object

model M

i

, namely

~v =M

i

~a (15)

for some transformvector ~a. When the identication

process begins, it is still unknownwhichof the models

M

1

; : : : ; M

l

of the object's class accounts for the image

andwhatthetransformvector~a is. Thersttaskfaced

form, ~a. This is done, as is explainedbelow, using the

prototypetransform

~

b = P +

~v denedin(9). Once~a is

recovered, itis appliedtoall themodelsM

1

; : : : ; M

l , and

the model for whicha near-perfect matchis obtained

determinestheobject's identity.

Theorem1belowestablishesthatthemodel transform

~a canberecovereddirectlyfromtheprototypetransform

~

b byapplyingalineartransformationwhichisreferredto

astheprototy pe-to-modeltransform. Thistransformhas

twointerestingproperties. First, it isview-independent;

namely, foranygivenviewof theobject, thesametrans-

formmaps theprototypetransformthatcorresponds to

thisviewtothecorrectmodel transform. Theprototype-

to-model transformtherefore can be computed in ad-

vance andstoredinthe libraryof models. Second, the

prototype-to-model transformcanbeusedtorecoverthe

model transformregardless of the qualityof matchbe-

tween the prototype and the image. In other words,

evenif the prototypealigns poorlywiththeimage, the

transformationthat aligns themodel withtheimageis

determinedcorrectlyinthis process.

Theorem1: Given a view~v =M

i

~a. Let

~

b =P +

~v

be the proto type tra nsfo rm,tha t is, the tra nsfo rmvec-

to r tha t best a ligns the pro to type withthe ima g e. The

model tra nsfo rm, ~a, ca nb ereco veredfro mthepro to type

tra nsfo rm,

~

b , bya pplying a ma trixA

i

, na mely

~a =A

i

~

b

A

i

is referredto a s the prototy pe-to-model transform.

Proof: Noticethat

~

b =P +

~v =P +

M

i

~a

AssumeP +

M

i

isinvertible, let

A

i

=(P +

M

i )

01

weobtainthat

~a =A

i

~

b

2

Corollary2: The pro to type-to -model tra nsfo rmis

view-indep endent.

Proof: Theprototype-to-model transform, A

i , isin-

dependent of bothposevectors, ~a and

~

b . Changingthe

image~v will result inanewpairof posevectors, ~a and

~

b , but similar to the old pair, the newpair is related

throughthesametransformA

i

. Theprototype-to-model

transformA

i

thereforecanbeusedtorecover theobject

poseforanyviewof M

i . 2

A

i

exists if P +

M

i

is invertible. This condition is

equivalent to requiring that the two columnspaces of

P andM

i

will not beorthogonal inanydirection. The

conditionholds, in general, whenthe two objects are

fairly similar. This is illustrated by the following ex-

ample. Consider the case that bothcolumn spaces of

P andM

i

areone-dimensional; namely, eachrepresents

a line throughthe origin. The only case in this one-

dimensional exampleinwhichA

i

doesnot exist is when

P andM

i

are orthogonal. But these lines are farthest

(8)

objectsarerelativelysimilar A

i

wouldexist.

Since it depends only onthe prototype P and the

model M

i

, theprototype-to-model transformA

i canbe

pre-computedandstoredinthelibraryof models. Every

model M

i

2 C is associatedwithits owntransform A

i

that relates, for everypossibleviewof M

i

, betweenthe

prototypetransformandthemodel transform. Tocom-

pare the image to the model M

i

the model transform

shouldrst be recovered. This is achievedbyapplying

A

i

totheprototypetransformcomputedinthecatego-

rizationstage.

Also, the prototype-to-model transform, A

i

, canbe

usedtoalignthemodel M

i

withtheprototypeP in3D .

Denotethealignedmodel byM 0

i , M

0

i

models thesame

object as M does, since their columnvectors spanthe

samespace. Inaddition, thealignedmodel M 0

i

has the

propertythatitisbroughtbytheprototypetransform,

~

b ,

toaperfect alignmentwiththeimage. Consequently, if

themodelsarealignedinthelibrarywiththeprototype,

theprototypetransformcomputedinthecategorization

stage can be reused for identication with no further

manipulations. Thisis establishedinTheorem3below.

Theorem3: Let M 0

i

=M

i A

i

be themodel M

i

a lig ned

withthepro to typeP. Fo r a nyview~v =M

i

~a, thepro to -

type tra nsfo rmfo r this view

~

b =P +

~v is identica l to the

mo del tra nsfo rmfo r this view;tha t is, ~v =M 0

i

~

b .

Proof: Since

M 0

i

=M

i A

i

weobtainthat

M 0

i

~

b =M

i A

i

~

b =M

i

~a =~v

2

Using Theorem3, the identicationscheme is sim-

pliedas follows. Themodels M

1

; : : : ; M

l

arealignedin

the library withthe prototype P byapplying the cor-

respondingprototype-to-model transform, A

1

; : : : ; A

l . At

recognitiontime, the prototype transform

~

b =P +

~v, is

appliedtothealignedmodels M 0

1

; : : : ; M 0

l

. Accordingto

Theorems1and3, bytransformingthemodels by

~

b the

correctmodel, M 0

i

, wouldperfectlyalignwiththeimage.

Intheschemeaboveweassumedthat full feature-to-

featurecorrespondenceisestablishedbetweentheproto-

typeandtheimage. This assumptionis not mandatory.

Methods for estimating the prototype transformusing

partial correspondenceor byconsideringother types of

features (suchaslinesegments) canalsobeused. Note

thatincasetheprototypetransformcanonlybeapprox-

imated, theaccuracyof themodel transformobtainedis

determinedbythequalityof this approximationandby

theconditionnumber of the prototype-to-model trans-

formA

i

. Theconditionnumber of A

i

aects thematch

evenif Theorem3is applied, namely, evenif themod-

els are alignedwiththe prototype inadvance. Conse-

quently, theconditionnumberoftheprototype-to-model

transformA

i

shouldbetakenintoaccount whentheli-

braryis dividedintoclasses.

Finally, theschemecanbeextendedtohandleclasses

of objects withdierent degrees of freedom. Consider,

folding. Obviously, thefoldingchairshavemoredegrees

of freedomthanthe regular, rigidchairs, andtherefore

theywouldbe representedinthe librarybywider ma-

trices thantherigidchairs are. As is explainedbelow,

the chairs canbe handledina commonclass, andthe

prototypefortheclass woulditself beafoldingchair.

Moregenerally, let M

1

; : : : ; M

l

beaclass of models of

dierent widths, and denote byk

1

; : : : ; k

l

the width of

M

1

; : : : ; M

l

respectively. Let P betheprototypefor this

class, anddenotebyk

p

thewidthof P, weset k

p tobe

k

p

=maxfk

1

; : : : ; k

l

g (16)

In other words, we require the prototype to have the

same degrees of freedomas the most exible object in

theclass. Wecanset k

p

accordingtoourgoal since, asit

isshowninSection4, theprototypeP isobtainedinour

schemebymanipulating the objects inthe class. The

prototype-to-model transformA

i

is denedinthis case

by

A

i

=(P +

M

i )

+

(17)

whereA

i isk

p 2k

i

. Itis straightforwardtoextendThe-

orem1toalsoincludethis case. Consequently, for any

viewofM

i

, themodel transform~a canberecoveredfrom

itscorrespondingprototypetransform

~

b byapplyingthe

prototype-to-model transformA

i to

~

b . Notethat since

k

p k

i

theprototype canappear inposes that donot

matchanypossiblemodel pose(andthereforeinnoise-

less conditions theyare impossible toobtain). Incase

theobject isobservedfromsuchaview, A

i

wouldmap

thisunmatchedprototypetransformtothemodel trans-

formthat correspondstothenearestmatchedprototype

transform. Bysettingk

p

tobeas largeasthemaximum

of k

1

; : : : ; k

l

weavoidcaseswherethereexistviews of the

object that cannot be accountedfor bythe prototype.

Model transforms thatcorrespondtosuchviews cannot

berecoveredfromprototypetransforms.

3.4 Summary

Wepresentedinthissectionaschemeforrecognizing3D

objectsfromsingle2Dviewsthatproceedsintwostages,

categorizationandidentication. Inthecategorization

stage the image is comparedagainst the storedproto-

types. Foreveryprototype, thecorrespondencebetween

theimageandtheprototypeis recovered, andthenear-

est viewof theprototypeisconstructed. Thesimilarity

betweenthisview andtheimageisevaluated, and, if the

twoarefoundsimilar, theclass identit yof the object is

determined. Intheidenticationstagetheobservedob-

ject is comparedagainst the models of its class. Since

theprototypeandthemodelswerebroughtinthelibrary

intoalignment, thesametransformationthat aligns the

prototypetotheimagealso aligns theobject model to

theimage. Theprototypetransformthereforeisapplied

tothemodels, andtheobtainedviewsarecomparedwith

theimage. Theviewthat isfoundtobeidentical upto

noiseandocclusiontotheimagedeterminestheindivid-

ual identityof theobject.

Thepresentedschemeis basedonseveral keyprinci-

pals. Recognitionisdividedintotwosub-processes, cat-

(9)

els arealignedwiththe image, andthe identityof the

object is determinedby a 2D comparison; 3D recon-

structionof the observedobject from the image is not

performed. The dicult component of the alignment

approach, namely, the recoveryof correspondence and

object pose, is performedonlyonce for eachclass; the

prototypetransformisreusedintheidenticationstage

toaligntheimagewiththeindividual models.

4 Constructingoptimal prototypes

Intheschemeaboveweassumedthat theclassesinthe

libraryof models are representedbyprototypeobjects.

Since categorization is achieved by matching the im-

agetoprototype objects, the questionof howto select

thebest prototypeshouldbeaddressed. Inthis section

wepresentanalgorithmforconstructingoptimal proto-

types.

Givena class of objects, the optimal prototype for

this class istheobject that resemblestheobjectsof the

class themost. Under our formulation, suchanobject

wouldshare as manyfeatures as possible withtheob-

jects of its class, the positionof these features onthe

prototypewouldbeascloseaspossibletotheirposition

on the objects, andthe prototype-to-model transform

for theseobjects wouldbeas stable as possible. Below

we showthat the optimal prototype caneectivelybe

computedusing principal component analysis; that is,

bycomputing the dominant eigenvectors for somema-

trixdeterminedbythemodels of theclass.

Principal component analysis often is used in clas-

sicationproblems to construct classes andprototypes

[11]. Inexistingapplications, anobjectisrepresentedby

apointinsomehighdimensional space, whereeachcom-

ponentofthispointcontainsaninvariantattributeofthe

object. Ahyperplaneinthatspace represents aclass of

objects. The goal of the principal component analysis

is, givena set of points (objects), to recover the class

that these points induce. Our caseis somewhat dier-

ent. Inourcaseanobjectisrepresentedbyacontinuous

linear space rather thanbya point. Whereas the use

of hyperplanes inother schemes oftenis arbitraryand

madeprimarilyforconvenience, their useinourscheme

is appropriate followingthe linear combinationscheme

[42] (seeSection3.1).

Thedierences outlinedabovealsoimplydierences

inthe proof that principle component analysis applies

toourcase. Weshowbelowthat theoptimal prototype

canbecomputedbyprincipal component analysis. The

traditional proof needs tobeextendedsinceinour case

objectsarerepresentedbycontinuousspacesrather than

bydiscretepoints.

Theprototypeconstructedinthisprocessis a3Dob-

ject obtainedbymanipulating the objects inits class.

Toallowthe construction, it seems as if the objects in

theclass shouldrstbebroughtintoalignment. Inpar-

ticular, if theobjectsarerepresentedbyviewer-centered

models(thatis, bysetsof theirviews, seeSection3.1for

details), thedierentobjectswouldthenhavetoberep-

resentedbyimages takenfromsimilar viewpoints. Nev-

ertheless, the process presentedbelowdoes not require

isobtainedinthisprocessevenwhentheobjectsarenot

aligned

Wenowturntoconstructing theoptimal prototype.

First, we dene anobjectivefunction. Givena proto-

typeP andanobjectmodel M

i

, wedenethesimilarity

betweenP andM

i

as follows. Let ~v

i

bea viewof M

i ,

wemeasurethesimilaritybetweentheprototypeP and

theview~v

i

using(12). Then, wesumthemeasureover

all possibleviews of M

i

. Assumingwithout loss of gen-

eralitythatk~v

i

k =1, (14) canberewrittenas

^

D (P; ~v

i

) =k(PP +

0I )~v

i

k (18)

Without loss of generality, we can assume that the

constructedprototype, P, is composedof orthonormal

columns. Note that anoverdeterminedmatrixP with

orthonormal columns satises P +

=P T

. Wecanthere-

forerewrite(18)as

^

D (P; ~v

i

) =k(PP T

0I )~v

i

k (19)

ThedistancebetweenP andthemodel M

i

is now given

bysumming

^

D (P; ~v

i

) over all unit-length(to eliminate

scalingeects) views of M

i

, namely

^

D (P; M

i ) =

Z

k~vik=1 k(PP

T

0I )~v

i

k (20)

Toobtaintheobjectivefunction, wesumthesedistances

over all models

E(P) = l

X

i=1 Z

kvik~ =1 k(PP

T

0I )~v

i

k (21)

Theobject P that minimizes this functionis denedto

betheoptimal prototype.

Notethat(21)is nottheonlypossibleobjectivefunc-

tion for this purpose. Analternative \worst case"ap-

proachistomeasurethedistancebetweentheprototype

tothefarthestmodel intheclass(ratherthansumming

this distanceover all models). Exceptforbeingdicult

to compute, this measure also is sensitive to \outlier"

models.

Theprototypethatminimizes(21)canbeconstructed

inaprocessthat includesthefollowingsteps.

1. Tosimplifytheprocessweassumethecolumnvec-

torsof eachof themodel matrices M

i

, (1i l ),

areorthonormal. (Incasetheyarenot, werstap-

plyaGramschmidtprocesstothem. Suchaprocess

obviouslydoesnotalter thespaceof views implied

bythemodels.)

2. Buildthen 2n symmetricmatrix

F = l

X

i=1 M

i M

T

i

3. Findthek dominant eigenvectors of F. Theopti-

mal matrixP is constructedfrom theseeigenvec-

tors.

(10)

totypeobject thatwouldbelongtothegivenclass. This

conditiondetermines thechoice of widthk for thepro-

totype. If all themodels sharethesamewidththenthe

prototype wouldassumethis width. Inthe rigidcase,

for example, k =4 (see Section3.1). As mentionedin

Section3.3above, incasetheobjects havedierent de-

greesof freedom, k issettobethemaximumof k

1

; : : : ; k

l

wherek

1

; : : : ; k

l

arethewidths of M

1

; : : : ; M

l

respectively.

Incasemorethank largeeigenvaluesareobtained, one

mayignoretheseguidelinerulesandconstruct aproto-

typethathashigher degreesof freedomthantheobjects

intheclass (seefor example[31]).

Theorem4belowestablishesthatthealgorithmabove

produces the optimal prototype. Weconsider here the

casethatall theobjectssharesimilardegreesof freedom.

Thesameprocedurecanbeappliedwithslightmodica-

tionstoincludethecaseofobjectswithdierentdegrees

of freedom.

Theorem4: Let M

1 , M

2

, ..., M

l

be a set o f models

belo ng ing to so me cla ss C. Assume every model M

i is

representedbya nn2k ma trixwitho rtho no rma l co lumn

vecto rs. The pro to type P tha t minimizes the term

E(P) = l

X

i=1 Z

kv~ik=1 k(PP

T

0I )~v

i k

where the integ ra tio n is do ne o ver a ll the unit-leng th

views ~v

i

o f ea chmodel M

i

, is co mpo sedo f the k eig en-

vecto rs o f thema trix

F = l

X

i=1 M

i M

T

i

tha t co rrespo ndto its k la rg est eig enva lues.

Proof: Let P becomposedof thek dominanteigen-

vectorsof F. Accordingtoregressionprinciples P min-

imizes theterm

l

X

i=1 k

X

j=1 k(PP

T

0I )~m

ij k

where ~m

ij

is the j'thcolumnvector of M

i

. In other

words, consider ~m

ij

asapointinR n

. Thespacespanned

bythecolumnvectorsof P is thenearestk-dimensional

hyperplanetothesepoints, ~m

ij

. Therest of this proof

extendstheclaimfromthediscretesumoverthecolumn

vectors of M

i

to the continuous integral over all views

spannedbythesevectors. Accordingtoourassumptions,

eachmatrixM

i

contains anorthonormal set of column

vectors. Replacingthesevectorsbyanotherorthonormal

basisfor M

i

will not changethematrixP; that is, P is

independent of the choice of orthonormal basis for the

models. This is illustratedbythefollowing derivation.

Toobtainanew orthonormal basisforthecolumnspace

of M

i

we canapplya k 2k rotation matrixR to M

i

(namely, M

i

R). P is thebest vectorspace for thenew

set aswell, since

M

i R(M

i R)

T

=M

i RR

T

M T

i

=M

i I M

T

i

=M

i M

T

i

torsforM

1

; : : : ; M

n

, andsoitsdominanteigenvectorsrep-

resentthebestvectorspaceforforanyorthonormal rep-

resentationof the objects. Consequently, P minimizes

the objective functionregardless of choice of basis for

themodels, andthereforeitalsominimizes therequired

term

E(P) = l

X

i=1 Z

kvik~ =1 k(PP

T

0I )~v

i k

2

Tosummarize, weshowedthatgivenaclassof object

models, theoptimal prototype for this class is givenby

thedominanteigenvectorsofthematrixF, whichiscon-

structedfromthe object models. Notethat inproving

Theorem4weshowedthattheprototypeisindependent

of choice of basis for themodels. This implies that, in

order toconstructtheprototype, theobject modelsM

1 ,

..., M

l

do not needto rst bebrought intoalignment.

The process aboveguarantees tooutput thesame pro-

totypeobject evenif themodels arenotaligned.

5 Relevancetohumanvision

The recognitionbyprototypes scheme uses the general

shapeof objectsasthecueforrecognizingthem. Aswas

alreadymentioned, classesinourschemecontainobjects

with fairlysimilar shapes. Incontrast, the humanvi-

sual systemrecognizesobjects usingbothshapecuesas

well as manyother cues, suchas color, texture, motion,

andcontext, andobjects are categorizedintheir basic

level of abstraction[33]. Onlylittle is currentlyknown

about the underlyingprocesses for recognitionusedby

thevisual system. Fromwhat is known, inspiteof the

dierences pointedabove, therecognitionbyprototypes

schemeseemstobeconsistentinseveral keyissueswith

psychological andphysiological ndings. Inthis section

webrieyreviewthesendings.

Theschemepresentedinthis paper promotestheno-

tionthatcategorizationandidenticationareperformed

using similar tools. Inbothcases viewvariations rst

arecompensatedfor, andthenaviewof either thehy-

pothesizedprototypeor object model is comparedwith

theimage. This is incontrasttomethods (suchas part

decompositionandfunctional description) that ingen-

eral handle either categorization or identication, but

do not extendto deal withbothproblems. The avail-

ablestudiesinthiscaseareinconclusive. Someevidence

seemtoindicatethatthetwoprocessesarehandledsepa-

ratelybythevisualsystem. Agnosticandprosopagnostic

patientsoftendemonstratedegradedidenticationabili-

ties, whereastheirperformanceincategorizationremains

intact. Doubledissociationbetweenthetwoprocesses,

however, hasnotbeenfound, andsotheassumptionthat

thetwoprocessesarehandledseparatelyinthebrainhas

not beenestablished. Infact, both cells that respond

to general faces as well as cells that respondtospecic

faces where found lying side by side within the same

brainarea, STS, of themacaquemonkey[29]. Thevul-

nerabilityof the identicationprocess to brainlessions

canbeexplainedbythattheprocessrequiresarelatively

largememorytoencodethedetailedshapesof objectsas

(11)

cover adetaileddescriptionof theobservedobject from

theimage(seee.g., [19]).

Another ideaproposedhereis thatcategorizationin-

volvestwostages: astageof compensatingforview vari-

ationsfollowedbyastageof 2D comparisontoaccount

for shape dierences. Adecoupling of viewvariation

andsemanticcategorizationwas suggestedbyLissauer

[24]. Warrington and Taylor [44, 45] found that pa-

tients that suer fromlessions intheposterior lobe of

theright hemispheredemonstratediculties incatego-

rizingobjects fromunconventional views, whereas their

performance in categorization of objects fromconven-

tional viewsremains intact. Additional evidenceforthe

eect of viewvariations oncategorizationperformance

werefoundforhealthysubjects. Subjectsthatareasked

to name objects respond slower whenthe objects ap-

pearinunconventional views[28]. Also, mental rotation

eects, namely, responsetime that grows linearlywith

thetilt of theobject, wereobservedinnaming tasks of

natural objects [21].

Finally, the process of categorizationpresentedhere

is achieved by comparing the image to prototype ob-

jects, andtheseprototypeobjectscanbeconstructedby

manipulatingthe familiar objects of the class. Recent

studies indicate that response time innaming tasks is

typicallyshorter anderrorrates arelower whentheob-

servedobject is similar tothe prototype[5]. Similarly,

shorterreactiontimeisobtainedwhensubjectsareasked

toanswer questions of thetype\does theobject X be-

long tothe class Y?"[34]. Other studies reportedthat

childrenlearngoodexamplesof classesbeforetheylearn

poor ones [1, 32] andthat subjects recall having seen

the prototype or average conguration of studiedface

imagesevenif this congurationwasnotstudied[8].

To summarize, although the presentedscheme gen-

erallydoes not recognize objects intheir basic level of

abstraction, itisconsistentwithpsychological andphys-

iological ndings inseveral keyissuesincludingasingle

approachfor the two sub-problems of recognition, cat-

egorizationand identication, viewdependencyof the

twosub-processes, andtheroleof prototypesincatego-

rization. The ndings discussedhere obviouslyare in-

conclusive, sincepsychological andphysiological studies

including the ones discussedhere have more than one

possibleinterpretation.

6 I mplementation

Totesttheideaspresentedinthepaper, wehaveimple-

mentedtheschemeandappliedit toseveral objects. In

our implementation, thelibraryof models includedtwo

classes. Therst (Figure 2) containedtwofour-legged

chairs(denotedbyA andB), andthesecond(Figure3)

includedtwocarmodels, aVWandaSaab.

Todemonstrate categorization, weusedchairAas a

prototypeandmatchedittoanimageof chairB. Corre-

spondences betweenthe prototype andthe image were

pickedmanually, and, usingthesecorrespondences, the

prototype transformwas recoveredand appliedto the

prototype. Theresultsof matchingthetransformedpro-

totypewiththeimageareseeninFigure4. Itcanbeseen

thesameorientationastheobservedobject (leftgure),

andthatthematchbetweenthetwoisgoodconsidering

that theobjects havedierentshapes. Notethat inthis

implementationweallowedtheobjects toundergogen-

eral anetransformations in3D , includingstretchand

shear, andsothematchbetweentheprototypeandthe

imagewasbetterthanif onlyrigidtransformationswere

allowed. Additional examplesusingchairBandthetwo

cars astheprototypesareshowninFigures5-7.

InFigures8-9wetriedtomatchtheprototypestothe

imageswithwrongcorrespondences. Theresultsofthese

matchesweresignicantlyworsethanwhenthecorrect

matcheswereused. Thisisconsistentwiththeideadis-

cussedinSection3.2that thequalityof thematchcan

beusedastheobjectivefunctionforresolvingthecorrect

correspondence.

Figure 10 shows the results of matchinga prototype

four-leggedchair to a single-leggedoce chair. It can

beseenthattheupper portionsof thechairsmatchrel-

ativelywell, whilethelegs of thechairs donotndap-

propriatematches.

Figure 11 shows the result of matching a prototype

chair to animage of a Saabcar. As ananecdotal ex-

ample, wematchedtheholebelowthebackof thechair

to thewindshieldof thecar andtheseat to the hood.

Ingeneral, whatevercorrespondenceisused, thetwoob-

jects wouldmatchpoorlyrelativetomatchingthepro-

totypestoobjects of theirclass.

Figures12-13demonstratetheidenticationstage. In

the librarywerst alignedthe model for chair Awith

the prototype chair (chair B) using the prototype-to-

model transform. Then, animageof chair Awas cate-

gorized(Figure5)bymatchingittotheprototypechair,

andtheprototypetransformwascomputed. Inthenext

step, theprototypetransformwasappliedtothespecic

model of chair A. Theresult of this applicationis seen

inFigure 12. It canbe seenthat a near-perfect align-

mentwasachievedinthisprocess. Asimilarprocesswas

appliedtotheVWcar inFigure13usingtheSaabcar

as the prototype. (Theresult of thecorrespondingcat-

egorizationstage is shownin Figure 6.) These gures

demonstratethatalthoughaperfect matchbetweenthe

prototypeandtheimagecouldnotbeobtained, thepro-

totypetransformcanstill beusedtoaligntheobserved

objectwithits specicmodel.

7 Summary

Weintroducedinthis paper arecognitionscheme that

proceedsintwostages: categorizationandidentication.

Categorizationisachievedbyaligningtheimagetopro-

totypeobjects. For everyprototype, thenearest proto-

type viewis recovered, andthesimilaritybetweenthis

viewand the image is evaluated. The prototype that

most resemblestheobservedobject determines its class

identity. Likewise, identication is achievedbyalign-

ing the observedobject to the individual models of its

class. At this stage the prototypetransformcomputed

inthecategorizationstageis reusedtoalignthemodels

withthe image. Themodel that matches the observed

object determines its specic identity. Inaddition, we

(12)

wereconst ruct edfomsingleimagesusingsymmet ry[31].

Figure3: Pict ures of t wocarsusedas models. Left : aVWmodel. Right : aSaabmodel. Models for t het wocarswere

borrowedfrom[42].

Figure4: Mat chingaprot ot ypechair( chairA)t oanimageof chairB. Thisgure, aswellast herestoft hegures, cont ain

t hreepict ures. Left : t heimaget oberecognized. Middle: t heappearanceof t heprot otypefollowingt heapplicat ionof t he

prot otypet ransform. Right : anoverlayof t heleftandt hemiddlepict ures.

(13)

Figure6: Mat chingaprot ot ypecar( Saab)t oanimageof aVWcar.

Figure7: Mat chingaprot ot ypecar( VW)t oanimageof aSaabcar.

(14)

Figure9: Mat chingaprot ot ypecar( Saab)t oanimageof aVWcarwit hwrongcorrespondence.

Figure10: Mat chingafour-leggedchairt oanimageof anocechair.

(15)

Figure 12: Mat chingamodel of chair At oanimageof t he same chair usingt he prot ot ype t ransform comput edint he

cat egorizat ionst age.

Figure 13: Mat chingamodel of aVWcar t oanimageof t hesame car usingt he prot ot ypet ransform comput edint he

cat egorizat ionst age.

(16)

totypesanddiscussedtherelevanceof theschemetohu-

manrecognition.

Animportant issue conveyedbyour scheme is that

categorizationcanbeusedtofacilitatetheidentication

of objects. Weshowedthat byrst categorizingtheob-

ject, thedicultstagesofthealignmentprocess, namely,

therecoveryof theobject poseandthecorrespondence

betweentheimageandthemodel, canbeperformedonly

onceperclass. Consequently, identicationisreducedin

thisschemeintoaseriesof simpletemplatecomparisons.

The schemepresentedinthis paper diers fromex-

istingcategorizationschemesintwoimportant aspects.

Theexistingschemes(e.g., [4]) rst attempt torecover

thepart structure(geons) of theobject fromtheimage

alone. This structure is assumedto be almost invari-

ant bothtorotationof theobject andacross objects of

the same class. In contrast, our scheme does not at-

tempt to recover any3D information fromthe image

alone. Moreover, it separates thetwoeectsthat deter-

minetheobject'sappearance: view variationeectsand

deformationsduetoclassvariability. Viewvariationsare

compensatedforbyrecoveringtheview of theprototype

that most resembles theimage, andtheamount of de-

formationthat separatestheprototypefromthespecic

object is evaluatedbyassessing the dierence (in2D )

betweenthenearest prototypeviewandtheimage.

Openproblemsforfutureresearchincludesolvingthe

correspondencebetweenprototypesandimages, combin-

ingtheschemewithexistingindexingapproaches, den-

ingeectivemeasurestoevaluatethequalityof matches,

andextendingthesystemtoincorporateadditional cues,

suchas color andtexture.

Acknowledgement

I wishtothankShimonUllmanfor encouragement and

advice, TaoAlter andYael Mosesfor manyfruitful dis-

cussions, Dror Bar Natanfor his assistance inverifying

theproofforTheorem4, EricGrimson, JohnHarris, and

TomasoPoggiofor commentsonearlier drafts.

References

[1] Anglin, J., 1976. Les premiers termes de reference

delenfant. InS. EnrlichandE. Tulving(Eds.),La

memoire semantique. Paris: Bulletin de Psycholo-

gie.

[2] BajcsyR. andSolinaF., 1987. Three dimensional

object representationrevisited. Proc. of 1st ICCV

Conference, London, 231-240.

[3] Basri R. andUllmanS., 1988. Thealignmentof ob-

jects withsmoothsurfaces. Proc. of 2ndInt. Conf.

of Computer Vision, Florida, 482-488.

[4] Biederman, I. 1985. Humanimage understanding:

recent research and a theory. Computer Vision,

Graphics, andImage Processing, 32, 29-73.

[5] Biederman, I., 1988. Aspects and extensions of

a theory of human image understanding. In Z.

Py ly shy n(Ed.), Computational ProcessesinHuman

NJ: Ablex<, 370-428.

[6] Binford, T.O., 1971. Visual perception by com-

puter. IEEEConf. onSy stems andControl.

[7] Brooks, R., 1981. Symbolic reasoning among 3-

dimensional models and2-dimensional images. Ar-

ticial Intelligenc e, 17, 285-349.

[8] Bruce, V., 1990. Perceiving andrecognizingfaces.

Mindand Language, 5(4), 342-364.

[9] Chien, C.H. andAggarwal, J.K., 1987. Shaperecog-

nitionfromsinglesilhouette. Proc. of ICCVConf.,

London, 481-490.

[10] Davis L.S., 1979. Shape matchingusingrelaxation

techniques. IEEETrans. on Pattern Analy sis and

Machine Intel., 1(1), 60-72.

[11] Duda, R.O. andHartP.E., 1973. Patternclassica-

tionandsceneanalysis. Wiley -IntersciencePublic a-

tion, JohnWileyandSons, Inc.

[12] Faugeras, O.D. andHebert, M., 1986. Therepresen-

tation, recognitionandlocationof 3Dobjects. Int.

J. Robotics Research5(3), 27-52.

[13] Fischler, M.A. and Bolles, R.C., 1981. Random

sampleconsensus: aparadigmformodel ttingwith

applicationto image analysis and automatedcar-

tography. Com. of theA.C.M., 24(6), 381-395.

[14] Forsyth, D., Mundy, J.L., Zisserman, A., Coelho,

C., Heller, A., andRothwell, C., 1991. Invariantde-

scriptorsfor3-Dobjectrecognitionandpose. IEEE

Tr ans. onPatternAnaly sis andMachine Intel., 13,

971-991.

[15] Grimson W.E.L. and Lozano-Perez T., 1984.

Model-based recognition and localization from

sparsedata. Int. J. of Robotics Research, 3, 3-35.

[16] Ho, S., 1987. Representingandusingfunctional def-

initions for visual recognition. Ph.D. Dissertation,

University of Visconsin, Madison.

[17] Homan, D. andRichards, W., 1984. Partsof recog-

nition. Cognition,, 1865-1896.

[18] Huttenlocher, D.P., andUllman, S., 1990Recogniz-

ingSolidObjectsbyAlignmentwithanImage, Int.

J. Computer Vision, 5(2), 195-212.

[19] Humphreys G.W. andRiddochM.J., 1987. Tosee

but not to see. Acase study of visual agnosia.

LawrenceErlb aum Associates, Pub., London.

[20] Jacobs, D.W., 1992. Space ecient 3Dmodel in-

dexing. Proc. of Image Understanding Workshop,

717-725.

[21] Jolicoeur P., 1985. The time to name disoriented

natural objects. MemoryandCognition, 13(4), 289-

303.

[22] Koenderink, J.J. andVanDoorn, A.J., 1982. The

shapeof smoothobjectsandthewaycontoursend.

Perception, 11, 129-137.

[23] Lamdan, Y., Schwartz, J.T., andWolfson, H., 1987.

On recognition of 3-Dobjects from2-Dimages.

Courant Inst. of Math. Sci., Rob. TR122.