ARTIFICIALINTELLIGENCELABORATORY
A.I. Memo No. 1391 December, 1992
Recognitionby Prototypes
Ronen Basri
Abstract
A schemeforrecognizing3D objects fromsingle 2Dimages is introduced. Theschemeproceeds intwo
stages. Inthe rst stage, the categorization stage, the image is comparedto prototype objects. For
eachprototype, the viewthat most resembles the image is recovered, and, if the viewis foundto be
similar totheimage, theclass identityof theobject isdetermined. Inthesecondstage, theidentication
stage, theobservedobjectiscomparedtotheindividual modelsof its class, whereclassesareexpectedto
containobjects withrelativelysimilarshapes. Foreachmodel, aviewthat matchestheimageissought.
If sucha viewis found, the object's specic identityis determined. The advantageof categorizing the
object before it is identiedis twofold. First, the image is comparedto a smaller number of models,
since onlymodels that belongtotheobject'sclass needtobeconsidered. Second, thecost of comparing
theimage to eachmodel inaclass is verylow, because correspondence is computedonce for thewhole
class. Morespecically, thecorrespondenceandobjectposecomputedinthecategorizationstagetoalign
theprototype withtheimagearereusedintheidenticationstageto alignthe individual models with
theimage. As aresult, identicationis reducedtoaseries of simple template comparisons. Thepaper
concludeswithanalgorithmfor constructingoptimal prototypesfor classes of objects.
Copyright c
Massachuset t sInst it ut eofTechnology,1993
Thisreportdescrib esresearchdoneatt heArt icial I nt elligenceLaborat oryoft heMassachuset t sI nst it ut eof Technologyand
t heMcDonnell-PewCent erforCognit iveNeuroscience. Supportfort helaborat ory'sart icial int elligenceresearchisprovided
inpart byt heAdvancedResearchProject s Agencyof t heDepart ment of Defenseunder Oceof Naval Researchcont ract
N00014-91-J-4038. RonenBasri issupport edbyt heMcDonnell-Pewandt heRot hchildpost doct oral fellowships.
Our worldcontains anoverwhelmingvariet yof objects.
Whilepeopledemonstrateoutstandingabilitiestomem-
orize and recognize thousands of objects [27, 37, 38],
computer vision applications largely fail to accommo-
datethesenumbers. Apparently, themaintool thaten-
ables people to eectivelyhandle this massive amount
of objectsiscategorization. Bydividingtheobjectsinto
classes, thevisual systemiscapableof concludingprop-
erties of unfamiliar objects fromtheir resemblance to
familiarones. For familiarobjects, categorizationoers
anindexingtool intothestoredlibraryof object repre-
sentations.
Recognitioncanbe performedindierent \levels of
abstraction". For example, thesameobject canberec-
ognizedasaface, ahumanface, orasaspecicperson's
face. Psychological studiessuggesttheexistenceofapre-
ferredlevel forrecognition, called\thebasiclevel of ab-
straction"[33]. Existingcomputational schemesusually
approachrecognitionineither oneof twolevels. Several
schemes attempt toclassifyobjects intheir basiclevel
of abstraction(werefer tothis taskbyc ategorization),
while other schemes attempt to determine the specic
identityof objects (we refer to this taskbyidentic a-
tion). This paper presentsanovel approachfor recogni-
tionthatcombinesthetwotasks.
Toseehowthetwotasksarerelated, consider thefol-
lowingexample. Supposeyouarewalkingdownastreet,
andsomeone is coming towards you. You lookat the
person's face, andit looks familiar, but youcannot tell
whoitis. Soyoutrytopicturethepeopleyouknowwho
looklikethepersonyousee, until nally, yourealizewho
thepersonis.
Anumber of hypothesescanbedrawnfromthisstory.
First, recognitioncan be brokeninto two stages: cat-
egorization and identication, where categorization is
believedto precede identication. Second, during the
course of recognitionthe image is comparedagainst a
number of objectmodels. Assumingthatindeedcatego-
rizationprecedesidentication, onlymodels that belong
totheobject'sclassneedtobeconsidered. Finally, when
anewmodel is comparedtotheimage, thecomparison
processmaybenetfromtheuseofinformationacquired
duringcategorization. Notethatthesituationdescribed
hereis not specictofaces. Onecanimaginethat simi-
larsituationsoccurwhenotherobjects, suchasanimals,
cars, andchairs, areobserved.
To see howinformationacquiredduring categoriza-
tioncanbeusedforidentication, consider theexample
offacerecognition. Whenafaceisrecognized, theimage
positionsof its parts andfeaturesareknown. Inpartic-
ular, an observer alreadyknows where the eyes, nose,
andmouthare andcaneveninfer thedirectionof gaze
andexpression. Theperson'sidentityisnotessential for
extractingandlocatingthesefeatures. Instead, theyare
matchedagainst features ina\generic"representation.
Inaddition, other features, suchas abeard, hair style,
andwrinkles, that maybetter distinguishbetweendif-
ferent persons maybe located. Moregenerally, wecan
postulatethat, duringcategorization, sub-structures of
and locatedwith respect to a generic model, and the
object's poseis determined.
Tofollow this example, I proposeaschemefor recog-
nizing 3D objects from single 2D views that combines
the two stages, categorizationandidentication. Cat-
egorizationis achievedbyaligning the image toproto-
type objects. Theprototype that appears most similar
totheimagedeterminestheclass identit yof theobject.
After theobject iscategorized, itsspecicidentityisde-
terminedbyaligning theobservedobject toindividual
models of itsclass. Byrstcategorizingtheobject, not
onlythenumber of models consideredfor identication
is reduced, but also thecost of comparing eachmodel
totheimagesignicantlydecreases. This is achievedby
reusing the correspondence andpose computedfor the
prototypeinthecategorizationstagetoaligntheimage
withtheindividual models. Weshow inthispaperthat,
albeit a perfect matchbetweenthe prototype andthe
image is not obtainable, the correspondence and pose
can be computedfor the prototype, and can be used
to bring the image and the object's model into align-
ment. Consequently, recoveringthecorrespondenceand
posefortheindividual modelsbecomesunnecessary, and
identicationis reducedto a series of simple template
comparisons.
Therestof this paper isdividedas follows. Section2
reviews themainexistingapproachesfor categorization
and identication. Section 3 presents the scheme of
recognitionbyprototypes. Section4 proposes analgo-
rithmforgeneratingoptimal prototypesfor thescheme.
Section 5 discusses the relevance of the scheme to hu-
manrecognition. Implementationresults are presented
inSection6.
2 Previous Approaches
Existingschemesfor categorizationoftenusea \reduc-
tionist"approach. Theimage, whichcontainsadetailed
appearance of anobject, is transformedintoacompact
representation that is invariant for all objects of the
sameclass. Onecommonapproachtogeneratingsucha
representationis bydecomposingtheobject intoparts.
Parts areextractedbycuttingtheobject inconcavities
[17, 22, 43] andlabeledaccordingtotheirgeneral shape.
The labels, together with the spatial relationships be-
tweentheparts, areusedtoidentifytheclass of theob-
ject[4, 6, 7, 26]. Asecondapproachextractsthepartsof
theobjectthat fulll certainfunctions. Thelistof func-
tions is usedtodeterminetheobject's class[16, 39, 47].
Schemesthat breakobjects intopartsareinsucient
toexplainall theaspectsof recognitionforthefollowing
reasons. First, inmanycasesobjects thatbelongtothe
sameclassdieronlybytheirdetailedshape, whilethey
shareroughlythesameset of parts. Moreover, evenob-
jects that at somelevel maybeconsideredbelongingto
dierentclasses, suchasacatandadog, mayalsoshare
roughlythesameset of parts. Tosolvethisproblemsev-
eral systemsalsostore, inadditiontothepartstructure
of theobjects, the detailedshapeof theparts [2, 6, 7].
Another problemis thatmanyof thetechniquesfor rec-
ognizing objects bypart decompositionrelyonnding
To recognize the specic identityof objects, a rel-
ativelydetailedrepresentationof the object's shape is
comparedwiththe image. Anexample for suchmeth-
odsisalignment[3, 9, 12, 13, 18, 25, 40, 41]. Alignment
involvesrecoveringthepositionandorientation(pose)in
whichtheobjectisobservedandcomparingtheappear-
anceof theobject fromthat posewiththeimage. Only
afewattempts havebeenmade inthe past to extend
thealignment scheme to the problemof object catego-
rization(e.g., [36]). Themaindicultyinapplyingthe
alignment approachis the recoveryof the pose of the
observedobject. Inmost implementations this involves
atime-consumingstage for nding the correspondence
betweenthemodel andtheimage. Theprocessbecomes
impractical whentheimageis comparedagainstalarge
libraryof objects, becausetypicallythecorrespondence
isestablishedbetweentheimageandeachof themodels
inthelibraryseparately.
Tohandlelargelibraries, indexingmethodswerepro-
posed(e.g., [20, 46, 14]). Thebasicideaisthefollowing.
Acertainfunctionisdenedandappliedtotheviewsof
all theobjectsinthelibrary. Theobjectmodels arear-
rangedinalook-uptableindexedbytheobtainedfunc-
tion values. Whenanimage is given, the function is
appliedtotheimage, andtheobtainedvalueis usedto
indexintothetable. Toreducethesize of thetableand
the complexity of its preparation, invariant functions,
functionsthat whenappliedtodierent views of anob-
jectreturnthesamevalueregardlessof viewpoint, often
areusedas theindexingfunctions.
Indexing methods suer fromseveral shortcomings.
First, existing indexing methods handle only rigidob-
jects. Extendingthesemethods tohandleclasses of ob-
jects has not beendiscussed. Second, becauseof com-
plexityissues, indexingfunctions usuallyareappliedto
small numbers of features. As a result, high rates of
falsepositivesareobtained, andtheeectiveness of the
indexingis reduced.
The scheme presentedin this paper is designed to
workwheretraditional approachestocategorizationand
indexingfail. Theschemecombinesbothcategorization
andidenticationof objects, andusesfairlydetailedrep-
resentations for objects. Rather thanindexing directly
to the specic object model, the scheme indexes into
the library of objects bycategorizing the object. The
classeshandledbytheschemeincludeobjects withrel-
ativelysimilar shapes. Tot intothescheme, insome
casesbasiclevel classesarebrokenintosub-classes. The
general problemof categorizationthereforemayrequire
additional tools.
3 Recognitionby Prototypes
The recognitionbyprototypes scheme proceeds as fol-
lows. Alibraryof 3D object models is storedinmem-
ory. Themodels inthelibraryaredividedintoclasses,
and3D prototype objects are selectedto represent the
classes. For every class, the models in the class are
alignedinthe librarywiththe prototype object. The
roleof this 3D alignmentwill becomeclearshortly.
matchedagainst all of the prototypes. For eachproto-
typeobject, thesystemattempts torecover theviewof
theprototypethat most resemblestheimage. Todoso,
thesystemrecoversthecorrespondencebetweenthepro-
totypeandtheimage, and, usingthiscorrespondence, it
determinesthetransformationthat best aligns thepro-
totype with the image. This transformation, referred
to as the prototype transform, is then appliedto the
prototype, andthe similaritybetweenthe transformed
prototypeandtheactual imageis evaluated. Since the
observedobjectingeneral diersfromtheprototypeob-
ject, aperfect matchbetweenthetwoisnotanticipated.
Thesystemthereforeseeks aprototypethat reasonably
matchestheimage. Oncesuchaprototypeisfound, the
class identityof theobject isdetermined.
After theobject's class isdetermined, thesystemat-
tempts torecover thespecicidentityof theobject. At
this stage, theimage is matchedagainst all themodels
of the object's class. For eachof thesemodels, thesys-
temseeks torecover thetransformationthat aligns the
model withtheimage. Aswill beshownbelow, sincethe
modelsarealignedinthelibrarywiththeprototype, the
transformationthat best aligns the prototype withthe
imageis identical tothe transformationthat aligns the
model totheimage. Theprototypetransformtherefore
is appliedto the specic models, andtheir appearance
fromthis poseis comparedwiththeimage. Themodel
that aligns withtheimage, if thereis such, determines
thespecicidentityof theobject.
Therest of this sectionis dividedas follows. InSec-
tion3.1theobject representationusedinourschemeis
presented. Section3.2describesthecategorizationstage,
andSection3.3describestheidenticationstage.
3.1 Object representation{ the linear
c ombinationscheme
In our scheme, an object is modeled by a matrix M
of size n2k, wheren is thenumber of featurepoints,
andk represents the degrees of freedomof the object.
Avector ~a 2 R k
, referredto as the transformve ctor,
represents thetransformationappliedtotheobject ina
certainview, andtheobject'sappearancefromthisview
is givenby
~v=M~a (1)
Intherestof thissectionweexplaintheuseofthisnota-
tion. Thenotationfollows fromthelinear combination
scheme[42], whichis brieyreviewedbelow.
Under the linear combination scheme an object is
modeled by a small set of views, each is represented
byavectorcontainingpointpositions, wherethepoints
in these views are ordered in correspondence. Novel
viewsof theobjectareobtainedbyapplyinglinearcom-
binations to the stored views. Additional constraints
mayapplytothecoecientsof thislinearcombination.
Computingtheobjectposethereforerequiresrecovering
thecoecients of thelinear combinationthat alignthe
model withthe image andverifying that the recovered
coecients indeedsatisfythe constraints. Themethod
handlesrigidobjects under weak-perspectivepro jection
(namely, orthographicpro jectionfollowedbyauniform
of objectswithsmoothboundingsurfaces andtohandle
articulatedobjects. Inour representation, thecolumns
of themodel matrixMcontainviews of theobject, and
thecoecientsof thelinear combinationthat alignthe
model withtheimagearegivenbythetransformvector
~a.
For concreteness, we reviewthe linear combination
schemefor rigidobjects. Consider a 3D object O that
containsn featurepoints (X
i
;Y
i
; Z
i
), 1in. Under
weak-perspective pro jection, the positionof the object
following a rotation R , translation
~
t, and scaling s is
givenby
x
i
= s r
11 X
i +s r
12 Y
i +sr
13 Z
i +t
x
y
i
= s r
21 X
i +s r
22 Y
i +sr
23 Z
i +t
y
(2)
wherer
ij
arethecomponentsof therotationmatrix, R,
andt
x , t
y
arethehorizontal andvertical componentsof
thetranslationvector,
~
t respectively.
Denoteby
~
X;
~
Y ;
~
Z; ~x; ~y 2 R n
vectors of X
i
; Y
i
; Z
i
; x
i
andy
i
values respectively, anddenote
~
1 =(1; :: : ; 1) 2
R n
, wecanrewriteEq. 2inavectorequationasfollows:
~x = a
1
~
X+a
2
~
Y +a
3
~
Z +a
4
~
1
~y = b
1
~
X+b
2
~
Y +b
3
~
Z +b
4
~
1
(3)
where
a
1
= s r
11
b
1
= s r
21
a
2
= s r
12
b
2
= s r
22
a
3
= s r
13
b
3
= s r
23
a
4
= t
x
b
4
= t
y
Therefore
~x; ~y 2 spanf
~
X;
~
Y ;
~
Z;
~
1g (4)
Dierent views of the object are obtained by chang-
ingtherotation, scale, andtranslationparameters, and
thesechangesresultinchangingthecoecientsinEq. 3.
Wemaythereforeconcludethat all theviews of arigid
object arecontainedina4Dlinear space.
This property, that the views of a rigid object are
containedina 4D linear space, provides a methodfor
constructingviewer-centeredrepresentationsfor theob-
ject. Theideaistouseimagesof theobjecttoconstruct
abasisforthis space. Ingeneral, twoviewsprovidesuf-
cientlymanyvectors. Therefore, anynovel viewis a
linearcombinationof twoviews[30, 42].
Noteverylinearcombinationisavalidview of arigid
object. Followingtheorthonormalityof therowvectors
of the rotation matrix, the coecients in Eq. 3 must
satisfythetwoquadraticconstraints
a 2
1 +a
2
2 +a
2
3
=b 2
1 +b
2
2 +b
2
3
a
1 b
1 +a
2 b
2 +a
3 b
3
=0
(5)
When the constraints are not satised, distorted (by
stretchor shear) pictures of the objects are generated.
Incaseaviewer-centeredrepresentationisused, thecon-
straintschangeinaccordancewiththeselectedbasis. A
thirdviewof theobject canbeusedtorecover thenew
constraints.
Forthepurposeofthispaperamodelforarigidobject
canbeconstructedbybuildingthefollowingn24model
matrix
M=(
~
X;
~
Y ;
~
Z;
~
1)
~x = M~a
~y = M
~
b
(6)
where~a = (a
1
; a
2
; a
3
; a
4 ) and
~
b = (b
1
; b
2
; b
3
; b
4
) arethe
coecients from Eq. 3. Notice that thetwolinear sys-
tems canbemergedintoonebyconstructingamodied
model matrixinthefollowingway
~x
~y
=
M 0
0 M
~a
~
b
(7)
Similar constructions canbe obtainedfor objects with
smooth bounding surfaces andfor articulatedobjects.
Thewidthof M, k, shouldthenbemodiedaccording
tothedegrees offreedomof themodeledobject. Aswas
mentionedabove, viewer-centeredrepresentationscanbe
obtainedbyconstructinga basis for the4D space from
imagesof theobject. Therefore, viewer-centeredmodels
canbeobtainedbyreplacing the columnvectors of M
withtheconstructedbasis.
To summarize, following the linear combination
schemewecanrepresent anobject byamatrixM and
construct views of the object by applying it to trans-
formvectors ~a. For rigid objects not everytransform
vector is valid; the componentsof thetransformvector
must satisfythetwoquadraticconstraints. Recognition
involvesrecoveringthetransformvector~a andverifying
that its components satisfythe twoconstraints. Ignor-
ingtheseconstraintswill resultinrecognizingtheobject
evenwhenitundergoesgeneral 3Danetransformation.
In the analysis belowwe largely ignore the quadratic
constraints. Theseconstraints, however, canbeveried
bothduring the categorizationstage as well as during
theidenticationstage.
3.2 Categorization
Therecognitionbyprototypes schemebegins bydeter-
miningthe object's category. This is achievedbycom-
paringtheobservedobjecttoprototypeobjects, objects
thatare\typicalexemplars"fortheirclasses. Foragiven
prototype, theviewof theprototype that most resem-
blestheimageis recoveredandcomparedtotheactual
image, andtheresult of thiscomparisondeterminesthe
class identityof theobject.
Webeginour descriptionof the categorizationstage
bydeningthedata structures usedbythe scheme. A
class C =(P ; fM
1
; M
2
; : : : ; M
l
g) is apair that includes a
prototypeP andaset of object models M
1
; M
2
; : : : ; M
l .
Boththe prototype andthe models are representedby
n 2k matrices, where n denes the number of feature
pointsconsidered, andk denotesthedegrees of freedom
of theobjects. Forthesakeof simplicityweassumehere
that all theobjects intheclass share thesamenumber
of featurepoints, n, andthat theyhavesimilar degrees
of freedom, k. Note that similar objects tendto have
similar degrees of freedom (e.g., all of them are rigid).
Bothassumptions are not strict, however. The scheme
canbemodiedtotoleratebothvaryingnumber of fea-
turepoints as well as dierent degrees of freedom. The
details will bediscussedlater inthis paper. Notethat
viewer-centeredrepresentations. Incaseviewer-centered
representationsareusedweshall assumethatthemodels
representtheobjectsfromthesamerangeofviewpoints.
Aclass inour scheme contains objects withsimilar
shapes. These objects share roughly the same topolo-
gies, and there exists a \natural" correspondence be-
tweenthem. Consider, for instance, the two chairs in
Figure1. Althoughthe shapes of thesechairs are dif-
ferent, andsome parts (e.g., thearms) appear onlyin
onechairandnotintheother, anatural correspondence
betweenfeatures inthetwoobjects canbedetermined.
Inthelibraryof models, the natural correspondence
betweenobjects is made explicit. It is speciedbythe
orderoftherowvectorsof themodels. Specically, given
aprototype P andobject models M
1
; : : : ; M
l
, we order
therowsof thesemodelssuchthattherstfeaturepoint
of P correspondstotherstfeaturepointof eachof the
modelsM
1
; : : : ; M
l
, andsoforth.
Giventhelibraryof objectsandgivenanincomingim-
age, therecognitionbyprototypesschemebeginsbycat-
egorizingtheobject observedintheimage. Toachieve
this goal, the prototype objects are alignedandcom-
paredtotheimage. Foreveryprototype, thecorrespon-
dence betweentheimage andthe prototype is rst re-
solved, and, usingthis correspondence, thenearestpro-
totype viewis recovered. Bydoing so, the schemede-
couplesthetwofactorsthataecttheappearanceof the
object inthe image, namely, viewvariationsandshape
variations. Byselecting the nearest prototype viewto
theimage, theschemecompensatesfor viewvariations.
Then, byevaluatingthesimilaritybetweenthenearest
prototypeview andtheactual image, itaccountsforthe
dierences inshapebetweentheprototype andthe ob-
servedobject.
Therststageinmatchingtheprototypetotheimage
involves the recoveryof correspondence betweenproto-
type and image features. In existing systems for rec-
ognizingthespecicidentityof objects establishingthe
correspondence betweenimages and object models in-
volves atime-consumingprocess inwhichsophisticated
algorithms are applied[10, 13, 15, 18, 23, 25, 35, 41].
These algorithms relyonthe propertythat, whenthe
correctcorrespondencebetweenamodel andanimageis
established, anear-perfectmatchbetweenthetwoisob-
tained. Whilethisassumptionisvalidforidentication,
it cannotbeusedunder our schemesincetheprototype
andtheimagegenerallyrepresentdierentobjects.
Todeterminethecorrespondence betweentheproto-
typeandtheimage, wedeneanobjectivefunctionthat
isappliedtotheprototypeandtheimageunder agiven
correspondenceandthatobtainsitsminimumunder the
correctcorrespondence. Theobjectivefunctionwill mea-
surethequalityof thematchbetweentheprototypeand
theimage. Namely, under this measurethecorrect cor-
respondence is the one that brings the prototype into
its bestalignmentwiththeimage. Giventhis objective
function, correspondenceisacombinatorial optimization
problem, andsominimizationtechniquescanbeusedto
resolve the correspondence betweenthe prototype and
theimage. This paper does not proposeaspecictech-
Assumingthecorrespondenceproblemcanbesolved,
the scheme proceeds as follows. Givena prototype P
andanimageI, wegeneratea viewvector~v from the
image byextracting the locationof feature points and
arrangingtheminavector. Thepointsin~v areordered
incorrespondence to theprototypepoints; that is, the
rst point in~v corresponds to the rst point inP and
soforth. Theprototy pe transform is thetransformation
that brings theprototypepoints as closeas possible to
their correspondingimagepoints. Theprototypetrans-
form, therefore, isthetransformvector
~
b thatminimizes
theEuclideandistancebetweentheprototypeandimage
points, namely
min
~
b 0
kP
~
b 0
0~vk (8)
Asolutionfor (8) is obtainedas follows. Assuming P
is overdetermined; that is, P is n 2k wheren >k and
r ank(P) =k, and denote by P +
=(P T
P) 01
P T
the
pseudo-inverseof P, theprototypetransform,
~
b , isgiven
by
~
b =P +
~v (9)
andthene arest prototy peview~p isobtainedbyapplying
P totheprototypetransform,
~
b , that is
~p =P
~
b =PP +
~v (10)
The nearest prototypeviewis nowcomparedtothe
image, andtheir resemblancedeterminestheclass iden-
tityof theobject. Thequalityof thematchbetweenthe
prototypeandtheimageis denedby
D (P; ~v)=k~p 0~v k =k(PP +
0I )~v k (11)
To eliminate eects due to scaling of the object, this
measure shouldbe normalized, as is illustratedbythe
examplebelow. Consideranobject seenfromsomeview
~v
1
. Its distance totheprototypeis givenbyD (P; ~v
1 ).
Supposetheobjectisnow seenfromanewview ~v
2 that
is identical to~v
1
, exceptthat theobjectis nowastwice
asclosetothecamera. Under theseconditions~v
2
=2~v
1 ,
anditsdistancetotheprototypeis givenbyD (P; ~v
2 )=
2D(P; ~v
1
). Clearly, we should have a measure that is
independentof thedistanceof theobject tothecamera.
OnewaytoobtainsuchameasureisbydividingD (P; ~v)
bythenormk~v k
^
D (P; ~v )= k(PP
+
0I )~vk
k~vk
(12)
^
D (P; ~v) is proposedhereas anobjectivefunctionfor
establishing the correspondence betweenthe prototype
andtheimage. Inother words, weexpectthatif theob-
jectbelongstotheprototype'sclassthen
^
D(P; ~v)obtains
itsminimal valuewhen~v isorderedincorrespondenceto
P. Anyother permutationwill increasethevalueof
^
D .
Formally, denotebyapermutationmatrix, weassume
that
^
D (P; ~v )=min
^
D (P; ~v ) (13)
Themeasure
^
D (P; ~v) hasasecondrole. Sinceitmea-
sures the similaritybetweenthe prototypeandtheim-
age, it canalsobe usedtodeterminetheobject's class.
Anobject observedinaview ~v belongstotheclass rep-
resentedbyaprototypeP if
^
D (P; ~v )< (14)
forsomeconstant >0. Werefer to(14) asthec atego-
rizationcriterion.
The categorizationstageproceeds as follows. Given
animageI andaprototype P, thecorrespondence be-
tweenP and I is resolvedbyminimizing the measure
^
D(P; ~v )overall possiblepermutation of~v , andif the
obtainedminimum
^
D (P; ~v )isbelowthethreshold , then
theclassidentit yof theobject is determined.
Notethat inour schemetheprototypeandthecate-
gorizationcriteriondeterminetheactual divisionof ob-
jects to classes; anobject belongs to a certainclass if
its views are sucientlysimilar, according to the cate-
gorizationcriterion, to views of the prototype. Under
theabovedenition, anobject belongstoaprototype's
classifthetotal dierencebetweenitsfeaturepointsand
theircorrespondingprototypepointsdoesnotexceed .
Themeasure
^
D (P; ~v )denedheredeterminesthesim-
ilaritybetweenthe prototype P and the view~v using
onlythe distances betweenfeature points. Ingeneral,
sincecorrespondenceis dicult toachieve, suchamea-
surewouldnot berobust. Includingadditional informa-
tion about the features inthe similaritymeasure may
increasethe robustness of the scheme. Also, measures
that consider onlythe proximityof feature points are
limitedintermsof dividingthelibraryintoclasses, since
theyinduceclassesof objectswithhighlysimilarshapes.
Measures that consider additional informationcanex-
tendtheclassestoincludelarger setsof objects.
Themeasure
^
D (P; ~v) canbeenrichedbyconsidering
thesimilaritybetweencorresponding points. Asimple
example for ameasurethat considers boththe proxim-
ityandsimilaritybetweenfeaturepointsisthefollowing
measure. Eachfeature point is associated with a la-
bel (suchasacorner oraninectionpoint). Again, the
measure
^
D (P; ~v )isapplied, butthistimeonlycorrespon-
dences betweenpoints with similar labels are allowed;
namely, corners intheimagecanonlymatchcornersin
theprototype, and, similarly, inectionpoints canonly
matchinection points. Other examples for measures
that combineproximityandsimilarityincludemeasures
that retainthetangentorthecurvatureof points. More
sophisticatedmeasures maycompare the topologies of
the objects inthe two views, or, inother words, verify
that theobjectssharesimilar part structuresin2D.
Auseful technique in measuring the similarity be-
tweenthe image andthe nearest prototype viewis to
consider a dierent set of features thantheset usedto
determinetheprototypetransform. Therational behind
this techniqueis that it is generallydicult torecover
exact feature-to-featurecorrespondence, andwhilesuch
correspondences arenecessaryfor recoveringtheproto-
type transform, similaritymeasures canbesuccessfully
appliedeveninthe absence of exact feature-to-feature
correspondence. This idearesemblesthebasicprinciple
of thealignmentalgorithm[18, 41], inwhichasmall set
of points is usedto compute the object pose, while a
larger set of points is usedtoverifythis pose.
Itshouldbenotedthatthegeneral owofthescheme
and, inparticular, the identicationstageareindepen-
dent of thespecicchoiceof similaritymeasure. As has
beennotedabove, the measure aects the divisionof
model libraries intoclasses andtheselectionof optimal
prototypes for these classes. Anexample for selecting
theoptimal prototypefor agivenclass under themea-
tures)isdescribedinSection4.
Finally, althoughthemainobjectiveofthecategoriza-
tionstageistodeterminetheclassidentityoftheobject,
thecategorizationschemedescribedaboveisuseful even
if theobject's categorycannot be determined. Section
3.3 belowshows that the prototype transformcan be
reusedtoaligntheimagewiththespecicmodels. Con-
sequently, followingthecategorizationstagethecost of
comparing the image to eachof the specic models is
substantiallyreducedsince thedicult part of recover-
ing the transformationthat relates the models to the
image is appliedonly to the prototype objects. As a
result, if theclassidentityof theobjectcannotbedeter-
minedwestill needtoconsider all thespecicmodelsin
thelibrary, but the overall cost of comparingthemod-
elstotheimagewouldbelowbecausecorrespondenceis
computedoncefor thewholeclass.
3.3 Identic ation
After the observed object is categorized, the system
turnstorecoveringitsindividual identity. At this stage
theimage is matchedto all the models inthe object's
class. For eachmodel, the system seeks torecover the
transformationthat aligns the model to the image, if
thereissuch. Inpreviousschemesthisrequiredrecover-
ingthecorrespondence betweentheimage andeachof
themodels separately. Inour scheme, however, this no
longer is necessary, since theobject transformis deter-
mineddirectlyfromtheprototypetransform. Weshow
inthis sectionthat theprototypeandtheobject trans-
formsarerelatedbyasimpletransformation, whichcan
be computedinadvance, andwhichcaninfact be un-
done already in the library of storedmodels. Conse-
quently, the prototype transformcanbe reusedinthe
identicationstagetoaligntheindividual models with
theimage.
The initial stage of categorization recovers three
piecesof informationthatcanbeusedfor identication.
The three are (i) the object class, (ii) the correspon-
dence betweenthe prototype and the image, and(iii)
the prototype transform. This information is usedin
theidenticationstage as follows. First, since the ob-
ject's class is determined, only models that belong to
this class are considered. Second, using thecorrespon-
dencebetweentheprototypeandtheimageestablished
inthecategorizationstage, andusingthe storedcorre-
spondencebetweentheprototypeandtheobjectmodels,
thecorrespondence betweenthe models andthe image
is immediately recovered. Finally, as is shownbelow,
the model transform, namely, the transformation that
aligns themodel withtheimage, is recoveredfrom the
prototypetransform.
Assume we are givenwitha view~v of some object
model M
i
, namely
~v =M
i
~a (15)
for some transformvector ~a. When the identication
process begins, it is still unknownwhichof the models
M
1
; : : : ; M
l
of the object's class accounts for the image
andwhatthetransformvector~a is. Thersttaskfaced
form, ~a. This is done, as is explainedbelow, using the
prototypetransform
~
b = P +
~v denedin(9). Once~a is
recovered, itis appliedtoall themodelsM
1
; : : : ; M
l , and
the model for whicha near-perfect matchis obtained
determinestheobject's identity.
Theorem1belowestablishesthatthemodel transform
~a canberecovereddirectlyfromtheprototypetransform
~
b byapplyingalineartransformationwhichisreferredto
astheprototy pe-to-modeltransform. Thistransformhas
twointerestingproperties. First, it isview-independent;
namely, foranygivenviewof theobject, thesametrans-
formmaps theprototypetransformthatcorresponds to
thisviewtothecorrectmodel transform. Theprototype-
to-model transformtherefore can be computed in ad-
vance andstoredinthe libraryof models. Second, the
prototype-to-model transformcanbeusedtorecoverthe
model transformregardless of the qualityof matchbe-
tween the prototype and the image. In other words,
evenif the prototypealigns poorlywiththeimage, the
transformationthat aligns themodel withtheimageis
determinedcorrectlyinthis process.
Theorem1: Given a view~v =M
i
~a. Let
~
b =P +
~v
be the proto type tra nsfo rm,tha t is, the tra nsfo rmvec-
to r tha t best a ligns the pro to type withthe ima g e. The
model tra nsfo rm, ~a, ca nb ereco veredfro mthepro to type
tra nsfo rm,
~
b , bya pplying a ma trixA
i
, na mely
~a =A
i
~
b
A
i
is referredto a s the prototy pe-to-model transform.
Proof: Noticethat
~
b =P +
~v =P +
M
i
~a
AssumeP +
M
i
isinvertible, let
A
i
=(P +
M
i )
01
weobtainthat
~a =A
i
~
b
2
Corollary2: The pro to type-to -model tra nsfo rmis
view-indep endent.
Proof: Theprototype-to-model transform, A
i , isin-
dependent of bothposevectors, ~a and
~
b . Changingthe
image~v will result inanewpairof posevectors, ~a and
~
b , but similar to the old pair, the newpair is related
throughthesametransformA
i
. Theprototype-to-model
transformA
i
thereforecanbeusedtorecover theobject
poseforanyviewof M
i . 2
A
i
exists if P +
M
i
is invertible. This condition is
equivalent to requiring that the two columnspaces of
P andM
i
will not beorthogonal inanydirection. The
conditionholds, in general, whenthe two objects are
fairly similar. This is illustrated by the following ex-
ample. Consider the case that bothcolumn spaces of
P andM
i
areone-dimensional; namely, eachrepresents
a line throughthe origin. The only case in this one-
dimensional exampleinwhichA
i
doesnot exist is when
P andM
i
are orthogonal. But these lines are farthest
objectsarerelativelysimilar A
i
wouldexist.
Since it depends only onthe prototype P and the
model M
i
, theprototype-to-model transformA
i canbe
pre-computedandstoredinthelibraryof models. Every
model M
i
2 C is associatedwithits owntransform A
i
that relates, for everypossibleviewof M
i
, betweenthe
prototypetransformandthemodel transform. Tocom-
pare the image to the model M
i
the model transform
shouldrst be recovered. This is achievedbyapplying
A
i
totheprototypetransformcomputedinthecatego-
rizationstage.
Also, the prototype-to-model transform, A
i
, canbe
usedtoalignthemodel M
i
withtheprototypeP in3D .
Denotethealignedmodel byM 0
i , M
0
i
models thesame
object as M does, since their columnvectors spanthe
samespace. Inaddition, thealignedmodel M 0
i
has the
propertythatitisbroughtbytheprototypetransform,
~
b ,
toaperfect alignmentwiththeimage. Consequently, if
themodelsarealignedinthelibrarywiththeprototype,
theprototypetransformcomputedinthecategorization
stage can be reused for identication with no further
manipulations. Thisis establishedinTheorem3below.
Theorem3: Let M 0
i
=M
i A
i
be themodel M
i
a lig ned
withthepro to typeP. Fo r a nyview~v =M
i
~a, thepro to -
type tra nsfo rmfo r this view
~
b =P +
~v is identica l to the
mo del tra nsfo rmfo r this view;tha t is, ~v =M 0
i
~
b .
Proof: Since
M 0
i
=M
i A
i
weobtainthat
M 0
i
~
b =M
i A
i
~
b =M
i
~a =~v
2
Using Theorem3, the identicationscheme is sim-
pliedas follows. Themodels M
1
; : : : ; M
l
arealignedin
the library withthe prototype P byapplying the cor-
respondingprototype-to-model transform, A
1
; : : : ; A
l . At
recognitiontime, the prototype transform
~
b =P +
~v, is
appliedtothealignedmodels M 0
1
; : : : ; M 0
l
. Accordingto
Theorems1and3, bytransformingthemodels by
~
b the
correctmodel, M 0
i
, wouldperfectlyalignwiththeimage.
Intheschemeaboveweassumedthat full feature-to-
featurecorrespondenceisestablishedbetweentheproto-
typeandtheimage. This assumptionis not mandatory.
Methods for estimating the prototype transformusing
partial correspondenceor byconsideringother types of
features (suchaslinesegments) canalsobeused. Note
thatincasetheprototypetransformcanonlybeapprox-
imated, theaccuracyof themodel transformobtainedis
determinedbythequalityof this approximationandby
theconditionnumber of the prototype-to-model trans-
formA
i
. Theconditionnumber of A
i
aects thematch
evenif Theorem3is applied, namely, evenif themod-
els are alignedwiththe prototype inadvance. Conse-
quently, theconditionnumberoftheprototype-to-model
transformA
i
shouldbetakenintoaccount whentheli-
braryis dividedintoclasses.
Finally, theschemecanbeextendedtohandleclasses
of objects withdierent degrees of freedom. Consider,
folding. Obviously, thefoldingchairshavemoredegrees
of freedomthanthe regular, rigidchairs, andtherefore
theywouldbe representedinthe librarybywider ma-
trices thantherigidchairs are. As is explainedbelow,
the chairs canbe handledina commonclass, andthe
prototypefortheclass woulditself beafoldingchair.
Moregenerally, let M
1
; : : : ; M
l
beaclass of models of
dierent widths, and denote byk
1
; : : : ; k
l
the width of
M
1
; : : : ; M
l
respectively. Let P betheprototypefor this
class, anddenotebyk
p
thewidthof P, weset k
p tobe
k
p
=maxfk
1
; : : : ; k
l
g (16)
In other words, we require the prototype to have the
same degrees of freedomas the most exible object in
theclass. Wecanset k
p
accordingtoourgoal since, asit
isshowninSection4, theprototypeP isobtainedinour
schemebymanipulating the objects inthe class. The
prototype-to-model transformA
i
is denedinthis case
by
A
i
=(P +
M
i )
+
(17)
whereA
i isk
p 2k
i
. Itis straightforwardtoextendThe-
orem1toalsoincludethis case. Consequently, for any
viewofM
i
, themodel transform~a canberecoveredfrom
itscorrespondingprototypetransform
~
b byapplyingthe
prototype-to-model transformA
i to
~
b . Notethat since
k
p k
i
theprototype canappear inposes that donot
matchanypossiblemodel pose(andthereforeinnoise-
less conditions theyare impossible toobtain). Incase
theobject isobservedfromsuchaview, A
i
wouldmap
thisunmatchedprototypetransformtothemodel trans-
formthat correspondstothenearestmatchedprototype
transform. Bysettingk
p
tobeas largeasthemaximum
of k
1
; : : : ; k
l
weavoidcaseswherethereexistviews of the
object that cannot be accountedfor bythe prototype.
Model transforms thatcorrespondtosuchviews cannot
berecoveredfromprototypetransforms.
3.4 Summary
Wepresentedinthissectionaschemeforrecognizing3D
objectsfromsingle2Dviewsthatproceedsintwostages,
categorizationandidentication. Inthecategorization
stage the image is comparedagainst the storedproto-
types. Foreveryprototype, thecorrespondencebetween
theimageandtheprototypeis recovered, andthenear-
est viewof theprototypeisconstructed. Thesimilarity
betweenthisview andtheimageisevaluated, and, if the
twoarefoundsimilar, theclass identit yof the object is
determined. Intheidenticationstagetheobservedob-
ject is comparedagainst the models of its class. Since
theprototypeandthemodelswerebroughtinthelibrary
intoalignment, thesametransformationthat aligns the
prototypetotheimagealso aligns theobject model to
theimage. Theprototypetransformthereforeisapplied
tothemodels, andtheobtainedviewsarecomparedwith
theimage. Theviewthat isfoundtobeidentical upto
noiseandocclusiontotheimagedeterminestheindivid-
ual identityof theobject.
Thepresentedschemeis basedonseveral keyprinci-
pals. Recognitionisdividedintotwosub-processes, cat-
els arealignedwiththe image, andthe identityof the
object is determinedby a 2D comparison; 3D recon-
structionof the observedobject from the image is not
performed. The dicult component of the alignment
approach, namely, the recoveryof correspondence and
object pose, is performedonlyonce for eachclass; the
prototypetransformisreusedintheidenticationstage
toaligntheimagewiththeindividual models.
4 Constructingoptimal prototypes
Intheschemeaboveweassumedthat theclassesinthe
libraryof models are representedbyprototypeobjects.
Since categorization is achieved by matching the im-
agetoprototype objects, the questionof howto select
thebest prototypeshouldbeaddressed. Inthis section
wepresentanalgorithmforconstructingoptimal proto-
types.
Givena class of objects, the optimal prototype for
this class istheobject that resemblestheobjectsof the
class themost. Under our formulation, suchanobject
wouldshare as manyfeatures as possible withtheob-
jects of its class, the positionof these features onthe
prototypewouldbeascloseaspossibletotheirposition
on the objects, andthe prototype-to-model transform
for theseobjects wouldbeas stable as possible. Below
we showthat the optimal prototype caneectivelybe
computedusing principal component analysis; that is,
bycomputing the dominant eigenvectors for somema-
trixdeterminedbythemodels of theclass.
Principal component analysis often is used in clas-
sicationproblems to construct classes andprototypes
[11]. Inexistingapplications, anobjectisrepresentedby
apointinsomehighdimensional space, whereeachcom-
ponentofthispointcontainsaninvariantattributeofthe
object. Ahyperplaneinthatspace represents aclass of
objects. The goal of the principal component analysis
is, givena set of points (objects), to recover the class
that these points induce. Our caseis somewhat dier-
ent. Inourcaseanobjectisrepresentedbyacontinuous
linear space rather thanbya point. Whereas the use
of hyperplanes inother schemes oftenis arbitraryand
madeprimarilyforconvenience, their useinourscheme
is appropriate followingthe linear combinationscheme
[42] (seeSection3.1).
Thedierences outlinedabovealsoimplydierences
inthe proof that principle component analysis applies
toourcase. Weshowbelowthat theoptimal prototype
canbecomputedbyprincipal component analysis. The
traditional proof needs tobeextendedsinceinour case
objectsarerepresentedbycontinuousspacesrather than
bydiscretepoints.
Theprototypeconstructedinthisprocessis a3Dob-
ject obtainedbymanipulating the objects inits class.
Toallowthe construction, it seems as if the objects in
theclass shouldrstbebroughtintoalignment. Inpar-
ticular, if theobjectsarerepresentedbyviewer-centered
models(thatis, bysetsof theirviews, seeSection3.1for
details), thedierentobjectswouldthenhavetoberep-
resentedbyimages takenfromsimilar viewpoints. Nev-
ertheless, the process presentedbelowdoes not require
isobtainedinthisprocessevenwhentheobjectsarenot
aligned
Wenowturntoconstructing theoptimal prototype.
First, we dene anobjectivefunction. Givena proto-
typeP andanobjectmodel M
i
, wedenethesimilarity
betweenP andM
i
as follows. Let ~v
i
bea viewof M
i ,
wemeasurethesimilaritybetweentheprototypeP and
theview~v
i
using(12). Then, wesumthemeasureover
all possibleviews of M
i
. Assumingwithout loss of gen-
eralitythatk~v
i
k =1, (14) canberewrittenas
^
D (P; ~v
i
) =k(PP +
0I )~v
i
k (18)
Without loss of generality, we can assume that the
constructedprototype, P, is composedof orthonormal
columns. Note that anoverdeterminedmatrixP with
orthonormal columns satises P +
=P T
. Wecanthere-
forerewrite(18)as
^
D (P; ~v
i
) =k(PP T
0I )~v
i
k (19)
ThedistancebetweenP andthemodel M
i
is now given
bysumming
^
D (P; ~v
i
) over all unit-length(to eliminate
scalingeects) views of M
i
, namely
^
D (P; M
i ) =
Z
k~vik=1 k(PP
T
0I )~v
i
k (20)
Toobtaintheobjectivefunction, wesumthesedistances
over all models
E(P) = l
X
i=1 Z
kvik~ =1 k(PP
T
0I )~v
i
k (21)
Theobject P that minimizes this functionis denedto
betheoptimal prototype.
Notethat(21)is nottheonlypossibleobjectivefunc-
tion for this purpose. Analternative \worst case"ap-
proachistomeasurethedistancebetweentheprototype
tothefarthestmodel intheclass(ratherthansumming
this distanceover all models). Exceptforbeingdicult
to compute, this measure also is sensitive to \outlier"
models.
Theprototypethatminimizes(21)canbeconstructed
inaprocessthat includesthefollowingsteps.
1. Tosimplifytheprocessweassumethecolumnvec-
torsof eachof themodel matrices M
i
, (1i l ),
areorthonormal. (Incasetheyarenot, werstap-
plyaGramschmidtprocesstothem. Suchaprocess
obviouslydoesnotalter thespaceof views implied
bythemodels.)
2. Buildthen 2n symmetricmatrix
F = l
X
i=1 M
i M
T
i
3. Findthek dominant eigenvectors of F. Theopti-
mal matrixP is constructedfrom theseeigenvec-
tors.
totypeobject thatwouldbelongtothegivenclass. This
conditiondetermines thechoice of widthk for thepro-
totype. If all themodels sharethesamewidththenthe
prototype wouldassumethis width. Inthe rigidcase,
for example, k =4 (see Section3.1). As mentionedin
Section3.3above, incasetheobjects havedierent de-
greesof freedom, k issettobethemaximumof k
1
; : : : ; k
l
wherek
1
; : : : ; k
l
arethewidths of M
1
; : : : ; M
l
respectively.
Incasemorethank largeeigenvaluesareobtained, one
mayignoretheseguidelinerulesandconstruct aproto-
typethathashigher degreesof freedomthantheobjects
intheclass (seefor example[31]).
Theorem4belowestablishesthatthealgorithmabove
produces the optimal prototype. Weconsider here the
casethatall theobjectssharesimilardegreesof freedom.
Thesameprocedurecanbeappliedwithslightmodica-
tionstoincludethecaseofobjectswithdierentdegrees
of freedom.
Theorem4: Let M
1 , M
2
, ..., M
l
be a set o f models
belo ng ing to so me cla ss C. Assume every model M
i is
representedbya nn2k ma trixwitho rtho no rma l co lumn
vecto rs. The pro to type P tha t minimizes the term
E(P) = l
X
i=1 Z
kv~ik=1 k(PP
T
0I )~v
i k
where the integ ra tio n is do ne o ver a ll the unit-leng th
views ~v
i
o f ea chmodel M
i
, is co mpo sedo f the k eig en-
vecto rs o f thema trix
F = l
X
i=1 M
i M
T
i
tha t co rrespo ndto its k la rg est eig enva lues.
Proof: Let P becomposedof thek dominanteigen-
vectorsof F. Accordingtoregressionprinciples P min-
imizes theterm
l
X
i=1 k
X
j=1 k(PP
T
0I )~m
ij k
where ~m
ij
is the j'thcolumnvector of M
i
. In other
words, consider ~m
ij
asapointinR n
. Thespacespanned
bythecolumnvectorsof P is thenearestk-dimensional
hyperplanetothesepoints, ~m
ij
. Therest of this proof
extendstheclaimfromthediscretesumoverthecolumn
vectors of M
i
to the continuous integral over all views
spannedbythesevectors. Accordingtoourassumptions,
eachmatrixM
i
contains anorthonormal set of column
vectors. Replacingthesevectorsbyanotherorthonormal
basisfor M
i
will not changethematrixP; that is, P is
independent of the choice of orthonormal basis for the
models. This is illustratedbythefollowing derivation.
Toobtainanew orthonormal basisforthecolumnspace
of M
i
we canapplya k 2k rotation matrixR to M
i
(namely, M
i
R). P is thebest vectorspace for thenew
set aswell, since
M
i R(M
i R)
T
=M
i RR
T
M T
i
=M
i I M
T
i
=M
i M
T
i
torsforM
1
; : : : ; M
n
, andsoitsdominanteigenvectorsrep-
resentthebestvectorspaceforforanyorthonormal rep-
resentationof the objects. Consequently, P minimizes
the objective functionregardless of choice of basis for
themodels, andthereforeitalsominimizes therequired
term
E(P) = l
X
i=1 Z
kvik~ =1 k(PP
T
0I )~v
i k
2
Tosummarize, weshowedthatgivenaclassof object
models, theoptimal prototype for this class is givenby
thedominanteigenvectorsofthematrixF, whichiscon-
structedfromthe object models. Notethat inproving
Theorem4weshowedthattheprototypeisindependent
of choice of basis for themodels. This implies that, in
order toconstructtheprototype, theobject modelsM
1 ,
..., M
l
do not needto rst bebrought intoalignment.
The process aboveguarantees tooutput thesame pro-
totypeobject evenif themodels arenotaligned.
5 Relevancetohumanvision
The recognitionbyprototypes scheme uses the general
shapeof objectsasthecueforrecognizingthem. Aswas
alreadymentioned, classesinourschemecontainobjects
with fairlysimilar shapes. Incontrast, the humanvi-
sual systemrecognizesobjects usingbothshapecuesas
well as manyother cues, suchas color, texture, motion,
andcontext, andobjects are categorizedintheir basic
level of abstraction[33]. Onlylittle is currentlyknown
about the underlyingprocesses for recognitionusedby
thevisual system. Fromwhat is known, inspiteof the
dierences pointedabove, therecognitionbyprototypes
schemeseemstobeconsistentinseveral keyissueswith
psychological andphysiological ndings. Inthis section
webrieyreviewthesendings.
Theschemepresentedinthis paper promotestheno-
tionthatcategorizationandidenticationareperformed
using similar tools. Inbothcases viewvariations rst
arecompensatedfor, andthenaviewof either thehy-
pothesizedprototypeor object model is comparedwith
theimage. This is incontrasttomethods (suchas part
decompositionandfunctional description) that ingen-
eral handle either categorization or identication, but
do not extendto deal withbothproblems. The avail-
ablestudiesinthiscaseareinconclusive. Someevidence
seemtoindicatethatthetwoprocessesarehandledsepa-
ratelybythevisualsystem. Agnosticandprosopagnostic
patientsoftendemonstratedegradedidenticationabili-
ties, whereastheirperformanceincategorizationremains
intact. Doubledissociationbetweenthetwoprocesses,
however, hasnotbeenfound, andsotheassumptionthat
thetwoprocessesarehandledseparatelyinthebrainhas
not beenestablished. Infact, both cells that respond
to general faces as well as cells that respondtospecic
faces where found lying side by side within the same
brainarea, STS, of themacaquemonkey[29]. Thevul-
nerabilityof the identicationprocess to brainlessions
canbeexplainedbythattheprocessrequiresarelatively
largememorytoencodethedetailedshapesof objectsas
cover adetaileddescriptionof theobservedobject from
theimage(seee.g., [19]).
Another ideaproposedhereis thatcategorizationin-
volvestwostages: astageof compensatingforview vari-
ationsfollowedbyastageof 2D comparisontoaccount
for shape dierences. Adecoupling of viewvariation
andsemanticcategorizationwas suggestedbyLissauer
[24]. Warrington and Taylor [44, 45] found that pa-
tients that suer fromlessions intheposterior lobe of
theright hemispheredemonstratediculties incatego-
rizingobjects fromunconventional views, whereas their
performance in categorization of objects fromconven-
tional viewsremains intact. Additional evidenceforthe
eect of viewvariations oncategorizationperformance
werefoundforhealthysubjects. Subjectsthatareasked
to name objects respond slower whenthe objects ap-
pearinunconventional views[28]. Also, mental rotation
eects, namely, responsetime that grows linearlywith
thetilt of theobject, wereobservedinnaming tasks of
natural objects [21].
Finally, the process of categorizationpresentedhere
is achieved by comparing the image to prototype ob-
jects, andtheseprototypeobjectscanbeconstructedby
manipulatingthe familiar objects of the class. Recent
studies indicate that response time innaming tasks is
typicallyshorter anderrorrates arelower whentheob-
servedobject is similar tothe prototype[5]. Similarly,
shorterreactiontimeisobtainedwhensubjectsareasked
toanswer questions of thetype\does theobject X be-
long tothe class Y?"[34]. Other studies reportedthat
childrenlearngoodexamplesof classesbeforetheylearn
poor ones [1, 32] andthat subjects recall having seen
the prototype or average conguration of studiedface
imagesevenif this congurationwasnotstudied[8].
To summarize, although the presentedscheme gen-
erallydoes not recognize objects intheir basic level of
abstraction, itisconsistentwithpsychological andphys-
iological ndings inseveral keyissuesincludingasingle
approachfor the two sub-problems of recognition, cat-
egorizationand identication, viewdependencyof the
twosub-processes, andtheroleof prototypesincatego-
rization. The ndings discussedhere obviouslyare in-
conclusive, sincepsychological andphysiological studies
including the ones discussedhere have more than one
possibleinterpretation.
6 I mplementation
Totesttheideaspresentedinthepaper, wehaveimple-
mentedtheschemeandappliedit toseveral objects. In
our implementation, thelibraryof models includedtwo
classes. Therst (Figure 2) containedtwofour-legged
chairs(denotedbyA andB), andthesecond(Figure3)
includedtwocarmodels, aVWandaSaab.
Todemonstrate categorization, weusedchairAas a
prototypeandmatchedittoanimageof chairB. Corre-
spondences betweenthe prototype andthe image were
pickedmanually, and, usingthesecorrespondences, the
prototype transformwas recoveredand appliedto the
prototype. Theresultsof matchingthetransformedpro-
totypewiththeimageareseeninFigure4. Itcanbeseen
thesameorientationastheobservedobject (leftgure),
andthatthematchbetweenthetwoisgoodconsidering
that theobjects havedierentshapes. Notethat inthis
implementationweallowedtheobjects toundergogen-
eral anetransformations in3D , includingstretchand
shear, andsothematchbetweentheprototypeandthe
imagewasbetterthanif onlyrigidtransformationswere
allowed. Additional examplesusingchairBandthetwo
cars astheprototypesareshowninFigures5-7.
InFigures8-9wetriedtomatchtheprototypestothe
imageswithwrongcorrespondences. Theresultsofthese
matchesweresignicantlyworsethanwhenthecorrect
matcheswereused. Thisisconsistentwiththeideadis-
cussedinSection3.2that thequalityof thematchcan
beusedastheobjectivefunctionforresolvingthecorrect
correspondence.
Figure 10 shows the results of matchinga prototype
four-leggedchair to a single-leggedoce chair. It can
beseenthattheupper portionsof thechairsmatchrel-
ativelywell, whilethelegs of thechairs donotndap-
propriatematches.
Figure 11 shows the result of matching a prototype
chair to animage of a Saabcar. As ananecdotal ex-
ample, wematchedtheholebelowthebackof thechair
to thewindshieldof thecar andtheseat to the hood.
Ingeneral, whatevercorrespondenceisused, thetwoob-
jects wouldmatchpoorlyrelativetomatchingthepro-
totypestoobjects of theirclass.
Figures12-13demonstratetheidenticationstage. In
the librarywerst alignedthe model for chair Awith
the prototype chair (chair B) using the prototype-to-
model transform. Then, animageof chair Awas cate-
gorized(Figure5)bymatchingittotheprototypechair,
andtheprototypetransformwascomputed. Inthenext
step, theprototypetransformwasappliedtothespecic
model of chair A. Theresult of this applicationis seen
inFigure 12. It canbe seenthat a near-perfect align-
mentwasachievedinthisprocess. Asimilarprocesswas
appliedtotheVWcar inFigure13usingtheSaabcar
as the prototype. (Theresult of thecorrespondingcat-
egorizationstage is shownin Figure 6.) These gures
demonstratethatalthoughaperfect matchbetweenthe
prototypeandtheimagecouldnotbeobtained, thepro-
totypetransformcanstill beusedtoaligntheobserved
objectwithits specicmodel.
7 Summary
Weintroducedinthis paper arecognitionscheme that
proceedsintwostages: categorizationandidentication.
Categorizationisachievedbyaligningtheimagetopro-
totypeobjects. For everyprototype, thenearest proto-
type viewis recovered, andthesimilaritybetweenthis
viewand the image is evaluated. The prototype that
most resemblestheobservedobject determines its class
identity. Likewise, identication is achievedbyalign-
ing the observedobject to the individual models of its
class. At this stage the prototypetransformcomputed
inthecategorizationstageis reusedtoalignthemodels
withthe image. Themodel that matches the observed
object determines its specic identity. Inaddition, we
wereconst ruct edfomsingleimagesusingsymmet ry[31].
Figure3: Pict ures of t wocarsusedas models. Left : aVWmodel. Right : aSaabmodel. Models for t het wocarswere
borrowedfrom[42].
Figure4: Mat chingaprot ot ypechair( chairA)t oanimageof chairB. Thisgure, aswellast herestoft hegures, cont ain
t hreepict ures. Left : t heimaget oberecognized. Middle: t heappearanceof t heprot otypefollowingt heapplicat ionof t he
prot otypet ransform. Right : anoverlayof t heleftandt hemiddlepict ures.
Figure6: Mat chingaprot ot ypecar( Saab)t oanimageof aVWcar.
Figure7: Mat chingaprot ot ypecar( VW)t oanimageof aSaabcar.
Figure9: Mat chingaprot ot ypecar( Saab)t oanimageof aVWcarwit hwrongcorrespondence.
Figure10: Mat chingafour-leggedchairt oanimageof anocechair.
Figure 12: Mat chingamodel of chair At oanimageof t he same chair usingt he prot ot ype t ransform comput edint he
cat egorizat ionst age.
Figure 13: Mat chingamodel of aVWcar t oanimageof t hesame car usingt he prot ot ypet ransform comput edint he
cat egorizat ionst age.
totypesanddiscussedtherelevanceof theschemetohu-
manrecognition.
Animportant issue conveyedbyour scheme is that
categorizationcanbeusedtofacilitatetheidentication
of objects. Weshowedthat byrst categorizingtheob-
ject, thedicultstagesofthealignmentprocess, namely,
therecoveryof theobject poseandthecorrespondence
betweentheimageandthemodel, canbeperformedonly
onceperclass. Consequently, identicationisreducedin
thisschemeintoaseriesof simpletemplatecomparisons.
The schemepresentedinthis paper diers fromex-
istingcategorizationschemesintwoimportant aspects.
Theexistingschemes(e.g., [4]) rst attempt torecover
thepart structure(geons) of theobject fromtheimage
alone. This structure is assumedto be almost invari-
ant bothtorotationof theobject andacross objects of
the same class. In contrast, our scheme does not at-
tempt to recover any3D information fromthe image
alone. Moreover, it separates thetwoeectsthat deter-
minetheobject'sappearance: view variationeectsand
deformationsduetoclassvariability. Viewvariationsare
compensatedforbyrecoveringtheview of theprototype
that most resembles theimage, andtheamount of de-
formationthat separatestheprototypefromthespecic
object is evaluatedbyassessing the dierence (in2D )
betweenthenearest prototypeviewandtheimage.
Openproblemsforfutureresearchincludesolvingthe
correspondencebetweenprototypesandimages, combin-
ingtheschemewithexistingindexingapproaches, den-
ingeectivemeasurestoevaluatethequalityof matches,
andextendingthesystemtoincorporateadditional cues,
suchas color andtexture.
Acknowledgement
I wishtothankShimonUllmanfor encouragement and
advice, TaoAlter andYael Mosesfor manyfruitful dis-
cussions, Dror Bar Natanfor his assistance inverifying
theproofforTheorem4, EricGrimson, JohnHarris, and
TomasoPoggiofor commentsonearlier drafts.
References
[1] Anglin, J., 1976. Les premiers termes de reference
delenfant. InS. EnrlichandE. Tulving(Eds.),La
memoire semantique. Paris: Bulletin de Psycholo-
gie.
[2] BajcsyR. andSolinaF., 1987. Three dimensional
object representationrevisited. Proc. of 1st ICCV
Conference, London, 231-240.
[3] Basri R. andUllmanS., 1988. Thealignmentof ob-
jects withsmoothsurfaces. Proc. of 2ndInt. Conf.
of Computer Vision, Florida, 482-488.
[4] Biederman, I. 1985. Humanimage understanding:
recent research and a theory. Computer Vision,
Graphics, andImage Processing, 32, 29-73.
[5] Biederman, I., 1988. Aspects and extensions of
a theory of human image understanding. In Z.
Py ly shy n(Ed.), Computational ProcessesinHuman
NJ: Ablex<, 370-428.
[6] Binford, T.O., 1971. Visual perception by com-
puter. IEEEConf. onSy stems andControl.
[7] Brooks, R., 1981. Symbolic reasoning among 3-
dimensional models and2-dimensional images. Ar-
ticial Intelligenc e, 17, 285-349.
[8] Bruce, V., 1990. Perceiving andrecognizingfaces.
Mindand Language, 5(4), 342-364.
[9] Chien, C.H. andAggarwal, J.K., 1987. Shaperecog-
nitionfromsinglesilhouette. Proc. of ICCVConf.,
London, 481-490.
[10] Davis L.S., 1979. Shape matchingusingrelaxation
techniques. IEEETrans. on Pattern Analy sis and
Machine Intel., 1(1), 60-72.
[11] Duda, R.O. andHartP.E., 1973. Patternclassica-
tionandsceneanalysis. Wiley -IntersciencePublic a-
tion, JohnWileyandSons, Inc.
[12] Faugeras, O.D. andHebert, M., 1986. Therepresen-
tation, recognitionandlocationof 3Dobjects. Int.
J. Robotics Research5(3), 27-52.
[13] Fischler, M.A. and Bolles, R.C., 1981. Random
sampleconsensus: aparadigmformodel ttingwith
applicationto image analysis and automatedcar-
tography. Com. of theA.C.M., 24(6), 381-395.
[14] Forsyth, D., Mundy, J.L., Zisserman, A., Coelho,
C., Heller, A., andRothwell, C., 1991. Invariantde-
scriptorsfor3-Dobjectrecognitionandpose. IEEE
Tr ans. onPatternAnaly sis andMachine Intel., 13,
971-991.
[15] Grimson W.E.L. and Lozano-Perez T., 1984.
Model-based recognition and localization from
sparsedata. Int. J. of Robotics Research, 3, 3-35.
[16] Ho, S., 1987. Representingandusingfunctional def-
initions for visual recognition. Ph.D. Dissertation,
University of Visconsin, Madison.
[17] Homan, D. andRichards, W., 1984. Partsof recog-
nition. Cognition,, 1865-1896.
[18] Huttenlocher, D.P., andUllman, S., 1990Recogniz-
ingSolidObjectsbyAlignmentwithanImage, Int.
J. Computer Vision, 5(2), 195-212.
[19] Humphreys G.W. andRiddochM.J., 1987. Tosee
but not to see. Acase study of visual agnosia.
LawrenceErlb aum Associates, Pub., London.
[20] Jacobs, D.W., 1992. Space ecient 3Dmodel in-
dexing. Proc. of Image Understanding Workshop,
717-725.
[21] Jolicoeur P., 1985. The time to name disoriented
natural objects. MemoryandCognition, 13(4), 289-
303.
[22] Koenderink, J.J. andVanDoorn, A.J., 1982. The
shapeof smoothobjectsandthewaycontoursend.
Perception, 11, 129-137.
[23] Lamdan, Y., Schwartz, J.T., andWolfson, H., 1987.
On recognition of 3-Dobjects from2-Dimages.
Courant Inst. of Math. Sci., Rob. TR122.