Learning Task-Relevant Features From Robot Data

(1)

Learning task-relevant features from robot data

NikosVlassis Roland Bunshoten Ben Krose

RWCP, Autonomous Learning Funtions SNN

ComputerSiene Institute

Faulty of Siene

University of Amsterdam

The Netherlands

fvlassis,bunshot,krosegsie ne. uva. nl

http://www.siene.uva.nl /res ear h/i as

Abstrat

Featureextrationfromrobotsensordataisastan-

dard way to deal with the high dimensionality and

redundany of suh data. An automati, ommonly

used way to learn suh features from a set of robot

observationsisPrinipalComponentAnalysis(PCA).

However, as we argued in previous work, PCA an

yieldfeatureswithlittledisriminatorypowerbetween

robot positions, leading to suboptimal loalization

performane of the robot. In order to get optimal

task-relevantfeatures,PCAmustbereplaedbyasu-

pervisedprojetionmethod.

Inthispaperweextendourpreviouslyproposedsu-

pervisedlinearfeatureextrationmethodintwoways:

(i) theprojetionmatrixis optimizedsimultaneously

overallolumnsundertheonstraintoforthonormal-

ity,(ii)aJaobiparametrizationofthematrixallows

theuseofunonstrainednonlinearoptimizationalgo-

rithms. ThenewalgorithmismoreeÆientandmany

times faster than the old version. We show experi-

mental resultsin extrating features from panorami

images ofamobile robot. Theresultsomparefavor-

ablyto thePCAsolutions.

1 Introdution

Inseveralmobilerobot appliationswhereamodel

of the environmentmust be builtand used fornavi-

gation,appropriatelandmarksorfeaturesmustbeex-

tratedfromtherawrobotsensormeasurementsprior

to modeling. The rationale is that normally the di-

mensionality of these data is veryhigh, making any

The features that are extrated from robot sen-

sor data an be lassied as loal or global. The

formerusuallyrefertoloation-dependentdistintive

harateristis of the environment like doors, hall-

ways, et., (natural landmarks), or landmarks real-

izedthroughspeializeddevieslikebeaons(artiial

landmarks)[1℄. Ontheotherhand,aglobalfeatureis

normallyloation-independentand aimsatproviding

goodrobotloalization ontheaverage.

Reentlytherehasbeenagrowinginterestinauto-

matiproedures that learn suh features from a set

of data (see, e.g., [13℄). Automati learning of fea-

turesisanaturalobjetivebeauseontheonehandit

obviates theneedfor man interferenein the feature

extrationproess,whileontheotherhandmakesthe

proess(potentially)environmentindependent.

Learningfeatures from aset of robot observations

ismostoftenarriedoutwithstatistialmethods,and

theeasiestandmostommonlyusedisPrinipalCom-

ponentAnalysis (PCA)[10℄. Thisis aglobal feature

extrationmethodwhihprojetsasetofrobotobser-

vations linearlyto alow-dimensional subspae,om-

puted by solving a matrix eigenvalue problem. The

niething aboutPCAisthat itombinesmanyopti-

malitypropertiesandisverysimpletoimplement[10℄.

Reent reports on the use of PCA on mobile robots

are[8,2,6, 11, 15,5℄.

However,whentherobotobservationsareolleted

ina`supervised'manner,i.e.,whentheyareannotated

inthesamplewiththepositionoftherobotwhereeah

observation was taken, then, as argued in [16℄, PCA

anbesuboptimal. ThereasonisthatPCAisanun-

supervisedfeature extration method that uses only

theobservedsensorvetorstoomputetheprojetion

(2)

little disriminatory power between robot positions.

If featureextrationisto beused fortaskslikerobot

loalizationandnavigation,thenPCAshouldbesub-

stitutedbyasupervisedprojetionmethod[16℄.

Intheurrentpaperweextendtheresultsin[16℄in

twomainways.First,intheaboveworktheprojetion

diretions were learned in a greedy fashion, namely,

a projetion to an optimal diretion was omputed,

thenaseondoptimaldiretionwassoughtwhihwas

orthogonaltotherst,et. Thisstrategyanbesub-

optimal anditis notdiÆulttodeviseartiial data

setsthatshowthissuboptimalbehavior. Inthispaper

weoptimizetheprojetionmatrix(seebelow)simul-

taneouslyforalldimensionswhilekeepingitsolumns

pairwise orthonormal.

Seond, we adopt an optimization strategy whih

obviatestheneedforonstrainednonlinearoptimiza-

tionbyparametrizingtheprojetionmatrixasaprod-

ut of Jaobi matries satisfying the orthogonality

onstraint during optimization. These two improve-

mentsmakethemethodmoreeÆientandmuhfaster

thantheoriginalversion.

In the following we rst desribe the proposed

method and then show experimental results from its

appliation in panorami image data olleted by a

mobilerobotinatypialindoorenvironment. Theav-

erageloalizationperformane|evaluatedthroughan

appropriate risk funtion|when using the proposed

method vs. PCA, and the visualization of the pro-

jeted data manifoldin the reduedsubspaepermit

aquantitativeandqualitativeveriationof ourthe-

oretial laims.

2 Feature extration and the loaliza-

tion risk

For larity of exposition and visualization we will

limitouranalysis toarobotthatfollowsapredened

one-dimensional trajetoryin itsworkspae. There-

sults extend diretly to the general ase. For eah

position (oset) s of the robot on the trajetory we

assume that the sensors provide an observation ve-

tor x2IR d

. Forouranalysisweassumeasupervised

training set fs

i

;x

i

g, 1 i n, of observations x

i

olletedatrespetivetrajetorypositionss

i .

Linear feature extration amounts to reduing the

dimensionality of the data x

i

by linearly projeting

them toasubspaeIR q

,1<q<d, multiplyingthem

y

i

=W T

x

i

; 1in; W T

W=I

q (1)

whereI

q

standsfortheq-dimensionalidentitymatrix.

Moreover,weassumeaprobabilistimodelthatasso-

iates robot loations with sensor observations. For

an observation x that is projeted through (1) to a

feature vetor y we assume a model for p(sjy ), the

onditionaldensityoftherobotpositionsgiveny .

Toassessthequalityofanindividualprojetionwe

mustdeneanappropriateriskfuntionthatmeasures

theaverageloalization performaneof therobot us-

ingtheextrated featuresy

i

. Forthispurposeitwas

proposedin [13℄theriskfuntion

R

L

= 1

n n

X

i=1 Z

js s

i jp(sjy

i

)ds; (2)

i.e., the average overthe training set mean absolute

distaneto thetrue|onditionedonthe featureve-

tory

i

|loation s

i

. This risk penalizes position esti-

mates that appear on the average far from the true

positionoftherobot. Theaboveformulawasapprox-

imatedin [13℄ from the training set with omplexity

O(n 3

).

In [16℄ we proposed an alternative risk whih is

O(n 2

). This risk is based on the simple observa-

tion that, for a given observation x

i

whih is pro-

jetedthrough (1) to y

i

, the density p(sjy

i

) will al-

ways exhibit a mode on s = s

i

. Thus, an approx-

imate measure of divergene from this mode is the

Kullbak-Leiblerdistane betweenp(sjy

i

)and auni-

modal density sharply peaked at s = s

i

, giving the

approximate estimate logp(s

i jy

i

) plus a onstant.

Averagingoverallpointsy

i

wehaveto minimizethe

risk

R

K

= 1

n n

X

i=1 logp(s

i jy

i

) (3)

whih an be regarded as the average negative log-

likelihoodofthedatagiventhemodelof p(s

i jy

i )and

theprojetionmatrixW .

From (3) we see that anonparametri estimate of

p(sjy ) is needed. For an appropriate sequene of

weights

j

(y );1jn,suh anestimateis[12℄

p(sjy )= n

X

j=1

j (y )

hs (s s

j

) (4)

where

h

s (s)=

1

p

2h exp

s 2

2h 2

(5)

(3)

s

dening aloal smoothingregionarounds. Aweight

funtion

j

(y ) whih satises the onditions in [12℄

and makes the above estimate asmooth funtion of

theprojetionmatrixWis

j (y )=

hy (y y

j )

P

n

k =1

h

y (y y

k )

(6)

where

hy (y )=

1

(2) q=2

h q

y exp

jjy jj 2

2h 2

y

(7)

is the q-dimensional spherial Gaussian kernel with

bandwidthh

y

. Thetwokernelbandwidthsh

y andh

s

are theonlyfreeparametersof themodel p(sjy ) and

their values aet the resulting projetions. Substi-

tuting p(sjy ) from above into (3) weget a risk with

omplexityO(n 2

).

3 Model seletion and optimization

3.1 Kernel smoothing

Using a nonparametri estimate of a density us-

ing(4)and(5){(7)requiresahoieforthesmoothing

parameters y

s and h

y

. Our approah was to assign

onstantvaluestothesetwobandwidthsduringopti-

mization. For projetions to 2-d we set h

y

= n 2=7

whih an be kept xed during optimization after

spheringthedata(seenext). Thisvalueiswithinthe

optimal bounds O(n 1=3

) and O(n 1=4

) given in [4,

Se. 4℄ for the related problem of projetion pursuit

regression, whileit wasfound to givegood resultsin

pratie. Forthes-bandwidthwehose theGaussian

MISEoptimalvalueh

s

=(3n=4) 1=5

[17,Ch.3.2℄.

3.2 Sphering

A spheringofthedatax

i

,namely,anormalization

to zero mean and identity ovariane matrix, makes

the kernel bandwidth h

y

independent of the proje-

tion. Thenh

y

anbekeptonstantduringoptimiza-

tion leading to onsiderable omputational savings.

Sphering meansa rotationof the data to their PCA

diretions and then standardizationofthe individual

varianestoone. Toavoidmodelingnoiseinthedata,

it is typial to ignore diretions with small eigenval-

ues, and aheuristi method to do this is by putting

a threshold to the ratio of the umulative variane

(addedeigenvalues)to thetotalvariane.

dataisbysingularvaluedeomposition[9℄. LetXbe

thendmatrixwhoserowsarethedatax

i

afterthey

havebeen normalized to zero mean. For n > d, we

omputethesingularvaluedeompositionX=ULV T

ofthematrixXand formthematrixA= p

nVL 1

.

ThepointsXAarethensphered[10℄.

For n d the data x

i

lie in general in a (n 1)-

dimensionalEulideansubspaeofIR d

. Inthisaseit

ismoreonvenienttoomputetheprinipaldiretions

througheigenanalysisof K =XX T

, theinner prod-

utsmatrix of the zero mean data. We ompute its

singularvaluedeompositionK=ULV T

andremove

the last olumn of V and last olumn and rowof L

(the last eigenvalueof Kwill alwaysbezero). Then

we form thematrix A= p

nVL 1

. The pointsKA

are(n 1)-dimensionalandsphered[7℄.

Moreover,allprojetionsof sphereddata x

i in the

formof(1)givealsosphereddatay

i

beause

E[yy T

℄=W T

E[xx T

℄W=I

q

(8)

due to theonstraintof orthonormal olumns of W .

Thisfreesusfrom havingto reestimate (o)varianes

oftheprojeteddataineahstepoftheoptimization

algorithm. Inthe followingweassumethat the data

x

i

havealreadybeenspheredandthepositiondatas

i

havebeennormalizedtozeromeanandunitvariane.

3.3 Optimization

ThesmoothformoftheriskR

K

asafuntionofW

allowstheminimizationoftheformer withnonlinear

optimization. Foronstrained optimization we must

omputethe gradientof R

K

and the gradientof the

onstraintfuntionW T

W I

q

withrespettoW ,and

then plug these estimates in aonstrained nonlinear

optimization routineto optimizewith respet to R

K

[3℄.

An alternative approah whih avoids the use of

onstrainednonlinearoptimization,inasimilarprob-

lemusingkernelsmoothingfordisriminantanalysis,

has been reently proposed in [14℄. The idea is to

parametrizetheprojetionmatrixWbyaprodutof

Jaobirotationmatries[9℄andthenoptimizewithre-

spettotheangleparametersinvolvedineahmatrix.

Forprojetionsfrom IR d

to IR q

this parametrization

takestheform

W= q

Y

o=1 d

Y

u=q+1 G

ou

(9)

where G

ou

is a Jaobi rotationmatrix whih equals

I exeptfortheelementsg =os ,g =sin ,

(4)

00 00 11 11

A

Figure1: Therobottrajetory.

g

uo

= sin

ou , and g

uu

= os

ou

for an angle

ou

whih depends on o and u. For simpliity we let in

the above notation g

oo , g

ou

, et., denote the (o;o)-

th,(o;u)-th,et.,elementsofthematrixG

ou

,respe-

tively. To ensure that W is dq, only the rst q

olumns of the last matrix G

qd

in (9) are retained,

while multipliations must be arried from right to

left toreduetheevaluationost.

MultipliationwithamatrixG

ou

ausesarotation

by

ou

along the plane dened by the dimensions o

and u,while the range of indiesin (9) ensuresthat

allrotationstakeplaealongplanesdenedbyatleast

onenon-projetivediretion, i.e., oneamongthe d

p remaining dimensions. This fat also redues the

totalnumberofparametersfromqdintheonstrained

optimization ase(elementsofmatrixW )toq(d q)

here(angles

ou ).

The derivative of the risk R

K

with respet to an

angle

k l

is(weskipananalytialderivationhere)

k l R

K

=trae

(r

W R

K )

T

k l

W (10)

where therstterminthetraeis

r

W R

K

= 1

nh 2

y X

T

[B+B T

diag(1 T

B)℄XW (11)

whereXisthendmatrixofthesphereddata,1isa

olumn vetorofallones,diag()transformsavetor

to adiagonalmatrix,andBisthennmatrixwith

elements

b

ij

=

j (y

i )

hy (y

i y

j )

hs (s

i s

j )

P

n

h

y (y

i y

k )

h

s (s

i s

k )

: (12)

Figure2: PanoramisnapshotfrompositionA.

Theseond termofthetraeis

k l W=

q

Y

o=1 d

Y

u=q+1

k l G

ou

(13)

where

k l G

ou

= (

G 0

ou

ifk=oandl=u

G

ou

otherwise

(14)

andG 0

ou

isthematrixG

ou

withtheones substituted

byzeros and thetrigonometri funtions substituted

bytheirderivatives.

A point we should note is that the mixture den-

sityformof(4)andtheadditionaltrigonometrifun-

tions in (9) an make the landsape of the risk R

K

havenumerous loal minima. For this reason, om-

biningagradient-freeoptimization method like,e.g.,

Nelder-Mead[9℄,withnonlinearoptimizationisrequi-

site.Alsoanappropriatedimensionredutionthrough

spheringprior to optimization ansigniantly fail-

itatethe searh. In anyase, the optimization algo-

rithm must be applied many times and the solution

withtheminimumriskmustberetained.

4 Experiments

We applied the above algorithm to data olleted

by aNomad Soutrobot followinga predened tra-

jetoryinourmobilerobotlabandtheadjoininghall

asshownin Fig. 1. Theomnidiretionalimagingde-

viewhihismountedontopoftherobot onsistsof

avertially mountedstandardameraaimed upward

lookingintoaspherialmirror. Thedatasetontains

104omnidiretionalimages(320240pixels)aptured

every25entimeters alongtherobotpath. Eahim-

ageis transformedto apanorami image (64256)

andthis set of104 panoramiimages onstitutesthe

trainingsetofouralgorithm.Atypialpanoramiim-

ageshotat theposition A ofthe trajetoryis shown

inFig.2.

Inordertoapplyoursupervisedprojetionmethod,

werst spheredthe panorami image datausing the

(5)

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−3

−2.5

−2

−1.5

−1

−0.5 0 0.5 1 1.5 2

start

end

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

−2

−1.5

−1

−0.5 0 0.5 1 1.5 2 2.5

start

end

Proposedmethod: R= 85 PCA:R= 72

Figure 3: Projetionof thespheredpanorami imagedata from 10-d to2-d: using theproposed method (left),

projetionontherst twoprinipal omponents(right). The`start' and`end'pointsaretheprojetionsofthe

panorami imagesapturedbytherobotatthebeginningandend, respetively,ofitstrajetory.

inner produts matrix as explained above and kept

the rst 10 dimensions explaining about 60% of the

total variane. Then weapplied ourmethod projet-

ing the sphered data points from 10-d to 2-d. The

resulting two-dimensional points are shown on the

left part of Fig. 3. For optimization we ran several

timesaombinedsearhusingtheNelder-Meadalgo-

rithmwithrandominitialvaluesfortheJaobiangles

in [ =2;=2℄, together with nonlinear optimization

with theBFGSalgorithm[3,9℄. RunningonlyBFGS

requiredmanymoreruns withrandominitialguesses

to reah the global minimum, leadingto omparable

totalexpenses. Eahexeutionoftheoptimizational-

gorithm took a oupleof seonds in aSpar Ultra5

mahine.

On the right part of Fig. 3 we show the result of

projeting the sphered 10-d points on the rst two

prinipal omponentsofthedata. Welearly seethe

advantageoftheproposedmethodoverPCA.Therisk

issmaller,whilefromtheshapeoftheprojetedman-

ifold wesee that taking into aount thepose infor-

mationduringprojetionansigniantlyimprovethe

resultingfeatures: therearefewerself-intersetionsof

the projeted manifold in our method than in PCA

whih, in turn, means better robot position estima-

tionontheaverage.

Finally,in Fig.4weshowthersttwofeatureve-

tors (points in the original spae of panorami im-

ages)learnedbyourmethodandbyPCA.InthePCA

asethesearethefamiliarrsttwoeigenimagesofthe

panoramidatawhih,asisnormallyobservedintyp-

ialdatasets, exhibitlowspatial frequenies. Wesee

thattheproposedsupervisedprojetionmethodyields

verydierent featurevetorsthanPCA, namely,im-

ageswithhigherspatialfrequeniesanddistinthar-

ateristis.

5 Conlusions

We proposed a method for learning task-relevant

linear features from high-dimensionalrobot observa-

tions. Ourmethodissupervisedinthesensethatthe

position of therobot in thesampleis alsotaken into

aountduringoptimization. Thismakesthemethod

superior to PCA whih is unsupervised. We showed

results of linear feature extration from panorami

robot data when the robot was moving in a typial

oÆeenvironment. Theresultsshowlearlythesupe-

riorityoftheproposed methodoverPCA.

Ourmethodanbeusefulinvariousrobotisettings

andis notlimited to mobilerobots. Inpartiular, it

anused in anyase where global feature extration

from supervised robot observations is in order. The

extensionof themethodto handle nonlinearfeatures

ispossible(e.g.,by usinganeuralnetwork)but then

additionalissueshaveto beaddressed(omplexityof

thenetwork,overtting,et.). Besides, thewide use

ofPCAinrobotiproblems showsthatlinearfeature

(6)

2ndoptimalfeaturevetor 2ndeigenvetor

Figure4: Thersttwofeaturevetorsusingourmethod(left),andPCA(right).

Referenes

[1℄ J.Borenstein,B.Everett,andL.Feng. Navigat-

ingMobile Robots: SystemsandTehniques. A.

K.Peters,Ltd,Wellesley,MA,1996.

[2℄ J.L.Crowley,F.Wallner,andB.Shiele.Position

estimation using prinipal omponents of range

data. InPro.IEEE Int.Conf. onRobotis and

Automation,Leuven,Belgium,May1998.

[3℄ P.E. Gill,W. Murray,andM.Wright. Pratial

Optimization. AademiPress,London, 1981.

[4℄ P. Hall. On projetionpursuitregression. Ann.

Statist., 17(2):573{588,1989.

[5℄ M. Jogan and A. Leonardis. Robust loaliza-

tionusing theeigenspae of spinning-images. In

Pro. IEEE Workshop on Omnidiretional Vi-

sion,SouthCarolina,June2000.

[6℄ B.KroseandR.Bunshoten.Probabilistiloal-

ization by appearane models and ative vision.

In Pro. IEEE Int. Conf. on Robotis and Au-

tomation, pages 2255{2260, Detroit, Mihigan,

May1999.

[7℄ V. Kumar and H. Murakami. EÆient alu-

lation of primary images from a set of images.

IEEE Trans. Pattern Analysis and Mahine In-

telligene,4(5):511{515,1982.

[8℄ S. K.Nayar,H. Murase,andS. A.Nene. Learn-

ing,positioning,andtrakingvisualappearane.

In Pro. IEEE Int. Conf. on Robotis and Au-

[9℄ W. H. Press, S. A. Teukolsky, B. P. Flannery,

and W. T. Vetterling. NumerialReipesin C.

CambridgeUniversityPress,2ndedition,1992.

[10℄ B. D. Ripley. Pattern Reognition and Neural

Networks. Cambridge University Press, Cam-

bridge,U.K.,1996.

[11℄ R.SimandG.Dudek.Learningvisuallandmarks

for pose estimation. In Pro. IEEE Int. Conf.

onRobotisandAutomation,Detroit,Mihigan,

May1999.

[12℄ C.J.Stone.Consistentnonparametriregression

(withdisussion).Ann.Statist.,5:595{645,1977.

[13℄ S. Thrun. Bayesian landmark learning for mo-

bile robotloalization. MahineLearning,33(1),

1998.

[14℄ K.Torkkolaand W.Campbell. Mutualinforma-

tioninlearningfeaturetransformations.InPro.

Int. Conf. on Mahine Learning, Stanford, CA,

June2000.

[15℄ N. Vlassis and B. Krose. Robot environment

modelingviaprinipal omponentregression. In

Pro.IEEE/RSJInt.Conf.onIntelligentRobots

and Systems, pages 677{682, Kyongju, Korea,

Ot. 1999.

[16℄ N. Vlassis, Y. Motomura,and B. Krose. Super-

visedlinearfeatureextrationformobilerobotlo-

alization. InPro.IEEEInt.Conf.onRobotis

and Automation, pages 2979{2984, San Fran-

siso, CA,Apr.2000.

[17℄ M.P.WandandM.C.Jones.KernelSmoothing.

Chapman&Hall,London,1995.