Learning task-relevant features from robot data
NikosVlassis Roland Bunshoten Ben Krose
RWCP, Autonomous Learning Funtions SNN
ComputerSiene Institute
Faulty of Siene
University of Amsterdam
The Netherlands
fvlassis,bunshot,krosegsie ne. uva. nl
http://www.siene.uva.nl /res ear h/i as
Abstrat
Featureextrationfromrobotsensordataisastan-
dard way to deal with the high dimensionality and
redundany of suh data. An automati, ommonly
used way to learn suh features from a set of robot
observationsisPrinipalComponentAnalysis(PCA).
However, as we argued in previous work, PCA an
yieldfeatureswithlittledisriminatorypowerbetween
robot positions, leading to suboptimal loalization
performane of the robot. In order to get optimal
task-relevantfeatures,PCAmustbereplaedbyasu-
pervisedprojetionmethod.
Inthispaperweextendourpreviouslyproposedsu-
pervisedlinearfeatureextrationmethodintwoways:
(i) theprojetionmatrixis optimizedsimultaneously
overallolumnsundertheonstraintoforthonormal-
ity,(ii)aJaobiparametrizationofthematrixallows
theuseofunonstrainednonlinearoptimizationalgo-
rithms. ThenewalgorithmismoreeÆientandmany
times faster than the old version. We show experi-
mental resultsin extrating features from panorami
images ofamobile robot. Theresultsomparefavor-
ablyto thePCAsolutions.
1 Introdution
Inseveralmobilerobot appliationswhereamodel
of the environmentmust be builtand used fornavi-
gation,appropriatelandmarksorfeaturesmustbeex-
tratedfromtherawrobotsensormeasurementsprior
to modeling. The rationale is that normally the di-
mensionality of these data is veryhigh, making any
The features that are extrated from robot sen-
sor data an be lassied as loal or global. The
formerusuallyrefertoloation-dependentdistintive
harateristis of the environment like doors, hall-
ways, et., (natural landmarks), or landmarks real-
izedthroughspeializeddevieslikebeaons(artiial
landmarks)[1℄. Ontheotherhand,aglobalfeatureis
normallyloation-independentand aimsatproviding
goodrobotloalization ontheaverage.
Reentlytherehasbeenagrowinginterestinauto-
matiproedures that learn suh features from a set
of data (see, e.g., [13℄). Automati learning of fea-
turesisanaturalobjetivebeauseontheonehandit
obviates theneedfor man interferenein the feature
extrationproess,whileontheotherhandmakesthe
proess(potentially)environmentindependent.
Learningfeatures from aset of robot observations
ismostoftenarriedoutwithstatistialmethods,and
theeasiestandmostommonlyusedisPrinipalCom-
ponentAnalysis (PCA)[10℄. Thisis aglobal feature
extrationmethodwhihprojetsasetofrobotobser-
vations linearlyto alow-dimensional subspae,om-
puted by solving a matrix eigenvalue problem. The
niething aboutPCAisthat itombinesmanyopti-
malitypropertiesandisverysimpletoimplement[10℄.
Reent reports on the use of PCA on mobile robots
are[8,2,6, 11, 15,5℄.
However,whentherobotobservationsareolleted
ina`supervised'manner,i.e.,whentheyareannotated
inthesamplewiththepositionoftherobotwhereeah
observation was taken, then, as argued in [16℄, PCA
anbesuboptimal. ThereasonisthatPCAisanun-
supervisedfeature extration method that uses only
theobservedsensorvetorstoomputetheprojetion
little disriminatory power between robot positions.
If featureextrationisto beused fortaskslikerobot
loalizationandnavigation,thenPCAshouldbesub-
stitutedbyasupervisedprojetionmethod[16℄.
Intheurrentpaperweextendtheresultsin[16℄in
twomainways.First,intheaboveworktheprojetion
diretions were learned in a greedy fashion, namely,
a projetion to an optimal diretion was omputed,
thenaseondoptimaldiretionwassoughtwhihwas
orthogonaltotherst,et. Thisstrategyanbesub-
optimal anditis notdiÆulttodeviseartiial data
setsthatshowthissuboptimalbehavior. Inthispaper
weoptimizetheprojetionmatrix(seebelow)simul-
taneouslyforalldimensionswhilekeepingitsolumns
pairwise orthonormal.
Seond, we adopt an optimization strategy whih
obviatestheneedforonstrainednonlinearoptimiza-
tionbyparametrizingtheprojetionmatrixasaprod-
ut of Jaobi matries satisfying the orthogonality
onstraint during optimization. These two improve-
mentsmakethemethodmoreeÆientandmuhfaster
thantheoriginalversion.
In the following we rst desribe the proposed
method and then show experimental results from its
appliation in panorami image data olleted by a
mobilerobotinatypialindoorenvironment. Theav-
erageloalizationperformane|evaluatedthroughan
appropriate risk funtion|when using the proposed
method vs. PCA, and the visualization of the pro-
jeted data manifoldin the reduedsubspaepermit
aquantitativeandqualitativeveriationof ourthe-
oretial laims.
2 Feature extration and the loaliza-
tion risk
For larity of exposition and visualization we will
limitouranalysis toarobotthatfollowsapredened
one-dimensional trajetoryin itsworkspae. There-
sults extend diretly to the general ase. For eah
position (oset) s of the robot on the trajetory we
assume that the sensors provide an observation ve-
tor x2IR d
. Forouranalysisweassumeasupervised
training set fs
i
;x
i
g, 1 i n, of observations x
i
olletedatrespetivetrajetorypositionss
i .
Linear feature extration amounts to reduing the
dimensionality of the data x
i
by linearly projeting
them toasubspaeIR q
,1<q<d, multiplyingthem
y
i
=W T
x
i
; 1in; W T
W=I
q (1)
whereI
q
standsfortheq-dimensionalidentitymatrix.
Moreover,weassumeaprobabilistimodelthatasso-
iates robot loations with sensor observations. For
an observation x that is projeted through (1) to a
feature vetor y we assume a model for p(sjy ), the
onditionaldensityoftherobotpositionsgiveny .
Toassessthequalityofanindividualprojetionwe
mustdeneanappropriateriskfuntionthatmeasures
theaverageloalization performaneof therobot us-
ingtheextrated featuresy
i
. Forthispurposeitwas
proposedin [13℄theriskfuntion
R
L
= 1
n n
X
i=1 Z
js s
i jp(sjy
i
)ds; (2)
i.e., the average overthe training set mean absolute
distaneto thetrue|onditionedonthe featureve-
tory
i
|loation s
i
. This risk penalizes position esti-
mates that appear on the average far from the true
positionoftherobot. Theaboveformulawasapprox-
imatedin [13℄ from the training set with omplexity
O(n 3
).
In [16℄ we proposed an alternative risk whih is
O(n 2
). This risk is based on the simple observa-
tion that, for a given observation x
i
whih is pro-
jetedthrough (1) to y
i
, the density p(sjy
i
) will al-
ways exhibit a mode on s = s
i
. Thus, an approx-
imate measure of divergene from this mode is the
Kullbak-Leiblerdistane betweenp(sjy
i
)and auni-
modal density sharply peaked at s = s
i
, giving the
approximate estimate logp(s
i jy
i
) plus a onstant.
Averagingoverallpointsy
i
wehaveto minimizethe
risk
R
K
= 1
n n
X
i=1 logp(s
i jy
i
) (3)
whih an be regarded as the average negative log-
likelihoodofthedatagiventhemodelof p(s
i jy
i )and
theprojetionmatrixW .
From (3) we see that anonparametri estimate of
p(sjy ) is needed. For an appropriate sequene of
weights
j
(y );1jn,suh anestimateis[12℄
p(sjy )= n
X
j=1
j (y )
hs (s s
j
) (4)
where
h
s (s)=
1
p
2h exp
s 2
2h 2
(5)
s
dening aloal smoothingregionarounds. Aweight
funtion
j
(y ) whih satises the onditions in [12℄
and makes the above estimate asmooth funtion of
theprojetionmatrixWis
j (y )=
hy (y y
j )
P
n
k =1
h
y (y y
k )
(6)
where
hy (y )=
1
(2) q=2
h q
y exp
jjy jj 2
2h 2
y
(7)
is the q-dimensional spherial Gaussian kernel with
bandwidthh
y
. Thetwokernelbandwidthsh
y andh
s
are theonlyfreeparametersof themodel p(sjy ) and
their values aet the resulting projetions. Substi-
tuting p(sjy ) from above into (3) weget a risk with
omplexityO(n 2
).
3 Model seletion and optimization
3.1 Kernel smoothing
Using a nonparametri estimate of a density us-
ing(4)and(5){(7)requiresahoieforthesmoothing
parameters y
s and h
y
. Our approah was to assign
onstantvaluestothesetwobandwidthsduringopti-
mization. For projetions to 2-d we set h
y
= n 2=7
whih an be kept xed during optimization after
spheringthedata(seenext). Thisvalueiswithinthe
optimal bounds O(n 1=3
) and O(n 1=4
) given in [4,
Se. 4℄ for the related problem of projetion pursuit
regression, whileit wasfound to givegood resultsin
pratie. Forthes-bandwidthwehose theGaussian
MISEoptimalvalueh
s
=(3n=4) 1=5
[17,Ch.3.2℄.
3.2 Sphering
A spheringofthedatax
i
,namely,anormalization
to zero mean and identity ovariane matrix, makes
the kernel bandwidth h
y
independent of the proje-
tion. Thenh
y
anbekeptonstantduringoptimiza-
tion leading to onsiderable omputational savings.
Sphering meansa rotationof the data to their PCA
diretions and then standardizationofthe individual
varianestoone. Toavoidmodelingnoiseinthedata,
it is typial to ignore diretions with small eigenval-
ues, and aheuristi method to do this is by putting
a threshold to the ratio of the umulative variane
(addedeigenvalues)to thetotalvariane.
dataisbysingularvaluedeomposition[9℄. LetXbe
thendmatrixwhoserowsarethedatax
i
afterthey
havebeen normalized to zero mean. For n > d, we
omputethesingularvaluedeompositionX=ULV T
ofthematrixXand formthematrixA= p
nVL 1
.
ThepointsXAarethensphered[10℄.
For n d the data x
i
lie in general in a (n 1)-
dimensionalEulideansubspaeofIR d
. Inthisaseit
ismoreonvenienttoomputetheprinipaldiretions
througheigenanalysisof K =XX T
, theinner prod-
utsmatrix of the zero mean data. We ompute its
singularvaluedeompositionK=ULV T
andremove
the last olumn of V and last olumn and rowof L
(the last eigenvalueof Kwill alwaysbezero). Then
we form thematrix A= p
nVL 1
. The pointsKA
are(n 1)-dimensionalandsphered[7℄.
Moreover,allprojetionsof sphereddata x
i in the
formof(1)givealsosphereddatay
i
beause
E[yy T
℄=W T
E[xx T
℄W=I
q
(8)
due to theonstraintof orthonormal olumns of W .
Thisfreesusfrom havingto reestimate (o)varianes
oftheprojeteddataineahstepoftheoptimization
algorithm. Inthe followingweassumethat the data
x
i
havealreadybeenspheredandthepositiondatas
i
havebeennormalizedtozeromeanandunitvariane.
3.3 Optimization
ThesmoothformoftheriskR
K
asafuntionofW
allowstheminimizationoftheformer withnonlinear
optimization. Foronstrained optimization we must
omputethe gradientof R
K
and the gradientof the
onstraintfuntionW T
W I
q
withrespettoW ,and
then plug these estimates in aonstrained nonlinear
optimization routineto optimizewith respet to R
K
[3℄.
An alternative approah whih avoids the use of
onstrainednonlinearoptimization,inasimilarprob-
lemusingkernelsmoothingfordisriminantanalysis,
has been reently proposed in [14℄. The idea is to
parametrizetheprojetionmatrixWbyaprodutof
Jaobirotationmatries[9℄andthenoptimizewithre-
spettotheangleparametersinvolvedineahmatrix.
Forprojetionsfrom IR d
to IR q
this parametrization
takestheform
W= q
Y
o=1 d
Y
u=q+1 G
ou
(9)
where G
ou
is a Jaobi rotationmatrix whih equals
I exeptfortheelementsg =os ,g =sin ,
00 00 11 11
A
Figure1: Therobottrajetory.
g
uo
= sin
ou , and g
uu
= os
ou
for an angle
ou
whih depends on o and u. For simpliity we let in
the above notation g
oo , g
ou
, et., denote the (o;o)-
th,(o;u)-th,et.,elementsofthematrixG
ou
,respe-
tively. To ensure that W is dq, only the rst q
olumns of the last matrix G
qd
in (9) are retained,
while multipliations must be arried from right to
left toreduetheevaluationost.
MultipliationwithamatrixG
ou
ausesarotation
by
ou
along the plane dened by the dimensions o
and u,while the range of indiesin (9) ensuresthat
allrotationstakeplaealongplanesdenedbyatleast
onenon-projetivediretion, i.e., oneamongthe d
p remaining dimensions. This fat also redues the
totalnumberofparametersfromqdintheonstrained
optimization ase(elementsofmatrixW )toq(d q)
here(angles
ou ).
The derivative of the risk R
K
with respet to an
angle
k l
is(weskipananalytialderivationhere)
k l R
K
=trae
(r
W R
K )
T
k l
W (10)
where therstterminthetraeis
r
W R
K
= 1
nh 2
y X
T
[B+B T
diag(1 T
B)℄XW (11)
whereXisthendmatrixofthesphereddata,1isa
olumn vetorofallones,diag()transformsavetor
to adiagonalmatrix,andBisthennmatrixwith
elements
b
ij
=
j (y
i )
hy (y
i y
j )
hs (s
i s
j )
P
n
h
y (y
i y
k )
h
s (s
i s
k )
: (12)
Figure2: PanoramisnapshotfrompositionA.
Theseond termofthetraeis
k l W=
q
Y
o=1 d
Y
u=q+1
k l G
ou
(13)
where
k l G
ou
= (
G 0
ou
ifk=oandl=u
G
ou
otherwise
(14)
andG 0
ou
isthematrixG
ou
withtheones substituted
byzeros and thetrigonometri funtions substituted
bytheirderivatives.
A point we should note is that the mixture den-
sityformof(4)andtheadditionaltrigonometrifun-
tions in (9) an make the landsape of the risk R
K
havenumerous loal minima. For this reason, om-
biningagradient-freeoptimization method like,e.g.,
Nelder-Mead[9℄,withnonlinearoptimizationisrequi-
site.Alsoanappropriatedimensionredutionthrough
spheringprior to optimization ansigniantly fail-
itatethe searh. In anyase, the optimization algo-
rithm must be applied many times and the solution
withtheminimumriskmustberetained.
4 Experiments
We applied the above algorithm to data olleted
by aNomad Soutrobot followinga predened tra-
jetoryinourmobilerobotlabandtheadjoininghall
asshownin Fig. 1. Theomnidiretionalimagingde-
viewhihismountedontopoftherobot onsistsof
avertially mountedstandardameraaimed upward
lookingintoaspherialmirror. Thedatasetontains
104omnidiretionalimages(320240pixels)aptured
every25entimeters alongtherobotpath. Eahim-
ageis transformedto apanorami image (64256)
andthis set of104 panoramiimages onstitutesthe
trainingsetofouralgorithm.Atypialpanoramiim-
ageshotat theposition A ofthe trajetoryis shown
inFig.2.
Inordertoapplyoursupervisedprojetionmethod,
werst spheredthe panorami image datausing the
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2
−3
−2.5
−2
−1.5
−1
−0.5 0 0.5 1 1.5 2
start
end
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
−2
−1.5
−1
−0.5 0 0.5 1 1.5 2 2.5
start
end
Proposedmethod: R= 85 PCA:R= 72
Figure 3: Projetionof thespheredpanorami imagedata from 10-d to2-d: using theproposed method (left),
projetionontherst twoprinipal omponents(right). The`start' and`end'pointsaretheprojetionsofthe
panorami imagesapturedbytherobotatthebeginningandend, respetively,ofitstrajetory.
inner produts matrix as explained above and kept
the rst 10 dimensions explaining about 60% of the
total variane. Then weapplied ourmethod projet-
ing the sphered data points from 10-d to 2-d. The
resulting two-dimensional points are shown on the
left part of Fig. 3. For optimization we ran several
timesaombinedsearhusingtheNelder-Meadalgo-
rithmwithrandominitialvaluesfortheJaobiangles
in [ =2;=2℄, together with nonlinear optimization
with theBFGSalgorithm[3,9℄. RunningonlyBFGS
requiredmanymoreruns withrandominitialguesses
to reah the global minimum, leadingto omparable
totalexpenses. Eahexeutionoftheoptimizational-
gorithm took a oupleof seonds in aSpar Ultra5
mahine.
On the right part of Fig. 3 we show the result of
projeting the sphered 10-d points on the rst two
prinipal omponentsofthedata. Welearly seethe
advantageoftheproposedmethodoverPCA.Therisk
issmaller,whilefromtheshapeoftheprojetedman-
ifold wesee that taking into aount thepose infor-
mationduringprojetionansigniantlyimprovethe
resultingfeatures: therearefewerself-intersetionsof
the projeted manifold in our method than in PCA
whih, in turn, means better robot position estima-
tionontheaverage.
Finally,in Fig.4weshowthersttwofeatureve-
tors (points in the original spae of panorami im-
ages)learnedbyourmethodandbyPCA.InthePCA
asethesearethefamiliarrsttwoeigenimagesofthe
panoramidatawhih,asisnormallyobservedintyp-
ialdatasets, exhibitlowspatial frequenies. Wesee
thattheproposedsupervisedprojetionmethodyields
verydierent featurevetorsthanPCA, namely,im-
ageswithhigherspatialfrequeniesanddistinthar-
ateristis.
5 Conlusions
We proposed a method for learning task-relevant
linear features from high-dimensionalrobot observa-
tions. Ourmethodissupervisedinthesensethatthe
position of therobot in thesampleis alsotaken into
aountduringoptimization. Thismakesthemethod
superior to PCA whih is unsupervised. We showed
results of linear feature extration from panorami
robot data when the robot was moving in a typial
oÆeenvironment. Theresultsshowlearlythesupe-
riorityoftheproposed methodoverPCA.
Ourmethodanbeusefulinvariousrobotisettings
andis notlimited to mobilerobots. Inpartiular, it
anused in anyase where global feature extration
from supervised robot observations is in order. The
extensionof themethodto handle nonlinearfeatures
ispossible(e.g.,by usinganeuralnetwork)but then
additionalissueshaveto beaddressed(omplexityof
thenetwork,overtting,et.). Besides, thewide use
ofPCAinrobotiproblems showsthatlinearfeature
2ndoptimalfeaturevetor 2ndeigenvetor
Figure4: Thersttwofeaturevetorsusingourmethod(left),andPCA(right).
Referenes
[1℄ J.Borenstein,B.Everett,andL.Feng. Navigat-
ingMobile Robots: SystemsandTehniques. A.
K.Peters,Ltd,Wellesley,MA,1996.
[2℄ J.L.Crowley,F.Wallner,andB.Shiele.Position
estimation using prinipal omponents of range
data. InPro.IEEE Int.Conf. onRobotis and
Automation,Leuven,Belgium,May1998.
[3℄ P.E. Gill,W. Murray,andM.Wright. Pratial
Optimization. AademiPress,London, 1981.
[4℄ P. Hall. On projetionpursuitregression. Ann.
Statist., 17(2):573{588,1989.
[5℄ M. Jogan and A. Leonardis. Robust loaliza-
tionusing theeigenspae of spinning-images. In
Pro. IEEE Workshop on Omnidiretional Vi-
sion,SouthCarolina,June2000.
[6℄ B.KroseandR.Bunshoten.Probabilistiloal-
ization by appearane models and ative vision.
In Pro. IEEE Int. Conf. on Robotis and Au-
tomation, pages 2255{2260, Detroit, Mihigan,
May1999.
[7℄ V. Kumar and H. Murakami. EÆient alu-
lation of primary images from a set of images.
IEEE Trans. Pattern Analysis and Mahine In-
telligene,4(5):511{515,1982.
[8℄ S. K.Nayar,H. Murase,andS. A.Nene. Learn-
ing,positioning,andtrakingvisualappearane.
In Pro. IEEE Int. Conf. on Robotis and Au-
[9℄ W. H. Press, S. A. Teukolsky, B. P. Flannery,
and W. T. Vetterling. NumerialReipesin C.
CambridgeUniversityPress,2ndedition,1992.
[10℄ B. D. Ripley. Pattern Reognition and Neural
Networks. Cambridge University Press, Cam-
bridge,U.K.,1996.
[11℄ R.SimandG.Dudek.Learningvisuallandmarks
for pose estimation. In Pro. IEEE Int. Conf.
onRobotisandAutomation,Detroit,Mihigan,
May1999.
[12℄ C.J.Stone.Consistentnonparametriregression
(withdisussion).Ann.Statist.,5:595{645,1977.
[13℄ S. Thrun. Bayesian landmark learning for mo-
bile robotloalization. MahineLearning,33(1),
1998.
[14℄ K.Torkkolaand W.Campbell. Mutualinforma-
tioninlearningfeaturetransformations.InPro.
Int. Conf. on Mahine Learning, Stanford, CA,
June2000.
[15℄ N. Vlassis and B. Krose. Robot environment
modelingviaprinipal omponentregression. In
Pro.IEEE/RSJInt.Conf.onIntelligentRobots
and Systems, pages 677{682, Kyongju, Korea,
Ot. 1999.
[16℄ N. Vlassis, Y. Motomura,and B. Krose. Super-
visedlinearfeatureextrationformobilerobotlo-
alization. InPro.IEEEInt.Conf.onRobotis
and Automation, pages 2979{2984, San Fran-
siso, CA,Apr.2000.
[17℄ M.P.WandandM.C.Jones.KernelSmoothing.
Chapman&Hall,London,1995.