• Aucun résultat trouvé

Supervised linear feature extraction for mobile robot localization

N/A
N/A
Protected

Academic year: 2021

Partager "Supervised linear feature extraction for mobile robot localization"

Copied!
6
0
0

Texte intégral

(1)

Proceedings of the 2000 IEEE

International Conference on Robotics & Automation San Francisco, CA • April 2000

S u p e r v i s e d L i n e a r F e a t u r e E x t r a c t i o n for M o b i l e R o b o t L o c a l i z a t i o n

Nikos Vlassis*

Yoichi Motomura ~

Ben KrSse*

*RWCP, Autonomous Learning Functions SNN

University of Amsterdam

Kruislaan 403, 1098 SJ Amsterdam

The Netherlands

e-mail: {vlassis, krose}@wins.uva.nl

t E l e c t r o t e c h n i c a l L a b o r a t o r y U m e z o n o 1 - 1 - 4 T s u k u b a I b a r a k i 3 0 5 - 8 5 6 8 J a p a n e - m a i l : m o t o m u r a @ e t l . g o . j p

Abstract

We are seeking linear projections of supervised high- dimensional robot observations and an appropriate en- vironment model that optimize the robot localization task. V~b show that an appropriate risk function to minimize is the conditional entropy of the robot po- sitions given the projected observations. We propose a method of iterative optimization through a proba- bilistic model based on kernel smoothing. To obtain good starting optimization solutions we use canonical correlation analysis. We apply our method on a real experiment involving a mobile robot equipped with an omnidirectional camera in an otfice setup.

1

I n t r o d u c t i o n

Current trend in mobile robot technology is towards building fully autonomous mobile robots, i.e., robots that can operate without external guidance in unstruc- tured or natural environments. To localize themselves accurately and then plan paths in their workspace the robots must use their perception mechanism, e.g., vi- sion, often in combination with a dead-reckoning de- vice, e.g., an odometer.

From a statistical viewpoint the robot localization task can be regarded as a prediction problem. Given an a priori model of the environment and a new sensor observation the task is to predict the position of the robot as accurately as possibly. Such a model, called map, is often built through supervised learning from a set of known robot positions-sensor observations [7, 11, 5, 12].

Sensor technology provides high-dimensional d a t a

such as images or range profiles. To deal with the abundance and the inherent r e d u n d a n c y in the d a t a (e.g., too many correlated measurements) an appro- priate feature extraction scheme should precede the modeling step. T h e e x t r a c t e d features can be natu- ral landmarks, i.e., distinctive features of the environ- ment [1] or landmarks formed by some mathematical transformation on the original observations [11, 5, 12]. In this paper we deal with the latter case, specifically the extraction of linear features from omnidirectional image d a t a to be used for map building and localiza- tion. Previous work in our group has investigated the use of principal component analysis (PCA) for this purpose [5, 12]. However, P C A is an unsupervised m e t h o d which optimizes a reconstruction error and may not be necessarily good for localization.

In this paper we look for supervised linear projections of the robot observations and an appropriate environ- ment model so t h a t the localization performance of the robot is optimized. In the following we describe the proposed model, the localization criterion to opti- mize, and the optimization m e t h o d we use to get the optimal features. We d e m o n s t r a t e our m e t h o d on a real robot equipped with an omnidirectional camera in an office environment. T h e results show that our m e t h o d outperforms P C A as a linear feature extrac- tion m e t h o d for robot localization.

2

T h e r o b o t l o c a l i z a t i o n p r o b -

l e m

(2)

dimensional vector 1 z*, e.g., an image. The robot lo- calization problem concerns the prediction of x* given

Z*.

To deal with the noise inherent in robot sensing we adopt a probabilistic framework. Let x be a stochastic vector describing the position of the robot and p(xlz*) the conditional density of x given a robot observation z*, call it

posterior density

henceforth. T h e localiza- tion problem can be formulated as finding an estimate of the posterior density as peaked as possible to the real robot position x*.

First we note t h a t due to perceptual alias the pos- terior density may exhibit multiple modes in different regions of the workspace. In other words, for the same observation z* two or more positions in the workspace can be candidates for x*. This implies t h a t a solu- tion to the localization problem would be to directly model the posterior density p(xlz* ) as a mixture of conditional densities and fit it from the d a t a [9]. However, for realistic r o b o t localization this approach is not adequate since we need to integrate old position estimates and actions into a single position estimate. One way to achieve this is by means of the Bayes' rule. We write the posterior density as

p ( x l z . ) _ p(z* Ix)p(x) (1)

p(z*)

where p(z* Ix) is the likelihood of the observation given x, p(x) a

prior

density on the robot positions, and p(z*) the unconditional density of observations alone. By repeatedly applying (1) using the posterior density as prior for the next estimate, and independently up- dating the prior from robot actions, we get a localiza- tion procedure t h a t can be used for robot navigation. Kalman filters and hidden Markov models are exam- ples of iterative localization procedures. See [11] for details and references.

3

F e a t u r e e x t r a c t i o n

We see from the above t h a t robot localization requires a model for p(zlx), the conditional density of obser- vations given robot positions. To build such a model we assume we are given a supervised training set in the form D = {xn,z,~},n = 1 , . . . , N , consisting of N pairs of known robot positions with their associated observations.

In p a t t e r n recognition there are m a n y ways to fit con- ditional densities like p ( z l x ) from a training set by us-

iAll vectors are assumed column vectors.

ing parametric, e.g., a neural network, or nonparamet- ric methods [9]. However, in b o t h cases the efficiency of the learning machine is highly influenced by the dimensionality of the observations, and it turns out t h a t for accurate modeling of high-dimensional d a t a we need a very large number of training points [3, ch. 6.12].

This fact suggests reducing the dimensionality of the observations prior to modeling, in other words, ex- tracting appropriate features from the original high- dimensional d a t a to be used for robot localization. We restrict our attention to linear features, thus for a q-dimensional (q < d) vector y extracted from an observation z we can write

y = W T z , (2)

with W a d × q projection (or feature) matrix. A usual statistical requirement in such problems is t h a t the components of y must be uncorrelated. This can be interpreted as a geometrical constraint on the matrix W if the d a t a z are already sphered,

i.e.,

ro- tated and scaled so t h a t their covariance m a t r i x is the identity matrix. This is always possible and is equiv- alent to applying principal component analysis and scaling all components to unit variances [6, 4]. The constraint of uncorrelatedness becomes then a con- straint of orthonormality 2 for the m a t r i x W

w T w = Iq (3)

where Iq stands for the q-dimensional identity matrix. In the following we will assume t h a t the original z points have already been sphered and have zero mean. T h e mapping (2) under the constraint (3) in effect rotates the sphered z space and then retains only the first q most 'useful' coordinates for localization. Since the mapping (2) is deterministic, the robot local- ization problem can be reformulated in the projected space by building a model of p(ylx; 0) in this space parametrized 3 on 0. Since each observation z corre- sponds to some projected vector y~ the Bayes rule (1) can be equally applied substituting y for z, while for building a model o f p ( y [ x ) we can use the transformed training set D' = {xn, Yn}, n = 1 . . . . , N.

T h e rest of the paper concerns building such a model and estimating in a supervised manner, i.e., using b o t h the z~ and xn points of the training set, the projection matrix W t h a t gives good localization. For t h a t we need a 'goodness' measure.

(3)

X* X

Figure 1: The predicted posterior p(x]y*) and the delta-peaked density q(x) on the real robot position X*.

4

T h e l o c a l i z a t i o n

c r i t e r i o n

In our analysis we consider the worst case where the robot localizes with a flat prior p(x). Moreover, to simplify matters, we assume that the points xn in the training set were sampled from the same prior p(x), an assumption which leads to mathematical tractability in the models below.

Suppose the robot is at the position x* and observes a vector y* derived from (2). Assume also a selected model p(ylx) which, through Bayes, gives a posterior density p(x[y*). A measure of closeness of the pre- dicted posterior and the real robot position can be derived by taking the

cross-entropy

between a delta- peaked density q(x) on x* and the posterior (Fig. 1)

c = - f q(x)logp(xlY*)dx ~ - logp(x*[y*), (4)

the approximation justified by the fact that q(x) is peaked on x*.

Averaging the above

loss

function over the joint x - y space we arrive at the definition of the

conditional

en-

tropy

[8] of the positions x given the projected vectors Y

f f p ( x ,

y ) l o g p ( x [ y ) d x d y (5) H(x[y)

J J

as the expected

risk

using the model p(y[x) and the mapping (2). The integral can be approximated using the empirical distribution of the training points to get the

empirical

risk

N N

1

1

P(xn,yn)

R = - ~ E l o g p ( x n l Y n ) = - ~ E l o g P(Yn)

n = l n = l

(6)

which can also be regarded as the negative log- likelihood of the training set with respect to the den- sity p(xn[Yn)- Optimal localization is then achieved by minimizing (6).

5

M o d e l

d e s c r i p t i o n

To minimize the empirical risk (6) we need models for the densities p(x,y) and p(y). An attractive choice that makes no particular assumptions about the sta- tistical nature of the observations is through kernel

smoothing

[13, 9]. A spherical multivariate Gaussian kernel of width h is centered on each training vector (xn, Yu), giving rise to the approximations

where 1 N p(x,y) = ~ E n = l N 1

p(x) =

1 N

;(Y) = N Z

K2(x - xn)Kq(y - Yn) (7)

K2(x - x.)

(8)

Kq(y -

Yn) (9) 1 _ [2 K 2 ( x - x ~ ) - 27rh 2 exp ( I l x ~ n l ) (10) 1 ( [ [ y - y~112~(11 ) Kq(y - y~) - (27rh2)q/2 exp - ~ ] a bivariate and q-variate Gaussian kernel, respectively. The width h reflects the degree of smoothness or over- fitting of the model and its value for a particular prob- lem can be computed by, e.g., cross-validation tech- niques [13, 9]. Finally, the use of kernel smoothing for density estimation provides the model p(y]x) for localization as

p(ylx) - p(x, y)

p(x) (12)

with p(x, y) and p(x) from (7) and (8), respectively. Although direct implementation of the risk quan- tity (6) using the above formulas is possible, its com- putational cost is

O(N2),

with N the size of the train- ing set, which can be regarded infeasible for large data sets. However, a very efficient implementation through

binned

kernel density estimators and the Fast Fourier Transform drops the cost to

O(N)

[10, 13].

6

O p t i m i z a t i o n

(4)

approach. We first solve for a one-dimensional fea- ture which minimizes the risk, then for a second one which is orthogonal to the first, and so on for a specific number of features or until the risk gets no significant decrease.

For one-dimensional projections y = w T z the only constraint that is imposed from (3) is t h a t the norm of the projection vector w must be one. Thus the transformation

w

- I l w l l

(13)

and computation of the risk using y = @Tz reduces the problem to an unconstrained nonlinear optimiza- tion problem which can be solved by regular tech- niques [2]. To compute the gradient of the risk (6) with respect to the vector w we need the gradient of y which is

W T Z Z T -- ( ' ~ ¢ T z ) ~ - T

Vwy = Vw Ilwll =

Ilwll

(14)

However, care must be taken when using gradient- based optimization because due to the nonlinearities of the risk (6) and the kernels (10) and (11) the ob- jective function may easily get stuck in local minima.

6 . 2 E x t r a c t i n g m o r e f e a t u r e s

Assume at some iteration t h a t r feature vectors have been extracted forming the columns of the projection matrix W . This matrix forms an r-dimensional basis in the z space and we can find its orthogonal com- plement W ± by orthogonalizing W (e.g., with Gram- Schmidt) so t h a t

w T W _ L ---- 0. ( 1 8 )

Then we project the d a t a points z on the complemen- tary subspace by multiplying them with W ± and ap- ply the one-dimensional optimization procedure de- scribed above to get an optimal feature W L in this space. T h e resulting vector is transformed back to the original space by w = W ~ w ± and is orthogonal to all other features (columns) of W . Moreover, the constraint (3) ensures t h a t the new projected variable y = w T z is uncorrelated with all other projected vari- ables (components) of y. We can iteratively apply this procedure for a specific n u m b e r of features or until the risk gets no significant decrease.

7

E x p e r i m e n t s

6 . 1 C a n o n i c a l c o r r e l a t i o n a n a l y s i s One way to obtain a good starting solution for the optimization routine is through canonical correlation analysis (CCA) [6]. This statistical m e t h o d seeks lin- ear transformations of two d a t a sets so t h a t the cor- relation between transformed variables from different sets is maximized. In our case we apply CCA be- tween the z points and each coordinate of x = [xl, x2] separately to find two projection directions and then choose the direction with the smallest risk.

Formally, CCA finds an optimal projection vector wi so that the correlation coefficient between the pro- jected variable y = w~Tz and the i-th coordinate xi of x (i = 1, 2) is maximized. For xi of zero mean and unit variance and sphered z the CCA optimal solution becomes [6]

wi = E[x~z] (15)

which, in effect, maximizes the effectiveness of z in predicting xi. The vector w~ t h a t gives the smallest risk for i = 1, 2 can be used as initial guess in the optimization r o u t i n e 3

4The use of C C A to o b t a i n a good first solution to t h e opti- mization p r o b l e m can also be justified by t h e following consid- eration. A s s u m e a o n e - d i m e n s i o n a l p r o j e c t e d variable y a n d a

For our experiments we used the M E M O R A B L E robot database. This database is provided by the T s u k u b a Research Center, Japan, for the Real World Comput- ing Project, and contains a supervised set of a b o u t 8000 robot positions and associated measurements of sonars, infrared sensors, and images from an omnidi- rectional camera (Fig. 2), taken by a Nomad mobile robot moving in a typical office environment (Fig. 3). To check our m e t h o d we used a subset of 300 im- ages obtained randomly from the whole environment. The omnidirectional images were first transformed to 64 × 256 pixels panoramic images and then sphered through PCA to 299-d. The x d a t a were normal- ized to unit variance per dimension. The optimization

o n e - d i m e n s i o n a l x a n d e x p a n d t h e c o n d i t i o n a l e n t r o p y (5) as H(xly) = H(ylx ) + H(x) -- H(y). (16)

T h e first t e r m reflects t h e noise after t h e p r o j e c t i o n while t h e second a n d t h i r d t e r m s are t h e e n t r o p i e s of x a n d y, respec- tively. Let us t h e n t r y to find t h e b e s t m a p p i n g y = f ( x ) t h a t m i n i m i z e s (16). In t h e low noise limit t h e first t e r m can be ignored while for t h e t e r m H(y) we can write [8, p. 565]

(5)

Figure 2: A snapshot of the omnidirectional camera.

:i:~ ~-f /4"i~ ~ I ~ I ~ ~ N

'' :~ " e, :

t i ~:-~ ,-,~ll

~ : - ? P\~ ?'~ :7/=::J Figure 3: T h e office environment.

m e t h o d described in the previous sections was used to e x t r a c t the o p t i m a l features.

To c o m p u t e the first feature we applied CCA between the z points and each individual coordinate Xl and x2 of x to get two solutions w l and w2, respectively. It a p p e a r e d t h a t w l had the smallest risk and this vector was used as initial guess in the nonlinear optimization routine. In the left p a r t of Fig. 4 we show the pro- jections of the z points on the first linear feature as a function of the x coordinates after nonlinear opti- mization. To c o m p a r e with results from our previous research on P C A we show in the right p a r t the pro- jection of the d a t a on the first principal component. We note t h a t in the first case there is an almost linear relationship between the projected variable and the robot positions, a n a t u r a l indication of good localiza- tion performance.

We iterated the procedure to c o m p u t e the first 10 fea- tures. In Fig. 5 we show the risk as a function of the n u m b e r of features (keeping the kernel width con- stant) for the p r o p o s e d m e t h o d and for P C A . We see t h a t for small n u m b e r of features our m e t h o d outper- forms P C A while for larger n u m b e r of features the two m e t h o d s converge. For nonlinear o p t i m i z a t i o n we used the B F G S algorithm [2] while for the kernels width we empirically found t h a t h = N -2/7 gives good general- ization performance.

--0

_ _ _ _ 4

. . . .

- a - propo~d rrwthod i-o- P ~ . . . T T ,o 2 S 4 $ 6 7 e 9 10 number of featuml

Figure 5: T h e risk as a function of the n u m b e r of features.

8

C o n c l u s i o n s - d i s c u s s i o n

We proposed a m e t h o d for supervised linear feature e x t r a c t i o n for robot localization. T h e main idea is the use of a supervised projection scheme t h a t optimizes an a p p r o p r i a t e localization criterion, a sort of super- vised projection pursuit with a localization index [4]. T h e use of n o n p a r a m e t r i c density estimation provides a s m o o t h objective function which can be optimized with respect to the projection m a t r i x with nonlinear optimization, while canonical correlation analysis pro- vides good s t a r t i n g solutions to the optimization rou- tine. As the e x p e r i m e n t s indicate the m e t h o d outper- forms P C A as a linear f e a t u r e extraction m e t h o d for small n u m b e r of features.

A similar a p p r o a c h to the p r o b l e m of o p t i m a l feature extraction was p r o p o s e d in [11] where a neural net- work was used to e x t r a c t nonlinear features from im- ages. In principle, nonlinear feature extraction can provide m o r e relevant features for localization t h a n a linear feature e x t r a c t i o n m e t h o d , however issues like model selection (e.g., deciding on the architecture of the neural network) or overfitting are m o r e difficult to deal with t h a n in the linear case.

Our m e t h o d c o m p a r e s favorably to the m e t h o d in [11] since it uses an objective function which can be com- p u t e d in O ( N ) , with N the size of the training set, in contrast to the objective function in [11] which is O(N3). Moreover, canonical correlation analysis pro- vides good initial solutions to the optimization prob- lem possibly avoiding the local m i n i m a of the objective function.

(6)

Supa rvt.md p m j ~ l ~ (R=O.2Bg?) i • "-... • -,. . 2 " , , , • . ~ - i 15 ~ " " - ' " ' " " : " ' " " 2 - 2 " • " • " " - . " Pr oie~bo~'~ on 1= PC {R~O "7763) 4 . " • • • • • i • • • : " 2 " ' ~ " ' . ? :. " ' ~ " ' . i - . " . " 1 5 : ' • " ' , 1 • 2 z - 0 5 - " 05 -1 15 • -.O5 0 x -2 -1 5 xl

Figure 4: Projected observations on the first linear feature as a function of x. (Left) Supervised projection: R = 0.2897. (Right) Projection on the first principal component: R = 0.7763.

tion matrix W. An interesting problem which we cur- rently investigate is the estimation from the data, us- ing cross-validation techniques, of the optimal h and optimal number of features that lead to good gener- alization, especially when the size of the training set is small. Finally, recent experiments showed that con- straint nonlinear optimization can provide better so- lutions than iterative optimization; these results to- gether with mathematical details will appear else- where.

A c k n o w l e d g m e n t s

We would like to thank the people in the Real World Computing Project and the Tsukuba Research Cen- ter in Japan for providing us with the MEMORABLE robot database. Many thanks also to Roland Bun- schoten for valuable discussions and various imple- mentations.

R e f e r e n c e s

[1] J. Borenstein, B. Everett, and L. Feng. Navigat- ing Mobile Robots: Systems and Techniques. A. K. Peters, Ltd, Wellesley, MA, 1996.

[2] P. E. Gill, W. Murray, and M. Wright. Practical Optimization. Academic Press, London, 1981. [3] S. Haykin. Neural Networks. Macmillan College

Publishing Company, New York, 1994.

[4] M. C. Jones and R. Sibson. What is projection pursuit? Journal of the Royal Statistical Society A, 150:1-36, 1987.

[5] B. Krhse and R. Bunschoten. Probabilistic local- ization by appearance models and active vision. In Proc. ICRA'99, IEEE Int. Conf. on Robotics and Automation, Detroit, Michigan, May 1999. [6] K. V. Mardia, J. T. Kent, and J. M. Bibby. Mul-

tivariate Analysis. Academic Press, 1979. [7] S. Oore, G. E. Hinton, and G. Dudek. A mobile

robot that learns its place. Neural Computation, 9:683-699, 1997.

[8] A. Papoulis. Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 3rd edition, 1991.

[9] B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cam- bridge, U.K., 1996.

[10] B. W. Silverman. Kernel density estimation us- ing the Fast Fourier Transform. Appl. Statist., 31:93-99, 1982.

[11] S. Thrun. Bayesian landmark learning for mo- bile robot localization. Machine Learning, 33(1), 1998.

[12] N. Vlassis and B. KrSse. Robot environment modeling via principal component regression. In Proc. IROS'99, IEEE/RSJ Int. Conf. on In- telligent Robots and Systems, pages 677-682, Ky6ngju, Korea, Oct. 1999.

Références

Documents relatifs

The basic process in relocalization relies in a robot swarm that re-aggregates based on local information only by recruiting lost robots so as to build a swarm configuration

However, if this approach is used to detect other pieces of home furniture, its accuracy decreases, mainly because their characteristics could be very similar and the oriented

In addition, systems are generally based on a request by the example: further to a request image taken by a robot in our case, the search engine returns the closest images of

The box dimensions only depend on the measurement context (measurement accuracy, spatial configuration of sensors/beacons and robot). Contrary to most localization

To retrieve landmarks on a new image, the cranial contour is detected and sampled, cephalograms are registered and the mean shape model is used to estimate the

This work shows performance of localization algo- rithm based on particle filter with small miniature low-cost E-puck robot.. Information from VGA camera and eight in- frared

In the 2012 edition of the ImageCLEF Robot Vision Challenge, participant were asked to classify functional areas based on sequences of images acquired by a mobile robot within an

  Absolute Mutual Localization (AML): the robots estimate the relative poses between their fixed frames using relative measurements..