Fast, handset-based GSM fingerprints for indoor localization

(1)

HAL Id: hal-00961277

https://hal.archives-ouvertes.fr/hal-00961277

Submitted on 24 Mar 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Fast, handset-based GSM fingerprints for indoor localization

Ye Tian, Bruce Denby, Iness Ahriz Roula, Pierre Roussel, Gérard Dreyfus

To cite this version:

Ye Tian, Bruce Denby, Iness Ahriz Roula, Pierre Roussel, Gérard Dreyfus. Fast, handset-based GSM fingerprints for indoor localization. 2012 International Symposium on Wireless Communication Systems (ISWCS), Aug 2012, France. pp.641-645, �10.1109/ISWCS.2012.6328446�. �hal-00961277�

(2)

Fast, Handset-Based GSM Fingerprints for Indoor Localization

Ye Tian, Bruce Denby SIGMA Laboratory and Université Pierre et Marie Curie

Paris, France ye.tian@etu.upmc.fr,

denby@ieee.org

Iness Ahriz

LAETITIA/CEDRIC Laboratory Conservatoire National des Arts et

Métiers Paris, France iness.ahriz@cnam.fr

Pierre Roussel, Gérard Dreyfus SIGMA Laboratory

ESPCI ParisTech Paris, France pierre.roussel@espci.fr, gerard.dreyfus@espci.fr Abstract—Accurately localizing users in indoor environments

remains an important and challenging task. The article presents new results on room-level indoor localization, using cellular Received Signal Strength fingerprints collected with a standard cellular handset programmed to perform fast scans of the 900 and 1800 Megahertz GSM bands as a user explores an indoor environment at a normal walking pace. Support Vector Machines are used to deal with the high dimensionality of the fingerprints.

The study demonstrates that an appropriately programmed standard cellular handset can provide a simple, inexpensive solution for accurate room-level indoor localization.

Keywords-localization; indoor; fingerprint; machine learning;

support vector machine

I. INTRODUCTION

The inability of GPS receivers to function adequately in

„urban canyon‟ and indoor environments has prompted a search for new techniques of indoor localization that can provide seamless and ubiquitous service for mobile users. Accurately and reliably locating persons and objects in indoor environments is a challenging, but attractive, goal that holds promise for a variety of location-based services and applications [1]. For many such services, room-level precision, in which the localization system discriminates between rooms rather than estimating coordinates per se, is an adequate goal;

this is the approach that will be adopted here.

A variety of indoor localization techniques have been proposed. Methods based on Received Signal Strength (RSS), in Wi-Fi and Bluetooth networks, for example, or using infrared or acoustic signals, appear promising [2-7]. A drawback of these approaches, however, is that they necessitate the deployment and maintenance of an infrastructure, which can be time consuming and costly.

In addition to such short-range signals, indoor localization based on fingerprints from wide-area radiotelephone networks, such as GSM and CDMA, have also been proposed [8-13]. The full coverage, near-ubiquity and relative stability of GSM networks may provide an attractive alternative for indoor localization.

Recent results have suggested that accurate and efficient indoor localization can be achieved using RSS information

acquired from very large numbers of GSM channels [10-13].

Those studies, however, did not represent a genuinely practical solution since the RSS scanning devices operated without user intervention only at a small number of representative points within each room. In this work, we present a more realistic solution, in which a user can be localized at room level regardless of his exact position in a room. This is achieved using a handheld acquisition device – actually a standard cellphone with a software modification – that can obtain an RSS fingerprint of the entire 900 and 1800 MHz GSM bands in only about 300 milliseconds. This enables the collection of large amounts of data on a reasonable timescale, at points throughout the interiors of the rooms, rather than at only a few representative points, while moving at a normal walking pace.

Our results show that GSM fingerprints acquired in this way can be used to differentiate rooms of about 10 square meters size in some 94% of cases, indicating that the method may indeed be used as part of a simple, practical, inexpensive indoor localization system. The data collection procedure employed is described in section II, and the room classification algorithms in section III. Results are presented in section IV, while conclusions and future perspectives appear in the final section.

II. DATA COLLECTION AND DATASETS

The data used in the experiments was obtained by scanning the entire GSM band in 7 rooms of a 4^th floor laboratory building (steel frame, concrete and plaster walls) in central Paris, France. The data acquisition device used was the GSM trace mobile “TEMS Pocket”, which is in fact a standard Sony Ericsson W995 mobile phone to which network investigation software has been added by the manufacturer [14]. In April of 2012, on a Saturday afternoon from 2pm to 6pm, 5500 scans (representing about one half hour of recording per room) were recorded in each of the 7 unoccupied rooms and manually labeled with the corresponding room numbers, as illustrated in Figure 1. Each scan contains the RSS of all 548 carriers in the GSM900 and GSM1800 bands, with values ranging from -117 to -38dBm. All scans were made via “random walks” in the 7 rooms with the TEMS Pocket handheld by the user. The exact positions of the individual scans within a room were not recorded; indeed all points in a given room are treated as

(3)

belonging to that room, consistent with the room-level indoor localization approach adopted here.

Office 1

Office 2

Office 3

Office 4 Office

Office 5 7

Data Center

6

Exterior Wall Interior Wall

Window Door

Figure 1. Layout of the laboratory where the data set was recorded

III. CLASSIFICATION ALGORITHMS

The room-level indoor localization problem is considered as a multi-class classification problem, where each room is a class. As is usual in data-driven classification problems, the algorithm works in a two-stage process. The first stage is off- line training, in which the equations of the discriminant functions are determined using training data with known labels.

The second stage is on-line testing, in which, given a fingerprint that is not present in the training dataset, the classifier must provide the label of the room where it was measured, using the previously defined separating surfaces.

Only the first off-line stage may require heavy computations, the second stage merely needs to compute the values of the discriminant functions. As a starting point for multi-class classification, a pairwise (also termed „two-class or „binary‟) classifier is introduced first.

A. Pairwise Classifier

Since the number of variables is very large and the size of the training set is relatively limited, Support Vector Machine (SVM) classifiers were deemed appropriate because of their built-in regularization mechanism [15].

Consider a set of M examples of items belonging to either of two classes A and B, each example being described by a p-

dimensional vector xi. Further assume that the examples are linearly separable, i.e. that there are, in descriptor space, linear surfaces of equation f(x) = 0 that separate all examples without error: f(xi) > 0 for all examples i belonging to class A and f(xi)

< 0 otherwise. It can be proved that f(x) can be written under the form

0 1

( )  ( ) 





_i^M ⁱ ⁱ ⁱ 

f x y x x (1)

where the ⁱ (i = 0..M) are parameters whose values are estimated from the examples, yi = +1 if example i belongs to class A and yi = 1 otherwise.

A linear SVM is a linear classifier such that the minimum distance between the separation surface f(x) = 0 and the examples that are closest to it (called support vectors) is maximum, thereby guaranteeing the best generalization given the available data. The values of the parameters ⁱ of such a classifier are obtained by solving a quadratic optimization problem under linear inequality constraints. The support vectors are the only examples whose ⁱ are nonzero.

If the examples are not linearly separable, one resorts to nonlinear SVMs, whereby the separation surface is of the form

0 1

( )  ( ) 





_i^M ⁱ ⁱ ⁱ 

f x y K x x (2)

where K(x, y) is a kernel function that must be such that the (M, M) matrix of general term K(xi, xj) is positive semi- definite. As for linear SVMs, the i are obtained by solving a quadratic optimization problem under constraints. If the constraints can be satisfied only if a large proportion of examples are support vectors, i.e. if the classifier has a large number of nonzero parameters, the constraint that all examples are classified without error and lie outside the margin can be relaxed; that “soft-margin” approach reduces the complexity of the classifier by performing a tradeoff between accuracy of classification of the training examples and ability to generalize; the price to pay is the introduction of a

“regularization” constant whose value must be chosen appropriately.

There exists a repertoire of valid kernel functions, among which the RBF kernel

(3) with appropriate width , is used in the present study. The values of  and the regularization constant are chosen by cross- validation

To summarize, a GSM environment described by the fingerprint x is assigned to room A or room B according to the sign of f(x), defined by (1) or (2) depending for linear or nonlinear SVM classification respectively. xi is the fingerprint dataset entry i, i.e. row i of RSS, GSM900 or GSM1800 depending on the fingerprint used by the classifier.

The SVMs used in our study, both with linear and RBF kernels, were implemented using the Spider toolbox [16].

(4)

In order to obtain a “baseline” result, nearest neighbor (1- NN) and k-nearest neighbor (k-NN) classifiers using the Euclidean distance in RSS-space were also implemented. The hyper parameter k was determined by the same cross-validation procedure as for the hyper parameters of SVMs.

B. Decision Rules for Multiclass Discrimination

When the discrimination problem involves more than two classes, it is necessary, for pairwise classifiers such as SVM, to define a method that allows combining multiple pairwise classifiers into a single multiclass classifier. This can be done in two ways: one-vs-one and one-vs-all.

1) One-vs-one

This approach decomposes the multiclass problem into the set of all possible one-vs-one problems. Thus, for an n-class problem, classifiers must be designed. Figure 2 illustrates the architecture associated with this method.

The decision rule in this case is based on a vote. First, the outputs of all classifiers are calculated. Now let Cij be the output of the classifier specializing in separating class i from class j. If Cij is 1, the tally for class i is increased by 1; if it is -1, the class tally of class j is increased by 1. Finally, the class assigned to the example is that having the highest vote tally.

A disadvantage of the one-vs-one technique is of course the increase in the number of classifiers required as compared to one-vs-all discussed below. In our case of seven classes, 21 classifiers are required, which still remains manageable.

C₁vs C₂ C₁vs C₃ C₁vs C₄ C_N-1vs C_N

Decision Rule

Predicted Class

RSS of the example to be localized

Figure 2. One-vs-one classification

2) One-vs-all

The one-vs-all approach consists of dividing the n-class problem into an ensemble of n pairwise classification problems, each of which is specialized in separating one class from all others. Figure 3 illustrates the procedure. In the first stage, each of the n classifiers is trained separately, and in the second stage, the following decision rule is applied : the outputs of all n classifiers are first calculated and, following the conventional procedure, the predicted class is taken to be that of the classifier with the largest magnitude of f(x) (relation (1) or (2))

[17]. The one-vs-all technique is advantageous from a computational standpoint, in that it only requires a number of classifiers equal to the number of classes, in our case, 7.

C1 vs all C2 vs all C3 vs all CN vs all

Decision Rule

Predicted Class

RSS of the example to be localized

Figure 3. One-vs-all classification

IV. RESULTS

The performance of each classifier is presented as the percentage of correctly classified test examples. In our dataset, in each room, the first 3000 examples are used off-line for training the classifiers (quadratic optimization under constraints) and finding the appropriate values of the hyper parameters by cross-validation. The final 2500 of the 5500 scans make up the test set. Testing involves the computation of the sign of f(x) from relations (1) or (2), which is very fast.

Experimental results are shown in Table I. The soft margin parameter C and RBF kernel parameter  are selected through cross-validation, giving C = 10^-4 in linear one-vs-one and linear one-vs-all classifiers, C = 10^-4 and  = 100 in RBF one-vs-one and one-vs-all classifiers. Results for 1-NN and k-NN classifiers are also given for comparison, where the parameter k was optimized by cross validation.

TABLE I. PERCENTAGE OF CORRECT CLASSIFICATION ON TEST SET

Classifier Fingerprint Type

GSM900 GSM1800 Both Bands

1-NN 56.2% 54.4% 62.7%

k-NN 62.4%( k=77) 61.2%( k=21) 67.9%(k=14)

Linear 1-vs-1 86.3% 83.2% 93.9%

Linear 1-vs-rest 87.1% 85.0% 94.2%

RBF 1-vs-1 85.5% 84.6% 93.0%

RBF 1-vs-rest 87.2% 85.7% 94.1%

As can be seen in the table, the SVM room classifiers give correct results about 94% of the time with no significant difference between linear and nonlinear kernels. As expected, results from nearest-neighbor classifiers are significantly

(5)

poorer. We also note that the GSM900 (174 carriers) and GSM1800 (374 carriers) bands are complementary in that better localization accuracy is obtained when both bands are present in the fingerprint. In Figure 4 we also show how the accuracy improves with fingerprint size, for the linear one-vs- all algorithm, by increasing the number of carriers in steps of 50, according to the ordered sequence numbers of the GSM carriers.

Figure 4. Classification results as a function of fingerprint size

In figure 5 we examine the effect of increasing the number of training examples for each of the 7 rooms, again using the linear one-vs-one algorithm. The figure plots the percentage of correct room classifications as a function of the training set size.

We see that a rather substantial reduction in training set size gives only a very moderate degradation in performance. For example, 93% of the test examples were correctly classified using only 1000 training examples. This is a very interesting result as far as acquisition time is concerned, as the TEMS Pocket requires less than 10 minutes to record 1000 training examples.

Figure 5. Classification results as a function of training set size

Table II presents the confusion matrix for the case of the linear one-vs-one algorithm, showing how the mis-classified

examples are distributed. It can be seen that most confusions occur between adjacent rooms, as could be expected. Rooms located on opposite sides of the corridor are easily discriminated.

TABLE II. CONFUSION MATRIX FOR 7ROOMS CLASSIFICATION Predicted

Class

True Class

1 2 3 4 5 6 7

1 91.1% 4.7% 4.2% 0% 0% 0% 0%

2 3.9% 94.9% 1.2% 0% 0% 0% 0%

3 2.6% 1.7% 95.7% 0% 0% 0% 0%

4 0% 0% 0% 96% 1.1% 0% 2.9%

5 0% 0% 0% 0.8% 97.1% 0.5% 1.6%

6 0% 0% 0% 0% 0% 99.9% 0.1%

7 0% 0.6% 0% 4.1% 4.3% 3% 88%

V. CONCLUSIONS AND PERSPECTIVES

We have presented an approach for indoor localization based on the use of RSS with very large numbers of GSM carriers, which has been tested on a dataset acquired in a laboratory building under realistic conditions. Data was collected in such a way as to explore the entire surface area of a room, using a standard cellular handset as the acquisition device. Experimental results demonstrate that out of a total of 17500 (2500*7) test fingerprints, the correct room label was obtained 94% of the time, thus indicating that the method can indeed serve as the basis for a simple, inexpensive indoor localization system.

Future tests will involve studying how performance evolves over longer periods of time, as well as experimenting with different methods for combining the scans from the two GSM bands used, and investigating whether W-CDMA network data or other types of variables can also be incorporated into our scans. Also, the experiments reported here were performed during a weekend, so that the presence of people in the environment will need to be investigated. We note that only one TEMS pocket device was used in these tests; in the future, will want to perform tests with several terminals, to verify device independence of our results. Finally we intend to try integrating a priori information and notions of physical trajectories into our location estimation algorithms, via particle or other types of filters. Such an approach will allow considering the entire indoor environment, not only rooms.

ACKNOWLEDGMENT

The authors wish to acknowledge the support of the China Scholarship Council.

REFERENCES

[1] A. Küpper, Location-Based Services: Fundamentals and Operation, John Wiley & Sons, New York, NY, USA, 2005.

[2] AM. Ladd, KE. Bekris, A. Rudys, LE. Kavraki, DS. Wallach, “On the feasibility of using wireless ethernet for indoor localization,” IEEE

(6)

Trans on Robotics and Automation. vol. 20, no. 3, pp.555-559, June 2004.

[3] Q. Yang, S. Jialin Pan, V. Wenchen Zheng, “Estimating location using Wi-Fi,” IEEE Intelligent Systems, vol. 23, no. 1, pp.8-13, January/February 2008.

[4] S-H. Fang, T-N. Lin, P-C. Lin, “Location fingerprinting in a decorrelated space,” IEEE Trans on Knowledge Data Engineering. vol.

20, no. 5, pp.685-691, May 2008.

[5] S-H. Fang, T-N. Lin, “Indoor location system based on discriminant- adaptive neural network in IEEE 802.11 environments,” IEEE Trans on Neural Network, vol. 19, no. 11, pp.1973-1978, Nov 2008.

[6] S-H. Hong, B-K. Kim, D-S Eom, “Localization algorithm in wireless sensor networks with network mobility,” IEEE Trans on Consumer Eectronic. vol. 55, no. 4, pp.1921-1928, Nov 2009.

[7] S-P. Kuo, Y-C. Tseng, “A scrambling method for fingerprint positioning based on temporal diversity and spatial dependency,”. IEEE Trans Knowledge Data Eng. vol. 20, no. 5, pp.678-684, May 2008

[8] V. Otsason, A. Varshavsky, A. LaMarca, E. de Lara, “Accurate GSM indoor localization,” in Proceedings of the 7th International Conference on Ubiquitous Computing, pp.141-158, Tokyo, Japan, September 2005.

[9] W. ur Rehman, E. de Lara, S. Saroiu, “CILoS, a CDMA indoor localization system,” in Proceedings of the 10th International Conference on Ubiquitous Computing, pp.21-24, Seoul, South Korea, September 2008.

[10] B. Denby, Y. Oussar, I. Ahriz, G. Dreyfus, “High-performance indoor localization with full-band GSM fingerprints,” in Proceedings IEEE International Conference on Communications, Workshop on Synergies in Communication and Localization, pp. 1-5, Dresden, Germany, June 2009

[11] I. Ahriz, Y. Oussar, B. Denby, G. Dreyfus, “Carrier relevance study for indoor localization using GSM,” in Proceedings of the 7th Workshop on Positioning, Navigation and Communication, pp.11-12, Dresden, Germany, March 2010.

[12] I. Ahriz, Y. Oussar, B. Denby, G. Dreyfus, “Full-band GSM fingerprints for indoor localization using a machine learning approach,” International Journal of Navigation and Observation. vol. 2010

[13] Y. Oussar, I. Ahriz, B. Denby, G. Dreyfus, “Indoor localization based on cellular telephony RSSI fingerprints containing very large numbers of carriers,” EURASIP Journal on Wireless Communications and Networking, vol. 2011, no. 1, pp.1-14.

[14] Test Mobile System. [Online]: www.ascom.com/tems

[15] N. Cristianini, J. Shawe-Taylor. Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge University Press, 2000 [16] The Spider. [Online]: http://people.kyb.tuebingen.mpg.de/spider/

[17] Y. Lee, Y. Lin, and G. Wahba, “Multicategory support vector machines:

theory and application to the classification of microarray data and satellite radiance data,” Journal of the American Statistical Association, vol. 99, no. 465, pp. 67–81, 2004.