Locating in crowdsourcing-based dataSpace: wireless indoor localization without special devices

(1)

HAL Id: hal-02284568

https://hal.archives-ouvertes.fr/hal-02284568

Submitted on 12 Sep 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Locating in crowdsourcing-based dataSpace: wireless

indoor localization without special devices

Yuanfang Chen, Lei Shu, Antonio Ortiz, Noel Crespi, Lin Lv

To cite this version:

Yuanfang Chen, Lei Shu, Antonio Ortiz, Noel Crespi, Lin Lv. Locating in crowdsourcing-based

dataSpace: wireless indoor localization without special devices. Mobile Networks and Applications,

Springer Verlag, 2014, 19 (4), pp.534-542. �10.1007/s11036-014-0517-8�. �hal-02284568�

(2)

(will be inserted by the editor)

Locating in Crowdsourcing-based DataSpace: Wireless Indoor

Localization without Special Devices

Yuanfang Chen · Lei Shu · Antonio M. Ortiz · Noel Crespi · Lin Lv

Received: date / Accepted: date

Abstract Locating a target in an indoor social envi-ronment with a Mobile Network is important and dif-ficult for location-based applications and services such as targeted advertisements, geosocial networking and emergency services. A number of radio-based solutions have been proposed. However, these solutions, more or less, require a special infrastructure or extensive pre-training of a site survey. Since people habitually carry their mobile devices with them, the opportunity using a large amount of crowd-sourced data on human behavior to design an indoor localization system is rapidly ad-vancing. In this study, we first confirm the existence of crowd behavior and the fact that it can be recognized using location-based wireless mobility information. On this basis, we design “Locating in Crowdsourcing-based DataSpace” (LiCS) algorithm, which is based on sens-ing and analyzsens-ing wireless information. The process of LiCS is crowdsourcing-based. We implement the proto-type system of LiCS. Experimental results show that LiCS achieves comparable location accuracy to previ-ous approaches even without any special hardware.

Yuanfang Chen, Antonio M. Ortiz, Noel Crespi Institut Mines-Télécom, Télécom SudParis, France E-mail: yuanfang.chen@etu.upmc.fr,

antonio.ortiz torres@telecom-sudparis.eu, noel.crespi@mines-telecom.fr

Lei Shu

Guangdong University of Petrochemical Technology, China E-mail: lei.shu@lab.gdupt.edu.cn

Lv Lin

Dalian University of Technology, China E-mail: lvlin george@mail.dlut.edu.cn

1 Introduction

Mobile indoor localization and navigation have become very popular in recent years [8, 18]. Some mobile in-door localization systems have been successfully used in a great number of applications such as the location detection of products stored in a warehouse, the loca-tion detecloca-tion of medical personnel or equipment in a hospital, the location detection of firemen in a building on fire, the location detection of police dogs trained to find explosives in a building, and finding tagged main-tenance tools and equipment scattered all over a plant. However, a challenging issue remains for mobile indoor localization. A number of existing approaches require infrastructures (e.g., indoor beacons) to achieve reli-able accuracy or extensive pre-training before system deployment (e.g., WiFi signal fingerprinting).

With the dramatic increase of the size and frequency of mass events and the potential functionality of mobile devices, the study of crowd dynamics based on Mobile Networks has become an important research area [9, 12]. Firstly, as a base of crowdsourcing-based technolo-gy, the crowd behavior of human is worthy to be deeply studied, whether in the virtual world or in the real world. By analyzing and studying the crowd behavior, we can extract some useful conclusions about how hu-mans behave when they are in large groups. For the crowdsourcing-based technologies such as Quora, Ya-hoo Answers and Google Answers, using the extracted conclusions at group and community levels [22] (a group or a community can be considered as a crowd) will be helpful to develop and improve this type of technolo-gy. Furthermore, predicting the formation of a crowd is helpful in some emergency situations, e.g., evacuation route control; and even the prediction is also beneficial for studying and improving the performance of

(3)

pub-2 Yuanfang Chen et al.

lic infrastructures, e.g., network usage during a mass event. Secondly, with the increasing ubiquity of sensing the real world context with smart mobile devices (e.g., the physical location of a mobile device) [10, 17], we investigate the rising opportunities for mobile crowd-sourcing to obtain real-time physical data. Based on the data, a bridge between the virtual world and the real world can be built. It also means that the crowd-sourcing is extended from the virtual world to the real world to help solve many real-world problems. Hence, the crowdsourcing technology can be used for the loca-tion estimaloca-tion problem in the physical world, and can solve the challenging problem of mobile indoor localiza-tion, to achieve “special-infrastructure-free”1 and even “pre-training-free”.

In this article, we first focus on recognizing human crowd behavior by analyzing the data measured by inter-net-accessible mobile phones from a locatiaware on-line social network. By crowd behavior recognition, we understand that the movement of a large number of individuals has a pattern and can be attributed, de-pending on relevant parameters such as the friendship between individuals2_{and check-in locations (with time)}

of these individuals.

Then, we propose LiCS, an indoor localization algo-rithm that considers trace data from individuals’ mobile devices and a location estimation model. The trace da-ta includes the following information: MAC addresses of devices, MAC addresses of signal transmitters and cor-responding RSSIs (generally, the RSSI (Received Signal Strength Indication) can be used to indicate the value of RSS). With LiCS, mobile devices periodically report their trace data to a Data Analysis Center (DAC). The DAC runs a machine learning algorithm that accepts the wireless-based trace data as features of user mobili-ty patterns, and periodically estimates the locations of mobile devices in real time. Since we use wireless infor-mation obtained from the social environment around us, LiCS can achieve fine-grained localization.

To validate this design, we implement a prototype system to conduct long-term experiments in two re-search laboratories and a corridor of a middle-size aca-demic building covering over 39, 725m2_{. Experimental}

results show that LiCS achieves comparable location accuracy to previous approaches even without a site

1 _{A “special infrastructure” means that the infrastructure}

consists of customized equipment. LiCS is based on Received Signal Strength (RSS) that exists in any wireless equipment, so LiCS can be directly supported by existing wireless infras-tructures around us.

2 _{For a location-aware online social network, if B is in the}

friend list of A, we consider that there is friendship between A and B, and the relationship is directed.

survey. Moreover, LiCS provides a room-level localiza-tion service.

The main contributions of this work are summarized as follows:

1. We design a crowd recognition model. One of the main challenges in crowd behavior recognition is to infer the most likely crowd behavior using the da-ta collected from a set of persons. We use check-in times and locations (T ime and Location id; we con-vert each latitude/longitude coordinate of the earth into a unique Location id) to quantify the trace of each individual. Then, a clustering algorithm3 _is

used to find the likely crowds.

2. We design LiCS, an indoor localization algorithm for social environments based on the target’s trace data. LiCS utilizes automatic selftraining for target trace data without any specific configuration for mobile devices.

3. We implement the prototype system of LiCS on An-droid devices, and perform an extensive set of ex-periments.

The rest of this article is organized as follows. Sec-tion 2 introduces related work. SecSec-tion 3 describes our system model. Section 4 shows the existence of human crowd behavior in our daily lives and how to recognize it from the trace data of individuals. The “Locating in Crowdsourcing-based DataSpace” (LiCS) algorithm is presented in Section 5, while the evaluation of LiCS algorithm and the analysis of results are shown in Sec-tion 6. The article concludes with a brief summary in Section 7.

2 Related Work

In this study, the related work concerns two main as-pects: (i) mobile-device-based indoor localization; (i-i) based technology and crowdsourcing-based indoor localization (crowd-sourced data can be used to improve the performance of indoor localiza-tion).

2.1 Wireless Indoor Localization

In the mobile computing community, a user can carry a sensing device (such as a smart phone) to move random-ly or within a field of fixed sensors [3]. In either case, the

3 _{An Expectation-Maximization (EM) clustering}

algorith-m [7] is used in this article. The EM assigns a probability distribution for each trace record (instance), which indicates the probability of each instance belonging to each of the clus-ters. The EM can automatically decide how many clusters to create.

(4)

knowledge of user’s location can be useful for some ap-plications (called Location-Based Services (LBS) [6]). Outdoor localization is well solved by GPS, but in-door localization remains a challenge in many cases. For the indoor localization issue, a number of algorithms have been proposed in the past two decades. Generally, these algorithms fall into two categories: (i) some work-s are bawork-sed on work-senwork-sorwork-s or beaconwork-s which are inwork-stalled throughout the environment [15][16][2]; so special hard-ware is required to run these algorithms; (ii) in recen-t approaches, recen-the locarecen-tion is derecen-termined by observing RSS. Two techniques are widely used in this kind of algorithms: fingerprinting-based and model-based tech-nique. A large body of indoor localization algorithms adopt fingerprint matching as the basic scheme of lo-cation determination. The main idea of this technique is to fingerprint the surrounding signatures at some lo-cations in the area of interest and then build a finger-print database. The location of a user is then estimated by matching the new measured fingerprint of the us-er against the database. Many kinds of signatures have been exploited by researchers such as WiFi signals [19] (e.g., LiFS algorithm [20]), Radio Frequency (RF) nals [1][21] and Frequency Modulation (FM) radio sig-nals [4]. The considerable manual costs and efforts, in addition to the inflexibility to environmental dynamis-m, are the main drawbacks of fingerprinting-based al-gorithms. The model-based technique calculates loca-tions based on geometrical or statistical models rather than searches for best-fit signatures from pre-built ref-erence databases [14]. For instance, the prevalent Log-Distance Path Loss (LDPL) model [11] builds up a semi-statistical function between RSS values and RF propagation distances. However, these model-based al-gorithms trade the measurement efforts at the cost of decreasing localization accuracy.

2.2 Crowdsourcing-based Technology

Crowdsourcing is one specific form of harvesting wis-dom and contributions from individuals of crowds; based on this harvesting scheme, some applications or ser-vices are developed and the performance of some algo-rithms can be improved: it can be called crowdsourcing-based technology. Some researchers explore crowdsourc-ing based on the use of mobile devices. Moreover, as an important basic service, wireless mobile-device-based indoor localization technologies such as GSM-based, Blu-etooth-based or WiFi-based localization, have been ex-tended by using crowdsourcing data to support and im-prove the performance of location estimation [13] [20]. Current crowdsourcing-based localization approaches de-pend on calibration of the space of interest. Such

cal-ibration tends to be onerous, because it has to be re-peated for every new space and each time there is a significant change in a given space (e.g., a change in the placement of signal transmitter). LiCS is aimed at eliminating the need for such explicit calibration effort and the need for any kind of map concerning the space of interest. LiCS exploits the advantage of model-based technique (versatility and conciseness) and avoids its drawback (accuracy loss for localization) by training a localization model using real-time trace data of individ-uals in crowds from the physical world. Figure 1 shows an example for the motivation of applying LiCS and the system architecture of LiCS.

3 System Model

The prototype system is developed on Android smart-phones and follows mobile-based network-assisted ar-chitecture (Fig. 1(b)). Our system model can be con-sidered as a wireless network where there are N fixed signal transmitters T = {t1, t2, ..., ti, ..., tN} and M

mo-bilizable signal receivers R = {r1, r2, ..., ri, ..., rM}. The

parameter p(t) denotes the estimate of a mobile termi-nal location at time t. And the parameter p0(t) is used to denote the real location of a mobile terminal at time t. Moreover, the parameter d(t) is the measured dis-tance between the estimated location p(t) and the real location p0(t), and we use the “step” to measure the distance between p(t) and p0(t) as the localization error at time t in our evaluation experiments (Section 6).

The problem of crowdsourcing-based location esti-mation can be defined as an identification procedure. The matched fingerprints can be identified from the model-assisted fingerprint database. And the model learn-s the real environment with wirelelearn-slearn-s learn-signallearn-s (RSS) and can be trained in real time with the wireless information that is submitted by mobile terminals.

4 Crowd Behavior Recognition

As the basis of crowdsourcing-based location estima-tion, we need to confirm the existence of crowd behav-ior; and it can be recognized from wireless mobile data of people’s daily lives. In this section we formalize a se-ries of processing steps which can be used to infer crowd behavior from location-based wireless information. We also present a mathematical model for recognizing the crowd behavior of a population.

First, based on the collected data from a location-based mobile social network using its public API [5]4,

4 _{The collected data with anonymous mobile devices from}

(5)

4 Yuanfang Chen et al.

(a) Motivation example. The location-finding abil-ity of indoor localization systems heavily depends on the broadcasts from wireless equipment. For LiCS, any individual can use the trained model to estimate his/her location, and then his/her RSS can be used to train the model again. With the crowd-sourced RSS, the model is trained to make its parameters accordant with the practical situa-tions.

(b) System architecture. As an improve-ment, LiCS exploits the advantage of model-based technique (versatility and conciseness) and avoids its drawback with the training for a location estimation model: crowd-sourced data is used to train the model and the model is installed in distributed servers.

Fig. 1 Motivation example and system architecture.

1071 10972 45914 7201 44126 8724 10650 45913 646 6908 71556933 7170 2908 2897 629 179 174 4 2832 1076 21818 8240 2808 30 10752 1 156 2975 11 2947 0 10986 2894 2850 6208 15542 167 10658 13501 10343 107176782 6677 2953 2822 1386 680 143 7231 1053610767 7705 10436 890710669 10668 7115 2834 146 9569 10612 2844 2821 217 8467 2 26792 1875 1037410350 8869 7798 3173 1077 651 281981458924 13265 2833 7620 19113 10988 2806 7715 15313 7790 10426 2920 620 2912 11746 7602 15377 308010632 10653 159 336 1453 35288 10411 13252 7517 3189 11029 3146 15565 15385 1515 3014 2297 10974 10554 965 878 1156 49566 34 21121 6606 7537 13361 3063 46708 8696 13560 64 2874562 28539 3670 1701 24659 27997 24601 34913 37296 23841 10729 6825 25 366811646112046692 7546 2949 2969 13323 3058 29912 12873 53458 30058 3680 2910 55 382037436 23908 10622 2918 1860 1075 3160 7681 28678 10967 7794 6904 10614 7281 2987 291113 46488 3083 23796 1307 22518 40799 5604 6625 1499 9406 7162 11016 119267103 15815 24597 2892 1446 235 41954 1777 1257494036863 679 13653 6614 698843796 1555811504 9130 36188 2951 2929 1308 279 19389 13548 9971 30705 39405 10714 10671 6600 7376 2924 23838 16451 6842 11519 28059 24604 301 108097843 33361 2818613535 7755 7647 3650 3206 7783 232228750 15820 70841867 7621 21629 232 1074310353 132 15350 15829 20355 15569 56071 4612 3233 2264 13479 12908470 683615844 419 263 1908419731984 11901 7590 7594 6589 6587 259 219 1594529903 14814 399994078435833 1944 9 11003 1880 7847 16447 56344 47768 7837 9592 9594 1985 875 1520 1693 20477 16620 21696 15375 3 17023 17315 400026213 1940 832 3302 11014 17795 13579 11299 99771914 1285 987 627 8902 1697 11663 11528 9416 6916 10959 317 28553109 10559 10685 28451876 23 2 9417 1036215852 28652839 28887 385 6902 154 26 6599 2861 7528 3009 10333 28888 10966 2993 1419 124 42 10354 10980 4 81256604 5404 3313 2880 2818 267 7 21822 135037518 12985 10375 10423 2902 2825 7558 652 8889 0 26494 634 11282 2896 103511935 109638946 10400 6915 3332 241 2954 10380 2893 9847 7527 10990 15727 55514 6980 6638 2905 2176 23544 35129 18848 9593 155577251 16448 2847 253 157 7793 11261 28518 577137619 2134 1771 293 144 13834 22271 11609 11824 10336 9985 9916 7526 6948 15335 35562 26137 26405 37001 11267 24103 1 6709 5628 2206 9918 10801 16450 8248 1859 310 7848 51389 6841 53515 16696 7542 17045 35 7756 37380 25003 15262 21695 6913 16529 5655 1095 662 13419 6956 277 3093 695511836 35128 8249673828776 2873 632 628 623 209 1491413 1142 15 26320 10382 10590 11264 1864 262 14 6249 11816 11260 10875 6588 62 1923 7529 698 (a) 13419 8145 26792 8249 7715 6677 2951 2894 2821 179 10986 26137 8924 7705 10411 2833 167 143 146 15377 13501 10653 7602 6915 15852662 277 5 35833 6614 7231 2920 652 10426 10374 10671 10729 10436 7798 10668 15313 336 35288 629 6208 1326510988 6933 10767 10423 10658 7155 540410400 3332 2908 10343 28501386 3080 2822 1453 7620 262 159 2905 2912 10669 3093 620 8240 6956 10967 7790 156 2806 2819 2897 10963 2954 279 174 1 2825 7170 8467 10336 2844 2902 132 2947 651 2832 15542 6913 6782132520 680 10350 235 10354 1875 1864 2818 1076 6588 241 7755 3173 55 21818 27997 7 0 7528 6904 28888 30 10375 2808 7115 26320 1876 1446 1419 1071646 9 10380 7756 21822 11014 3109 2839 124 26 2873 6863 11504 10362 12985 15385 8902 6587 46708 40002 3146 28059 24601 3058 987 627 8946 10632 1087510351 88898248 14 15350 16620 1940 170231914 1701 36806213 28678 563441582911646 263 11901 11519 15727 21696 1156 7103 13 11003 7681 1059011746 1584429912 7847 11609 9130 10959 2910 2892 16450 10717 1142 22271 24103 15820 40784 28518 35129 2834 8750 12873 3 6842 7619 10974 1152812574 6841 13479 28186 40799 9847 2134 1944 708423222 11663 28539 10714 9416 13579 3014 1307419 219 53458 13653 1911310622 7281 6916 2911 2847 1308 7647 1192633361 2383811836 41954 16448 9406 8470 7542 4 6948 6249 5628 1859 878 62 15557 10536 16529 6738 11824 1923 2206 1095 632 23841 7518 1126145913 45914 16696 9971 7529310 149 15 2949 6692 8869 13548 144 875 6709 9977 6955 169713834198435128 1075 2924 2176 26494 6604 2874 28887 10333 10650 1880 42 23 2896 679 623 11267 1771293 1693 21695 3313 2929 209 10685 1935 9417 3009 2880 38521629 11282 28652993 267 11 4 25 9569 2953 10382 7558 10612 7517 10559 6902 11299 154 13503 10809 634 11016 7201 7527 2845 10972 4412664 7526 11260 10966 10980 1860 34 2861 6599 6980 8125 8724 2975 10752 2855 1777 1413 2 30705 3160 19389 47768 3302 17795 7621 355623171520 9918 56071 24659 2969 13560 46488 37296 25003 3206 7794 11264 9592 13361 7376 24597 2987 14814 15945 6606 3233 1985 2893 7537 1499 299039985 37436 15375 43796 628 2918 10614 15335 17315 232 217 7783 20355 7837 16451 7162 2264 22518 21121 869655514 10801 5138972517546 6600 832 53515 23544 20477 39405 5655 1867 1077 9594 15565 6638 16447 11204 18848 3820 1515 259 35 7590 36188373801 157 110297594 419737843 9593 4612 2297 13323 23796 9916 57713 7793 6836 6589 1285 562 34913 15815 36709403 1290 965 301 152623063 5604 3083 30058 7848 13535 39999 698 11816 6825 23908 6988 10990 15558 6908 15569 3189 24604370018907 10743 6625 3650 1908 253 3668 4956617045 26405 28776 10353 10554 (b)

Fig. 2 (a) Clustering result with F riendship attribute. (b) Clustering result with Location id attribute. For clearness, we only show one-month clustering result using F riendship and Location id attribute as the classes of clustering evalu-ation, respectively. 5 clusters can be found for the two clus-tering processes, and we use different colours to distinguish different clusters. 1 2 3 4 5 0 500 1000 1500 2000 2500 3000 3500 Cluster number Number of users (a) 1 2 3 4 5 0 500 1000 1500 2000 2500 3000 3500 4000 Cluster number Number of users (b)

Fig. 3 Number of users for each cluster. (a) Number of user-s for each cluuser-ster uuser-sing F rienduser-ship attribute auser-s the clauser-suser-s of clustering evaluation. (b) Number of users for each cluster using Location id attribute as the class of clustering evalua-tion.

the relationships between the check-in time, locations,

friend-we confirm whether the crowd behavior of individual-s can be identified. Figure 2 individual-showindividual-s the reindividual-sultindividual-s of data clustering with F riendship and Location id attribute, respectively (these two attributes are evaluation class-es of clustering), using the Expectation-Maximization (EM) algorithm [7]. From Fig. 2, the crowd behavior (we define “crowd” in Definition 1) has been recognized using the processed data (in our study we process the original collected data). Moreover, we can find that dif-ferent evaluation classes have difdif-ferent clustering accu-racy levels, so different attributes have different impacts on crowd behavior. Figure 3 shows the number of user-s for each cluuser-ster correuser-sponding to Fig. 2 (Fig. 2 only shows the users who are “1-hop” friends, but in Fig. 3, the users who are multihop friends are also counted for each cluster). For Fig. 3, more detailed explanations are necessary: why some clusters are composed by t-housands of users when we use “Friendship” attribute as the class of clustering evaluation? Because (i) we use “multihop” friends in the clustering evaluation; that is to say, if A is a friend of B within the range r and B is a friend of C within the range r, A, B and C all will belong to a same crowd; (ii) the error of clustering e-valuation is existent; (iii) we use the ee-valuation dataset which is from one special month, e.g., some data of the dataset is from a major celebration (a great number of users are gathered together), to show the impact of attributes on crowd behavior5.

ship and crowd behavior of users in 772, 966 distinct places. The data consists of 58, 228 nodes (users) and 214, 078 friend edges (friendship is directed between any two nodes).

5 _{Even if the dataset is incomplete, it still can be used to}

show that “the impact of attributes (friendship and check-in locations) is existent on crowd behavior”.

(6)

Definition 1 A crowd can be defined as follows: a group of individuals at a “same” physical location (the range radius r for each user is 10m) and at the “same” time (the time range for a crowd is set to 15 minutes; name-ly, if the check-in time difference between two users is within 15 minutes, we consider that they are “at the same time”).

The characteristics of crowd behavior for each single person can therefore be inferred from his/her check-in records. We refer to this as their “individual behavior”. This shows which individuals participate in a specif-ic crowd. From the EM algorithm, a given record be-longs to each cluster with certain probabilities. More-over, likelihood is a measurement of “how good” a clus-tering process is and it increases in the successive itera-tions of EM algorithm. It is worth mentioning that the higher the likelihood, the better the model fits the data. Second, the clustering process (crowd behavior recog-nition model) is described as follows. We define two pa-rameters: (i) the user u’s check-in data Su _{which is a}

sequence of activity observations for user u; (ii) a set of unknown values θ (i.e., the serial numbers of clusters). These two parameters are used along with a Maximum Likelihood Estimation (MLE): L(θ; Su_{) = p(S}u_{|θ). Our}

purpose is to seek the MLE of marginal likelihood. In other words, we need to find the most probable θ which the user u belongs. The EM algorithm iteratively ap-plies the following two steps to achieve our purpose:

1. Expectation step (E step): calculate the expected value of the log-likelihood function under the cur-rent established clusters (θ(t)_):

Q(θ|θ(t)) = ESu_,θ(t)[log L(θ; Su)];

2. Maximization step (M step): find the appropriate value of parameter θ, which maximizes this quanti-ty:

θmle= arg max

θ Q(θ|θ (t)_).

MLE estimates θ by finding a value of θ that maximizes Q(θ|θ(t)_{), and the estimation result can be flagged as:}

θmle.

Further, based on our recognition model, we add the spatio-temporal pattern of crowd into crowd behavior (clustering). We use a triple q = (θ, pi, ti) to replace θ.

Then the expectation-maximization process becomes: 1. Expectation step: Q(q|q(ti)_{) = E}

Su_,q(ti)[log L(q; Su)],

where pi is the position characteristic of location

measurement mi, and ti is the timestamp of mi.

Moreover, q(ti) _{is a set of current established}

clus-ters with their locations and timestamps;

2. Maximization step: choose q to maximize Q(.), qmle= arg max

q Q(q|q (ti)_).

Finally, in order to preserve the integrity of our model, the recognition accuracy must be measured. In our crowd behavior recognition model, the value of the log-likelihood can be used to measure the accuracy. For instance, using F riendship attribute as the eval-uation class, based on one-month data, the log likeli-hood of crowd identification is: −16.42186, and using Location id attribute as the evaluation class, the log likelihood is: −16.87742. Their accuracy is different, and F riendship attribute is more effective for improv-ing the recognition ability of the model.

5 Locating in Crowdsourcing-based DataSpace As an effective measurement, RSS is easily available from various wireless signals which are from most off-the-shelf wireless equipment such as WiFi- or Bluetooth-compatible devices. And a large number of RSS-based indoor localization algorithms are proposed. However, considering RSS as a database to support indoor local-ization (e.g., RSS fingerprint space), it is time-consuming and labor-intensive. Especially, from extensive experi-ments, we observe that the RSS is vulnerable due to en-vironmental dynamism (an example is shown in Fig. 4(a)). How to avoid these weaknesses to improve the perfor-mance of RSS-based indoor localization? It is worth noting that the trend of RSS change is obvious between different locations (Figure 4(b)).

0 20 40 60 80 −95 −90 −85 −80 −75 −70 −65 −60 −55 −50 Location number

RSS from WiFi Router 3 (dBm

)

(a) An example: the fluctua-tion range of RSS for some lo-cations. For example, at the location 60, the RSS is not a value, it is a range of values. So the fingerprint of a location cannot be denoted by the ab-solute value of RSS. −10 −5 0 5 10 15 −80 −75 −70 −65 −60 −55 −50 Location number RSS from W iFi Router 3 (dBm) (b) Changing trend of RSS. location number = 0 is the location of a signal trans-mitter; as a signal receiver moves away from the trans-mitter, the RSS is decreas-ing.

Fig. 4 Instability of RSS and changing trend between differ-ent locations.

To try to avoid these weaknesses, we present LiCS and the details are shown as follows.

(7)

6 Yuanfang Chen et al.

Input: Signal triples6 from individuals.

First, at time t + 1, the value p(t + 1) can be esti-mated based on “observed locations” p(t), p(t−1), p(t− 2), ..., p(t − k + 1). Moreover, the relationship between the output p(t + 1) and the input p(t), p(t − 1), p(t − 2), ..., p(t − k + 1) has the following mathematical rep-resentation (Eq. 1): p(t + 1) = α0+ q X j=1 αjg(β0j+ p X i=1 βijp(t − i + 1)) + , (1) where αj(j = 0, 1, 2, ..., q) and βij(i = 0, 1, 2, ..., p; j =

1, 2, 3, ..., q) are the connection weights between time series, p is the number of “observed locations”, q is the number of nodes of hidden layer7 _{and is noise of the}

estimation. The logistic function g(x) = _1+e1−x is used

as a hidden-layer transfer function. In this study, an optimal location estimation model is built by training the model with the wireless data collected from the re-al physicre-al space around us (the collected data can be denoted as signal triples). The training steps are shown as follows:

Step 1: Cluster the signal triples. Partition all triples into several clusters, using an Expectation-Maximiza-tion (EM) clustering algorithm. Moreover a cluster center can be obtained for each cluster. Each cluster is given a unique number as its location.

Step 2: Input some selected time-serial signal triples with corresponding locations of clusters into Eq. 1 for learning the optimal configuration of parameter-s, αj, βij and . A location estimation model with

optimal parameter configuration can be obtained. Moreover, the growth of logistic function g(.) sat-isfies: the initial stage of growth is approximately exponential; then, as saturation begins, the growth slows, and at maturity, the growth stops. So if the training time is long enough, the parameter con-figuration of Eq. 1 will gradually converge to the optimal solution.

Then, a target can be located with the optimal lo-cation estimation model. (i) Give the target a start location p(0). Calculate the Euclidean distances be-tween a received new signal triple and all cluster cen-ters (the new signal triple is from the target). If the

6 _{A triple can be denoted as [RSS, M AC}

T, M ACR]. For

the specific RSS of a location, M ACT is the MAC address

of corresponding signal transmitter and M ACR is the MAC

address of corresponding signal receiver.

7 _{In machine learning, using the hidden layer enables}

greater processing power and system flexibility. The nodes of hidden layer are named as hidden nodes. Hidden nodes are the nodes that are neither in the input layer nor the output layer. These nodes are essentially hidden from view, and their number and organization can typically be treated as a black box to people who are interfacing with the system.

shortest Euclidean distance is relative to the cluster k, p(0) = k. (ii) Calculate the location at time t + 1. Us-ing the trained Eq. 1 as “optimal location estimation model”, from the start location p(0), we can obtain time-serial locations. And based on “observed location-s”, [p(t), p(t − 1), ..., p(0)], p(t + 1) can be calculated.

The real-time location of a target is obtained by our algorithm. Moreover, (i) the optimal location estima-tion model is periodically trained by new signal triples; and (ii) if more signal transmitters can be detected by receivers, it will help to distinguish different locations more effectively for achieving higher localization accu-racy.

Output: The real-time location (cluster number) of a target. For some special applications, if the abso-lute coordinates of clusters are available, the absoabso-lute physical location of target will be known.

Note that Eq. 1 is the core of LiCS. From above de-scriptions of the algorithm LiCS, we can find that these parameters will affect the accuracy of LiCS: the con-nection weights between time series, αj and βij. The

parameter αj reflects the importance of hidden-layer

transfer for the location estimation of time t+1. In oth-er words, it denotes the degree of correlation between different time series. The parameter βij reflects the

im-portance of the jth _{node in the hidden layer, when the}

hidden layer transfers the influence of observed loca-tion p(t − i + 1) to p(t + 1). The appropriate values of these parameters for the model of location estimation at time t + 1 will be helpful to improve the accuracy of localization.

The variables that are used in LiCS are summarized in Tab. 1.

6 Evaluation

We develop the prototype system of LiCS on the in-creasingly popular Android OS which supports WiFi and Bluetooth. We conduct long-term experiments in two laboratories (84m2 and 53m2, respectively) and a corridor (Fig. 5) of a middle-size academic building where a number of WiFi routers without location in-formation have been installed. Moreover, in each ex-perimental site, we install three Bluetooth transmitters (laptop-embedded Bluetooth transmitters are used in our experiment, so they are not special devices; the sig-nal of Bluetooth is full-coverage for each experimental site). The experiment lasts one month using 9 volun-teers. We measure WiFi signals and Bluetooth signals.

(8)

Table 1 Variables and explanations

Variables Explanations

[RSS, M ACT, M ACR] A triple consists of three elements, where RSS is Received Signal Strength received by

a mobile terminal, M ACT is the MAC address of corresponding signal transmitter and

M ACRis the MAC address of corresponding signal receiver. The triple is used to train

the location estimation model and structure the fingerprint database. p(t) We use the variable to denote the location of a mobile terminal at time t.

αj The variable is the connection weight between time series, and it reflects the importance

of hidden-layer transfer for the location estimation of time t + 1.

βij The variable is the connection weight between time series, and it reflects the importance

of jth_{node in the hidden layer, when the hidden layer transfers the influence of observed}

location p(t − i + 1) to p(t + 1).

(a) (b)

(c)

Fig. 5 Floor plans of experimental sites. (a) Laboratory covering over 84m2_{. (b) Laboratory covering over 53m}2_{. (c) Corridor}

covering over 302m2_.

6.1 Experimental Setup

Each volunteer carries a mobile phone and can take any activity in any area of experiment. The trace data of each volunteer is recorded every 30 seconds during working hours (from 9:00 a.m. to 10:00 p.m.). Moreover, the trace data from volunteers covers most of the areas of experiment.

In our experiments, for LiCS, we use WiFi and Blue-tooth signals. BlueBlue-tooth is a wireless technology stan-dard for exchanging data over short distances, so Blue-tooth signals attenuate more rapidly with distance com-pared with WiFi signals.

We compare our algorithm with LiFS [20] under the same experimental conditions. LiFS is an RSS-based in-door localization algorithm (using WiFi signals). The

key idea behind LiFS is that human motion can be ap-plied to connect previously independent radio finger-prints under certain semantics. In LiFS, absolute val-ues of RSS are used to establish a fingerprint database. When a user sends a location query with his/her current RSS fingerprint, LiFS retrieves the fingerprint database and returns the matched fingerprints as well as the cor-responding locations.

6.2 Performance Evaluation

In this section, we show the comparative results of LiFS and LiCS. We estimate 248 location queries, and cumu-late all localization errors of these queries (Cumulative

(9)

8 Yuanfang Chen et al. 0 2 4 6 8 10 12 14 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Error (step) Percent of Queries Empirical CDF LiFS LiCS (WiFi) LiCS (Bluetooth) 0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Error (step) Percent of Queries Empirical CDF LiCS (WiFi) LiCS (Bluetooth) LiFS 0 5 10 15 20 25 30 35 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Error (step) Percent of Queries Empirical CDF LiFS LiCS (WiFi) LiCS (Bluetooth) (a) (b) (c)

Fig. 6 CDFs of localization errors for both algorithms in three different experimental sites. (a) CDF of localization errors in the laboratory covering over 84m2_{. (b) CDF of localization errors in the laboratory covering over 53m}2_{. (c) CDF of localization}

errors in the corridor covering over 302m2_.

Distribution Function (CDF)8_{) for both algorithms,}

re-spectively. The results are shown in Fig. 6 and Fig. 7.

0 5 10 15 20 25 30 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Error (step) Percent of Queries Empirical CDF LiCS (Bluetooth) LiFS LiCS (WiFi)

Fig. 7 CDF of localization errors for the average of three experimental sites.

The unit of the estimated error for localization is step ≈ 1.2m. The average localization error of LiCS with Bluetooth signals is 2.6575m, LiCS with WiFi sig-nals is 4.35m, and LiFS is 5.95m. The maximum local-ization error of LiCS with Bluetooth signals is 33.6m, LiCS with WiFi signals is 34.8m and LiFS is 40.8m.

For comparing the overall performance of three ex-perimental sites for both algorithms, we calculate the average results of three experimental sites for LiCS (Blue-tooth), LiCS (WiFi) and LiFS. From Fig. 7, we can find that our LiCS is better than LiFS for the average of three experimental sites. For example, the localiza-tion errors of 95% queries are less than 6m for LiC-S (Bluetooth), 69% queries are less than 6m for LiCLiC-S

8 _{Cumulative Distribution Function describes the}

probabil-ity that a real-valued random variable X with a given proba-bility distribution will be found at a value less than or equal to x. It can be formulated as FX(x) = P (X ≤ x).

(WiFi) and 60% for LiFS. LiCS uses the model training with real-time data, so the localization accuracy for lo-cation queries is improved compared with LiFS. More-over, why the localization accuracy of LiCS (Bluetooth) is higher than LiCS (WiFi)? Based on the attenuation characteristics of Bluetooth signals and WiFi signals, the differences of RSS between different locations for Bluetooth are greater than the differences for WiFi. So using the RSS of Bluetooth to distinguish different lo-cations is more accurate than using the RSS of WiFi. If we can distinguish different locations more clearly, we can obtain higher localization accuracy based on an RSS-based fingerprint database.

Furthermore, from the experimental results of Fig. 6, we can find that: (i) on average, for localization accu-racy, LiCS is better than LiFS, in the three differen-t experimendifferen-tal sidifferen-tes. For example, differen-the localizadifferen-tion er-rors of 80% queries are less than 2.4m for LiCS (Blue-tooth), and 70% for LiFS, in the laboratory covering over 84m2_{; (ii) the localization accuracy of LiCS}

(Blue-tooth) is better than LiCS (WiFi) and LiFS. For ex-ample, the localization errors of 50% queries are under 2.4m for LiCS (Bluetooth), while about 30% for LiCS (WiFi) and about 25% for LiFS, in the laboratory cov-ering over 53m2_{. Bluetooth improves the average}

local-ization error up to 39% compared with LiCS (WiFi), and up to 55% compared with LiFS. Because the signal strength of Bluetooth is changed sharply between lo-cations, which makes the distinction of signal strength between different locations more remarkable (the “re-markable” is conducive to improving the accuracy of localization).

Moreover, LiFS is based on a priori database (some human intervention is necessary in the build phase of a database). LiCS is crowd-sourced, so only wireless in-formation is required, which is received by humans in their daily lives. And the locating process of LiCS is

(10)

automatic and priori-database-free. LiCS is based on WiFi and Bluetooth information which is readily avail-able. The above-mentioned features of LiCS make the rapid deployment of system possible.

7 Conclusion

Crowdsourcing is a distributed problem-solving scheme that has emerged in recent years. It exploits the poten-tial and wisdom of crowds to support various applica-tions and to improve the performance of various algo-rithms in a cost-effective fashion. In our study, we inves-tigate the existence of crowd behavior in the real world and it can be recognized using location-aware wireless mobility information of people’s daily lives. Further-more, we present a model for crowd behavior recog-nition. On this basis, by sensing and collecting WiFi and Bluetooth information from the social surroundings around us, we propose a time-serial location estimation model. Using this model, we design and implement LiC-S, an indoor target localization algorithm based on two aspects, (i) mobile devices carried by individuals, and (ii) a location estimation model which is trained by the collected data from individuals about WiFi and Blue-tooth information. The experimental results show that LiCS achieves competitive location accuracy without any special infrastructure. This work sets up a novel perspective to crowd-sourced indoor localization algo-rithms.

Acknowledgment

Lei Shu’s work is supported by the Guangdong Uni-versity of Petrochemical Technology Internal Project (2012RC0106).

References

1. Bahl, P., Padmanabhan, V.N.: Radar: An in-building rf-based user location and tracking system. In: Proceedings of the 19th Annual International Conference on Com-puter Communications (INFOCOM), pp. 775–784. IEEE (2000)

2. Borriello, G., Liu, A., Offer, T., Palistrant, C., Sharp, R.: Walrus: wireless acoustic location with room-level reso-lution using ultrasound. In: Proceedings of the 3rd In-ternational Conference on Mobile systems, Applications, and Services (MobiSys), pp. 191–203. ACM (2005) 3. Chen, M., Gonzalez, S., Vasilakos, A., Cao, H., Leung,

V.C.: Body area networks: A survey. Mobile Networks and Applications 16(2), 171–193 (2011)

4. Chen, Y., Lymberopoulos, D., Liu, J., Priyantha, B.: Fm-based indoor localization. In: Proceedings of the 10th In-ternational Conference on Mobile Systems, Applications, and Services (MobiSys), pp. 169–182. ACM (2012)

5. Cho, E., Myers, S.A., Leskovec, J.: Friendship and mo-bility: user movement in location-based social networks. In: Proceedings of the 17th International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 1082–1090. ACM (2011)

6. Dey, A., Hightower, J., de Lara, E., Davies, N.: Location-based services. Pervasive Computing 9(1), 11–12 (2010) 7. Do, C.B., Batzoglou, S.: What is the expectation maxi-mization algorithm? Nature Biotechnology 26(8), 897– 899 (2008)

8. Gu, Y., Lo, A., Niemegeers, I.: A survey of indoor po-sitioning systems for wireless personal networks. IEEE Communications Surveys & Tutorials 11(1), 13–32 (2009)

9. Gupta, A., Kalra, A., Boston, D., Borcea, C.: Mobisoc: a middleware for mobile social computing applications. Mobile Networks and Applications 14(1), 35–52 (2009) 10. Gy˝orb´ıró, N., Fábián, Á., Hományi, G.: An activity

recognition system for mobile phones. Mobile Networks and Applications 14(1), 82–91 (2009)

11. Jung, S., Lee, C.o., Han, D.: Wi-fi fingerprint-based ap-proaches following log-distance path loss model for indoor positioning. In: Proceedings of the 1st MTT-S Interna-tional Microwave Workshop Series on Intelligent Radio for Future Personal Terminals, pp. 1–2. IEEE (2011) 12. Kjærgaard, M.B., Wirz, M., Roggen, D., Troster, G.:

Mo-bile sensing of pedestrian flocks in indoor environments using wifi signals. In: IEEE International Conference on Pervasive Computing and Communications (PerCom), p-p. 95–102. IEEE (2012)

13. Rai, A., Chintalapudi, K.K., Padmanabhan, V.N., Sen, R.: Zee: Zero-effort crowdsourcing for indoor localization. In: Proceedings of the 18th Annual International Confer-ence on Mobile Computing and Networking (MobiCom), pp. 293–304. ACM (2012)

14. Roos, T., Myllymaki, P., Tirri, H.: A statistical modeling approach to location estimation. IEEE Transactions on Mobile Computing 1(1), 59–69 (2002)

15. Savvides, A., Han, C.C., Strivastava, M.B.: Dynamic fine-grained localization in ad-hoc networks of sensors. In: Proceedings of the 7th Annual International Conference on Mobile Computing and Networking (MOBICOM), pp. 166–179. ACM (2001)

16. Scott, J., Dragovic, B.: Audio location: Accurate low-cost location sensing. Pervasive Computing 3468, 1–18 (2005) 17. Tang, F., Guo, S., Guo, M., Li, M., Wang, C.l.: Towards context-aware ubiquitous transaction processing: a model and algorithm. In: IEEE International Conference on Communications (ICC), pp. 1–5. IEEE (2011)

18. Varshney, U., Vetter, R.: Mobile commerce: framework, applications and networking support. Mobile Networks and Applications 7(3), 185–198 (2002)

19. Wu, K., Xiao, J., Yi, Y., Gao, M., Ni, L.M.: Fila: fine-grained indoor localization. In: Proceedings of the 31st Annual International Conference on Computer Commu-nications (INFOCOM), pp. 2210–2218. IEEE (2012) 20. Yang, Z., Wu, C., Liu, Y.: Locating in fingerprint space:

wireless indoor localization with little human interven-tion. In: Proceedings of the 18th Annual International Conference on Mobile Computing and Networking (Mo-biCom), pp. 269–280. ACM (2012)

21. Youssef, M., Agrawala, A.: The horus wlan location de-termination system. In: Proceedings of the 3rd Interna-tional Conference on Mobile Systems, Applications, and Services (MobiSys), pp. 205–218. ACM (2005)

22. Zhang, D., Guo, B., Yu, Z.: The emergence of social and community intelligence. Computer 44(7), 21–28 (2011)