Reality mining: digging the impact of friendship and location on crowd behavior

(1)

HAL Id: hal-01285971

https://hal.archives-ouvertes.fr/hal-01285971

Submitted on 10 Mar 2016

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Reality mining: digging the impact of friendship and

location on crowd behavior

Yuanfang Chen, Antonio M. Ortiz, Noel Crespi, Lei Shu, Lin Lv

To cite this version:

Yuanfang Chen, Antonio M. Ortiz, Noel Crespi, Lei Shu, Lin Lv. Reality mining: digging the impact

of friendship and location on crowd behavior. MOBIQUITOUS 2013 : 10th International Conference

on Mobile and Ubiquitous Systems: Computing, Networking and Services, Dec 2013, Tokyo, Japan.

pp.142 - 154, �10.1007/978-3-319-11569-6_12�. �hal-01285971�

(2)

Location on Crowd Behavior

Yuanfang Chen1, Antonio M. Ortiz1, Noel Crespi1, Lei Shu2, and Lin Lv3

1 _{Institut Mines-Télécom, Télécom SudParis, France}

2 _{Guangdong University of Petrochemical Technology, China}

3 _{School of Software, Dalian University of Technology, China}

{yuanfang.chen,antonio.ortiz_torres,noel.crespi}@telecom-sudparis.eu,lei. shu@lab.gdupt.edu.cn,lvlin1023@gmail.com

Abstract. Crowd behavior is a natural instinct of human, which directly impacts how we form opinions and make decisions. It is a subject that deserves to be stud-ied since it is common that people change their behavior when being in a group. In pervasive computing research, plenty of work has been directed towards dis-covering human movement patterns based on wireless networks, mainly focusing on movements of individuals. It is surprising that social interactions among in-dividuals in a crowd is largely neglected. Mobile phones offer on-body tracking and they are already deployed on a large scale, allowing the characterization of user behavior through large amounts of wireless information collected by mo-bile phones. In this paper, we observe and analyze the impact of friendship and location attributes on crowd behavior, using location-based wireless mobility in-formation. This is a cornerstone for predicting crowd behavior, which can be used in a large number of applications such as traffic management, crowd safety, and infrastructure deployment.

Key words: Crowd behavior, Mobile phones, Wearable computing, Complex so-cial networks

1 Introduction

With the increasing size and frequency of mass events, such as traffic congestion on a highway, swarming at a tourist attraction, and clogging at weekend shopping sale, the study of crowd dynamics has become an important research area [28, 7]. Howev-er, even successful modeling approaches such as those inspired by Newtonian force models are still not fully consistent with empirical observations and are sometimes d-ifficult to be adapted for crowd prediction. With the prevalence of smart devices (such as Smart Cellphone, Tablet PC, etc.), on-body sensing, computing and communication have become widespread [16] (carry-on smart devices can be called “social sensors”). These developments have made it possible to obtain real-time and comprehensive em-pirical data required by crowd behavior analyses [24]. This “reality mining” [18] is deemed adequate to provide objective measurements of human interactions, and it can be called “honest signals” [25]. These options open up new pathways in Computational Social Science [17], where large amounts of information obtained from wireless

(3)

mo-bile devices can lead to new perspectives for the analysis of crowd behavior and social dynamics.

In order to develop reliable prediction models for traffic management, urban infras-tructure deployment, or crowd safety, it is necessary to understand what laws determine the formation of a crowd. While a lot of studies know the “physics” of crowd behavior, it is surprising that social interactions among individuals in a crowd has been large-ly neglected. Indeed, the great majority of existing studies investigate a crowd as a collection of isolated individuals, and each individual has its own motion speed and direction [20, 1]. However, it turns out that in practice, the majority of individuals do not take actions alone, but in groups with social relationships [26, 10].

In this paper, we first focus on recognizing human crowd behavior by analyzing the data measured by internet-accessible mobile phones from a location-aware online social network. By crowd behavior recognition, we understand that the movement of a large number of individuals has a pattern and can be attributed, depending on relevant param-eters such as the friendship between individuals and check-in locations (with time) of these individuals. Based on the recognition, we can realize that it is possible to predict the formation of a crowd from wireless information related to individuals. For instance, the formation of downtown pedestrian flows is related to the interactions between indi-viduals, and these flows can be distinguished by collecting the wireless information of pedestrians.

Second, we investigate how “friendship” and “location” impact human crowd be-havior. Nathan Eagle et al. have obtained an important result: “Data collected from mobile phones has the potential to provide insight into the relational dynamics of indi-viduals. Furthermore, it is possible to accurately infer 95% of friendship relations only considering the observational wireless data” [9]. This implies that there is a relationship between the friendship and the behavioral patterns of humans. Based on this result, we observe the patterns of friendship in different crowds.

The contributions of this paper are listed as follows.

– We design a crowd recognition model. One of the main challenges in crowd behavior recognition is to infer the most likely crowd behavior using the data collected from a set of persons. We use check-in time and location (T ime and Location id. We convert each latitude/longitude coordinate of the earth into a unique Location id) to quantify the track of each individual. Then, a clustering algorithm1 is used to find the likely crowds.

– We investigate how friendship influences crowd behavior. In order to measure this influence, we use the friendship degree of each user and the probability distribution of various degrees.

– We investigate how individuals’ locations influence crowd behavior. For measuring the influence, we investigate these relationships for different clusters: (i) users and their locations; (ii) check-in time and users’ locations.

1_{An Expectation-Maximization (EM) clustering algorithm is used in this paper. The EM assigns} a probability distribution for each track record (instance), which indicates the probability of each instance belonging to each of the clusters. The EM can automatically decide how many clusters to create.

(4)

The paper is structured as follows. Section 2 introduces the related work. Section 3 presents the data used in our study and statistic analysis. Section 4 shows how to rec-ognize human crowd behavior from datasets. Section 5 reveals the impact of attributes (check-in time, check-in location and friendship) on crowd behavior. Finally, some con-clusions are given in Section 6.

2 Related Work

Recently, a number of scientific communities, from computer science to physics, have been working in human dynamics. Pedestrian movement patterns have been studied using wireless-based personal location data [23]. In physics, for human crowd behav-ior analysis, many approaches have been proposed inspired by using fluid dynamic-s [13], dynamic-swarmdynamic-s [3] and cellular automata [2]. Car-badynamic-sed human movement patterndynamic-s have also been studied by utilizing the data from GPS-equipped vehicles [27]. Thereinto, the approaches which are based on wireless mobility information are more reliable, objective and environment-independent compared with model-based approaches, e.g., habit-based model [15]. Model-based approaches are environment-sensitive. Moreover, because the factors of environment influence each other, a simple model is not su ffi-cient to reflect the interactions between these factors. Furthermore, the performance of a model is related to the experience of modeler.

As an important aspect of human dynamics, “crowd dynamics” is worthy to be deeply analyzed, since it helps to extract some useful conclusions about how human-s behave when they are in large grouphuman-s. Further, predicting the formation of a crowd is helpful in some emergency situations, e.g., evacuation route control; and even the prediction is also beneficial for studying and improving the performance of public in-frastructures, e.g., network usage during a mass event. However, the prediction of crowd events is still a challenge, even if a lot of new technologies can be used, e.g., GPS-based human trace tracking technology. Moreover, GSM, bluetooth or WiFi localization tech-nologies have been explored to be used to collect sufficient data for analyzing crowd behavior [6].

The prediction of human behavior is the main topic of a number of publication-s. The method proposed in [14] estimates an object’s future locations, by considering the patterns of recent movements. They present the concept of a Trajectory pattern (T-pattern), a special association rule with a timestamp that is able to define a sequence of locations with a certain probability. They also propose the Trajectory Pattern Tree (TP-T), a data access method that indexes trajectory patterns to efficiently answer predictive queries, and finally, they detail a Hybrid Prediction Algorithm (HPA) that provides ac-curate prediction for both near and distant time queries. T-patterns are also used in [19], where WhereNext is presented, which is a technique to predict the next location of a moving object. It uses an evaluation function that efficiently creates TPTs by consid-ering the previous movements of all moving objects in a certain area. The model pre-sented in [21] is based on behavioral heuristics that predict individual trajectories and collective motion patterns such as the spontaneous formation of unidirectional lanes or stop-and-go waves. These heuristics consider visual information to describe the motion of pedestrians.

(5)

Most of the previous research in crowd dynamics has ignored the internal connec-tion between social relaconnec-tionships and crowd formaconnec-tion. While discrete observaconnec-tions of an individual’s idiosyncratic behavior seem to be merely random, and most studies of crowd behavior consider only interactions among isolated individuals, the results pre-sented in [22] show that up to 70% of people in a crowd are actually moving in group-s group-such agroup-s friendgroup-s, couplegroup-s, or familiegroup-s, concluding that the group-social relationgroup-shipgroup-s affect crowd forming. In addition, group sizes are commonly distributed according to a pois-son distribution [12]. Thus, social ties between individuals impact the forming of crowd. In this work, we consider friendship and location relationships between individuals as an important parameter for crowd recognition.

3 Data Description

The dataset used for analyzing crowd behavior consists of anonymous check-in data from mobile devices collected by a location-based social networking service provider where users share their locations by check-in, and the friendship between these mobile users is collected using their public API [5]. This aggregated and anonymous mobile device information is used to correlate, model, evaluate and analyze the relationships between the check-in time, locations, friendship and crowd behavior of users in 772, 966 distinct places. The dataset consists of 58, 228 nodes (users) and 214, 078 friend edges (friendship is directed between any two nodes).

Check-in behavior of users. Based on users’ check-in behavior, we can obtain users’ locations, and our recognition and observation are check-in-location-aware. Moreover, through check-in frequency, some special places can be inferred. For ex-ample, the home location of a user can be defined as the location which has maximum average number of check-ins for a period of time. Manual inspection shows that this infers home locations with 85% accuracy [5]. We can deduce that there is a relation-ship between users’ social relations (e.g., the kinrelation-ship of users at home) and check-in locations with check-in frequency.

Friendship and mobility. Our study aims at understanding how the location of user A’s friend B affects the movement of A. Intuitively we are more likely to move to a place in which we have friends (crowd behavior is a natural instinct of human). To quantify this effect we proceed as follows. User A “visits” the location of friend B, if A checks in within radius r of B’s location, and we aim at computing p(d), which measures the probability that A “visits” a friend (near the friend) within the range, r= d (we set r = 10m). In the dataset, using the “location id”, we can draw the “ranges” corresponding to different friends and obtain relevant probabilities to which range A belongs. Moreover, In Fig. 1, we show, for each user, the number of records and to which cluster belongs based on January, 2010 dataset (Fig. 1(a)) and February, 2010 dataset (Fig. 1(b)).

The original dataset provided by the service provider can not be directly used to mine the basic laws that govern human crowd behavior, or even the impact of these laws, so we apply a process to perform an association of the mobility check-in data with the friendship information. The process involves two steps:

(6)

0 1 2 3 4 5 6 x 104 0 50 100 150 200 250 300 UserID Number of records

(a) Result of January, 2010 dataset. 0 1 2 3 4 5 6 x 104 0 50 100 150 200 250 300 350 400 450 500 UserID Number of records (b) Result of February, 2010 dataset.

Fig. 1. Number of records and which cluster belongs for each user in the January and February, 2010 datasets (different colours represent different clusters).

1. Perform a spatial-temporal analysis of the records to detect which users form a crowd (e.g., places in which some people have stopped for a sufficiently long time); 2. Infer in every crowd which users are friends and what kind of friendship they have

(1, 2, 3, ..., N-hop friends, e.g., “1-hop” is for direct friends).

In order to infer “where and when crowds are forming?” from a large number of records, we first characterize the individual activity by mathematical modeling. Each location measurement mi, collected for every mobile device, is characterized by a

po-sition pi expressed in latitude and longitude, and a timestamp ti. We also measure the

interval time between different check-in activities [11]. The average interval time mea-sured for the whole population is 30 minutes. So within the interval we can detect a gathering of humans from the dataset. Moreover, this time interval is comparable to the average length of real social events.

To confirm the friendship in a crowd, we merge the friendship dataset with the check-in record dataset, e.g., for a crowd, if there is a record of any two users in the friendship dataset, the two users are 1-hop friends, and then we add this information into the check-in record dataset as a value of a new attribute column (friendship). Multihop friendship will be also recognized, and we record the identity number of minimum-hop-count friend for every user as the value of the friendship attribute column.

4 Crowd Behavior Recognition

We formalize a series of processing steps which can be used to infer crowd behav-ior from location-based wireless information. In this section we build a mathematical model for recognizing the crowd behavior of population.

First, based on the processed dataset, we confirm whether the crowd behavior of in-dividuals can be identified. Fig. 2 shows the results of data clustering with Friendship and Location id attributes, respectively (these two attributes are evaluation classes for clustering), using Expectation-Maximization (EM) algorithm [8]. From Fig. 2, the

(7)

crowd behavior has been recognized using the processed dataset (in our study a crowd is defined in Definition 1). Moreover, we can find that different evaluation classes have different clustering accuracy levels, so the impacts of different attributes on the crowd behavior are different. Fig. 3 shows the number of users for each cluster corresponding to Fig. 2 (Fig. 2 only shows the users who are “1-hop” friends, but in Fig. 3, the users who have multihop friendship are also counted for each cluster).

Definition 1 (A Crowd). A group of individuals at the same physical location (the range radius r= 10m) at the same time.

1071 10972 45914 7201 44126 8724 10650 45913 646 6908 71556933 7170 2908 2897 629 179 174 4 2832 1076 21818 8240 2808 30 10752 1 156 2975 11 2947 0 10986 2894 2850 6208 15542 167 10658 13501 10343 107176782 6677 2953 2822 1386 680 143 7231 1053610767 7705 10436 890710669 10668 7115 2834 146 9569 10612 2844 2821 217 8467 2 26792 1875 1037410350 8869 7798 3173 1077 651 2819 81458924 13265 28337620 19113 10988 2806 7715 15313 7790 10426 2920 620 2912 11746 7602 15377 308010632 10653 159 336 1453 35288 10411 13252 7517 3189 11029 3146 15565 15385 1515 3014 2297 10974 10554 965 878 1156 49566 34 21121 6606 7537 13361 3063 46708 8696 13560 64 2874562 28539 3670 1701 24659 27997 24601 34913 37296 23841 10729 6825 25 366811646112046692 7546 2949 2969 13323 3058 29912 12873 53458 30058 3680 2910 55 382037436 23908 10622 2918 1860 1075 3160 7681 28678 10967 7794 6904 10614 7281 2987 291113 46488 3083 23796 1307 22518 40799 5604 6625 1499 9406 7162 11016 119267103 15815 24597 2892 1446 235 41954 1777 1257468639403 679 13653 6614 6988 43796 1555811504 9130 36188 2951 2929 1308 279 19389 13548 9971 30705 39405 10714 10671 6600 7376 2924 23838 16451 6842 11519 28059 24604 301 10809 7843 33361 2818613535 7755 7647 3650 3206 7783 232228750 15820 70841867 7621 21629 232 1074310353 132 15350 15829 20355 15569 56071 4612 3233 2264 13479 12908470 683615844 419 263 1908419731984 11901 75907594 6589 6587 259 219 1594529903 14814 399994078435833 1944 9 11003 1880 7847 16447 56344 47768 7837 9592 9594 1985 875 1520 1693 20477 16620 21696 15375 3 17023 17315 400026213 1940 832 3302 11014 17795 13579 11299 9977 1914 1285 987 627 8902 1697 11663 11528 9416 6916 10959 317 28553109 10559 10685 28451876 23 2 9417 1036215852 28652839 28887 385 6902 154 26 6599 2861 7528 3009 10333 28888 10966 2993 1419 124 42 10354 10980 4 81256604 5404 3313 2880 2818 267 7 21822135037518 12985 10375 10423 2902 2825 7558 652 8889 0 26494 634 11282 2896 103511935 109638946 10400 6915 3332 241 2954 10380 2893 9847 7527 10990 15727 55514 6980 6638 2905 2176 23544 35129 18848 9593 155577251 16448 2847 253 157 779311261 28518 577137619 2134 1771 293 144 13834 22271 11609 11824 10336 9985 9916 7526 6948 15335 35562 26137 26405 37001 11267 24103 1 6709 5628 2206 9918 10801 16450 8248 1859 310 7848 51389 6841 53515 16696 7542 17045 35 7756 37380 25003 15262 21695 6913 16529 5655 1095 662 13419 6956 277 3093 695511836 35128 8249673828776 2873628632623 209 1491413 1142 15 26320 10382 10590 11264 1864 262 14 6249 11816 11260 10875 6588 62 1923 7529 698

(a) Clustering result with

Friendshipattribute. 13419 8145 26792 8249 7715 6677 2951 2894 2821 179 10986 26137 8924 7705 10411 2833 167 143 146 15377 13501 10653 7602 6915 15852662 277 5 35833 6614 7231 2920 652 10426 10374 10671 10729 10436 7798 10668 15313 336 35288 629 6208 1326510988 6933 10767 10423 10658 7155 540410400 3332 2908 10343 28501386 3080 2822 1453 7620 262 159 2905 2912 10669 3093 620 8240 6956 10967 7790 1562806 2819 2897 10963 2954 279 174 1 2825 7170 8467 10336 2844 2902 132 2947 651 2832 15542 6913 6782132520 680 10350 235 10354 1875 1864 2818 1076 6588 241 7755 3173 55 21818 27997 7 0 7528 6904 28888 30 10375 2808 7115 26320 1876 1446 1419 1071646 9 10380 7756 21822 11014 3109 2839 124 26 2873 6863 11504 10362 12985 15385 8902 6587 46708 40002 3146 28059 24601 3058 987 627 8946 106321087510351 88898248 14 15350 16620 1940 170231914 1701 36806213 28678 563441582911646 263 11901 11519 15727 21696 1156 7103 13 11003 7681 1059011746 1584429912 7847 11609 9130 10959 2910 2892 16450 10717 1142 22271 24103 15820 40784 28518 35129 2834 8750 12873 3 6842 7619109741152812574 6841 13479 28186 40799 9847 2134 1944 708423222 11663 28539 10714 9416 135793014 1307419 219 53458 13653 1911310622 7281 6916 2911 2847 1308 7647 1192633361 238381183641954 16448 9406 8470 7542 4 6948 6249 5628 1859 878 62 15557 10536 16529 6738 11824 1923 2206 1095 632 23841 7518 1126145913 4591416696 9971 7529310 149 15 2949 6692 8869 13548 144 875 6709 9977 6955 1697138341984 35128 1075 2924 2176 26494 6604 2874 28887 10333 10650 1880 42 23 2896 679 623 11267 1771293 1693 21695 3313 2929 209 10685 1935 9417 3009 2880 385 21629 11282 2865 2993 267 11 4 25 9569 2953 10382 7558 10612 7517 10559 6902 11299 154 13503 10809 634 11016 7201 7527 2845 10972 4412664 7526 11260 10966 10980 1860 34 2861 6599 6980 8125 8724 2975 10752 2855 1777 1413 2 30705 3160 19389 47768 3302 17795 7621 355623171520 9918 56071 24659 2969 13560 46488 37296 25003 3206 7794 11264 9592 13361 7376 24597 2987 14814 15945 6606 3233 1985 2893 7537 1499 29903 998537436 15375 43796 628 2918 10614 15335 17315 232 217 7783 20355 7837 16451 7162 2264 22518 21121 869655514 10801 5138972517546 6600 832 53515 23544 20477 39405 5655 1867 1077 9594 15565 6638 16447 11204 18848 3820 1515 259 35 7590 36188373801 157 110297594 419737843 9593 4612 2297 13323 23796 9916 57713 7793 6836 6589 1285 562 34913 15815 3670 9403 1290 965 301 15262 3063 5604 3083 300587848 13535 39999 698 11816 6825 23908 6988 10990 15558 6908 15569 3189 24604370018907 10743 6625 3650 1908 253 3668 4956617045 26405 28776 10353 10554

(b) Clustering result with Location idattribute.

Fig. 2. For clearness, we only show one-month clustering result using Friendship and Location idattributes as the classes of clustering evaluation, respectively. 5 clusters can be found for two clustering processes, and we use different colours to distinguish different clusters. Note that the shape for the center node of any cluster is square.

The characteristics of the crowd behavior of each single person can thus be in-ferred from his/her check-in records. We refer to this as their “individual behavior”. This shows which individuals participate in a specific crowd. From the EM algorithm, a given record belongs to each cluster with certain probabilities. Moreover, the likeli-hood is a measurement of “how good” a clustering process is and it is increased at each iteration of the EM algorithm. It is worth mentioning that the higher the likelihood, the better the model fits the data.

Second, the clustering process (crowd behavior recognition model) is as follows. We define two parameters: (i) the user u’s check-in data Su which is a sequence of activity observations about u; (ii) a set of unknown values θ (i.e., the serial numbers of clusters). These two parameters are used along with a Maximum Likelihood Estimation (MLE): L(θ; Su)= p(Su|θ). Our purpose is to seek the MLE of marginal likelihood. In other words, we need to find the most probable θ which the user u belongs. The EM algorithm iteratively applies the following two steps to achieve our purpose:

1. Expectation step (E step): calculate the expected value of the log-likelihood func-tion under the current established clusters (θ(t)_):

(8)

1 2 3 4 5 0 500 1000 1500 2000 2500 3000 3500 Cluster number Number of users

(a) Number of users for each cluster using Friendship at-tribute as the class of clustering evaluation. 1 2 3 4 5 0 500 1000 1500 2000 2500 3000 3500 4000 Cluster number Number of users

(b) Number of users for each cluster using Location id at-tribute as the class of clustering evaluation.

Fig. 3. Number of users for each cluster.

2. Maximization step (M step): find the appropriate value of parameter θ, which max-imizes this quantity:

θmle_{= arg max} θ Q(θ|θ

(t)_).

MLE estimates θ by finding a value of θ that maximizes Q(θ|θ(t)_{), and the estimation}

result can be flagged as: θmle_.

Further, based on our recognition model, we add the spatio-temporal pattern into the crowd behavior (clustering). We use a triple q = (θ, pi, ti) to replace θ. Then the

expectation step becomes: 1. Expectation step:

Q(q|q(ti)₎= E

Su_,q(ti)[log L(q; Su)],

where piis the position characteristic of location measurement mi, and tiis

times-tamp of mi. Moreover, q(ti)is a set of current established clusters with their locations

and timestamps;

2. Maximization step: choose q to maximize Q(.), qmle= arg max

q Q(q|q (ti)_).

Finally, in order to preserve the model integrity, the recognition accuracy must be measured. In our crowd behavior recognition model, the value of the log-likelihood can be used to measure the accuracy. For instance, using the Friendship attribute as the evaluation class, based on one-month data, the log likelihood of crowd identification is: −16.42186, and using the Location id attribute as the evaluation class, the log likeli-hood is: −16.87742. Their accuracy is different, and the Friendship attribute is more effective for improving the recognition ability of the model.

(9)

5 Impact of Attributes on Crowd Behavior

Most previous studies of crowd behavior only consider interactions among isolated in-dividuals, and assume that the motion of individuals is random. The work in [22] af-firms that the walking behavior of pedestrians is affected by social relationships, such as friends, couples, or families walking together. The results presented in Fig. 2, show that some population attributes impact the crowd motion of humans, and these impacts are different for diverse attributes. In this section, we go deeper into the impact of these effects by analyzing how friendship and location affect crowd behavior.

5.1 Impact of Friendship on Crowd Behavior

In this section, we analyze the impact of social relationship (friendship) on the complex dynamics of crowd behavior. For this, we use the empirical data of the motion of indi-viduals by means of check-in recordings of public areas. Observations are made under varying-density collections of population.

0 1 2 3 4 5 6 x 104 0 10 20 30 40 50 60 70 UserID Degree (k) User−Degree Distribution January data February data

(a) Friendship degree of each us-er. 0 10 20 30 40 50 60 70 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Degree (k) P(k)

Degree Probability Distribution

February data January data

(b) Probability distribution of var-ious degrees.

Fig. 4. Friendship degree of each user and probability distribution of various degrees (January and February, 2010 datasets).

First of all, we analyze the friendship of a group of people. Fig. 4 shows the friend-ship degree of each user and the probability distribution of various degrees (based on the data of January and February, 2010). We can find that friendship exists in almost all observed users. It means that friendship is an important influencing factor for human behavior. It is worth noting that a Poisson distribution is met for the friendship degree probability distribution of all observed users. Moreover, from Fig. 4(b), friendship de-gree is less than 10 for almost 90% users. So the crowding propensity of an individual is not primarily oriented by friendship.

Secondly, several crowds exist in the observed group of people and we investigate the friendship degree probability distribution of each crowd. Fig. 5 shows the friendship degree distributions of 5 clusters which are based on January, 2010 dataset. For some

(10)

crowds, friendship degrees show approximative scale-free power-law distributions, e.g., crowds 0, 1, 2, and 3. 100 ₁₀1 ₁₀2 100 101 102 Degree (k) Number of users 0 100 ₁₀1 ₁₀2 100 102 104 Degree (k) Number of users 1 100 ₁₀1 ₁₀2 100 101 102 Degree (k) Number of users 2 100 ₁₀1 ₁₀2 100 102 104 Degree (k) Number of users 3 100 ₁₀1 ₁₀2 100 101 102 Degree (k) Number of users 4

Fig. 5. Relationship between friendship degree and number of users for each crowd (January, 2010 dataset). 0 1 2 3 4 0 1 2 3 4 5 6 x 104 Cluster FriendID

Fig. 6. IDs of users (there is friendship among these users) for each crowd.

Finally, how many users have friendship between each other for each cluster? Fig. 6 exposes an interesting phenomenon: even though the average friendship degree of clus-ter 2 is larger than that of clusclus-ter 3 (shown in Fig. 5), the number of users who have friendship in cluster 2 is less compared to cluster 3. This means that there is no relation between the number of users (who have friendship) and friendship degree in a crowd. 5.2 Impact of Location on Crowd Behavior

With the increasing ubiquity of location sensing included in mobile devices, we realize the arising opportunity for human crowd analysis through mobile wireless information from the real world context. Thereby, the spatial locations of individuals become a

(11)

kind of important and available information for extending the analysis and modeling of human crowd behavior to the physical world. Most prediction models of crowd events use some kind of location information. In [4], Calabrese et al. believe that the attendees of crowd events are related to areas: “Sport events such as baseball games attract about double the number of people which normally live in the Fenway Park area. Moreover, those events seem to be predominantly attended by people living in the surrounding of the baseball stadium, as well as the south Boston area”. From our investigation, we can find the impact of location on crowd behavior. Fig. 7 shows location id distribution of users. Location_id UserID 0 60000 8000 4000 6000 2000 30000 15000 45000 0

(a) Location distribution of clus-tered users. Check-in time Location_id 0 0 10000 5000 2500 7500 7000 3500

(b) Along with the change of check-in time, the location distribution of clustered users.

Fig. 7. Relationship among users, check-in time, clusters and locations (different colours denote different clusters).

Analyzing Fig. 7, we can see that users are obviously clustered, and an approximate diagonal line divides the User-Location (see Fig. 7(a)) and Check-in time-Location (see Fig. 7(b)) space. Moreover, we can observe: the range of activity for any user is limited; and duration time is different for different crowds in different locations. We can assert that human crowd behavior is centralized; namely “hot spot” exists. We use this char-acter of crowd behavior to do centralized population monitoring or urban public facility deployment.

Cluster

Location_id

Fig. 8. Location distribution of users in each cluster.

Figure 8 shows the location distribution of users for each crowd. We can find that the distances between users are different (the location distribution of users is uneven) in

(12)

any crowd. For instance, in cluster 3, there are two small subclusters. We believe that the small-world phenomenon exists in crowd behavior.

6 Conclusion

The analysis of crowd behavior can give us an idea of how we behave when we are part of a group. Different actions can be taken by individuals when being surrounded by others. In this paper, we have focused on analyzing crowd behavior, considering check-in data from location-aware mobile social networks.

There are several parameters that affect the forming way of a crowd. Social relation-ships, e.g., friendship, as well as current locations, can impact the way that a crowd is formed. Understanding how these parameters determine crowd formation and evolution is the first step prior to the creation of predictive approaches. After grasping the rela-tionship between these parameters and crowd behavior, our ongoing research focuses on the prediction model of crowd behavior. This model will be useful for preventing disasters from the pushing of crowds, facilitating efficient massive event planning, or even for traffic management.

Acknowledgment

This work was supported by the EU ITEA 2 Project 11020, “Social Internet of Things-Apps by and for the Crowd” (SITAC).

References

1. Antonini, G., Bierlaire, M., Weber, M.: Discrete choice models of pedestrian walking behav-ior. Transportation Research Part B: Methodological 40(8), 667–687 (2006)

2. Bandini, S., Manzoni, S., Vizzari, G.: Crowd behaviour modeling: from cellular automata to multi-agent systems. Multi-Agent Systems: Simulation and Applications pp. 204–230 (2009)

3. Bellomo, N.: Modeling crowds and swarms: congested and panic flows. Modeling Complex Living Systems pp. 169–188 (2008)

4. Calabrese, F., Pereira, F., Di Lorenzo, G., Liu, L., Ratti, C.: The geography of taste: analyzing cell-phone mobility and social events. Pervasive Computing 6030, 22–37 (2010)

5. Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of the 17th SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1082–1090. ACM (2011)

6. Cook, D.J., Das, S.K.: Pervasive computing at scale: transforming the state of the art. Perva-sive and Mobile Computing 8(1), 22–35 (2012)

7. Davis, L.: Predicting travel time to limit congestion at a highway bottleneck. Physica A: Statistical Mechanics and its Applications 389(17), 3588–3599 (2010)

8. Do, C.B., Batzoglou, S.: What is the expectation maximization algorithm? Nature biotech-nology 26(8), 897–899 (2008)

(13)

9. Eagle, N., Pentland, A.S., Lazer, D.: Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences 106(36), 15,274–15,278 (2009)

10. Ge, W., Collins, R.T., Ruback, R.B.: Vision-based analysis of small groups in pedestrian crowds. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(5), 1003–1016 (2012)

11. Gonzalez, M.C., Hidalgo, C.A., Barabasi, A.L.: Understanding individual human mobility patterns. Nature 453(7196), 779–782 (2008)

12. Griesser, M., Ma, Q., Webber, S., Bowgen, K., Sumpter, D.J.: Understanding animal group-size distributions. PloS one 6(8), e23,438:1–9 (2011)

13. Helbing, D., Molnar, P.: Social force model for pedestrian dynamics. Physical review E 51(5), 4282–4286 (1995)

14. Jeung, H., Liu, Q., Shen, H.T., Zhou, X.: A hybrid prediction model for moving objects. In: Proceedings of the 24th International Conference on Data Engineering, pp. 70–79. IEEE (2008)

15. Jiao, Y., Liu, Y., Wang, J., Wang, J.: Model for human dynamics based on habit. Chinese Science Bulletin 55(24), 2744–2749 (2010)

16. Ko, M.H., West, G., Venkatesh, S., Kumar, M.: Online context recognition in multisensor systems using dynamic time warping. In: Intelligent Sensors, Sensor Networks and Infor-mation Processing Conference, pp. 283–288. IEEE (2005)

17. Lazer, D., Pentland, A., Adamic, L., Aral, S., Barab´asi, A.L., Brewer, D., Christakis, N., Con-tractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., Van Alstyne, M.: Computational social science. Science 323(5915), 721–723 (2009)

18. Mitchell, T.M.: Mining our reality. Science 326(5960), 1644–1645 (2009)

19. Monreale, A., Pinelli, F., Trasarti, R., Giannotti, F.: Wherenext: a location predictor on tra-jectory pattern mining. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 637–646. ACM (2009)

20. Moussa¨ıd, M., Helbing, D., Garnier, S., Johansson, A., Combe, M., Theraulaz, G.: Experi-mental study of the behavioural mechanisms underlying self-organization in human crowds. Proceedings of the Royal Society B: Biological Sciences 276(1668), 2755–2762 (2009) 21. Moussaid, M., Helbing, D., Theraulaz, G.: How simple rules determine pedestrian behavior

and crowd disasters. Proceedings of the National Academy of Sciences (PNAS) 108(17), 6884–6888 (2011)

22. Moussa¨ıd, M., Perozo, N., Garnier, S., Helbing, D., Theraulaz, G.: The walking behaviour of pedestrian social groups and its impact on crowd dynamics. PloS one 5(4), e10,047:1–7 (2010)

23. Paul, U., Subramanian, A.P., Buddhikot, M.M., Das, S.R.: Understanding traffic dynamics in cellular data networks. In: Proceedings of the 30th International Conference on Computer Communications, pp. 882–890. IEEE (2011)

24. Pentland, A., Choudhury, T., Eagle, N., Singh, P.: Human dynamics: computation for orga-nizations. Pattern Recognition Letters 26(4), 503–511 (2005)

25. Pentland, A.S., Pentland, S.: Honest signals: how they shape our world. MIT press (2008) 26. Tang, L., Liu, H.: Toward predicting collective behavior via social dimension extraction.

Intelligent Systems 25(4), 19–25 (2010)

27. Trasarti, R., Pinelli, F., Nanni, M., Giannotti, F.: Mining mobility user profiles for car pool-ing. In: Proceedings of the 17th SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1190–1198. ACM (2011)

28. Zhang, X., Weng, W., Yuan, H., Chen, J.: Empirical study on unidirectional dense crowd during a real mass event. Physica A: Statistical Mechanics and its Applications 392(12), 2781–2791 (2013)