• Aucun résultat trouvé

Deciphering human activities in complex urban systems : mining big data for sustainable urban future

N/A
N/A
Protected

Academic year: 2021

Partager "Deciphering human activities in complex urban systems : mining big data for sustainable urban future"

Copied!
200
0
0

Texte intégral

(1)

Deciphering Human Activities in Complex Urban Systems -Mining Big Data for Sustainable Urban Future

by

Shan Jiang

Master of Science in Transportation and Master in City Planning Massachusetts Institute of Technology (2009)

Submitted to the Department of Urban Studies and Planning in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Urban and Regional Planning at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 2015

2015 Shan Jiang. All Rights Reserved

The author hereby grants to MIT the permission to reproduce and to distribute publicly paper and electronic copies of the thesis document in whole or in part in any medium now known or hereafter created.

Author...Signature

redacted

P partment of Urban Studies and Planning August 31, 2015

Certified by

...

Signature redacted

Jose

ferreira,

Jr. Pro essor of Urban Planning and Operations Research Department of Urban Studies and Planning Dissertation Supervisor

Signature redacted

A ccepted by ... ... ...- -- - --- - - -- - --- ---- .-

-Professor Lawrence

J.

Vale Chair, PhD Committee Department of Urban Studies and Planning

MASSA HUTSNTTT

MAS'SACHUSETS INSTITUTE OF TECHNOLOGY

SEP 172015

(2)
(3)

Deciphering Human Activities in Complex Urban Systems -Mining Big Data for Sustainable Urban Future

by

Shan Jiang

Submitted to the Department of Urban Studies and Planning

on August 31, 2015 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Urban and Regional Planning

ABSTRACT

"Big Data" is in vogue, and the explosion of urban sensors, mobile phone traces, and other windows onto urban activities has generated much hype about the advent of a new 'urban science.' However, translating such Big Data into a planning-relevant understanding of activity patterns and travel behavior presents a number of obstacles. This dissertation examines some of these obstacles and develops data processing pipelines and urban activity modeling techniques that can complement traditional travel surveys and facilitate the development of richer models of activity patterns and land use-transportation interactions. This study develops methods and tests their usefulness

by using Singapore metropolitan area as an example, and employing data mining and

statistical learning methods to distill useful spatiotemporal information on human activities by people and by place from traditional travel survey data, semantically enriched GIS data, massive and passive call detail records (CDR) data, and Wi-Fi augmented mobile positioning data. I illustrate that regularity and heterogeneity exist in individuals' daily activity patterns in the metropolitan area. I test the hypothesis that by characterizing and clustering individuals' activity profiles, and incorporating them into household decision choice models, we can characterize household lifestyles in ways that enhance our understanding and enable us to predict important decision-making processes within the urban system. I also demonstrate ways of integrating Big Data with traditional data sources in order to identify human mobility patterns, urban structures, and semantic themes of places reflected by human activities. Finally, I discuss how the enriched understanding about cities, human mobility, activity, and behavior choices derived from Big Data can make a difference in land use planning, urban growth management, and transportation policies.

(4)

Dissertation Supervisor: Joseph Ferreira, Jr.

Title: Professor of Urban Planning and Operations Research Dissertation Committee Members:

Marta C. Gonzilez, Associate Professor of Civil and Environmental Engineering P. Christopher Zegras, Associate Professor of Transportation and Urban Planning

(5)

ACKNOWLEDGMENT

I am indebted to many people who made my experience at MIT a true adventure and

lifetime treasure.

I would like to express my deepest gratitude to my mentor and committee chair,

Professor Joseph Ferreira, for his excellent advice, guidance, encouragement, and friendship. Joe gave me every possible support that I could have asked for from an adviser: sharing his knowledge, inspiring me intellectually, coaching me through research and teaching, providing me with opportunities, and especially helping me with a smooth academic transition in the last year when I became a new parent. Joe guided me through each step from enrollment, general exam, colloquium, to dissertation writing and defense. I am also very grateful to Joe's wife, Lissa, for her support, encouragement, and friendship for important life events over my years at MIT.

I would also like to express my tremendous appreciation to Professor Marta Gonzalez

for playing a pivotal role in shaping my research-she introduced to me a new field of inquiry using statistical physics approach to study human behavior, and inspired and influenced my interdisciplinary perspective. Marta also provided me with enormous help and support in my completion of this dissertation. My sincere gratitude also goes to Professor Chris Zegras, for his terrific guidance, advice, support, friendship, and critical discussions and comments on my research and teaching. It has been my true pleasure and honor studying and working with my committee.

I thank Professors Tridib Banerjee, Moshe Ben-Akiva, Mi Diao, Emilio Frazzoli, Ralph

Gakenheimer, Cristian Angelo Guevara, Ricardo Hurtubia, Matthieu de Lapparent, Tunney Lee, Frank Levy, Francisco Martinez, Francisco Pereira, and Jinhua Zhao for their valuable inputs, discussions, and comments on my earlier research that led to this dissertation. I am also grateful to Professors Eran Ben-Joseph, Yung Ho Chang, Joseph Ferreira, Dennis Frenchman, Ralph Gakenheimer, David Geltner, Amy Glasmeier, Stephen Hammer, Yu-Hung Hong, Annette Kim, Tunney Lee, Karen R. Polenske, Albert Saiz, Bish Sanyal, Larry Vale, Chris Zegras, and Jinhua Zhao for their help, support, and encouragement to the student-led discussions on China Urban Development (CUD) that I co-organized in the past few years, with initial funding from the Office of the Dean for Graduate Education (ODGE) Graduate Student Life Grants.

I thank Professors Roy Bahl, James Buckley, Gabriella Carolini, Calestous Juma, Bin Lu,

Meg Rithmire, Edward Steinfeld, Phillip Thompson, Jing Wang, Weiping Wu, Jiawen Yang, and Degao Zheng for sharing their knowledge and insights in the CUD discussions. And I thank Professors John Attanucci, Mikel Murga, Nigel Wilson,

(6)

Michael Woo, Fred Salvucci, and Joseph Sussman for their help and advice while I studied at MIT.

I am particularly grateful for the outstanding administrative support that I received from DUSP and CEE especially from Sandy Wellford, Karen Yegian, Sue Delaney, Janine

Marchese, Ezra Glenn, Duncan Kincaid, Tom Fitzgerald, Philip Thompson, Roberta Pizzinato, Ginny Siggia, Kirsten Greco, Mike Foster, and other CRON staff. I also thank administrative help from Nancy Cain, Wei Ling Janet Long, and Bingran Zuo, from the Singapore-MIT Alliance for Research and Technology (SMART) Center. This dissertation would not have been possible without the generous financial support from MIT, through a Presidential Fellowship, DUSP teaching fellowships, and an

ODGE and a DUSP fellowship on childbirth accommodation; from the Singapore

National Research Foundation (NRF) through the SMART Future Urban Mobility (FM) project; and from the Center for Complex Engineering Systems (CCES) at

KACST and MIT. I thank Singapore Urban Redevelopment Authority, Land Transport

Authority, a local telecommunications company, and Skyhook (a Boston-based technology company) for providing data for this research.

My gratitude also goes to friends and colleagues who helped me at various points of my

time at MIT. Many of them also contributed to the CUD student group and discussion series. They are: Yang Chen, Zhiyu Chen, Tuan Yee Ching, Lu Gao, Zhan Guo, Feiya Huang, Xiaoting Jia, Weifeng Li, Xin Li, Xiongjiu Liao, Yu Lu, Kyung-min Nam, Jingsi Shaw, Zhengzhen Tan, Xiaodong Wang, Ying Wang, Jie Xia, Yuan Xiao, Wenjing Xu, Hongliang Zhang, Yuming Zhang, Yi Zhu, and their family members; Lauren Alexander, Ana Alves, Clio Andris, Jie Bai, Giulia Cernicchiaro, Serdar Colak, Jieping Chen, Liqun Chen, Wanli Fang, Gaston Fiore, Feng Fu, Pedro Gandol, Robert Goodspeed, Yafei Han, Jianxiang Huang, Zheng Jia, David Lee, Jae Seung Lee, Christa Lee-Chuvala, Corinna Li, Jieping Li, Menghan Li, Weixuan Li, Zelin Li, Qiao Liang, Qian Lin, Roberto Lopez, Yingying Lu, Xiaosu Ma, Omar Masud, Dietmar Offenhuber, Meng Ren, Victor Rocco, Filipe Rodrigues, Chetan Rogbeer, Todd Schenk, Christian Schneider, Andres Sevtsuk, Shomon Shamsuddin, Jameson Toole, Anthony Vanky, Dong Wang, Pu Wang, Yuqi Wang, Chia Yang Weng, Ning Wu, Yunke Xiang, Liyan Xu, Yingxiang Yang, Laurie Zapalac, Linyi Zou, Qianqian Zhang, Xin Zhang, Ruishan Zheng, and many others.

Finally, I am profoundly grateful to my family: to Tao, for his wholehearted love and incredible support and patience over the years; to Emily, for the greatest joy that she has brought to me; and to my parents, for their unconditional love and belief in me. This dissertation is dedicated to them.

(7)

CONTENTS

ABSTRACT ... 3

ACKN OW LEDGM ENT ... 5

CONTENTS ... 7

LIST OF FIGURES ... 10

LIST OF T ABLES ... 13

CHAPTER 1 INTRODUCTION ... 15

1.1 Background and M otivation ... 15

1.2 Research Questions... 17

1.3 Research Contribution... 19

1.4 D issertation Framework... 20

CHAPTER 2 LITERATURE REVIEW ... 22

2.1 Human Activity, Mobility, Residential Choice, and Lifestyles... 22

2.1.1 Integrated Urban M odels... 22

2.1.2 Lifestyles: The M issing Link... 24

2.2 Big Data for Urban M odeling and Planning ... 27

2.2.1 M obility and Transport ... 28

2.2.2 Dynam ic Land Use and Urban Structure... 31

CHAPTER 3 STUDY AREA AND DATA ... 34

3.1 Study Area ... 34

3.2 Data... 37

3.2.1 Household Travel Survey D ata... 37

3.2.2 Call Detail Record Data ... 38

3.2.3 Geographic Units ... 39

3.2.4 Census Data... 42

3.2.5 W ireless Location Positioning Data... 42

3.2.6 Land Use, POI and Em ploym ent Data... 43

3.2.7 Housing Transaction Data ... 48

CHAPTER 4 HUMAN ACTIVITIES, RESIDENTIAL CHOICE, AND LIFESTYLES .. 50

4.1 Activity Profiles... 51

(8)

CHAP

CHAP

4.1.2 Comparing Activity Profiles by Clusters in Singapore: 2004-2008 .57

4.2 Activity Profiles in Spatiotemporal Context ... 60

4.2.1 Activity Density and Intensity Patterns...60

4.2.2 Activity Spatiotemporal Patterns by Clusters ... 63

4.3 Activity Profiles, Residential Choice, and Lifestyles... 66

4.3.1 Theoretical Framework... 66

4.3.2 Household Lifestyle Indicators: Activity Profiles... 67

4.3.3 M odel Structure ... 71

4.3.4 M odel Specification... 74

4.3.5 Model Estimation Results... 87

4.4 Sum m ary ... 107

TER 5 HUMAN ACTIVITIES AND DAILY MOBILITY ... 110

5.1 Mining Activity and Mobility Patterns from Mobile Phone Data ... 111

5.1.1 Parsing Trajectory to Extract Stays ... 112

5.1.2 D etecting H om e... 113

5.1.3 Filtering Phone User-Day Samples ... 114

5.1.4 Identifying Activity-Based Mobility Patterns ... 117

5.1.5 Expanding Phone Sample to Population... 119

5.1.6 Aggregating Daily Trips and Mobility Patterns... 124

5.2 Spatial Patterns of Activity-based Human Mobility... 126

5.2.1 The 2-node Home-based Tour ... 126

5.2.2 The 1-node, 3-node and 4-node Motifs... 127

5.3 Sum m ary ... 132

TER 6 HUMAN ACTIVITIES, LAND USE, AND URBAN STRUCTURE... 133

6.1 Linking Human Activities in Cyber Space to Urban Space ... 134

6.1.2 Clustering Urban Place by Human Cyber Activity Rhythm... 139

6.2 Examining Population and Employment by Urban Clusters... 143

6.3 Enriching Semantics of Urban Places Using POIs... 149

(9)

CHAPTER 7 CONCLUSIONS AND POLICY IMPLICATIONS FOR SUSTAINABLE

U R BA N FU T U R ES ... 155

7.1 Summary of Findings and Policy Implications ... 155

7.1.1 Activity Profiles, Household Lifestyles, and Residential Choice .... 155

7.1.2 Individual Activities and Daily Mobility Patterns ... 157

7.1.3 Collective Human Activities, Land Use and Urban Structure ... 158

7.2 Big Data and Sustainable Urban Development ... 160

7.2.1 Real-time and Short-term Urban Operations and Management.... 160

7.2.2 Long-term Planning for Sustainable Urban Futures... 163

7.3 Limitations and Opportunities for Future Research ... 166

7.3.1 Big Data: limitations and opportunities... 166

7.3.2 Future Research... 167

A PPE N D IC E S ... 171

Appendix I Method on Eigenactivities and Activity Clustering ... 171

1.1 T he Setting... 171

1.2 Principal Component Analysis/Eigen Decomposition ... 172

1.3 Daily Activity Profile Clustering... 175

Appendix II Latent Class Choice Model Estimation Results ... 178

(10)

LIST OF FIGURES

Figure 3.1 Singapore neighborhoods, housing, and transportation. ... 35

Figure 3.2 Singapore development guide planning (DGP) zone, transportation analysis zone (MTZ), and cellular tower distribution ... 40

Figure 3.3 Singapore DGP zone, MTZ zone and postcode spatial distribution...41 Figure 3.4 Grid cells with Wi-Fi augmented mobile positioning data covering Singapore for a canonical week in 2010. ... 43 Figure 3.5 Singapore Master Plan 2008... 46 Figure 3.6 Employment size by industrial sector aggregated in 100-by-100 meter grid cells ... 4 7

Figure 3.7 Singapore CPI from 1995 to 2014...48 Figure 3.8 Singapore housing transaction prices for HDB and private housing units...49 Figure 4.1 Illustration of the interrelations and interactions between life situation, lifestyle, location choice, activity, and travel behavior within a representative household

... 5 1

Figure 4.2 Snapshots of human activities at different time-of-day in Singapore on a representative w eekday... 53

Figure 4.3 Activity profile clustering in Singapore in 2004 ... 55

Figure 4.4 Activity profile clustering in Singapore in 2008 ... 56

Figure

4.5

Activity participation duration and rate (a), and mode share of trip stages (b) of each activity profile cluster in Singapore, 2004-2008... 58

Figure 4.6 Trip characteristics by activity profile cluster in Singapore, 2004-2008...59 Figure 4.7 Density and intensity of home, work, shopping, and recreation activities in

Singapore, 2004-2008 ... 62

Figure 4.8 Intensity of home, work, shopping, and recreation activities by individuals with different activity profiles in 2008 in Singapore... 64

(11)

Figure 4.9 Changes of intensity for home, work, shopping, and recreation activities by

individuals with various activity profiles in Singapore, 2004-2008...65

Figure 4.10 Household decisions in the long-, mid-, and short-terms... 66

Figure 4.11 Factor loadings of principal component factors for household activity p rofiles...7 1 Figure 4.12 Framework of household lifestyle decisions: latent class choice model with ind icators...72

Figure 4.13 Age distribution of Singapore residential population... 75

Figure 4.14 Regional transit accessibility to manufacturing and office employment...82

Figure 4.15 Zonal median housing prices by housing types in Singapore...86

Figure 4.16 Sociodemographic profiles of the four lifestyle-classes (for 4-LCCM with

in dicators)

... 98

Figure 4.17 Household activity profiles by household lifestyle...101

Figure 4.18 Individual activity profiles by household lifestyle ... 103

Figure 5.1 Workflow of population estimation and mobility pattern extraction from C D R d ata...112

Figure 5.2 Stay point detection algorithm illustration...113

Figure 5.3 User-day statistics after sample filtering ... 116

Figure 5.4 Examples of daily mobility networks...118

Figure 5.5 Entity relation diagram of the processed phone user, user-day mobility data from CDR, and geographic spatial units...120

Figure 5.6 Distribution of expansion factors...121

Figure 5.7 Estimation of population in high spatial resolution...123

Figure 5.8 Daily motifs and trips from phone data, and comparison with survey data125 Figure 5.9 Spatial patterns of daily motifs in Singapore. ... 128

(12)

Figure 5.10 Spatial patterns of daily motifs with 3 nodes in Singapore ... 129

Figure 5.11 Spatial patterns of daily motifs with 4 nodes in Singapore ... 130

Figure 5.12 Neighborhoods with high concentration of residents with 3- and 4-node daily m otifs...13 1 Figure 6.1 Skyhook SpotRank data for a canonical week in Singapore...135

Figure 6.2 Eigen places and reconstruction error ... 136

Figure 6.3 The first three Eigen places in Singapore ... 138

Figure 6.4 Dunn Index to determine cluster size for urban place types ... 140

Figure 6.5 Hourly temporal rhythm of urban places by cluster ... 140

Figure 6.6 Overlay of urban clusters identified from Wi-Fi augmented mobile positioning data...14 1 Figure 6.7 Spatial distributions of urban places by cluster...142

Figure 6.8 Population and employment density at the grid-cell level ... 144

Figure 6.9 Cumulative distribution function As) for population and employment size (s) in grid cell by urban cluster...145

Figure 6.10 Spatial distribution of population size in grid cell by urban clusters ... 147

Figure 6.11 Spatial distribution of employment size in grid cell by urban clusters...148

Figure 6.12 Cumulative distribution function As) for employment size (s) in grid cell by industrial sector and urban cluster...151

Figure 6.13 U rban cluster them es...152

Figure 7.1 Integration of real-time information with land use-transportation feedback cycle. ... 16 3 Figure 7.2 Sociodemographic profiles (for 3-class LCCM without indicators)...182

(13)

LIST OF TABLES

Table 3.1 Example of call detail records (CDR) data for one anonymous user...38

T able 3.2 PO I classification... 45

Table 4.1 Summary of individual activity profiles... 68

Table 4.2 Activity profiles by household in detailed categories... 69

Table 4.3 Activity profiles by household in aggregated categories ... 69

Table 4.4 Factors and loadings on household activity profiles...70

Table 4.5 Descriptive statistics for variables in household class membership model...78

Table 4.6 Estimation results on residential choice with no lifestyle segmentation ... 89

Table 4.7 Summary of model selection statistics ... 90

Table 4.8 Estimation results for class-specific residential choice model (for 4-LCCM w ith indicators) ... 93

Table 4.9 Estimation results for lifestyle class-membership model (for 4-LCCM with in dicators) ... 96

Table 4.10 Sociodemographic profiles for 4 lifestyle-classes (for 4-LCCM with in

dicators)

... 97

Table 4.11 Estimation results for lifestyle measurement equations on activity profiles (for 4-LC C M with indicators)...99

Table 4.12 Household activity profiles by lifestyle...100

Table 4.13 Individual activity profile by household lifestyle...102

Table 6.1 Population and employment statistics by urban cluster ... 146

Table 7.1 Summary of model statistics for LCCM without activity profile indicators 178 Table 7.2 Estimation results: class-specific residential choice model (for 3-class LCCM w ithout indicators) ... 179

(14)

Table 7.3 Estimation results: lifestyle class-membership model (for 3-class LCCM w ithout indicators) ... 180

Table 7.4 Sociodemographic profiles (for 3-class LCCM without indicators)...181 Table 7.5 Estimation results: class-specific residential choice model (for 4-class LCCM w ithout indicators) ... 183

Table 7.6 Estimation results: lifestyle class-membership model (for 4-class LCCM w ithout indicators) ... 184

(15)

CHAPTER 1

INTRODUCTION

"We recognize that people are at the center of sustainable development and, in this regard, we strive for a world that is just, equitable and inclusive, and we commit to work together to promote sustained and inclusive economic growth, social development and en vironmental protection and thereby to benefit alL."

-- The Future We Want: Outcome document adopted at Rio+20 1.1 Background and Motivation

Cities, home to billions of people worldwide, are complex systems (Batty 2009; Portugali et al. 2012). Today, more than 54 percent of the world's population (3.9 billion) resides in urban areas. In Asia-despite a lower level of urbanization, 53 percent now live in cities. One in eight of the world's urban dwellers live in 28 mega-cities of more than 10 million inhabitants. Urbanization has been, and will continue to be a major challenge in the decades to come- especially for countries in the Global South, where today's largest cities are concentrated and the pace of urbanization is fastest (United Nations 2014). Cities are growing at an unprecedented speed in human history, and human beings are facing enormous challenges, including environment degradation, increased energy consumption, climate change, decrease in quality of life, and unsustainable development (Dimitriou & Gakenheimer 2011). For example, in 2011, energy use in the transport sector alone reached 103 quadrillion British thermal units (Btu) globally, accounting for 20 percent of total global energy consumption'. Over the past fifteen years, carbon dioxide emissions from the transportation sector doubled in non-OECD countries, due to the rapid growth of private vehicle ownership and freight traffic2. In the United States, the average daily vehicle miles traveled (VMT) per household has increased by almost 60 percent in the last three decades, and reached 54 VMT per day by 2009'. The total fuel wasted in congestion has increased by 480 percent (from 0.5 billion gallons in 1982 to 2.9 billion gallons in 2011). The total

' Total energy use includes losses in electricity generation, transmission, and distribution. Online

reference:

http://www.eia.gov/tools/fags/fag.cfm?id=447&t=1. In 2011, the industrial sector consumed 50% of the world total energy produced; residential and commercial sectors consume 18% and 12%, respectively.

2 International Energy Agency, 2015, World Energy Oudook Special Report: Energy and Climate

Change.

(16)

carbon dioxide produced due to congestion has increased by 460 percent (from 10 billion lbs. to 56 billion lbs. in 201 1).4

In order to change the traditional direction of unsustainable development, more than 20 years ago, world leaders from the UN Member States gathered for the 1992 Rio Earth Summit to adopt Agenda 21 (United Nations 1992), and send a worldwide message that "nothing less than a transformation of our attitudes and behavior would bring about the necessary changes." Twenty years after Rio, we still have a long way to go to "leave a livable world to our children and grandchildren"'.

In order to achieve sustainable development, it is important to introspect on how human beings live, work, play, and travel in today's diverse and complex urban systems and target "change" based on understanding of these human activities. Data on human activities in urban areas has been collected via a variety of methods. In developed countries like the US, the investigation of human activities was traditionally accomplished through survey data. Metropolitan planning organizations (MPOs) depend heavily on travel surveys to develop travel demand models. Such models can predict future travel demands to inform planning for transportation infrastructure, assess environmental impact, and set investment priorities (Oppenheim 1995). Travel surveys contain rich details on individuals and households. However, they are expensive to implement and collect, small in sample sizes, low in frequency, and short in observation duration. For example, travel surveys collected by American Metropolitan Planning Organizations (MPOs) only account for 1 percent of households in a metropolitan area, are conducted once per decade, and only cover one or two travel days per survey respondent6. In many developing countries, survey data may not even exist, due to the high cost to collect, limited administrative capacity to undertake surveying, or validity of the data, in the context of rapid urbanization process.

The advent of big data, accompanied by rapidly advancing information and communication technologies (ICT), presents exceptional opportunities for urbanists to examine new possibilities to learn human footprints and social-technological ecology in cities. The Big Data gathered from urban sensing technologies (such as GPS, transit smart cards, mobile phones, and social media) can cover large proportions of global

' Reported by the Texas A&M Transportation Institute, 2012, Urban Mobility Report.

* United Nations. 2012. "Rio+20 The Future We Want". Retrieved online from http://www.un.org/en/sustainablefuture/about.shtml

6 One example is the CMAP Travel Tracker Survey in 2007-2008 for northeast Illinois. Retrieved online

(17)

populations. By the end of 2010, mobile networks were been accessible to over 90 percent of the global population, and over 75 percent of the global population (5.3 billion) subscribed to mobile telephone services, among which more than 70 percent

(3.8 billion) lived in the developing world.7 Expansive urban sensing data and the high penetration of telecommunication in modern societies has transformed cities into repositories for exabytes of digital traces of human activities, with fine-grained spatiotemporal information worldwide. With various degrees of data privacy protection, we can observe human activities over longer periods, in new dimensions, and on a very large scale.

1.2 Research Questions

In addition to the limitations of small samples, geographical coverage, and short-term observations, an even bigger challenge of the traditional land use-transportation modeling approach to analyzing travel survey data is that travel patterns in metropolitan areas have become difficult to predict. The ever-changing demographics, diverse workforce participation, rapid employee turnover rate, emerging destinations (recommended to travelers by social networks such as Facebook, Yelp), and alternative transportation modes enabled by technology innovations (such as Uber8, Zipcarl, autonomous vehicleslo, etc.) can make travel patterns in cities more complex than traditionally prominent journey-to-work commuting flows. Household decisions about where to live, work, and play could change in ways that are not well-modeled using traditional travel surveys and snapshots of residential and work locations. However, modeling them sensibly to reflect more accurate household behaviors helps to predict urban activities robustly and, therefore make proactive and effective policy interventions possible for a sustainable urban future.

\While ubiquitous urban sensing data can bring great opportunities for urbanists to examine cities through new lenses, a great many challenges exist in adopting them. The goal of this dissertation is to overcome some of the challenges and identify unique

7 International Telecommunication Union, "Access to mobile networks available to over 90% of world

population 143 countries offer 3G services." 2010.

8 Uber is a transportation network company that allows users to submit ride requests through the smart

phone apps and then routed to Uber drivers who use their own cars.

9 Zipcar is a car sharing company in the United States

10 "Google Signs Agreement with NYC Mayor to Replace NYC Taxi With Driverless Google Cabs", 2015

April 01. Retrieved online from http://inhabitat.com/nyc/google-signs-agreement-with-nyc-mayor-to-replace-nyc-taxis-with-driverless-google-cabs/

(18)

opportunities presented by big urban sensing data to tackle critical planning questions in land use-transportation studies for sustainable urban development. These questions have been difficult to address in the past, due to limitations in data and methodology. In particular, I take advantage of mobile phone Call Detail Records (CDR) data, and Wi-Fi augmented mobile positioning data, together with traditional travel survey data, census data, massive GIS data on the built environment, land use, and points-of-interest (POI) in a metropolitan area to understand the dynamics and complexity of cities. Mobile phone data is produced for billing purposes by mobile telecommunication carriers worldwide. Compared to traditional survey data, phone data can record human whereabouts in space and time in much higher frequency and at a larger scale and low cost. Following the wide adoption of mobile phones worldwide, developing methods to fully utilize mobile phone data can be powerful for understanding human activities and mobility (Gonzalez et al. 2008) and informing planning for sustainable cities. Challenges to utilizing such mobile phone data include its massive, passive and noisy nature. For example, unlike GPS data generated from active tracking, CDR data show spatiotemporal information of mobile phones only when devices are in use. Moreover, due to privacy and legal restrictions and economic constraints, little or no information is contained in passive CDR data about social demographic characteristics on anonymous users or their activities. Travel survey data, on the other hand-while also limited, as outlined above, can be rich in detail and shed light on the behavioral reasoning for human activities.

Big urban sensing data such as mobile phone CDR data shows great promise to complement small travel survey data, in order to develop sensitive urban land use and transportation models that differ from the traditional approach. However, a question remains: how to make sense of the massive, passive and noisy data to reveal human activity and mobility in the city in ways that can be useful and important for land use and transportation planning purposes. Anchored on this major question, I address the following two sets of specific questions within this dissertation.

W What are the inherent structures and patterns governing daily human activity in the city? What are the spatiotemporal patterns of activities by different groups of people in the

city? How are these activity patterns linked to household residential choices and their long-term lifestyles?

* How can big urban sensing data be mined to reveal differences in individual activity and mobility patterns, and be translated from individual human activities to metropolitan-scale urban knowledge to inform sustainable land use planning, urban growth management, and transportation planning?

(19)

I choose Singapore as a case study in order to answer the above questions for three

reasons. First, as a city-state, Singapore has limited resources. The scarcity of land is one of the key factors that contribute to its high density development and exacerbate other challenges such as housing and transportation provision. To overcome the land constraints, Singapore government has put great efforts in long-term and integrated planning which emphasize planning decisions for a balanced outcome in terms of the environment, economy and social equity." For this reason, it makes Singapore a rich case for examining human activities and residents' interactions with residential choice, lifestyle and mobility, and urban structure in an integrated approach. Second, Singapore's government has endeavored to "use technology to make a difference to people's lives", and it has prioritized policies and programs to build a "Smart Nation.. .where people live meaningful and fulfilled lives, enabled seamlessly by technology, offering exciting opportunities for all." 12 As a place with rich data on

various aspects of the urban system, Singapore is an ideal test bed for employing big data for urban planning. Finally, under the Sustainable Singapore Blueprint released in 2009, Singapore had set aside more than 700 million US dollars over five years to support sustainable development" and Singapore government had also made a commitment to support research to achieve this goal. This research was made possible with a support by Singapore National Research Foundation (NRF) through the Future Urban Mobility (FM) research project of the Singapore-MIT Alliance for Research and Technology (SMART) Program.1 4 The SMART FM project provided data access and research support to this study.

1.3 Research Contribution

The major contributions of this dissertation are fourfold: First, by applying data mining and statistical learning methods to travel survey data, I reveal the inherent structure in human daily activities and extract typologies of human activity profiles at the metropolitan scale. I test the hypothesis that the extracted human activity profiles can be used as lifestyle indicators to model household residential choices and latent lifestyle classes to explain the heterogeneity of household preferences for neighborhood amenities,

" Khoo Teng Chye, 2014. "Urbanization in Singapore. Making small more livable." May 31.

http://www.mayorsandcities.com/politics/world/urbanization-singapore-making-small-livable

1 2

Launch of Smart Nation Initiative, Prime Minister Lee Hsien Loong, 2014, Retrieved online from

https://www.ida.gov.sg/About-Us/Newsroom/Speeches/2014/Transcript-of-prime-minister-lee-hsien-loong-speech-at-smart-nation-launch-on-2 4-november

13 http://www.mewr.gov.sg/ssb/

(20)

socioeconomic environment, accessibility, and housing types. I argue that by incorporating household lifestyle into an integrated land use and transportation (LUT) model, it will improve LUT models' behavior validity and be sensitive to test policy scenarios that address heterogeneous preferences of households with different lifestyles. Second, by mining millions of anonymous mobile phone users' CDR data, I test the hypothesis that anonymous CDR data can be processed to reveal human daily mobility networks, or motifs, similar to what travel surveys provide. I infer patterns of motifs and relate these within the urban context to examine the spatial patterns of activity-based human mobility in the metropolis. I suggest that long-term observations of individual daily motifs can be used to enrich understanding of land use and transport interactions, and help planners to gain a better understanding of travel behaviors at both neighborhood and metropolitan levels. Third, I develop a new method to use human cyber activities revealed by Wi-Fi augmented mobile positioning data, combined with POIs to detect and monitor dynamic land use at the metropolitan scale. I test the hypothesis that, using this method, we can quantify and measure urban structure and discover a richer semantic sense of urban places than can be revealed via traditional data sources and methods. Finally, by drawing on these examples, I argue for how the methods presented in this study may be useful for enriching integrated urban models and identifying the impacts of urban policies on human activities. I demonstrate how Big Data can be translated into understandings on how urban policies motivate changes in human activity and mobility behavior.

1.4 Dissertation Framework

Urban systems are dynamic and complex, composed not only of land, buildings, infrastructures and other types of physical artifacts, but also of footprints and flows of human activities within the system. In this dissertation, I attempt to identify key behavioral choices at the individual and household levels that can help to explain how observed human activity patterns are affected by urban structure, mobility options and urban planning policies, using Singapore as an example. The dissertation is organized as follows:

* In Chapter 2, I review and highlight how this research fits within the literature on human activities, and Big Data applications in urban computing.

" In Chapter 3, I introduce Singapore as a case study and describe the dissertation's data

sets. I reference traditional data sources such as census data, household travel survey data and GIS data to measure the built environment and sociodemographic characteristics at various geographical levels. I use Big Data such as call detail records (CDR) and Wi-Fi

(21)

augmented mobile positioning data to examine large-scale human activities from the perspectives of people and place in the city.

* By applying statistical learning and data mining methods to traditional travel survey data in Singapore, in Chapter 4 I cluster human daily activity profiles by their temporal pattern, and examine their spatiotemporal distribution and changes between 2004 and

2008. I then employ a structural equation modeling approach-the latent class choice

model (LCCM)-to study the relationship between activity profiles, residential choices, and household lifestyles.

* In Chapter 5, I develop a pipeline to extract activity-based daily mobility patterns from raw CDR data for 1.55 million anonymous users and 6.28 million user-day observations, and examine them within the urban spatial context of Singapore. Results concur with previous studies (Schneider et al. 2013) regarding individual daily mobility patterns. I define such patterns as "motifs," and identify a dozen motifs covering approximately 90 percent of all daily mobility patterns in the metropolitan area. However, CDR data reveal a different distribution of daily mobility motifs, compared with what traditional survey data indicate for Singapore. It reflects that traditional, paper-based travel dairy data tend to under-represent diverse travel patterns.

* Using Wi-Fi augmented mobile positioning data to reflect human urban activities, in Chapter 6, I decompose the rhythm of urban places into Eigen Places. I cluster their temporal characteristics and enrich the meaning of urban places by using POI semantics.

I utilize passive urban sensing data to detect the functional themes of places and urban

structure in the metropolitan area.

* In Chapter 7, 1 synthesize the findings of each chapter, and elaborate the policy implications of using Big Data and an urban computing approach to study human activities for planning a sustainable urban future. I also discuss limitations and avenues for future research.

(22)

CHAPTER 2

LITERATURE REVIEW

For decades, urban planners, geographers, and economists have studied the structure and organization of cities (Anas et al. 1998; Fujita 1989; Alonso 1971; Florida et al.

2008; Lynch 1964), as well as their function and role in people's daily lives (Anas & Xu, 1999; Chapin, 1974; Glaeser, Kolko, & Saiz, 2001; Higerstrand, 1970; Wheaton, 1977). Different facets of the spatiotemporal characteristics of human activities have

long been studied by researchers in sociology (Geerken & Gove 1983), social ecology (Taylor & Parkes 1975; Goodchild & Janelle 1984; Chapin 1974), psychology (Freud

1953; Maslow & Frager 1987), geography (Shaw & Yu 2009; Harvey & Taylor 2000;

Hanson & Hanson 1980; Hanson & Kwan 2008; Shaw et al. 2008; Higerstrand 1970), economics (Becker, 1977, 1991), and urban and transportation studies (Ben-Akiva & Bowman 1998; Bhat & Koppelman 1999; Axhausen et al. 2002). Studies in these fields can benefit from the innovations in both data collection and analytical methods that have inspired a new generation of researchers to examine the dynamics of human mobility and activities in the city (Farrahi & Gatica-Perez 2008; Wegener 2012; Cranshaw et al. 2012; Noulas et al. 2011; Wang et al. 2012; Gonzalez et al. 2008; Toole et al. 2015). In this chapter, I review existing literature analyzing human activities in the fields of urban and transportation modeling, present challenges to further development, and demonstrate opportunities presented by Big Data.

2.1 Human Activity, Mobility, Residential Choice, and Lifestyles 2.1.1 Integrated Urban Models

The interrelationship between land use and transportation has been extensively examined and informs general knowledge in the urban transportation field (Wegener et al. 2004; Wang et al. 2011). An increasing number of transportation agencies have started to shift travel demand modeling practices from a trip-based approach to an activity-based approach (ABA)(Bhat & Koppelman 1999; Transportation Research Board 2015). The former approach, also known as a four-step model (FSM), was developed and institutionalized in the 1960s to evaluate travel demand changes, in response to highway development (McNally 2008). The FSM includes the sequential steps of trip generation, trip distribution, mode choice, and trip assignment. It extrapolates trips and generates origin-destination (OD) matrices from travel surveys, without explicitly modeling activity patterns and choices that could result in chained or shifted trips in response to changes in land-use or accessibility. An ABA, on the other hand, explicitly models travel as derived demand in pursuit of activities. In the transportation field, it adopts a framework that incorporates the interaction between

(23)

activities and travel and requires time-use (or activity-based travel) survey data to construct an entire sequence of activity and travel, in order to model activity episode generation and scheduling processes (Bhat & Koppelman 1999; Bowman & Ben-Akiva 2001).

Beginning in the 1990s in the United States, with increasing concerns about urban challenges and unsustainable development-exhibited by urban sprawl, an increase in vehicle miles traveled, traffic congestion, energy consumption, and environmental degradation-US legislation, such as the Intermodal Surface Transportation Efficiency Act of 1991 (ISTEA), required that metropolitan and statewide transportation plans be integrated with land use plans. A growing number of studies in the transportation field have since focused on various aspects of short-term individual daily activity and travel patterns and their connections to long-term household location choice and vehicle ownership (Hunt, 1997; Ben-Akiva, Bowman, & Gopinath, 1996; Ben-Akiva & Bowman, 1998; Ben-Akiva & De Palma, 1986; Bowman & Ben-Akiva, 2001; Krizek,

2003; Palma & Rochat, 2000.) For example, Ben-Akiva and Bowman (1998) developed

a prototype discrete choice model-system for the Boston metropolitan area to integrate household residential location choice with models of activity and travel-including "tours"- characterized by destinations, time of day, and travel modes, etc. They integrate long-term and short-term household decisions by assigning each household member a utility-based accessibility measurement that reflects the expected maximum utility among available daily activity schedules conditioned on chosen tour patterns and is estimated from the activity and travel models.

With the restructuring of urban economies and societies and the increasingly frequent use of ICT in everyday life, changes in travel options and arrangements in activities exhibit increasing complexity. Planners need to understand these changing patterns of activities and travel by millions of individuals in a metropolitan area. The large-scale integrated land use and transportation models have gradually gained attention from regional planning agencies, because they need policy analysis tools to make informed decisions, answer complex policy questions, and test scenarios for urban growth patterns, emerging transport options, and environmental impacts resulting from changing travel behavior (Chapin 2012; Wang et al. 2011). Examples of these integrated urban models include UrbanSim (Waddell 2002), ILUTE (Salvini & Miller 2005), and PECAS (Hunt & Abraham 2005).

Several obstacles limit the wide adoption of large-scale integrated urban models. Waddell (2011) systematically examines challenges for (a) integrated planning and (b) integrated modeling. Waddell (2011) argues that integrated planning challenges include conflicting institutions, values, epistemologies, and policies. Integrated modeling

(24)

challenges include transparency, behavioral validity, empirical validity, ease of use, computational performance, flexibility, data availability and quality, and uncertainty. In order that urban planners may overcome some of the difficulties of understanding and interpreting activity-based models and policy implications, I propose to extract higher-level individual daily activity profiles that present individuals' tendencies in activity participation, as well as their time use behaviors. The focus here is to understand human activity patterns in a broader sense-a departure from the detailed and precise schedule-based activity program modeling that concerns ABA travel demand modeling."

2.1.2 Lifestyles: The Missing Link

Among the challenges for integrated urban modeling is behavioral validity (Waddell

2011). Although being explored in some studies, heterogeneous household lifestyles are

unaccounted for within existing integrated urban models. Most of the urban models assume homogenous lifestyles across all households within broadly defined sociodemographic groups when modeling and simulating their long-term decisions and behaviors. This is especially problematic in today's complex world, in which cities become consumption centers (Glaeser et al. 2001) and long-term decisions on where to live and what type of housing units to buy vary significantly, depending upon household lifestyles (Fu et al. 2014).

This is an important issue, as organizations, governments, and researchers have increasingly realized that a fundamental solution to sustainability cannot be obtained without addressing the question of lifestyle (Goodland et al. 1992; Devuyst et al. 2001; Brundtland et al. 1987). Lifestyle may be defined as "the habits, attitudes, tastes, moral standards, economic level, etc., that together constitute the mode of living of an individual or group." 16 The first use of the term "style of life" is credited to Adler (1930)

who emphasized individual psychological aspects. Lazer (1963) first introduced the term within marketing literature as a system construct and argued that lifestyles reflect aggregated behavioral patterns of group segmentation. Recent marketing studies show that human behaviors are the manifestation of our values, attitudes, habits and perceptions-all of which impact lifestyles (VALS 2010).

' This method and analysis is explained in detail in Chapter 4.

(25)

Despite increasing evidence that household lifestyles can vary widely and influence decisions on housing, transportation, and energy consumption (Christensen 1997), progress on its operationalization in urban modeling has been slow due to the complexities of lifestyles. The traditional urban modeling approach overlooks the important relationship between lifestyle and household behavior and could lead to ineffective policies aiming to curb urban sprawl, mitigate congestion, or reduce energy consumption and greenhouse gas emissions (Kitamura 1988; Scheiner & Kasper 2003). The ability to model household lifestyles and their influences on individual and household behaviors thus becomes important for designing sensitive integrated urban models.

Urban and transport-related studies have used lifestyle as a concept and framework to examine residential location and housing choices, vehicle ownership choices, travel demand and patterns, activity participation and patterns, and energy demand and related emissions, etc. For example, Kitamura (1988) hypothesizes that lifestyle can be revealed by consumer expenditure patterns, life-cycle stages (e.g., couples with or without children, single individuals or parents, etc.), social demographic characteristics (age, gender, employment status, income), car ownership, and travel environment (consumer technologies and products, telecommunications, urban systems). Ben-Akiva and Bowman (1998) propose to incorporate the concept of household lifestyles into an integrated, activity-based modeling framework and argue that household lifestyles can be manifested by household members' behavioral patterns, household formation, labor force participation, location choice, and leisure orientation. Salomon, Waddell, and Wegener (2002) further design a framework to demonstrate the possibility of incorporating lifestyles into urban modeling.

Analytical approaches to household lifestyles and interdependence with household behaviors can be categorized into two groups: The first group applies the clustering analysis method that has been widely used in marketing research in segmenting lifestyles (Mitchell 1983; Bagley & Mokhtarian 1999). For example, Salomon & Ben-Akiva

(1983) employ travel survey data to define five types of lifestyle clusters, including upper

socioeconomic classes with large and small households, young adults, low socioeconomic and undereducated, and the elderly. Krizek and Waddell (2002) apply a factor analysis method to measure travel characteristics, activity frequency, automobile ownership, and urban form. They cluster households into nine segments, including retirees, single/busy urbanites, elderly, urbanites with higher income, transit users, suburban errand runners, family- and activity-oriented, suburbanites with double incomes, and ex-urban family commuters. Fan & Khattak (2012) employ negative binomial models for trip frequency, Tobit models for travel time, and factor analysis method applied to the 2003 American Time User Survey to identify five lifestyles

(26)

including passive leisure-, socializing-, family-, recreation-, and community-oriented lifestyles. They find that family-oriented lifestyle exhibit the greatest automobile dependence, while recreation-oriented lifestyle display the least.

The second approach employs structural equation modeling (SEM) techniques, including linear SEM and latent class choice methods. For example, Scheiner and Holz-Rau (2007), use a linear SEM to examine interrelation between life situation, lifestyle, residential location choice, travel behavior, and car ownership. A life situation is the combination of an individual's life-cycle stage (e.g., marriage status, etc.) and social demographic status (age, gender, employment status, income). They define lifestyle as a long-term latent-class manifestation for an individual's behavior. Walker and Li (2007) employ a latent class choice model to estimate three classes of household lifestyles, while estimating residential location choices. The three identified lifestyles include: (a) suburban, automobile, school orientated, (b) transit, house orientated, and (c) high density, near urban activity, and auto oriented. Vij et al. (2011) use behavioral mixture models captured by latent class and a longitudinal data set to study a sub-dimension of individual lifestyle: the modality style. They quantify influences on individuals' long-term mode choices for both work and non-work related travel. Chen (2012) adopts the method proposed by Walker and Li (2007) to examine bundled household residential and mobility choices that reflect household lifestyles in the context of urban China. Chen finds that households belonging to different lifestyle groups exhibit different decision-making mechanisms, and identifies four lifestyle groups in the city of Jinan that reflect orientations toward a job, child, budget, or amenities.

The existing literature illustrates the interrelationship between household long-term decisions (e.g., on residential choice, etc.), activity patterns (Ben-Akiva & Bowman

1998), and lifestyles. Although the SEM approach is useful for identifying lifestyles

through observable socio-economic variables and illustrative behaviors, as Walker and Li

(2007) maintain, "there are things about lifestyle preferences that are unobserved and

that cannot be explained well by observable socioeconomic characteristics." Furthermore, they add, "life-cycle variables alone are not sufficient to capture behavior." For these reasons, I propose to use high-level activity profiles as lifestyle indicators to measure latent household lifestyles. Given its behavioral advantages, I adopt the latent class choice model with indicators method proposed by Walker and Li to model the interdependency between latent lifestyle and household residential decisions. The specifics of this model will be discussed in Chapter 4.

While, theoretically, this approach sounds promising, there is one major difficulty in practice: In order to measure tendencies in long-term lifestyles, one or two-day activity observations collected from travel surveys are not as robust as long-term observations. In

(27)

order to examine lifestyle heterogeneity, repeated longitudinal observations (e.g., panel data) will enhance model estimations. Big Data, when combined with statistical learning and data mining methods, can offer solutions to this challenge. A further examination of the literature reveals the possibilities for utilizing Big Data for urban modeling.

2.2 Big Data for Urban Modeling and Planning

With advancements in ICT, the affordability of significantly improved computational power, and the rapid market expansion of mobile devices and mobile applications, the number of people using mobile devices featuring Internet connectivity worldwide has significantly increased. Every year, thousands of extabytes of data are generated and stored in virtualized storage infrastructures known as clouds. This enables users' digital footprints to be recorded as part of Big Data7 that is beyond the ability of typical database software tools to capture, store, manage and analyze.

Emerging Big Data studies examine dynamic population distribution (Ratti et al. 2006; Horanont & Shibasaki 2011; Kaiser & Pozdnoukhov 2011; Sterly et al. 2013), in high spatial and temporal resolution (Douglass et al. 2015), human mobility (Candia et al.

2007; Song, Qu, et al. 2010; Gonzilez et al. 2008), traffic patterns (Wang et al. 2012;

Toole et al. 2015), land use (Toole et al. 2012), and social networks (Eagle et al. 2009; Lazer et al. 2009; D. Wang et al. 2011; Toole et al. 2015). Many of the urban issues that have confounded planners, geographers, and social scientists have attracted computer scientists and physicists to work together in this exciting interdisciplinary domain and forge new fields of urban computing (Zheng et al. 2014) and urban science (Batty 2013).

Big Data presents excellent opportunities to tackle important issues in the rapid urbanization era and potentials to help urban and transportation planners to understand complexities of human activity, travel, and land use patterns (Jiang et al. 2013; Batty et al. 2012). In this section, I focus on literature that specifically examines human activity and mobility related to land use and transportation studies, in order to highlight the opportunities and challenges to applying Big Data for urban modeling and planning purposes.

17 Gantz, J. and Reinsel, D. (2011) "Extracting Value from Chaos," IDC Report. Retrieved online, July

2015.

(28)

2.2.1 Mobility and Transport

In the transportation field, researchers have made advancements in developing methods of adopting mobile phone CDR data, in order to understand and estimate human mobility and travel (Caceres et al. 2007; Simini et al. 2012; Gonzglez et al. 2008). Mobile phones move along with their users. Even though mobile phone CDR data may be sparse spatially and temporally and only show records of phone users' geographic locations when devices are in use, the large scale and long observation period of such data can be used to infer human footprints. Human mobility revealed from the CDR data, illustrate a pattern of "preferential returns" to previously visited locations and "explorations" of new places as a general and universal feature in human mobility (Song, Koren, et al. 2010; Song, Qu, et al. 2010). Based on this feature, it is possible to estimate meaningful locations for human activities using CDR data. It is not as structured as traditional travel survey data, which contains location and time information for meaningful activity destinations; or as precise as GPS data, which provides higher frequency and accuracy (Zheng et al. 2010). However, as a byproduct for billing purposes carried out routinely by mobile service carriers, CDR data can be obtained at a much lower cost and on a greater scale.

2.2.1.1 Trip-based Approach

CDR data may include spatiotemporal information on people's movements respective to cell towers or triangulated locations, depending on the location positioning technology used by the mobile service carrier. Based on this feature of CDR data, Wang et al. (2012) develop a method to generate tower-to-tower transient OD matrices for different time periods and further covert them into intersection-to-intersection transient

OD matrices in the road network for areas of interest (Boston and San Francisco). By

assigning the transient OD matrices to the road network and using a bipartite network framework, they present a method to analyze road usage patterns and pinpoint areas as driver sources contributing to major traffic congestion in the study areas. Due to the lack of traffic volume data in ground truthing, the authors use probe vehicle GPS data to validate estimated travel times in road segments and report high correlations between estimated (from CDR data) and observed travel times (from GPS data). Following a similar approach, Iqbal et al. (2014) use CDR data collected in Dhaka, Bangladesh over one month, combined with traffic counts data, to estimate intersection-to-intersection transient OD matrices. By using an optimization-based approach, they generate expansion factors for node-to-node transient OD matrices and compared results with limited traffic counts data.

(29)

While the transient

OD

generation method produces and scales

OD

tables by time-of-day from CDR data, it resembles a trip-based approach because it mimics trip segments during human travel in order to generate transient ODs. Rather than modeling travel flows between activity destinations, it models segments of travels based on the appearance of people in space and time, as presented in phone data. This approach can be problematic, especially when CDR data are in low spatial resolution. For example, if spacing between cell towers is wide (more than a few kilometers), but road networks within each tower coverage area are dense, then transient

OD

matrices can introduce biases. It may assign traffic to detoured local roads, although the assigned travel path is not necessarily a direct route from the "true" origin to destination. In order to address this issue, parsing the trajectories observed in CDR data into stay-locations is important.

A vast body of computer science literature has emerged on the topic of trajectory mining,

due to the wide adoption of smartphones and location-based mobile applications. A comprehensive review on trajectory mining methods can be found in Zheng (2015). In general, the goal is to find suitable algorithms to extract meaningful stay locations for further analysis of noisy Big Data.

By applying algorithms to parse passive CDR trajectories into stay locations, Alexander

et al. (2015) present methods to estimate average daily

OD

trip-tables from fine-grained triangulated mobile phone traces for two million users over two months for the Boston metropolitan area. By extracting stay locations from triangulated CDR data inferring the activity types of "home," "work," and "other;" and expanding phone samples to represent residential population, they generate estimations comparable to the traditional four-step based

OD

tables by day type (weekday, weekend), trip purpose (e.g., HBW, HBO, NHB1 8

), and hour of departure (e.g, AM, Midday, PM, and rest of day). By comparing the trip estimations with CTPP'9 and Household Travel Survey data in the same region, Alexander et al. identify a strong correlation between the three sources at the metropolitan level. Although a comparison between results in spatial details demonstrates a strong correlation between CDR and CTPP estimates at the town level, the correlation at the census tract level is weaker, likely due to geolocation errors embedded in these two sources.

Two recent studies synthesize the methods of processing CDR data to estimate travel demand and propose an innovative framework to derive estimated traffic on road

" HBW, HBO, and NHB represent home-based work trips, home-based other trips, non-home-based

trips.

1" CTPP is the Census Transportation Planning Products conducted and prepared by the U.S. Department of Transportation Federal Highway Administration.

(30)

networks and understand road usage patterns from raw CDR data for cities on different continents (See Colak et al. 2015; Toole et al. 2015). These efforts have greatly enhanced our knowledge on how to use CDR data to understand human mobility to produce OD tables at city-scale at a low cost and to estimate travel times on the road networks. However, due to the complexity of implementation, the predominant approach continues to be trip-based, rather than activity-based.

2.2.1.2 Activity-based Approach

Given the theoretical advantages of an activity-based approach, the development of methods to translate big urban data (particularly CDR data) for travel analysis in an activity-based approach is relevant and important for urban and transportation modeling.

By adopting the concept of "motifs" from complex network theory (Milo et al. 2002),

Schneider et al. (2013) examine daily mobility networks from CDR data for Paris over a period of six months and from travel survey data for Paris and Chicago for one or two days. Using a simple method of extracting meaningful activity locations, they maintain towers with frequent visits above a certain threshold as potential stay locations for the analysis. Although the treatment of stay locations in this study is a broad brush-stroke approach, it is groundbreaking, because it abstracts human daily mobility in a similar way to the activity-based approach. Schneider et al. find that by using only seventeen unique motifs, 90 percent of the travel patterns observed in both surveys, along with phone datasets, can be retrieved for the metropolitan areas.

Through more careful treatment of stay locations on fine-grained triangulated mobile phone CDR data for one million users in the Boston metropolitan area over two months, Jiang et al. (2013) apply a similar approach described in Schneider et al. to extract human daily motifs. They report similar findings to the results obtained for Paris and Chicago and propose a probabilistic inference method to use motifs, time of day, activity sequence, and land use related information (such as land use classification and POI) to further infer activity types and assign traffic to transportation networks based on travel generated in this approach.

Widhalm et al. (2015) further implement the idea of inferring activity types (such as home, work, shopping, leisure, and others) for extracted stay locations from mobile phone data and land use data for Boston and Vienna. By using the Relational Markov Network (RMN) method, they infer activity chains from mobile phone data. While Widhalm et al. demonstrate the promise of using fine-grained phone data and additional land use information to infer activity types, they also find that the

Références

Documents relatifs

Aujourd’hui les applications les plus visibles de la fouille de données concerne l’analyse des requêtes internet pour élaborer de la publicité ciblée ou l’analyse des

Based on the part of pre-classified 3D point cloud data which corresponds to trees, we present a novel framework which involves a downsampling of the original data, a projection

quadratic term, human flow, green space density, open street markets, noise pollution and 246.

[r]

By separating the estimation process into a first step that aims at estimating the total number of trips generated by each zone and a second step that focuses on the spatial

Drawing on the connection between Paul Virilio and Georges Perec, this essay argues that there is an historical transition in Virilio’s thought, in which his desire to recuperate

Data pruning In all the previous items, the entire set of data is used. However, because of a large redundancy, not all data are equally important, or useful. Hence, one can

The resistance to urban sprawl, the energy and social limits linked to the culture of acceleration [16] and the aging of urban populations change the context and