• Aucun résultat trouvé

Beyond city size : characterizing and predicting the location of urban amenities

N/A
N/A
Protected

Academic year: 2021

Partager "Beyond city size : characterizing and predicting the location of urban amenities"

Copied!
69
0
0

Texte intégral

(1)

Beyond City Size: Characterizing and predicting

the location of urban amenities

by

Elisa Castaner Ensenat

B.S., M.I.T (2014)

Submitted to the Department of Electrical Engineering and

Computer Science

in partial fulfillment of the requirements for the degree of

Masters of Engineering in Electrical Engineering and Computer

Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2015

c

○ Massachusetts Institute of Technology 2015. All rights reserved.

Author . . . .

Department of Electrical Engineering and Computer Science

May 8, 2015

Certified by . . . .

Cesar A. Hidalgo

Associate Professor

Thesis Supervisor

Accepted by . . . .

Prof. Albert R. Meyer

Chairman, Maters of Engineering Thesis Committee

(2)
(3)

Beyond City Size: Characterizing and predicting the

location of urban amenities

by

Elisa Castaner Ensenat

Submitted to the Department of Electrical Engineering and Computer Science on May 8, 2015, in partial fulfillment of the

requirements for the degree of

Masters of Engineering in Electrical Engineering and Computer Science

Abstract

Intercity studies have shown that a city’s characteristics —ranging from infras-tructure to crime—scale as a power of its population. These studies, however, have not been extended to the intra-city scale, leaving open the question of how urban characteristics are distributed within a city. Here we study the spa-tial organization of one important urban characteristic: its amenities, such as restaurants, cafes, and libraries. We use a dataset summarizing the position of more than 1.2 million amenities disaggregated into 74 distinct categories and covering 47 U.S. cities to show that: (i) the spatial distribution of amenities within a city is characterized by dense agglomerations of amenities (which we call micro-clusters), (ii) that unlike in the intercity case, size is a poor predictor of the amenities of each type that locate in each micro-cluster, and (iii) that the number of amenities of each type in a micro-cluster is better predicted using in-formation on the collocation of amenities observed across all micro-clusters than using the micro-cluster’s size. Finally, we use these findings to create a recom-mendation algorithm that suggests amenities that are missing in a micro-cluster and can inform the efforts of developers and planners looking to construct and regulate the development of new and existing neighborhoods.

Thesis Supervisor: Cesar A. Hidalgo Title: Associate Professor

(4)
(5)

Acknowledgments

I would like to thank my supervisor, Cesar Hidalgo, for the patient guidance, en-couragement, and advice he has provided me throughout my time as his student. I have been extremely lucky to have a supervisor who cared so much about my work, and who responded to my questions and queries promtly. I would also like to thank the rest of the Macro Connections group at the MIT Media Lab for the support and feedback they have given me throughout the year. The completion of this project wouldn’t have been possible without their help.

(6)
(7)

Contents

1 Introduction 13

1.1 Multi-Centers . . . 15

2 Data 17 3 Results 19 3.1 From the intercity to the intra-city scale . . . 19

3.2 Micro-clusters . . . 22 3.3 Intra-city scaling . . . 27 3.4 Recommender System . . . 27 4 Discussion 33 A Supplementary Material 35 A.1 Data . . . 35

A.2 Intercity Scaling . . . 38

A.3 Clustering . . . 40

A.3.1 Effective number of amenities . . . 40

A.3.2 Identifying cluster centers . . . 41

A.3.3 Assigning points to clusters . . . 41

(8)

A.5 Predictions . . . 42

(9)

List of Figures

3-1 Intercity Scaling Relations . . . 21

3-2 Clustering algorithm . . . 24

3-3 Intercity Scaling Relations . . . 26

3-4 City micro-agglomerations . . . 29

3-5 Prediction of amenities in Boston’s micro-clusters. . . 32

A-1 Clustering Algorithm: Boston, SF, NY . . . 43

(10)
(11)

List of Tables

A.1 Merged amenity types . . . 36

A.2 Total amenity count and amenity categories . . . 37

A.3 Cities population and amenity count . . . 38

A.4 Intercity Scaling parameters per amenity type . . . 40

A.5 𝑅2s of intercity and intra-city models . . . . 47

A.6 AIC and BIC values of intercity and intra-city models . . . 49

(12)
(13)

Chapter 1

Introduction

During the last decade the empirical study of cities has been characterized by a strong emphasis on scaling relationships connecting the size of a city —measured by its population—with attributes ranging from the availability of infrastructure to the presence of crime [1, 2]. This growing literature has shown that these scaling relationships hold across cities from different cultures and time periods [2, 3]. Yet, these intercity relationships teach us little about the way in which these attributes are spatially distributed within a city. In fact, one could easily construct a model where attributes follow a random spatial distribution within a city and that also satisfies the intercity scaling relationships documented in the literature. In this paper we add to this literature by bringing the quantitative study of cities to the intra-city scale and by showing the statistical principles that explain the frequency, composition, and location of amenities within a city. But why is the intra-city scale important? One the one hand, understanding the distribution of amenities is important for the planners and developers who shape cities. Planners need to create urban designs that stimulate the virtuous social interactions that encourage economic activity, reduce levels of crime, and lower traffic congestion [4, 5, 6, 7, 8]. Developers, who construct buildings

(14)

look-ing for profits, need to create buildlook-ings and units that are attractive to residents and shop owners, and hence, need to understand which types of buildings and units are better pre-adapted to the uses that a neighborhood might require. On the other hand, a city’s citizens and visitors can benefit from maps representing the city at a meso-scale. These meso-scale maps can focus on clusters of ameni-ties instead of individual units, helping uncover the presence of neighborhoods with an active urban life. Finally, small business owners may also benefit from a statistical understanding of cities at the intra-city scale, as the empirical laws describing the location of amenities in neighborhoods can be used to uncover instances of unsatisfied demand that shop owners can use to identify new busi-ness locations (this is information that is now only available to large franchising operations —such as Starbucks [9]). So a better understanding of cities at the intra-city scale can benefit both, the planners and developers that shape a city, and the citizens and visitors who utilize a city’s streets.

Here, we move beyond the intercity scale and studies focused on the size of cities by looking at data summarizing the precise location of amenities (such as restaurants, cafes, and libraries) within a city. Our contribution consists on two parts. First, we introduce a clustering algorithm to show that the spatial organization of cities is based on hundreds of highly localized micro-clusters of urban activity, and that the size of these micro-clusters, is a poor predictor of the number of amenities of a certain type that are present in it. This suggests that, to recover the predictability of the intercity studies —which we reproduce with our data—we need to use information on the types of amenities that are present in each micro-cluster. Our second contribution involves the development of a simple prediction algorithm that exploits information on the patterns of collocation of amenities observed across thousands of micro-clusters. We use this algorithm to identify anomalies in the data —which can represent instances

(15)

of unsatisfied demand—and use these anomalies to suggest both, new amenities for each cluster and the clusters that are in the direst need of a specific amenity. Together these results help extend the study of cities to the intra-city scale, and also, open new avenues of research that focus on the composition of amenities at the neighborhood scale.

1.1

Multi-Centers

The idea that highly localized clusters of economic activity characterize the dis-tribution of amenities in a city has a long academic tradition. On the one hand scholars have conducted empirical studies looking to identify and characterize micro-clusters (or neighborhood scale agglomerations), and on the other hand, we have models that have been used to explain why economic and social activity agglomerates.

On the empirical side people have used employment densities [10], commuting patterns [11], the floor space used by businesses [10], mobile phone and social media activity [12, 13, 14], and the spatial collocation of commercial units [15] to identify micro-clusters of urban activity. On the theoretical side, people have de-veloped supply side and demand side theories to explain agglomeration. Supply side theories of agglomeration focus on externalities, such as knowledge spillovers [16], shared capacities [17], and transportation costs [18], to explain the coloca-tion of businesses and/or manufacturing activities. Demand side theories focus on the ability of agglomerations to attract shared customers. The quintessential demand side model of agglomeration is Hotelling’s 1929 model, which predicts that similar businesses would collocate to maximize their catchment area. These demand side stories, of course, also apply to businesses that are not necessarily similar, but complementary, such as shoe-stores and clothing stores, explaining

(16)

also, why businesses that are not closely related —such as car repair shops and ice-cream stores—tend not to collocate. The rise of novel high-resolution data sources summarizing the location of urban amenities, however, allows us to ex-plore the collocation of amenities empirically, helping us both, validate these theories, but also, provide new empirical facts that we could use to test new theories and models.

(17)

Chapter 2

Data

We collect data from the Google Places API containing the latitude, longitude, and type of amenity (i.e. cafe, restaurant, library, etc.), for more than 1.26 million amenities across 47 US cities (see SM for details). Additionally, we collect data on the population of each of these cities by identifying all of the administrative units contained within the area of our amenities data (see SM). For instance, in the case of Boston, our amenities data includes the areas of Cambridge, Somerville, and Brookline, so we estimate the population of the larger city of Boston by summing the populations of these and other administrative units (see SM).

Going forward we use the word city to refer to the naturally occurring urban agglomerations that people refer to colloquially as ’cities’, and not to the nar-rowly defined administrative units that exist within them (i.e. we use Boston to refer to the union of the administrative units of Boston, Cambridge, Brook-line, Somerville, Newton, etc., and not to the administrative area controlled by Boston’s City Hall). We adopt this use of the word city because our data involves contiguous areas that transcend individual administrative units.

Certainly the data from the Google Places API is not free of biases and lim-itations. The amenities data registered in the Google Places API focuses on

(18)

customer facing businesses and places of interests (from hair salons and bak-eries to airports and cemetbak-eries). Therefore, the Google Places API data fails to include information on other forms of economic activity, such as manufactur-ing or business-to-business activities. Also, the data might have codmanufactur-ing issues, such as having a restaurant registered as a bar. Moreover, businesses that shut down, either because they went broke or relocated, might not be updated from Google Maps, and therefore, the data can contain outdated information. Yet, despite these limitations, the Google Places API is accurate enough to be the backbone of the world’s most used mapping service (Google Maps) and is used daily by millions of individuals to find the location of businesses. This makes the Google Places API data an imperfect, yet attractive dataset to study the spatial organization of amenities at the intra-city scale.

Finally, we remind the reader that any results derived in this paper should be interpreted in the narrow context of the data from which these results were derived. This is data from an online mapping service and for U.S. cities only. The question of whether the results presented below can be generalized to other locations, and also, of whether these results hold for other datasets, is beyond the scope of this paper.

(19)

Chapter 3

Results

3.1

From the intercity to the intra-city scale

We begin by reproducing the well-known intercity scaling laws of Bettencourt et al. [19, 20, 21] using our urban amenities data. By reproducing these laws we validate our data in the context of intercity research before presenting our intra-city contributions.

Figure 3-1 shows the total number of amenities 𝑌𝑐 in a city as a function of

its population. The total number of amenities in a city (𝑌𝑐) scales sub-linearly

with a city’s population 𝑁𝑐 as: 𝑌𝑐 = 𝑌0𝑁𝑐𝛽, with 𝑌0 = 2.03 and 𝛽 = 0.68 (Fig.

3-1a, 𝑅2 = 90% p-value ≪ 1x10−5) matching Bettencourt et al. scaling laws

[1, 19, 20, 21]. The sub-linearity of this scaling law indicates the presence of scale economies, since it means that the number of per capita amenities in a city decreases with a city’s total population.

Next, we explore the exponent of this scaling relationship for amenities of different types. We find that some amenities, such as museums, religious centers, and art galleries, scale slowly with a city’s population (roughly as the square root of a city’s total population (𝛽 ≈ 0.5)). Other amenities, such as restaurants,

(20)

bakeries, and dentists, scale almost linearly with a city’s population (𝛽 > 0.8) (For a summary of all exponents, see SM Table A.4). This diversity of scaling relationships tells us that the composition of amenities in a city changes with a city’s population. For instance, in a city with a population of only half a million people we expect to find, on average, 46 restaurants per museum, but in a city with ten times that population we expect to find almost double that (76 restaurants per museum). These different ratios are direct expressions of the difference in scaling exponents characterizing the dependence of restaurants and museums in a city’s size (Figure 3-1b).

But not all amenities correlate strongly with a city’s population. In fact, the relationship between the number of amenities and a city’s population is noisy for many amenities. To distinguish the amenities that correlate strongly with a city’s population from those that don’t we use the 𝑅2 statistic of the scaling relationship connecting a city’s population with the number of amenities of each type (Figure 3-1c). A high 𝑅2 (𝑅2 > 0.5), such as that characterizing the scaling of restaurants, schools, and shoe stores, means that a city’s population is a strong predictor of the number of amenities of that type in a city. A low 𝑅2 (𝑅2 < 0.5), such as that characterizing the scaling of museums, embassies,

and universities, means that a city’s population is an incomplete predictor of the number of amenities of that type in a city. Note that the observed 𝑅2s, but

not the scaling exponents, will be almost the same if we were to use the total number of amenities instead of population as a measure of city size, since a city’s population correlates almost perfectly with that city’s total number of amenities (Fig. 3-1a 𝑅2 = 90%).

(21)

Figure 3-1: Intercity Scaling Relations. a Scaling of the total number of amenities in a city 𝑐 (𝑌𝑐), as a function of a city’s population (𝑁𝑐). The total

number of amenities in a city scales as 𝑌𝑐 = 𝑌0𝑁𝑐𝛽 with 𝑌0 = 2.03, 𝛽 = 0.68,

and 𝑅2 = 0.90. Each point represents one of the 47 US cities in our dataset. b

Scaling of the total number of restaurants and museums in a city as a function of a city’s population. Each point represents the number of restaurants (yellow) or museums (blue) in a different city. The figure shows that the scaling exponent of restaurants (𝛽 = 0.81) is larger than the scaling exponent of museums (𝛽 = 0.59) meaning that the number of restaurants per museum increases with a city’s population. c The scaling exponent (horizontal axis) and the goodness of fit (𝑅2,

vertical axis) of the scaling relationship of each amenity type. The horizontal dashed line separates amenities whose number correlates strongly with a city’s population (𝑅2 > 0.5) from those characterized by a milder correlation (𝑅2 <

0.5). The vertical dashed line separates amenities that scale with population faster than the total amount of amenities in a city (𝛽 > 0.68) from those that scale slower than that.

(22)

3.2

Micro-clusters

But do these scaling relationships hold at the intra-city scale? To explore the intra-city scale we first need to divide the city into meaningful intra city units. To perform this division we introduce a clustering algorithm that splits cities into micro-clusters, which are spatially localized and bounded agglomerations of amenities. Then, we study the city at the intra-city scale by using micro-clusters as our unit of study. As a measure of the size of a micro-cluster we use the total number of amenities present in it. Switching from cities to micro-clusters as our unit of study will reveal that the size of a micro-cluster, unlike that of a city, is a poor predictor of the number of amenities of each type present in it. Yet, as we will show, we can recover some of the predictability lost when moving to the intra-city scale by using data on the types of amenities that are present in each micro-cluster (and controlling for over-fitting by using both Akaike’s and Bayes’ Information Criteria).

We begin the spatial clustering of urban amenities by calculating the effective number of amenities that are present in each location 𝑖. We define the effective number of amenities in location 𝑖 (𝐼𝑖), as the number of amenities that can

be reached by walking from that location. Formally, the effective number of amenities in location 𝑖 is the scalar function 𝐼𝑖:

𝐼𝑖 = 𝑌𝑐

∑︁

𝑗=1

𝑒−𝛾𝑑𝑖𝑗

where 𝑑𝑖𝑗 is the distance between amenity 𝑖 and amenity 𝑗, 𝛾 is a decay

parameter that discounts amenities based on their distance to location 𝑖, and 𝑌𝑐 is the total number of amenities in city 𝑐. To interpret the values of 𝐼 it is

useful to note that an amenity at the location where the measurement is taking place (i.e. with 𝑑𝑖𝑖 = 0) contributes one to the effective number of amenities in

(23)

that location. An amenity 𝑗 at distance 𝑑𝑖𝑗 = 1/𝑒 —which would imply walking

1/𝑒 kilometers from amenity 𝑖 will contribute only 1/𝑒 to location’s 𝑖 effective number of amenities (𝐼𝑖). We find that our algorithm finds meaningful clusters

when we set 𝛾 = 16, which implies that the contribution of an amenity to the effective number of amenities of a location roughly halves every 62.5 meters and becomes negligible at about 500 meters (the short side of the Manhattan block is 80 meters long).

Figure 3-2 illustrates our clustering algorithm using the city of Boston as an example. The bottom layer (Fig. 3-2a) is a map of Boston used for spatial refer-ence. The center layer (Fig. 3-2b) shows Boston’s effective number of amenities (𝐼) for all the locations where an amenity is present. The top layer (Fig. 3-2c) shows the clusters identified using our algorithm (with different colors).

To identify the amenities belonging to each cluster we begin by identifying each local peak on the effective number of amenities landscape defined by 𝐼 (Fig. 3-2b) as the center of a potential micro-cluster. We identify these local peaks by searching for locations that have an effective number of amenities 𝐼 larger than their 𝑛 nearest neighbors (using a functional heuristic to find the 𝑛 that works best for each 𝐼—see SM). Then, we assign amenities to a micro-cluster by using the following greedy algorithm: (i) We initialize clusters by assigning to each cluster center all amenities that are in close proximity to it (less than 0.5 kms). (ii) We calculate the distance between each unassigned amenity and the amenities that have been assigned to a cluster. (iii) We assign to a cluster only the amenity that is closest to an amenity that has already been assigned to a cluster. And (iv), we recalculate the distance between assigned and unassigned amenities and repeat step (iii) and (iv) until all amenities have been assigned to a cluster. An example of the clusters found for the city of Boston is shown in Figure 3-2c (see SM for more examples).

(24)

Figure 3-2: Clustering algorithm. a Map of Boston b The number of effective amenities (𝐼) at each location where an amenity is present in Boston. Peaks represent locations with a high number of effective amenities and valleys represent locations with a low number of effective amenities. The black dots represent the local maxima identified by our clustering algorithm. These points represent the centers of a micro-cluster (for example, Kendall/MIT or the North End). c Clusters identified using our clustering algorithm. Each cluster is expressed as a set of dots of the same color, each dot representing an amenity. The center of each cluster is marked using a black dot.

(25)

Overall, we find that the clusters identified using this algorithm correspond to well-known centers of urban activity. In the case of Boston these clusters include Harvard Square and Central Square in Cambridge and The North End and Coolidge Corner in Boston, among others.

We also note that the distribution of the effective number of amenities in a city is also characterized by some universal properties. Figure 3-3a shows the distribution of the effective number of amenities (𝐼) for every city in our dataset while Figure 3-3b shows the same distribution after normalizing the effective number of amenities in a city by that city’s average (< 𝐼 >=

∑︀

𝑖𝐼𝑖

𝑌𝑐 ). For

comparison, we also show the same distributions for an ensemble of cities where the location of each amenity has been randomized. These randomized cities are characterized by a narrow distribution for their effective number of amenities, meaning that these random cities lack the high concentrations of amenities that indicate the presence of micro-clusters in real cities. More importantly, figure 3-3b shows that once we normalize the effective number of amenities in a city by that city’s average all cities follow the same lognormal distribution

𝑃 ( 𝐼𝑖

< 𝐼 > = 𝑥) = 𝑙𝑛𝑁 (𝜇, 𝜎)

with 𝜇 = −0.404 and 𝜎 = 0.89. The existence of a universal distribution for the effective number of amenities across all cities in our sample means that all of these cities have an equal number of peaks and valleys of a given magnitude when the magnitude of these peaks and valleys is measured in units of that city’s average.

(26)

Figure 3-3: Intercity Scaling Relations. a The distribution of the effective number of amenities in each US city. Blue lines show the distribution observed in our urban amenities data and orange lines show the distribution observed after randomizing the location of amenities for each city. b The distribution of the effective number of amenities in each US city normalized by the average effective number of amenities in that city. Blue lines show the distribution observed in the cities data and orange lines show the distribution observed in the same cities but after randomizing the location of amenities

(27)

3.3

Intra-city scaling

Now that we have identified micro-clusters for all cities in our data we analyze whether the scaling relationships that hold at the intercity scale also hold at the scale of micro-clusters (i.e. we test whether the number of amenities of each type in a cluster scales with the size of that cluster). Figure 3-4a compares the scaling relationships observed at the intercity scale with the scaling relationships observed at the intra-city scale for a subset of amenities and two different models (for all amenities see SM table A.5). In light colors (light blue and vermillion) we show the accuracy of models predicting the number of amenities of a given type in a city or a micro-cluster using only information on that city or cluster’s size. The dark bars (navy and crimson) show the accuracy of a model using information on the composition of amenities in a city or micro-cluster (which we will explain later). The comparison between the size based models show that amenities, such as schools, doctors, and shoe stores, which correlate strongly with the total number of amenities in a city (average inter-city scaling 𝑅2 > 70%), do not scale

well with the total number of amenities in a micro-cluster (average inter-city 𝑅2 < 18%). This indicates that the scaling laws observed in the intercity scale

fail to hold—for most amenities—at the intra-city scale.

Next, we try to recover some of the predictability lost at the intra-city scale by introducing a model based on the composition of a micro-cluster—the types of amenities present in it.

3.4

Recommender System

We begin the construction of the composition-based model by studying the collo-cation of pairs of amenities across all clusters. Figure 3-3b shows the network of correlations between pairs of amenities calculated using spearman’s rank

(28)

correla-tion across all clusters. We build the skeleton of this network using a Maximum Spanning Tree algorithm and then add edges between amenities that have a pair-wise correlation equal or larger than 0.3 (see SM for the full correlations matrix) [22]. The network shows that amenities tend to collocate with other amenities of similar types. For example, car repair shops collocate with car dealers (Spear-man’s 𝜌 = 0.45), religious centers collocate with schools (Spear(Spear-man’s 𝜌 = 0.46), and nightclubs collocate with bars (Spearman’s 𝜌 = 0.36). Also, the network shows that amenities sometimes tend to collocate with amenities from different categories. For instance, clothing stores collocate with restaurants and beauty salons (respective Spearman’s 𝜌 = 0.52 𝜌 = 0.45). What is more important, however, is that these patterns of collocation suggests that it is possible to create a parsimonious model to predict the number of amenities of a type in a cluster using information on the presence of other amenities in it, since the network indicates that the presence of a set of amenities in a cluster carries information about the presence of other amenities.

Finally, we use the collocation of amenities in a cluster to create an algo-rithm that we can use to predict the number of amenities that should locate in each micro-cluster and create a recommender system that we can use to identify micro-clusters where particular amenities are over or under-supplied. To create this algorithm we need to go beyond pairwise correlations, as the high clustering of the network of collocations (Fig. 3-4) indicates that the information about the presence of an amenity in a cluster carried by the presence of other ameni-ties is likely to have some redundancy. Going forward, we go beyond pairwise correlations by using a forward selection algorithm that iteratively adds types of amenities to a regression until the contribution of the presence of a new amenity type to the predictive power of the regression is characterized by a p-value of more than 0.001 (see SM). In addition, we validate the models resulting from

(29)

Figure 3-4: Micro-Cluster Composition a Light blue and light red bars, respectively, correspond to the 𝑅2 of the predictions obtained using the size of a city (left) and the size of each micro-cluster (right). The dark blue and dark red bars correspond, respectively, to the 𝑅2 of the predictions obtained using the

composition of cities (left) and the composition of micro-clusters (right). (For all amenities see SM). b The nodes in the network represent different types of amenities and the edges connect amenities that are likely to collocate in a micro-cluster (see SM). The width of the edges connecting a pair of nodes is proportional to the spearman correlation obtained from the collocation of the two types of amenities across all micro-clusters. The size of a node is proportional to the number of times that an amenity is present in our data set. The color of each node represents the category that the amenity belongs to.

(30)

this forward selection algorithm by using both Akaike’s Information Criterion (AIC) and Bayes’s Information Criterion (BIC). By using AIC and BIC we en-sure that the models that we obtain are not better than the models using size simply because they include more variables.

The red bars of Figure 3-4a (vermillion and crimson) compare the 𝑅2 of the models constructed using the size of micro-clusters with the 𝑅2 of the models

constructed using the composition of micro-clusters. In most cases (66/74 = 89%), the BIC test chooses the regression using the composition of a micro-cluster over the regression using its size (the exception are airports, aquariums, bus stations, car rentals, casinos, convenience stores, gas stations, and zoos). Also, we note that these results are not just statistically significant, but characterized by strong size effects. On average, for the 66 amenity types in which the composition model works better, the 𝑅2 of the composition model is twice that of the model using size only (𝑅2 = 17% on average using size vs. 𝑅2 = 35% on average

using composition), meaning that the increase in predictive power obtained by considering the composition of amenities in a cluster is not only statistically significant, but also substantial.

Finally, we use the composition model described above to create a recom-mender system [22, 24] to suggest amenities that might be missing in an urban cluster. We predict missing amenities by calculating the difference between the number of amenities in a cluster predicted by the composition model and the number of amenities of that type observed in each cluster.

Figure 3-5 compares the number of car parks, hotels, and beauty salons, ob-served and predicted, for each micro-cluster in Boston. Points above the lines, such as Harvard Square in car parks (Figure 3-5a), the North End in hotels (Fig-ure 3-5b), and Central Square in Beauty Salons (Fig(Fig-ure 3-5c), suggest instances of unsatisfied demand. Points below the lines such as Boston’s Theatre District

(31)

in car parks, Coolidge Corner in hotels, and Winthrop in beauty salons, suggest instances of excess demand. Of course, these suggestions should not be taken literally. For instance, a decision to build new parking in Harvard square is a decision that requires considering many aspects of Harvard Square that are not included in our model, such as the aesthetics of its architecture [25, 26] or the externalities caused by cars. Nevertheless, this validation shows that our model automatically captures the under-supply of parking that characterizes Harvard square (and that is well known to Cambridge residents). Figure 5b, on the other hand, shows that our model suggests a lack of hotels in the North End, a well-known tourist spot where only a handful of hotels are present. This could mean that there is a great potential for new hotels to locate in Boston’s North End, but once again, this is a decision that would need to incorporate other factors, such as North End’s famous idiosyncratic architecture and active resident community [4].

(32)

Figure 3-5: Prediction of amenities in Boston’s micro-clusters. a Ob-served vs. predicted number of car parks, b hotels, and c, beauty salons for each micro-cluster in Boston. Points above the lines represent micro-clusters where the predicted number of amenities is higher than the observed, suggesting instances of unsatisfied demand (or missing data). Points below the lines rep-resent micro-clusters where the predicted number of amenities is lower than the observed, suggesting instances of excess demand.

(33)

Chapter 4

Discussion

During recent years the quantitative study of cities has focused extensively on inter-city studies, and in particular, on inter-city scaling laws. These intercity studies, however, do not tell us much about the spatial distribution of a city’s characteristics. In this paper we extended this literature to the intra-city scale by focusing on micro-clusters of urban amenities and by showing that the scaling laws that hold at the inter-city scale need to be replaced by multivariate sta-tistical models that exploit information on the composition of micro-clusters to predict the number of amenities of each type that is present in each micro-cluster. Of course, our results and models are not free of biases and limitations. Be-yond the data biases described above, our model is limited by its simplicity, which bounds the total amount of variance in the presence of amenities that we can explain. Our statistical model predicts the number of amenities that locate in a micro-cluster using regressions without interaction terms. This means that the models could be potentially improved by using more complex functional forms, but also, by adding to them information that is not expressed in the presence of amenities, such as the aesthetic appeal of a neighborhood’s architecture [25, 26], it’s foot traffic as captured by mobile phone data [27], or the centrality of the

(34)

urban micro-cluster in the context of the city.

Still, the results and methods presented here point to interesting new avenues of research. For example, time resolved data sources for both amenities and streetscapes could be used to explore the interaction between the dynamics of the amenities that locate in a micro-cluster and the types of buildings being constructed in it. Also, these results could be used to help inform what types of business permits need to be given out to help balance the micro-clusters of a city’s neighborhoods. On the computational side, the information uncovered here could be used to create new meso-scale city maps that can help users understand a city’s micro-clusters, but also, deliver the recommendations for each micro-cluster uncovered by our algorithm or similar algorithms. Together, our results, and the new avenue of research they open, should help stimulate further quantitative study of the multivariate statistical laws that characterize cities at the intra-city scale.

(35)

Appendix A

Supplementary Material

A.1

Data

Amenities Data: We collected data from the Google Places API containing the latitude, longitude and type (cafe, restaurant, library, etc.) of the urban amenities located in 47 US cities. The original data set contains 95 different types of amenities but we merged them into 74 categories by aggregating data on amenities that fulfill similar functions (Table A.1) and excluding amenities that are unspecific (such as the "store" category) or for which little data is available. The amenities we exclude are: taxi stand, campground, store, subway station, RV park, movie rental, and shopping mall. The resulting amenities are shown in Table A.2.

Population Data: We collect data on the population of each city from Wikipedia. Table B.1 in shows all the administrative units in each city overlap-ping with our amenities data, and their population as indicated in Wikipedia. To obtain each city’s population we aggregate the population of each of the ad-ministrative units that overlap with our amenities data for that city. The final population of cities and their total number of amenities are shown in Table A.3.

(36)

Original Amenities New Amenities Hindu temple Mosque Place of worship Synagogue Church Religious center Meal delivery Meal takeaway Food Restaurant Restaurant Health Doctor Doctor Finance Bank Finance Roofing Contractor Electrician Plumber Painter General Contractor Construction contractor

Table A.1: The left column shows the amenities that were merged into a new amenity type, shown in the right column.

Amenity Points Category Amenity Points Category

Accounting 17280 Services Gym 5934 Health

Airport 1535 Transportation Hardware

store

4595 Shopping

Amusement park

1017 Entertainment Home goods

store

29537 Shopping

Aquarium 492 Entertainment Hospital 7942 Health

Art gallery 5358 Entertainment Hotel and

lodging

11452 Services

ATM 30753 Services Insurance

agency

27866 Services

Bakery 9255 Food & Drinks Jewelry store 6751 Shopping

Bar 21506 Food & Drinks Laundry 14391 Services

Beauty salon 41851 Services Lawyer 37611 Services

Bicycle store 1409 Shopping Library 3466 Education

(37)

Bowling alley

366 Entertainment Local

Gov-ernment Office

10081 Government

Bus station 110642 Transportation Locksmith 2182 Services

Cafe 9485 Food & Drinks Movie

The-ater

1232 Entertainment

Car dealer 11603 Services Moving

Company

12744 Services

Car rental 2968 Services Museum 2161 Entertainment

Car repair 40215 Services Night Club 5675 Food & Drinks

Car wash 3202 Services Park 25723 Other

Casino 172 Entertainment Parking 5527 Transportation

Cemetery 2386 Other Pet Store 2270 Shopping

City hall 140 Government Pharmacy 15204 Shopping

Clothing store

29806 Shopping Physiotherapist 7929 Health

Construction contractor

86044 Services Police 1613 Government

Convenience store

13818 Shopping Post Office 2723 Services

Courthouse 717 Government Real Estate

Agency

39484 Services

Dentist 26071 Health Religious

Centers

58468 Other

Department store

3515 Shopping Restaurant 112430 Food & Drinks

Doctor 153772 Health School 46516 Education

Electronics store

11876 Shopping Shoe Store 8612 Shopping

Embassy 688 Government Spa 2843 Health

Finance 32221 Services Stadium 1245 Entertainment

Fire station 2050 Government Storage 5849 Services

Florist 5102 Shopping Train

Sta-tion 1262 Transportation Funeral home 2761 Services Travel Agency 7394 Services Furniture store

12379 Shopping University 6597 Education

Gas station 2552 Services Veterinary

Care

5373 Services

Grocery or

supermarket

15206 Shopping Zoo 114 Entertainment

Total 1,262,374

Table A.2: Total number amenities of each type in the Google Places data set in the 47 US cities in our study. The Categories column shows the category we assign each amenity type to when we study the collocation of amenities.

(38)

City Population Number of Ameni-ties

City Population Number

of Ameni-ties

Atlanta 447,841 19,050 Nashville 737,796 21,619

Austin 885,400 22,592 New Orleans 570,943 14,607

Baltimore 642,587 14,434 New York 8,405,837 75,081

Birmingham 389,250 15,066 Oklahoma 922,506 21,010 Boston 1,121,438 19,769 Orlando 493,524 20,559 Buffalo 258,959 7,409 Philadelphia 1,945,795 40,410 Charlotte 850,880 19,954 Phoenix 2,046,991 39,354 Chicago 3,618,465 64,531 Pittsburgh 466,879 15,714 Cincinnati 453,968 13,818 Portland 609,456 21,043 Cleveland 685,931 18,496 Providence 290,459 6,653 Columbus 1,128,075 27,854 Raleigh 582,834 15,884 Dallas 2,435,949 44,358 Richmond 262,944 9,437 Denver 1,757,830 32,731 Sacramento 767,408 20,372

Detroit 973,284 21,776 Salt Lake 210,806 9,444

Houston 3,362,560 80,011 San Antonio 1,511,307 35,255

Indianapolis 1,468,843 212,96 San Diego 2,297,970 46,614

Jacksonville 1,007,094 204,66 San Francisco 837,442 18,984

Las Vegas 1,850,966 29,009 San Jose 1,472,951 30,868

Los Angeles 6,428,879 114,002 Seattle 622,155 20,514

Louisville 840,601 22,425 St Louis 361,273 12,125

Memphis 832,803 21,350 Tampa 742,583 25,285

Miami 800,216 13,403 Virginia Beach 448,479 10,619

Milwaukee 822,777 20,590 Washington 1,267,943 20,310

Naples 95,796 5,970 Total 60,940,877 1,236,151

Table A.3: Population and total number of amenities of each city.

A.2

Intercity Scaling

We explore the scaling exponent 𝛽 of the scaling relationship (𝑌𝑐𝑘 = 𝑌0𝑁𝑐𝛽) for

each type of amenity 𝑘, 𝑌𝑘

𝑐 , in a city 𝑐 with population of that city, 𝑁𝑐, finding

that scaling exponents vary greatly for each amenity type. Table A.4 shows 𝑌0,

(39)

Amenity 𝑌0 𝛽 𝑅2 Amenity 𝑌0 𝛽 𝑅2

Accounting 0.014 0.727 0.751 Gym 0.002 0.797 0.869

Airport 0.003 0.655 0.362 Hardware store 0.009 0.658 0.783

Amusement park 0.000 0.769 0.309 Home goods store 0.028 0.710 0.709 Aquarium 0.000 0.765 0.621 Hospital 0.006 0.725 0.837

Art gallery 0.010 0.653 0.742 Hotel and

lodg-ing

0.003 0.797 0.712

Atm 0.041 0.690 0.695 Insurance

agency

0.013 0.763 0.582

Bakery 0.001 0.847 0.936 Jewelry store 0.003 0.766 0.833

Bar 0.016 0.728 0.799 Laundry 0.003 0.814 0.775

Beauty salon 0.025 0.745 0.816 Lawyer 0.536 0.520 0.627

Bicycle store 0.001 0.733 0.699 Library 0.005 0.681 0.755

Book store 0.002 0.735 0.829 Liquor store 0.001 0.897 0.775

Bowling alley 0.002 0.589 0.421 Local

Govern-ment Office

0.094 0.553 0.780

Bus station 18.313 0.320 0.250 Locksmith 0.000 0.861 0.526

cafe 0.004 0.767 0.756 Movie Theater 0.001 0.704 0.776

Car dealer 0.015 0.684 0.420 Moving

Com-pany

0.009 0.734 0.544

Car rental 0.000 0.839 0.681 Museum 0.010 0.592 0.692

Car repair 0.024 0.743 0.613 Night Club 0.003 0.762 0.852

Car wash 0.001 0.790 0.601 Park 0.014 0.751 0.736

Casino 0.004 0.461 0.035 Parking 0.010 0.667 0.765

Cemetery 0.006 0.634 0.124 Pet Store 0.000 0.891 0.890

City hall 0.004 0.465 0.285 Pharmacy 0.007 0.763 0.908

Clothing store 0.012 0.772 0.846 Physiotherapist 0.008 0.706 0.830

Construction contractor 0.179 0.656 0.620 Police 0.001 0.737 0.744 Convenience store 0.054 0.611 0.462 Post Office 0.002 0.736 0.888

Courthouse 0.001 0.710 0.717 Real Estate

Agency

0.041 0.704 0.710

Dentist 0.002 0.902 0.823 Religious

Cen-ters 0.156 0.639 0.748 Department store 0.005 0.680 0.519 Restaurant 0.024 0.817 0.945 Doctor 0.205 0.689 0.818 School 0.015 0.790 0.925 Electronics store 0.003 0.803 0.790 Shoe Store 0.002 0.827 0.887 Embassy 0.000 0.838 0.153 Spa 0.000 0.853 0.687 Finance 0.024 0.729 0.798 Stadium 0.003 0.643 0.460

Fire station 0.011 0.583 0.371 Storage 0.003 0.747 0.454

Florist 0.002 0.766 0.843 Train Station 0.005 0.558 0.133

(40)

Furniture store 0.011 0.716 0.796 University 0.149 0.482 0.225

Gas station 0.005 0.652 0.289 Veterinary Care 0.002 0.764 0.648

Grocery or su-permarket

0.006 0.766 0.878 Zoo 0.009 0.398 0.364

Table A.4: Shows the value of the parameters 𝑌0, 𝛽 and 𝑅2 of the scaling

re-lationship, of the total number of each type of amenity in a city, 𝐴𝑘𝑐, with that city’s population, 𝑁𝑐 expressed as: 𝐴𝑘𝑐 = 𝑌0𝑁𝑐𝛽.

A.3

Clustering

A.3.1

Effective number of amenities

We begin our clustering procedure by calculating the effective number of ameni-ties at each location. The effective number of amenities, 𝐼𝑖, in a location 𝑖

represents the number of amenities that can be reached by walking from that location. We define 𝐼𝑖 as:

𝐼𝑖 = 𝑌𝑐 ∑︁ 𝑗=1 𝑒−𝛾𝑑𝑖𝑗 = 𝑘 ∑︁ 𝑗=1 𝑒−𝛾𝑑𝑖𝑗 + 𝑌𝑐 ∑︁ 𝑗=𝑘+1 𝑒−𝛾𝑑𝑖𝑗 = 𝑘 ∑︁ 𝑗=1 𝑒−𝛾𝑑𝑖𝑗 + 𝜖

where 𝑑𝑖𝑗 is the distance (in km) between amenity 𝑖 and amenity 𝑗, and 𝑌𝑐 is

the total number of amenities in a city 𝑐. 𝛾 is a decay parameter that discounts amenities based on their distance to location 𝑖. We set 𝛾 = 16, meaning that the contribution of an amenity to the effective number of amenities at a location roughly halves every 62.5 meters and becomes negligible at about 500 meters. To simplify the calculation of the effective number of amenities in a location we use 𝑘 amenities instead of 𝑌𝑐. Theoretically all of the amenities in a city should

contribute to a location’s effective number of amenities, but since amenities that are far from a location are discounted by an exponential factor, considering

(41)

the contribution of the 𝑘 closest amenities gives already a good approximation. In general, we find that the effective number of amenities for a location does not change after considering the first few hundred amenities, indicating that 𝑘 = 2, 000 provides a set that is large enough to provide a good estimate for a location’s effective number of amenities.

A.3.2

Identifying cluster centers

We continue our clustering procedure by identifying the centers of each micro-cluster as the local peaks on the landscape. We identify local peaks by searching for locations that have an effective number of amenities, 𝐼𝑖, larger than their 𝑛𝑖

nearest neighbors. We define 𝑛𝑖 as: 𝑛𝑖 = 3𝐼𝑖+ 50, i.e. a function of the effective

number of amenities at location 𝑖, so that the centers of very dense clusters are required to have larger 𝐼𝑖 than a large number of neighbor amenities, while

centers of very sparse clusters are required to have larger 𝐼𝑖 than a small number

of neighboring amenities. By setting 𝑛𝑖 proportional to 𝐼𝑖 we avoid assigning

multiple cluster centers to areas with high density of amenities, and we avoid not assigning any cluster center to areas with a low density of amenities.

A.3.3

Assigning points to clusters

Finally, we assign points to micro-clusters using the cluster centers we obtained. First, we remove the 10% of the points in each city with the lowest effective number of amenities, to eliminate isolated amenities that are not part of a micro-cluster. After that, we assign all amenities that are within a distance of 0.5km of a cluster center to that cluster center. Then, we calculate the distance from each unassigned point to each assigned point. Furthermore, we iteratively:

(42)

2. Assign point 𝑢 to the cluster point a belongs to.

3. Calculate the distance from each unassigned point to the newly assigned point 𝑢.

The algorithm finalizes once all points have been assigned to a cluster. Figure A-1 shows the effective number of amenities in the cities of Boston, San Francisco, and New York (left figures), and the corresponding assignments of amenities to clusters (right figures).

A.4

Collocation of amenities

To study the collocation patterns of amenities, we calculate the spearman cor-relation between all pairs of amenities across clusters. We show the resulting correlations in the form of a network, where nodes represent amenity types and edges connect amenities that are highly correlated across micro-clusters. To construct this network we first create a Maximum Spanning Tree (MST) of the network and then add edges only between amenities that have a pairwise corre-lation equal or larger than 0.3.

Here, we show the values of all spearman correlations between amenities across clusters in the form of a matrix (Figure A-2). We cluster amenities using Ward linkages.

A.5

Predictions

We construct four regression models to predict each type of amenity in the in-tercity and intra-city scale using two different metrics: size and composition. In the inter city scale, we predict the number of each type of amenity in a city using the total number of amenities in a city and the composition of amenities in the

(43)

Figure A-1: Clustering Algorithm: Boston, SF, NY. The figures on the right show the effective number of amenities at each location in the cities of a Boston, b San Francisco, and c New York. Red lines correspond to areas with a high effective number of amenities and blue lines correspond to areas with a low effective number of amenities. The black dots represent the locations we assign as cluster centers. The figures on the left show the corresponding assignment of amenities to micro-clusters. Each dot represents an amenity, and sets of dots of the same color constitute a micro-cluster.

(44)

Figure A-2: Amenities correlations matrix. Matrix showing the Spearman correlation between each pair of amenities. Amenities are clustered using Ward linkages.

(45)

city. In the intra-city scale, we predict the number of each type of amenity in a micro-cluster using the size of micro-clusters and the composition of amenities in each micro-cluster. We create a model that uses the total number of ameni-ties in a micro-cluster to predict the number of each type of amenity in that micro-cluster. To construct these models we use a forward selection algorithm that iteratively adds types of amenities to a regression until the contribution of the presence of a new amenity type to the predictive power of the regression is characterized by a p-value of more than 0.001 (nextly we explain how we use AIC and BIC to verify our model selection). Table A.5 shows the 𝑅2 obtained for each of these models.

Given that these four models use a different number of samples and pa-rameters, we calculate the Akaike Information Cirterion (AIC) and Bayesian Information Criterion (BIC) of each of the models. These criteria allow us to differentiate the models: the lower the AIC and BIC values, the more desirable the model (better fit and less overfitted). The AIC and BIC values obtained for each model are summarized in Table A.6.

Intercity Scaling Intra-City Scaling

Size Composition Size Composition

Accounting 0.946 0.985 0.291 0.448 Airport 0.575 0.816 0.016 0.114 Amusement Park 0.382 0.724 0.002 0.005 Aquarium 0.709 0.880 0.014 0.028 Art Gallery 0.603 0.930 0.114 0.271 ATM 0.911 0.967 0.320 0.465 Bakery 0.777 0.980 0.364 0.543 Bar 0.649 0.966 0.462 0.750 Beauty Salon 0.952 0.989 0.449 0.615 Bicycle Store 0.594 0.919 0.080 0.183 Book Store 0.878 0.980 0.245 0.344 Bowling Alley 0.478 0.702 0.004 0.014 Bus Station 0.242 0.431 0.023 0.237 Cafe 0.649 0.956 0.505 0.670 Car Dealer 0.608 0.850 0.003 0.231 Car Rental 0.831 0.942 0.042 0.118

(46)

Car Repair 0.867 0.976 0.016 0.437 Car Wash 0.828 0.970 0.005 0.071 Casino 0.016 0.000 0.002 0.008 Cemetery 0.126 0.585 0.001 0.015 City Hall 0.379 0.449 0.031 0.151 Clothing Store 0.884 0.993 0.298 0.718 Construction Contractor 0.824 0.978 0.135 0.456 Convenience Store 0.629 0.928 0.042 0.134 Courthouse 0.676 0.738 0.088 0.446 Dentist 0.954 0.974 0.262 0.439 Department Store 0.673 0.945 0.016 0.200 Doctor 0.957 0.986 0.408 0.694 Electronics Store 0.924 0.966 0.224 0.355 Embassy 0.102 0.419 0.046 0.114 Finance 0.953 0.983 0.424 0.610 Fire Station 0.490 0.632 0.018 0.058 Florist 0.889 0.981 0.207 0.259 Funeral Home 0.476 0.787 0.018 0.146 Furniture Store 0.912 0.980 0.173 0.444 Gas Station 0.443 0.777 0.000 0.028 Grocery or Supermarket 0.791 0.955 0.116 0.377 Gym 0.911 0.984 0.229 0.339 Hardware Store 0.896 0.953 0.020 0.194

Home Goods Store 0.908 0.986 0.213 0.517

Hospital 0.958 0.979 0.096 0.546

Hotel and Lodging 0.795 0.824 0.250 0.435

Insurance Agency 0.825 0.981 0.234 0.433 Jewelry Store 0.902 0.978 0.208 0.352 Laundry 0.933 0.984 0.180 0.354 Lawyer 0.871 0.894 0.359 0.570 Library 0.610 0.937 0.180 0.416 Liquor Store 0.753 0.815 0.175 0.301

Local Government Office 0.901 0.937 0.181 0.567

Locksmith 0.671 0.752 0.033 0.053 Movie Theater 0.780 0.952 0.125 0.190 Moving Company 0.721 0.931 0.012 0.131 Museum 0.499 0.951 0.221 0.412 Night Club 0.735 0.957 0.326 0.606 Park 0.669 0.745 0.149 0.320 Parking 0.666 0.938 0.374 0.610 Pet Store 0.812 0.943 0.077 0.192 Pharmacy 0.878 0.949 0.169 0.371 Physiotherapist 0.863 0.931 0.081 0.260 Police 0.681 0.866 0.052 0.201 Post Office 0.859 0.964 0.090 0.130

(47)

Religious Centers 0.744 0.868 0.171 0.430 Restaurant 0.921 0.995 0.659 0.826 School 0.948 0.976 0.251 0.438 Shoe Store 0.916 0.966 0.153 0.648 Spa 0.784 0.940 0.182 0.297 Stadium 0.613 0.749 0.010 0.107 Storage 0.632 0.912 0.010 0.123 Train Station 0.099 0.414 0.047 0.087 Travel Agency 0.813 0.931 0.292 0.402 University 0.238 0.351 0.020 0.328 Veterinary Care 0.814 0.966 0.020 0.115 Zoo 0.343 0.680 0.001 0.011

Table A.5: 𝑅2 of the intercity and intra-city models we construct using metrics of size and composition of cities (in the case of the intercity) and micro-clusters (in the case of the intra-city).

Intercity Scale Intra-City Scale

Size Comp. Size Comp.

AIC BIC AIC BIC AIC BIC AIC BIC

Accounting 387.1 389.0 610.2 615.7 7233.6 7240.7 5467.9 5630.0 Airport 283.6 285.4 534.2 536.0 -14564.4 -14557.4 -14565.1 -14480.5 Amusement Park 252.4 254.3 492.8 496.5 -6923.4 -6916.4 -6940.3 -6933.2 Aquarium 160.8 162.6 395.3 399.0 -24141.2 -24134.2 -23744.2 -23709.0 Art Gallery 404.3 406.2 600.4 605.9 15448.2 15455.3 13780.7 13893.4 Atm 458.2 460.1 614.1 617.8 14260.7 14267.7 12446.0 12664.5 Bakery 437.6 439.5 479.8 483.5 2507.6 2514.7 -349.5 -208.5 Bar 508.4 510.2 724.1 731.5 20416.0 20423.0 14125.5 14358.1 Beauty Salon 471.4 473.3 625.0 628.7 19820.0 19827.0 16895.3 17113.8 Bicycle Store 268.8 270.6 283.5 285.4 -16203.0 -16196.0 -17083.1 -16970.3 Book Store 278.1 280.0 510.0 517.4 -7584.9 -7577.8 -8461.1 -8313.1 Bowling Alley 132.9 134.8 414.8 418.5 -28639.3 -28632.3 -28784.8 -28756.6 Bus Station 667.5 669.3 735.0 736.9 34335.6 34342.7 34766.9 34936.1 Cafe 443.1 445.0 461.9 465.6 4533.6 4540.6 1134.4 1338.8 Car Dealer 461.3 463.1 714.4 716.2 10190.0 10197.1 8802.5 8873.0 Car Rental 290.1 291.9 443.9 449.4 -3146.7 -3139.7 -3181.3 -3110.8 Car Repair 521.0 522.8 731.4 738.8 22908.0 22915.1 18230.6 18371.6 Car Wash 293.8 295.6 460.0 467.4 -11654.7 -11647.6 -11747.7 -11663.2 Casino 216.3 218.2 443.5 443.5 -35421.5 -35414.5 -35127.2 -35113.1 Cemetery 341.0 342.9 586.3 590.0 -21285.1 -21278.0 -21423.3 -21402.2 City Hall 76.2 78.0 293.2 295.0 -37437.7 -37430.7 -38618.9 -38562.5 Clothing Store 500.4 502.2 592.9 600.3 31184.7 31191.8 23911.6 24024.4

(48)

Construction Contractor 591.7 593.6 574.4 578.1 23556.0 23563.1 20067.1 20243.3 Convenience Store 455.2 457.1 570.9 572.8 1726.9 1733.9 2246.0 2394.0 Courthouse 160.9 162.8 404.9 406.8 -13810.3 -13803.3 -17901.4 -17788.6 Dentist 436.7 438.6 714.8 718.5 19519.7 19526.7 17163.9 17311.9 Department Store 327.4 329.2 510.3 514.0 -6448.4 -6441.4 -7219.3 -7064.3 Doctor 578.7 580.5 607.2 614.6 42180.3 42187.3 36517.8 36672.9 Electronics Store 386.1 388.0 587.0 590.7 3688.0 3695.0 2189.0 2322.9 Embassy 329.3 331.1 632.8 634.7 -2578.3 -2571.2 -3268.8 -3205.4 Finance 440.6 442.5 615.9 623.3 20231.0 20238.1 16860.1 17036.3 Fire Station 293.9 295.8 600.6 602.4 -17404.5 -17397.5 -17771.8 -17722.5 Florist 332.2 334.1 524.5 530.0 -4705.5 -4698.5 -5303.1 -5197.4 Funeral Home 336.3 338.1 568.3 570.1 -10028.7 -10021.7 -10844.2 -10703.3 Furniture Store 389.5 391.4 441.7 447.2 10673.1 10680.1 7599.0 7697.7 Gas Station 333.2 335.0 543.6 545.5 -13926.6 -13919.5 -11923.4 -11867.0 Grocery or Su-permarket 467.7 469.6 592.6 596.3 9280.9 9288.0 6500.8 6677.1 Gym 321.2 323.1 463.7 469.2 -2721.9 -2714.9 -4013.8 -3837.6 Hardware Store 299.0 300.9 516.4 522.0 -8239.8 -8232.8 -9960.1 -9854.4 Home Goods Store 469.2 471.0 645.5 651.0 17169.9 17177.0 13170.9 13290.7 Hospital 310.3 312.1 428.4 433.9 11386.1 11393.2 5907.4 6027.2 Hotel and Lodging 413.6 415.5 734.4 736.3 12585.9 12592.9 10292.7 10483.0 Insurance Agency 496.8 498.7 692.3 697.9 14861.7 14868.7 12397.0 12538.0 Jewelry Store 353.3 355.1 444.2 447.9 14860.2 14867.3 13143.0 13269.9 Laundry 400.6 402.4 566.7 570.4 4144.1 4151.2 2146.9 2316.0 Lawyer 479.9 481.8 728.9 730.8 38846.6 38853.6 35662.3 35831.4 Library 343.4 345.3 429.1 432.8 -5993.1 -5986.1 -8949.8 -8808.9 Liquor Store 405.4 407.2 632.6 634.4 -1736.3 -1729.2 -2355.6 -2242.8 Local Govern-ment Office 331.1 332.9 481.5 487.0 12849.6 12856.6 7505.0 7638.9 Locksmith 309.2 311.0 542.0 543.9 -14495.9 -14488.8 -14640.9 -14591.5 Movie Theater 223.5 225.4 352.4 356.1 -16822.2 -16815.1 -17422.3 -17337.7 Moving Com-pany 443.6 445.5 628.4 632.1 300.1 307.2 -457.0 -372.4 Museum 318.3 320.1 385.0 392.4 -4793.4 -4786.3 -6985.0 -6872.2 Night Club 377.0 378.9 545.3 550.8 6321.7 6328.7 1774.4 1922.5 Park 504.8 506.7 683.4 685.3 11027.2 11034.2 10194.1 10363.3 Parking 372.0 373.9 596.1 599.8 5373.6 5380.6 1963.1 2153.4

(49)

Pet Store 285.8 287.6 452.0 455.7 -13001.7 -12994.6 -14005.4 -13885.6 Pharmacy 429.4 431.2 643.7 647.4 7366.7 7373.7 5035.9 5176.9 Physiotherapist 353.1 355.0 667.6 673.1 791.0 798.0 -544.4 -459.9 Police 252.8 254.6 349.2 352.9 -15255.4 -15248.4 -16701.6 -16602.9 Post Office 269.4 271.3 504.4 508.1 -12492.7 -12485.7 -12755.2 -12670.6 Real Estate Agency 515.7 517.6 668.7 672.4 20820.6 20827.7 19273.1 19449.4 Religious Cen-ters 565.0 566.9 729.1 732.8 24793.2 24800.3 21728.3 21883.4 Restaurant 602.3 604.2 745.4 752.8 32182.1 32189.1 26651.6 26912.4 School 488.4 490.3 741.3 745.0 15330.2 15337.2 13283.1 13445.2 Shoe Store 363.4 365.2 607.3 611.0 18001.5 18008.6 11416.1 11528.9 Spa 307.5 309.4 456.4 460.1 -8683.6 -8676.6 -9852.6 -9732.7 Stadium 225.0 226.8 553.0 554.9 -13695.6 -13688.5 -13931.8 -13875.4 Storage 394.6 396.4 582.3 586.0 -6999.5 -6992.4 -7697.7 -7606.1 Train Station 334.7 336.6 545.1 547.0 -10105.8 -10098.8 -10424.3 -10389.0 Travel Agency 398.7 400.6 545.0 548.7 3926.0 3933.1 2523.5 2671.5 University 403.4 405.3 557.5 559.3 24047.2 24054.3 21500.1 21627.0 Veterinary Care 336.8 338.6 619.2 624.8 -3348.5 -3341.5 -3679.7 -3538.8 Zoo 70.8 72.7 144.4 148.1 -43402.7 -43395.6 -42907.2 -42879.0

Table A.6: AIC and BIC values of the intercity and intra-city models we construct using metrics of size and composition of cities (in the case of the intercity) and micro-clusters (in the case of the intra-city).

(50)
(51)

Appendix B

Cities Administrative Units and

Populations

City Administrative District Population Total City

Population Atlanta Atlanta 447,841 447,841 Austin Austin 885,400 885,400 Baltimore Baltimore 622,104 Arbutus 20,483 Halethorpe N/A 642,587 Birmingham Birmingham 212,237 Vestavia Hills 34,018 Mountain Brook 20,359 Homewood 25,750 Bessemer 27,053 Fultondale 8,752 Gardendale 13,735 Tarrant 6,285 Center Point 16,864 Chalkville 3,829 Trussville 20,368 389,250 Boston Boston 645,966 Quincy 92,271 Milton 27,003

(52)

Dedham 24,729 Brookline 58,732 Somerville 75,754 Cambridge 105,162 Watertown 31,915 Chelsea 35,177 Belmont 24,729 1,121,438 Buffalo Buffalo 258,959 258,959 Charlotte Charlotte 792,862 Mint Hill 23,341 Matthews 27,198 Pineville 7,479 850,880 Chicago Chicago 2,718,782 Lincolnwood 12,590 Park Ridge 37,480 Rosemont 4,202 Schiller Park 11,793 Norridge 14,572 Hardwood Heights 8,612 Bensenville 18,352 Franklin Park 18,333 River Groove 10,227 Elmwood Park 24,883 Northlake 12,323 Stone Park 4,946 Melrose Park 25,411 River Forest 11,172 Oak Park 51,878 Maywood 24,090 Bellwood 19,071 Berkeley 5,209 Hillside 8,193 Forest Park 14,167 Broadview 7,932 Westchester 16,718 North Riverside 6,672 Berwyn 56,800 Cicero 84,103 La Grange Park 13,579 Riverside 8,875 Brookfield 18,978 Lyons 10,729 Stickney 6,786

(53)

Forest View 698 La Grange 15,550 Western Springs 12,975 Hinsdale 16,816 Mc Cook 228 Summit 11,054 Countryside 5,895

Indian Head Park 3,809

Hodgkins 1,897 Burr Ridge 10,559 Palos Park 4,847 Palos Heights 12,515 Crestwood 10,950 Willow Springs 5,524 Justice 12,926 Bedford Park 580 Bridgeview 16,446 Hickory Hills 14,049 Palos Hills 17,484 Chicago Ridge 14,305 Worth 10,789 Hometown 4,349 Oak Lawn 56,690 Evergreen Park 19,852 Alsip 19,277 Merrionette Park 1,900 Robbins 5,337 Blue Island 23,706 3,618,465 Cincinati Cincinnati 296,943 Delhi 29,510 Covedale 6,447 Mack 11,585 Bridgetown North 12,569 Dent 10,497 Cheviot 8,375 Monfort Heights 11,948 White Oak 19,167

North College Hill 9,397

Groesbeck 6,788 Finneytown 12,741 Amberley 3,585 Deer Park 5,736 Kenwood 6,981 Fairfax Mariemont 1,699 453,968

(54)

Cleveland Cleveland 396,815 Cleveland Heights 46,121 University Heights 13,539 Shaker Heights 28,448 Maple Heights 23,138 Garfield Heights 28,849 Parma 81,601 Brook Park 19,212 Brooklyn 11,169 Rooky River 20,213 Fairview Park 16,826 685,931 Columbus Columbus 787,033 Westerville 36,120 Huber Ridge 4,883 Worthington 13,575 Dublin 41,751 Hilliard 28,435 Upper Arlington 33,771 Marble Cliff 573 Grandview Heights 6,536 Lincoln Village 9,482 Urbancrest 960 Grove City 36,832 Obetz 4,628 Groveport 5,540 Blacklick Estates 9,518 Reynoldsburg 36,347 Bexley 13,057 Whitehall 18,062 Gahanna 33,248 New Albany 7,724 1,128,075 Dallas Dallas 1,197,816 Richardson 103,297 Garland 226,876 Farmers Branch 28,616 Carrollton 126,700 Irving 228,653 University Park 23,068 Highland Park 8,564 Grand Prairie 175,396 Duncanville 38,524 Hutchins 5,338 Seagoville 14,835 Balch Springs 23,728

(55)

Mesquite 139,824 Sunnyvale 5,130 Rowlett 56,199 Sachse 20,329 Addison 13,056 2,435,949 Denver Denver 649,495 Glendale 4,184 Englewood 30,255 Sheridian 5,664

Cherry Hills village 5,987

Greenwood Village 13,925 Littleton 41,737 Lakewood 142,980 Edgewater 5,170 Wheat Ridge 30,166 Arvada 111,707 Berkley 11,207 Twin Lakes 171 Westminster 106,114 Sherrelwood 18,287 Welby 14,846 Commerce City 45,913 Derby 7,685 Thornton 118,772 Federal Heights 11,973 Northglenn 35,789 Aurora 345,803 1,757,830 Detroit Detroit 681,090 Lincoln Park 38,144 Dearborn 95,884 Melvindale 10,525 Dearborn Heights 57,774 Highland Park 11,629 Hamtramck 22,423

Grosse Pointe Woods 15,838

Harper Woods 13,990

Grosse Pointe Farms 9,316

Grosse Pointe 5,326

Grosse Pointe Park 11,345

973,284

Houston Houston 2,195,914

Seabrook 11,952

Kemah 3,334

(56)

Friendswood 35,805 Pearland 108,715 Fresno 19,069 Sugar Land 83,860 Greatwood 6,640 Rosenberg 31,676 Richmond 11,081 Pecan Grove 15,881 Mission Bend 36,501 Cinco Ranch 18,274 Katy 14,102 Cypress 122,803 Jersey Village 7,620

Hunters Creek Village 4,367

Bellaire 16,855 Spring 54,298 Aldine 15,869 Tomball 10,753 Humble 15,133 Porter 25,627 Atascocita 65,844 Huffman 12,116 Crosby 2,299 Highlands 7,522 Channelview 38,289 Jacinto City 10,553 Galena Park 10,887 Deer Park 32,010 La Porte 33,800 Pasadena 149,043 South Houston 16,983 Sheldon 1,990 Barrett 3,199 Cloverleaf 22,942 Four Corners 2,954 Meadows Place 4,660 Missouri City 67,358 Fifth Street 2,059 Brookside Village 1,523 3,362,560 Indianapolis Indianapolis 843,393 Lawrence 46,001 Beech Grove 14,192 Warren 1,239 Franklin Township 54,594 Perry Township 108,972

(57)

Decatur 9,362 Speedway 11,930 Wayne 136,828 Camby 32,388 Pike Township 77,895 Washington township 132,049 1,468,843 Jacksonville Jacksonville 821,784 Lakeside 30,943 Orange Park 8,412 Oakleaf Plantation 20,315 Bellair-Meadowbrook Ter-race 13,343 Atlantic Beach 12,895 Neptune Beach 7,124 Jacksonville Beach 21,823

Ponte Vedra Beach 37,924

Sawgrass 4,942 Palm Valley 19,860 Baldwin 1,430 Nassau Village-Ratliff 5,337 Callahan 962 1,007,094

Las Vegas Las Vegas 583,736

North Las Vegas 216,961

Whitney 38,585 Winchester 27,978 Paradise 223,167 Henderson 257,729 Spring Valley 178,395 Summerlin South 24,085 Enterprise 108,481 Nellis AFB 2,187 Sunrise Manor 189,372 Blue Diamond 290 1,850,966

Los Angeles Los Angeles 3,884,307

Santa Monica 89,736

Marina del Rey 8,866

Beverly Hills 34,290 Culver City 38,883 Inglewood 109,673 Burbank 103,340 La Crescenta Montroes 19,653 La Canada Flintridge 20,246 Glendale 196,021

(58)

Pasadena 137,122

East Los Angeles 126,496

South Pasadena 25,619 San Marino 13,147 Vernon 112 Huntington Park 58,114 Bell 35,477 Bell Gardens 42,072 Florence-Graham 63,387 South Gate 94,396 Lynwood 69,772 Compton 96,455 Willowbrook 35,983 Long Beach 462,257 Carson 91,714 West Carson 21,699

View Park-Windsor Hills 11,075

Westmont 31,853 Lennox 22,753 Hawthorne 84,293 Gardena 58,829 El Segundo 16,654 Manhattan Beach 35,135 Redondo Beach 66,748 Torrance 147,478 Lomita 20,256 Rolling Hills 1,860

Palos Verdes Peninsula

Rancho Palos Verdes 41,643

Signal Hill 11,465 6,428,879 Louisville Louisville 609,893 New Albany 36,372 Clarksville 21,724 Jeffersonville 44,953 Oak Park 5,379 Buckner 4,000 Crestwood 1,999 Mt Washington 9,117 Hillview 8,172 Brooks 2,401 Shepherdsville 11,222 Shively 15,157 St Matthews 15,852 Lyndon 11,002 Northfield 970

(59)

Rolling Hills 907 Anchorage 2,264 Middletown 7,218 Hurstbourne 3,884 Memphis Memphis 653,450 West Memphis 26,245 Bartlett 55,055 Lakeland 12,430 Germantown 39,161 Collierville 46,462 832,803 Miami Miami 419,777 Coral Gables 49,631 Coral Terrace 24,380 West Miami 5,965 Miami Springs 13,809 Gladeview 14,468 Hialeah 224,669

West Little River 34,699

El Portal 2,325 Miami Shores 10,493 800,216 Milwaukee Milwaukee 599,164 Shorewood 13,162 Whitefish Bay 14,137 Glendale 12,872 Brown Deer 12,088 Bayside 4,411 Wauwatosa 47,068 West Allis 60,732 Greenfield 37,072 Greendale 14,325 Hales Corners 7,746 822,777 Naples Naples 19,537 Vineyards 3,375 Golden Gate 23,961 Lely 3,451 Naples Manor 5,562 Lely Resort 4,646 Pelican Bay 6,346 Naples Park 5,967 East Naples 22,951 95,796 Nashville Nashville 626,681 Ashland City 4,541

(60)

Millersville 7,471

Goodlettsville 16,813

Hendersonville 54,068

Mt Juliet 28,222

737,796

New Orleans New Orleans 378,715

Marrero 33,141 Harvey 20,348 Gretna 17,736 Terrytown 23,319 Timerlane 10,243 Arabi 8,093 Chalmette 17,119 Meraux 10,192 Violet 8,555 St Bernard 43,482 570,943

New York New York 8,405,837

8,405,837 Oklahoma Oklahoma 610,613 Mustang 17,395 Yukon 22,709 Bethany 19,563 Piedmont 5,720 Edmond 81,405

The village Nichols Hills 3,710

Moore 55,081 Del City 21,332 Midwest City 54,371 Spencer 3,746 Jones 2,517 Choctaw 15,205 Harrah 5,095 McLoud 4,044 922,506 Orlando Orlando 244,483 Clarcona 2,990 Pine Hills 60,076 Orlovista 6,123 Doctor Phillips 10,981 Williamsburg 7,646 Hunters Creek 14,321 Oak Ridge 22,685 Pine Castle 10,805 Conway 13,467 Belle Isle 5,988

(61)

Taft 2,205 Meadow Woods 25,558 Azalea Park 12,556 Winter Park 29,203 Goldenrod 12,039 Eatonville 2,159 Fairview Shores 10,239 493,524 Philadelphia Philadelphia 1,553,165 Westville 4,288 Gloucester City 11,402 Mt Ephraim 4,676 Bellmawr 11,540 Barrington 6,983 Haddonfield 11,507 Collingswood 13,850 Camden 76,903 Cherry Hill 71,722 Pennsauken Township 35,830

Maple Shade Township 19,043

Riverton 2,779 Cinnaminson 16,763 Cheltenham 4,810 Glenside 8,384 Abington 55,234 Wyncote 3,044 Jenkintown 4,422 Rockledge 2,550 Flourtown 4,538 Wyndmoor 5,498 Plymouth Meeting 6,177 Darby 10,687 1,945,795 Phoenix Phoenix 1,445,632 Tolleson 6,756 Glendale 226,721 Peoria 162,592 Sun City 37,499 Tempe 161,719 Guadalupe 6,072 2,046,991 Pittsburgh Pittsburgh 305,841 Homestead 3,165 Whitaker 1,271 Munhall 11,380 Brentwood 9,643

(62)

Whitehall 13,938 Castle Shannon 8,316 Mt Oliver 3,403 Dormont 8,593 Scott Township 17,024 Green Tree 4,431 Carnegie 7,972 Ingram 3,330 Crafton 5,951 Rosslyn Farms 427 McKees Rocks 6,104 Stowe Township 6,362 Avalon 4,705 Bellevue 8,370 Reserve Township 3,333 Millvale 3,744 Sharpsburg 3,446 Aspinwall 2,801 Wilkinsburg 15,930 Edgewood 3,118 Rankin 2,122 Braddock 2,159 466,879 Portland Portland 609,456 609,456 Providence Providence 177,994 North Providence 32,078 Cranston 80,387 290,459 Raleigh Raleigh 431,746 Cary 151,088 582,834 Richmond Richmond 214,114 Bon Air 16,366 Bensley 5,819

East Highland Park 14,796

Lakeside 11,849 262,944 Sacramento Sacramento 475,122 Rio Linda 15,106 North Highlands 42,694 Arden-Arcade 92,186 La Riviera 10,802 Rosemont 22,681 Parkway-South Sacra-mento 36,468

(63)

Florin 47,513

Vineyard 24,836

767,408

Salt Lake Salt Lake 186,440

South Salt Lake 24,366

210,806

San Antonio San Antonio 1,409,019

Somerset 1,550 Macdona 559 Helotes 7,341 Leon Valley 10,151 Terrell hills 4,878 Castle Hills 4,116 Kirby 8,673 Shavano Park 3,035 Windcrest 5,364 Converse 18,198 Live Oak 9,156

Universal City N/A

Adkins N/A

Cibolo 19,580

Northcliff 1,819

Garden Ridge 1,882

Fair Oaks ranch 5,986

1,511,307

San Diego San Diego 1,345,895

Chula Vista 243,916 National city 58,582 Bonita 12,538 La Presa 34,126 Coronado 24,697 Spring Valley 28,205 La Mesa 57,065

Rancho San Diego 21,208

El Cajon 99,478 Santee 53,413 Granite Hills 3,035 Winter Gardens 20,631 Lakeside 20,648 Poway 47,811 Fairbanks ranch 3,148 Rancho Santa Fe 3,117 Encinitas 59,518 Solana Beach 12,867 Del Mar 4,161 Escondido 143,911

(64)

2,297,970

San Francisco San Francisco 837,442

837,442

San Jose San Jose 1,000,536

Sunnyvale 140,081 Santa Clara 116,468 Fruitdale 935 Campbell 39,349 Saratoga 29,926 Los Gatos 29,413 Morgan Hill 37,882 East Foothills 8,269 Milpitas 70,092 1,472,951 Seattle Seattle 608,660 White Center 13,495 622,155 St Louis St Louis 319,294 Castle Point 3,962 Bellefontaine Neighbors 10,828 Jennings 14,712 Normandy 5,008 Northwoods 4,208 Pine Lawn 3,261 361,273 Tampa Tampa 347,645 Town ’N’ Country 78,442 Egypt Lake-Leto 35,282

Greater Carrollwood N/A

Lake Magdalene 28,509 Cheval 10,702 Greater Northdale 22,079 Lutz 19,344 Thonotosassa 13,014 Temple Terrace 24,541

Del Rio N/A

Mango 11,313

Seffner 7,579

Brandon 103,483

Palm River-Clair Mel 21,024

Progress Village 5,392

Gibstonton 14,234

742,583

Virginia Beach Virginia Beach 448,479

Greenbrier East N/A

(65)

Washington Washington 658,893 Bethesda 63,374 Silver Spring 76,716 Friendship Village 4,512 Takoma Park 16,715 Hyattsville 17,865 Coral Hills 9,895 Suitland-Silver Hill 33,515 Hillcrest Heights 16,469 Marlow Heights 5,618 Temple hills 7,852 Alexandria 148,892 Arlington 207,627 1,267,943

Table B.1: Administrative units that overlap with our amenities data and their respective population taken from Wikipedia.

(66)

Figure

Figure 3-1: Intercity Scaling Relations. a Scaling of the total number of amenities in a city
Figure 3-2: Clustering algorithm. a Map of Boston b The number of effective amenities (
Figure 3-3: Intercity Scaling Relations. a The distribution of the effective number of amenities in each US city
Figure 3-4: Micro-Cluster Composition a Light blue and light red bars, respectively, correspond to the
+7

Références

Documents relatifs

The theoretical model is based on the median voter hypothesis (Borcherding and.. Deacon, 1972; Bergstrom and Goodman, 1973) to explicit the provision of the local public

In Berlin after reunification, the double green belt established after the First World War has been allocated new functions; in London, the Greater London Authority has

The main hypothesis is that fractal organization of an urban pattern allows a good accessibility to various urban amenities, offered by the central city and the

En revanche, pour l’ensemble du système de production de viande, la part de l’éner- gie consommée par l’atelier naisseur par kg de produit est néanmoins comprise entre 73 et 83

In columns (5) and (6), we instrument for amenities and expected commuting time with instruments based on, respectively, land use in 1900 and historic city plans.. The results

From this result that distance to employment is a very weak predictor of the income of residents when compared to local amenities, one could hastily conclude that at the

A second scheme is associated with a decentered shock-capturing–type space discretization: the II scheme for the viscous linearized Euler–Poisson (LVEP) system (see section 3.3)..

We define a partition of the set of integers k in the range [1, m−1] prime to m into two or three subsets, where one subset consists of those integers k which are &lt; m/2,