Beyond City Size: Characterizing and predicting
the location of urban amenities
by
Elisa Castaner Ensenat
B.S., M.I.T (2014)
Submitted to the Department of Electrical Engineering and
Computer Science
in partial fulfillment of the requirements for the degree of
Masters of Engineering in Electrical Engineering and Computer
Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
c
○ Massachusetts Institute of Technology 2015. All rights reserved.
Author . . . .
Department of Electrical Engineering and Computer Science
May 8, 2015
Certified by . . . .
Cesar A. Hidalgo
Associate Professor
Thesis Supervisor
Accepted by . . . .
Prof. Albert R. Meyer
Chairman, Maters of Engineering Thesis Committee
Beyond City Size: Characterizing and predicting the
location of urban amenities
by
Elisa Castaner Ensenat
Submitted to the Department of Electrical Engineering and Computer Science on May 8, 2015, in partial fulfillment of the
requirements for the degree of
Masters of Engineering in Electrical Engineering and Computer Science
Abstract
Intercity studies have shown that a city’s characteristics —ranging from infras-tructure to crime—scale as a power of its population. These studies, however, have not been extended to the intra-city scale, leaving open the question of how urban characteristics are distributed within a city. Here we study the spa-tial organization of one important urban characteristic: its amenities, such as restaurants, cafes, and libraries. We use a dataset summarizing the position of more than 1.2 million amenities disaggregated into 74 distinct categories and covering 47 U.S. cities to show that: (i) the spatial distribution of amenities within a city is characterized by dense agglomerations of amenities (which we call micro-clusters), (ii) that unlike in the intercity case, size is a poor predictor of the amenities of each type that locate in each micro-cluster, and (iii) that the number of amenities of each type in a micro-cluster is better predicted using in-formation on the collocation of amenities observed across all micro-clusters than using the micro-cluster’s size. Finally, we use these findings to create a recom-mendation algorithm that suggests amenities that are missing in a micro-cluster and can inform the efforts of developers and planners looking to construct and regulate the development of new and existing neighborhoods.
Thesis Supervisor: Cesar A. Hidalgo Title: Associate Professor
Acknowledgments
I would like to thank my supervisor, Cesar Hidalgo, for the patient guidance, en-couragement, and advice he has provided me throughout my time as his student. I have been extremely lucky to have a supervisor who cared so much about my work, and who responded to my questions and queries promtly. I would also like to thank the rest of the Macro Connections group at the MIT Media Lab for the support and feedback they have given me throughout the year. The completion of this project wouldn’t have been possible without their help.
Contents
1 Introduction 13
1.1 Multi-Centers . . . 15
2 Data 17 3 Results 19 3.1 From the intercity to the intra-city scale . . . 19
3.2 Micro-clusters . . . 22 3.3 Intra-city scaling . . . 27 3.4 Recommender System . . . 27 4 Discussion 33 A Supplementary Material 35 A.1 Data . . . 35
A.2 Intercity Scaling . . . 38
A.3 Clustering . . . 40
A.3.1 Effective number of amenities . . . 40
A.3.2 Identifying cluster centers . . . 41
A.3.3 Assigning points to clusters . . . 41
A.5 Predictions . . . 42
List of Figures
3-1 Intercity Scaling Relations . . . 21
3-2 Clustering algorithm . . . 24
3-3 Intercity Scaling Relations . . . 26
3-4 City micro-agglomerations . . . 29
3-5 Prediction of amenities in Boston’s micro-clusters. . . 32
A-1 Clustering Algorithm: Boston, SF, NY . . . 43
List of Tables
A.1 Merged amenity types . . . 36
A.2 Total amenity count and amenity categories . . . 37
A.3 Cities population and amenity count . . . 38
A.4 Intercity Scaling parameters per amenity type . . . 40
A.5 𝑅2s of intercity and intra-city models . . . . 47
A.6 AIC and BIC values of intercity and intra-city models . . . 49
Chapter 1
Introduction
During the last decade the empirical study of cities has been characterized by a strong emphasis on scaling relationships connecting the size of a city —measured by its population—with attributes ranging from the availability of infrastructure to the presence of crime [1, 2]. This growing literature has shown that these scaling relationships hold across cities from different cultures and time periods [2, 3]. Yet, these intercity relationships teach us little about the way in which these attributes are spatially distributed within a city. In fact, one could easily construct a model where attributes follow a random spatial distribution within a city and that also satisfies the intercity scaling relationships documented in the literature. In this paper we add to this literature by bringing the quantitative study of cities to the intra-city scale and by showing the statistical principles that explain the frequency, composition, and location of amenities within a city. But why is the intra-city scale important? One the one hand, understanding the distribution of amenities is important for the planners and developers who shape cities. Planners need to create urban designs that stimulate the virtuous social interactions that encourage economic activity, reduce levels of crime, and lower traffic congestion [4, 5, 6, 7, 8]. Developers, who construct buildings
look-ing for profits, need to create buildlook-ings and units that are attractive to residents and shop owners, and hence, need to understand which types of buildings and units are better pre-adapted to the uses that a neighborhood might require. On the other hand, a city’s citizens and visitors can benefit from maps representing the city at a meso-scale. These meso-scale maps can focus on clusters of ameni-ties instead of individual units, helping uncover the presence of neighborhoods with an active urban life. Finally, small business owners may also benefit from a statistical understanding of cities at the intra-city scale, as the empirical laws describing the location of amenities in neighborhoods can be used to uncover instances of unsatisfied demand that shop owners can use to identify new busi-ness locations (this is information that is now only available to large franchising operations —such as Starbucks [9]). So a better understanding of cities at the intra-city scale can benefit both, the planners and developers that shape a city, and the citizens and visitors who utilize a city’s streets.
Here, we move beyond the intercity scale and studies focused on the size of cities by looking at data summarizing the precise location of amenities (such as restaurants, cafes, and libraries) within a city. Our contribution consists on two parts. First, we introduce a clustering algorithm to show that the spatial organization of cities is based on hundreds of highly localized micro-clusters of urban activity, and that the size of these micro-clusters, is a poor predictor of the number of amenities of a certain type that are present in it. This suggests that, to recover the predictability of the intercity studies —which we reproduce with our data—we need to use information on the types of amenities that are present in each micro-cluster. Our second contribution involves the development of a simple prediction algorithm that exploits information on the patterns of collocation of amenities observed across thousands of micro-clusters. We use this algorithm to identify anomalies in the data —which can represent instances
of unsatisfied demand—and use these anomalies to suggest both, new amenities for each cluster and the clusters that are in the direst need of a specific amenity. Together these results help extend the study of cities to the intra-city scale, and also, open new avenues of research that focus on the composition of amenities at the neighborhood scale.
1.1
Multi-Centers
The idea that highly localized clusters of economic activity characterize the dis-tribution of amenities in a city has a long academic tradition. On the one hand scholars have conducted empirical studies looking to identify and characterize micro-clusters (or neighborhood scale agglomerations), and on the other hand, we have models that have been used to explain why economic and social activity agglomerates.
On the empirical side people have used employment densities [10], commuting patterns [11], the floor space used by businesses [10], mobile phone and social media activity [12, 13, 14], and the spatial collocation of commercial units [15] to identify micro-clusters of urban activity. On the theoretical side, people have de-veloped supply side and demand side theories to explain agglomeration. Supply side theories of agglomeration focus on externalities, such as knowledge spillovers [16], shared capacities [17], and transportation costs [18], to explain the coloca-tion of businesses and/or manufacturing activities. Demand side theories focus on the ability of agglomerations to attract shared customers. The quintessential demand side model of agglomeration is Hotelling’s 1929 model, which predicts that similar businesses would collocate to maximize their catchment area. These demand side stories, of course, also apply to businesses that are not necessarily similar, but complementary, such as shoe-stores and clothing stores, explaining
also, why businesses that are not closely related —such as car repair shops and ice-cream stores—tend not to collocate. The rise of novel high-resolution data sources summarizing the location of urban amenities, however, allows us to ex-plore the collocation of amenities empirically, helping us both, validate these theories, but also, provide new empirical facts that we could use to test new theories and models.
Chapter 2
Data
We collect data from the Google Places API containing the latitude, longitude, and type of amenity (i.e. cafe, restaurant, library, etc.), for more than 1.26 million amenities across 47 US cities (see SM for details). Additionally, we collect data on the population of each of these cities by identifying all of the administrative units contained within the area of our amenities data (see SM). For instance, in the case of Boston, our amenities data includes the areas of Cambridge, Somerville, and Brookline, so we estimate the population of the larger city of Boston by summing the populations of these and other administrative units (see SM).
Going forward we use the word city to refer to the naturally occurring urban agglomerations that people refer to colloquially as ’cities’, and not to the nar-rowly defined administrative units that exist within them (i.e. we use Boston to refer to the union of the administrative units of Boston, Cambridge, Brook-line, Somerville, Newton, etc., and not to the administrative area controlled by Boston’s City Hall). We adopt this use of the word city because our data involves contiguous areas that transcend individual administrative units.
Certainly the data from the Google Places API is not free of biases and lim-itations. The amenities data registered in the Google Places API focuses on
customer facing businesses and places of interests (from hair salons and bak-eries to airports and cemetbak-eries). Therefore, the Google Places API data fails to include information on other forms of economic activity, such as manufactur-ing or business-to-business activities. Also, the data might have codmanufactur-ing issues, such as having a restaurant registered as a bar. Moreover, businesses that shut down, either because they went broke or relocated, might not be updated from Google Maps, and therefore, the data can contain outdated information. Yet, despite these limitations, the Google Places API is accurate enough to be the backbone of the world’s most used mapping service (Google Maps) and is used daily by millions of individuals to find the location of businesses. This makes the Google Places API data an imperfect, yet attractive dataset to study the spatial organization of amenities at the intra-city scale.
Finally, we remind the reader that any results derived in this paper should be interpreted in the narrow context of the data from which these results were derived. This is data from an online mapping service and for U.S. cities only. The question of whether the results presented below can be generalized to other locations, and also, of whether these results hold for other datasets, is beyond the scope of this paper.
Chapter 3
Results
3.1
From the intercity to the intra-city scale
We begin by reproducing the well-known intercity scaling laws of Bettencourt et al. [19, 20, 21] using our urban amenities data. By reproducing these laws we validate our data in the context of intercity research before presenting our intra-city contributions.
Figure 3-1 shows the total number of amenities 𝑌𝑐 in a city as a function of
its population. The total number of amenities in a city (𝑌𝑐) scales sub-linearly
with a city’s population 𝑁𝑐 as: 𝑌𝑐 = 𝑌0𝑁𝑐𝛽, with 𝑌0 = 2.03 and 𝛽 = 0.68 (Fig.
3-1a, 𝑅2 = 90% p-value ≪ 1x10−5) matching Bettencourt et al. scaling laws
[1, 19, 20, 21]. The sub-linearity of this scaling law indicates the presence of scale economies, since it means that the number of per capita amenities in a city decreases with a city’s total population.
Next, we explore the exponent of this scaling relationship for amenities of different types. We find that some amenities, such as museums, religious centers, and art galleries, scale slowly with a city’s population (roughly as the square root of a city’s total population (𝛽 ≈ 0.5)). Other amenities, such as restaurants,
bakeries, and dentists, scale almost linearly with a city’s population (𝛽 > 0.8) (For a summary of all exponents, see SM Table A.4). This diversity of scaling relationships tells us that the composition of amenities in a city changes with a city’s population. For instance, in a city with a population of only half a million people we expect to find, on average, 46 restaurants per museum, but in a city with ten times that population we expect to find almost double that (76 restaurants per museum). These different ratios are direct expressions of the difference in scaling exponents characterizing the dependence of restaurants and museums in a city’s size (Figure 3-1b).
But not all amenities correlate strongly with a city’s population. In fact, the relationship between the number of amenities and a city’s population is noisy for many amenities. To distinguish the amenities that correlate strongly with a city’s population from those that don’t we use the 𝑅2 statistic of the scaling relationship connecting a city’s population with the number of amenities of each type (Figure 3-1c). A high 𝑅2 (𝑅2 > 0.5), such as that characterizing the scaling of restaurants, schools, and shoe stores, means that a city’s population is a strong predictor of the number of amenities of that type in a city. A low 𝑅2 (𝑅2 < 0.5), such as that characterizing the scaling of museums, embassies,
and universities, means that a city’s population is an incomplete predictor of the number of amenities of that type in a city. Note that the observed 𝑅2s, but
not the scaling exponents, will be almost the same if we were to use the total number of amenities instead of population as a measure of city size, since a city’s population correlates almost perfectly with that city’s total number of amenities (Fig. 3-1a 𝑅2 = 90%).
Figure 3-1: Intercity Scaling Relations. a Scaling of the total number of amenities in a city 𝑐 (𝑌𝑐), as a function of a city’s population (𝑁𝑐). The total
number of amenities in a city scales as 𝑌𝑐 = 𝑌0𝑁𝑐𝛽 with 𝑌0 = 2.03, 𝛽 = 0.68,
and 𝑅2 = 0.90. Each point represents one of the 47 US cities in our dataset. b
Scaling of the total number of restaurants and museums in a city as a function of a city’s population. Each point represents the number of restaurants (yellow) or museums (blue) in a different city. The figure shows that the scaling exponent of restaurants (𝛽 = 0.81) is larger than the scaling exponent of museums (𝛽 = 0.59) meaning that the number of restaurants per museum increases with a city’s population. c The scaling exponent (horizontal axis) and the goodness of fit (𝑅2,
vertical axis) of the scaling relationship of each amenity type. The horizontal dashed line separates amenities whose number correlates strongly with a city’s population (𝑅2 > 0.5) from those characterized by a milder correlation (𝑅2 <
0.5). The vertical dashed line separates amenities that scale with population faster than the total amount of amenities in a city (𝛽 > 0.68) from those that scale slower than that.
3.2
Micro-clusters
But do these scaling relationships hold at the intra-city scale? To explore the intra-city scale we first need to divide the city into meaningful intra city units. To perform this division we introduce a clustering algorithm that splits cities into micro-clusters, which are spatially localized and bounded agglomerations of amenities. Then, we study the city at the intra-city scale by using micro-clusters as our unit of study. As a measure of the size of a micro-cluster we use the total number of amenities present in it. Switching from cities to micro-clusters as our unit of study will reveal that the size of a micro-cluster, unlike that of a city, is a poor predictor of the number of amenities of each type present in it. Yet, as we will show, we can recover some of the predictability lost when moving to the intra-city scale by using data on the types of amenities that are present in each micro-cluster (and controlling for over-fitting by using both Akaike’s and Bayes’ Information Criteria).
We begin the spatial clustering of urban amenities by calculating the effective number of amenities that are present in each location 𝑖. We define the effective number of amenities in location 𝑖 (𝐼𝑖), as the number of amenities that can
be reached by walking from that location. Formally, the effective number of amenities in location 𝑖 is the scalar function 𝐼𝑖:
𝐼𝑖 = 𝑌𝑐
∑︁
𝑗=1
𝑒−𝛾𝑑𝑖𝑗
where 𝑑𝑖𝑗 is the distance between amenity 𝑖 and amenity 𝑗, 𝛾 is a decay
parameter that discounts amenities based on their distance to location 𝑖, and 𝑌𝑐 is the total number of amenities in city 𝑐. To interpret the values of 𝐼 it is
useful to note that an amenity at the location where the measurement is taking place (i.e. with 𝑑𝑖𝑖 = 0) contributes one to the effective number of amenities in
that location. An amenity 𝑗 at distance 𝑑𝑖𝑗 = 1/𝑒 —which would imply walking
1/𝑒 kilometers from amenity 𝑖 will contribute only 1/𝑒 to location’s 𝑖 effective number of amenities (𝐼𝑖). We find that our algorithm finds meaningful clusters
when we set 𝛾 = 16, which implies that the contribution of an amenity to the effective number of amenities of a location roughly halves every 62.5 meters and becomes negligible at about 500 meters (the short side of the Manhattan block is 80 meters long).
Figure 3-2 illustrates our clustering algorithm using the city of Boston as an example. The bottom layer (Fig. 3-2a) is a map of Boston used for spatial refer-ence. The center layer (Fig. 3-2b) shows Boston’s effective number of amenities (𝐼) for all the locations where an amenity is present. The top layer (Fig. 3-2c) shows the clusters identified using our algorithm (with different colors).
To identify the amenities belonging to each cluster we begin by identifying each local peak on the effective number of amenities landscape defined by 𝐼 (Fig. 3-2b) as the center of a potential micro-cluster. We identify these local peaks by searching for locations that have an effective number of amenities 𝐼 larger than their 𝑛 nearest neighbors (using a functional heuristic to find the 𝑛 that works best for each 𝐼—see SM). Then, we assign amenities to a micro-cluster by using the following greedy algorithm: (i) We initialize clusters by assigning to each cluster center all amenities that are in close proximity to it (less than 0.5 kms). (ii) We calculate the distance between each unassigned amenity and the amenities that have been assigned to a cluster. (iii) We assign to a cluster only the amenity that is closest to an amenity that has already been assigned to a cluster. And (iv), we recalculate the distance between assigned and unassigned amenities and repeat step (iii) and (iv) until all amenities have been assigned to a cluster. An example of the clusters found for the city of Boston is shown in Figure 3-2c (see SM for more examples).
Figure 3-2: Clustering algorithm. a Map of Boston b The number of effective amenities (𝐼) at each location where an amenity is present in Boston. Peaks represent locations with a high number of effective amenities and valleys represent locations with a low number of effective amenities. The black dots represent the local maxima identified by our clustering algorithm. These points represent the centers of a micro-cluster (for example, Kendall/MIT or the North End). c Clusters identified using our clustering algorithm. Each cluster is expressed as a set of dots of the same color, each dot representing an amenity. The center of each cluster is marked using a black dot.
Overall, we find that the clusters identified using this algorithm correspond to well-known centers of urban activity. In the case of Boston these clusters include Harvard Square and Central Square in Cambridge and The North End and Coolidge Corner in Boston, among others.
We also note that the distribution of the effective number of amenities in a city is also characterized by some universal properties. Figure 3-3a shows the distribution of the effective number of amenities (𝐼) for every city in our dataset while Figure 3-3b shows the same distribution after normalizing the effective number of amenities in a city by that city’s average (< 𝐼 >=
∑︀
𝑖𝐼𝑖
𝑌𝑐 ). For
comparison, we also show the same distributions for an ensemble of cities where the location of each amenity has been randomized. These randomized cities are characterized by a narrow distribution for their effective number of amenities, meaning that these random cities lack the high concentrations of amenities that indicate the presence of micro-clusters in real cities. More importantly, figure 3-3b shows that once we normalize the effective number of amenities in a city by that city’s average all cities follow the same lognormal distribution
𝑃 ( 𝐼𝑖
< 𝐼 > = 𝑥) = 𝑙𝑛𝑁 (𝜇, 𝜎)
with 𝜇 = −0.404 and 𝜎 = 0.89. The existence of a universal distribution for the effective number of amenities across all cities in our sample means that all of these cities have an equal number of peaks and valleys of a given magnitude when the magnitude of these peaks and valleys is measured in units of that city’s average.
Figure 3-3: Intercity Scaling Relations. a The distribution of the effective number of amenities in each US city. Blue lines show the distribution observed in our urban amenities data and orange lines show the distribution observed after randomizing the location of amenities for each city. b The distribution of the effective number of amenities in each US city normalized by the average effective number of amenities in that city. Blue lines show the distribution observed in the cities data and orange lines show the distribution observed in the same cities but after randomizing the location of amenities
3.3
Intra-city scaling
Now that we have identified micro-clusters for all cities in our data we analyze whether the scaling relationships that hold at the intercity scale also hold at the scale of micro-clusters (i.e. we test whether the number of amenities of each type in a cluster scales with the size of that cluster). Figure 3-4a compares the scaling relationships observed at the intercity scale with the scaling relationships observed at the intra-city scale for a subset of amenities and two different models (for all amenities see SM table A.5). In light colors (light blue and vermillion) we show the accuracy of models predicting the number of amenities of a given type in a city or a micro-cluster using only information on that city or cluster’s size. The dark bars (navy and crimson) show the accuracy of a model using information on the composition of amenities in a city or micro-cluster (which we will explain later). The comparison between the size based models show that amenities, such as schools, doctors, and shoe stores, which correlate strongly with the total number of amenities in a city (average inter-city scaling 𝑅2 > 70%), do not scale
well with the total number of amenities in a micro-cluster (average inter-city 𝑅2 < 18%). This indicates that the scaling laws observed in the intercity scale
fail to hold—for most amenities—at the intra-city scale.
Next, we try to recover some of the predictability lost at the intra-city scale by introducing a model based on the composition of a micro-cluster—the types of amenities present in it.
3.4
Recommender System
We begin the construction of the composition-based model by studying the collo-cation of pairs of amenities across all clusters. Figure 3-3b shows the network of correlations between pairs of amenities calculated using spearman’s rank
correla-tion across all clusters. We build the skeleton of this network using a Maximum Spanning Tree algorithm and then add edges between amenities that have a pair-wise correlation equal or larger than 0.3 (see SM for the full correlations matrix) [22]. The network shows that amenities tend to collocate with other amenities of similar types. For example, car repair shops collocate with car dealers (Spear-man’s 𝜌 = 0.45), religious centers collocate with schools (Spear(Spear-man’s 𝜌 = 0.46), and nightclubs collocate with bars (Spearman’s 𝜌 = 0.36). Also, the network shows that amenities sometimes tend to collocate with amenities from different categories. For instance, clothing stores collocate with restaurants and beauty salons (respective Spearman’s 𝜌 = 0.52 𝜌 = 0.45). What is more important, however, is that these patterns of collocation suggests that it is possible to create a parsimonious model to predict the number of amenities of a type in a cluster using information on the presence of other amenities in it, since the network indicates that the presence of a set of amenities in a cluster carries information about the presence of other amenities.
Finally, we use the collocation of amenities in a cluster to create an algo-rithm that we can use to predict the number of amenities that should locate in each micro-cluster and create a recommender system that we can use to identify micro-clusters where particular amenities are over or under-supplied. To create this algorithm we need to go beyond pairwise correlations, as the high clustering of the network of collocations (Fig. 3-4) indicates that the information about the presence of an amenity in a cluster carried by the presence of other ameni-ties is likely to have some redundancy. Going forward, we go beyond pairwise correlations by using a forward selection algorithm that iteratively adds types of amenities to a regression until the contribution of the presence of a new amenity type to the predictive power of the regression is characterized by a p-value of more than 0.001 (see SM). In addition, we validate the models resulting from
Figure 3-4: Micro-Cluster Composition a Light blue and light red bars, respectively, correspond to the 𝑅2 of the predictions obtained using the size of a city (left) and the size of each micro-cluster (right). The dark blue and dark red bars correspond, respectively, to the 𝑅2 of the predictions obtained using the
composition of cities (left) and the composition of micro-clusters (right). (For all amenities see SM). b The nodes in the network represent different types of amenities and the edges connect amenities that are likely to collocate in a micro-cluster (see SM). The width of the edges connecting a pair of nodes is proportional to the spearman correlation obtained from the collocation of the two types of amenities across all micro-clusters. The size of a node is proportional to the number of times that an amenity is present in our data set. The color of each node represents the category that the amenity belongs to.
this forward selection algorithm by using both Akaike’s Information Criterion (AIC) and Bayes’s Information Criterion (BIC). By using AIC and BIC we en-sure that the models that we obtain are not better than the models using size simply because they include more variables.
The red bars of Figure 3-4a (vermillion and crimson) compare the 𝑅2 of the models constructed using the size of micro-clusters with the 𝑅2 of the models
constructed using the composition of micro-clusters. In most cases (66/74 = 89%), the BIC test chooses the regression using the composition of a micro-cluster over the regression using its size (the exception are airports, aquariums, bus stations, car rentals, casinos, convenience stores, gas stations, and zoos). Also, we note that these results are not just statistically significant, but characterized by strong size effects. On average, for the 66 amenity types in which the composition model works better, the 𝑅2 of the composition model is twice that of the model using size only (𝑅2 = 17% on average using size vs. 𝑅2 = 35% on average
using composition), meaning that the increase in predictive power obtained by considering the composition of amenities in a cluster is not only statistically significant, but also substantial.
Finally, we use the composition model described above to create a recom-mender system [22, 24] to suggest amenities that might be missing in an urban cluster. We predict missing amenities by calculating the difference between the number of amenities in a cluster predicted by the composition model and the number of amenities of that type observed in each cluster.
Figure 3-5 compares the number of car parks, hotels, and beauty salons, ob-served and predicted, for each micro-cluster in Boston. Points above the lines, such as Harvard Square in car parks (Figure 3-5a), the North End in hotels (Fig-ure 3-5b), and Central Square in Beauty Salons (Fig(Fig-ure 3-5c), suggest instances of unsatisfied demand. Points below the lines such as Boston’s Theatre District
in car parks, Coolidge Corner in hotels, and Winthrop in beauty salons, suggest instances of excess demand. Of course, these suggestions should not be taken literally. For instance, a decision to build new parking in Harvard square is a decision that requires considering many aspects of Harvard Square that are not included in our model, such as the aesthetics of its architecture [25, 26] or the externalities caused by cars. Nevertheless, this validation shows that our model automatically captures the under-supply of parking that characterizes Harvard square (and that is well known to Cambridge residents). Figure 5b, on the other hand, shows that our model suggests a lack of hotels in the North End, a well-known tourist spot where only a handful of hotels are present. This could mean that there is a great potential for new hotels to locate in Boston’s North End, but once again, this is a decision that would need to incorporate other factors, such as North End’s famous idiosyncratic architecture and active resident community [4].
Figure 3-5: Prediction of amenities in Boston’s micro-clusters. a Ob-served vs. predicted number of car parks, b hotels, and c, beauty salons for each micro-cluster in Boston. Points above the lines represent micro-clusters where the predicted number of amenities is higher than the observed, suggesting instances of unsatisfied demand (or missing data). Points below the lines rep-resent micro-clusters where the predicted number of amenities is lower than the observed, suggesting instances of excess demand.
Chapter 4
Discussion
During recent years the quantitative study of cities has focused extensively on inter-city studies, and in particular, on inter-city scaling laws. These intercity studies, however, do not tell us much about the spatial distribution of a city’s characteristics. In this paper we extended this literature to the intra-city scale by focusing on micro-clusters of urban amenities and by showing that the scaling laws that hold at the inter-city scale need to be replaced by multivariate sta-tistical models that exploit information on the composition of micro-clusters to predict the number of amenities of each type that is present in each micro-cluster. Of course, our results and models are not free of biases and limitations. Be-yond the data biases described above, our model is limited by its simplicity, which bounds the total amount of variance in the presence of amenities that we can explain. Our statistical model predicts the number of amenities that locate in a micro-cluster using regressions without interaction terms. This means that the models could be potentially improved by using more complex functional forms, but also, by adding to them information that is not expressed in the presence of amenities, such as the aesthetic appeal of a neighborhood’s architecture [25, 26], it’s foot traffic as captured by mobile phone data [27], or the centrality of the
urban micro-cluster in the context of the city.
Still, the results and methods presented here point to interesting new avenues of research. For example, time resolved data sources for both amenities and streetscapes could be used to explore the interaction between the dynamics of the amenities that locate in a micro-cluster and the types of buildings being constructed in it. Also, these results could be used to help inform what types of business permits need to be given out to help balance the micro-clusters of a city’s neighborhoods. On the computational side, the information uncovered here could be used to create new meso-scale city maps that can help users understand a city’s micro-clusters, but also, deliver the recommendations for each micro-cluster uncovered by our algorithm or similar algorithms. Together, our results, and the new avenue of research they open, should help stimulate further quantitative study of the multivariate statistical laws that characterize cities at the intra-city scale.
Appendix A
Supplementary Material
A.1
Data
Amenities Data: We collected data from the Google Places API containing the latitude, longitude and type (cafe, restaurant, library, etc.) of the urban amenities located in 47 US cities. The original data set contains 95 different types of amenities but we merged them into 74 categories by aggregating data on amenities that fulfill similar functions (Table A.1) and excluding amenities that are unspecific (such as the "store" category) or for which little data is available. The amenities we exclude are: taxi stand, campground, store, subway station, RV park, movie rental, and shopping mall. The resulting amenities are shown in Table A.2.
Population Data: We collect data on the population of each city from Wikipedia. Table B.1 in shows all the administrative units in each city overlap-ping with our amenities data, and their population as indicated in Wikipedia. To obtain each city’s population we aggregate the population of each of the ad-ministrative units that overlap with our amenities data for that city. The final population of cities and their total number of amenities are shown in Table A.3.
Original Amenities New Amenities Hindu temple Mosque Place of worship Synagogue Church Religious center Meal delivery Meal takeaway Food Restaurant Restaurant Health Doctor Doctor Finance Bank Finance Roofing Contractor Electrician Plumber Painter General Contractor Construction contractor
Table A.1: The left column shows the amenities that were merged into a new amenity type, shown in the right column.
Amenity Points Category Amenity Points Category
Accounting 17280 Services Gym 5934 Health
Airport 1535 Transportation Hardware
store
4595 Shopping
Amusement park
1017 Entertainment Home goods
store
29537 Shopping
Aquarium 492 Entertainment Hospital 7942 Health
Art gallery 5358 Entertainment Hotel and
lodging
11452 Services
ATM 30753 Services Insurance
agency
27866 Services
Bakery 9255 Food & Drinks Jewelry store 6751 Shopping
Bar 21506 Food & Drinks Laundry 14391 Services
Beauty salon 41851 Services Lawyer 37611 Services
Bicycle store 1409 Shopping Library 3466 Education
Bowling alley
366 Entertainment Local
Gov-ernment Office
10081 Government
Bus station 110642 Transportation Locksmith 2182 Services
Cafe 9485 Food & Drinks Movie
The-ater
1232 Entertainment
Car dealer 11603 Services Moving
Company
12744 Services
Car rental 2968 Services Museum 2161 Entertainment
Car repair 40215 Services Night Club 5675 Food & Drinks
Car wash 3202 Services Park 25723 Other
Casino 172 Entertainment Parking 5527 Transportation
Cemetery 2386 Other Pet Store 2270 Shopping
City hall 140 Government Pharmacy 15204 Shopping
Clothing store
29806 Shopping Physiotherapist 7929 Health
Construction contractor
86044 Services Police 1613 Government
Convenience store
13818 Shopping Post Office 2723 Services
Courthouse 717 Government Real Estate
Agency
39484 Services
Dentist 26071 Health Religious
Centers
58468 Other
Department store
3515 Shopping Restaurant 112430 Food & Drinks
Doctor 153772 Health School 46516 Education
Electronics store
11876 Shopping Shoe Store 8612 Shopping
Embassy 688 Government Spa 2843 Health
Finance 32221 Services Stadium 1245 Entertainment
Fire station 2050 Government Storage 5849 Services
Florist 5102 Shopping Train
Sta-tion 1262 Transportation Funeral home 2761 Services Travel Agency 7394 Services Furniture store
12379 Shopping University 6597 Education
Gas station 2552 Services Veterinary
Care
5373 Services
Grocery or
supermarket
15206 Shopping Zoo 114 Entertainment
Total 1,262,374
Table A.2: Total number amenities of each type in the Google Places data set in the 47 US cities in our study. The Categories column shows the category we assign each amenity type to when we study the collocation of amenities.
City Population Number of Ameni-ties
City Population Number
of Ameni-ties
Atlanta 447,841 19,050 Nashville 737,796 21,619
Austin 885,400 22,592 New Orleans 570,943 14,607
Baltimore 642,587 14,434 New York 8,405,837 75,081
Birmingham 389,250 15,066 Oklahoma 922,506 21,010 Boston 1,121,438 19,769 Orlando 493,524 20,559 Buffalo 258,959 7,409 Philadelphia 1,945,795 40,410 Charlotte 850,880 19,954 Phoenix 2,046,991 39,354 Chicago 3,618,465 64,531 Pittsburgh 466,879 15,714 Cincinnati 453,968 13,818 Portland 609,456 21,043 Cleveland 685,931 18,496 Providence 290,459 6,653 Columbus 1,128,075 27,854 Raleigh 582,834 15,884 Dallas 2,435,949 44,358 Richmond 262,944 9,437 Denver 1,757,830 32,731 Sacramento 767,408 20,372
Detroit 973,284 21,776 Salt Lake 210,806 9,444
Houston 3,362,560 80,011 San Antonio 1,511,307 35,255
Indianapolis 1,468,843 212,96 San Diego 2,297,970 46,614
Jacksonville 1,007,094 204,66 San Francisco 837,442 18,984
Las Vegas 1,850,966 29,009 San Jose 1,472,951 30,868
Los Angeles 6,428,879 114,002 Seattle 622,155 20,514
Louisville 840,601 22,425 St Louis 361,273 12,125
Memphis 832,803 21,350 Tampa 742,583 25,285
Miami 800,216 13,403 Virginia Beach 448,479 10,619
Milwaukee 822,777 20,590 Washington 1,267,943 20,310
Naples 95,796 5,970 Total 60,940,877 1,236,151
Table A.3: Population and total number of amenities of each city.
A.2
Intercity Scaling
We explore the scaling exponent 𝛽 of the scaling relationship (𝑌𝑐𝑘 = 𝑌0𝑁𝑐𝛽) for
each type of amenity 𝑘, 𝑌𝑘
𝑐 , in a city 𝑐 with population of that city, 𝑁𝑐, finding
that scaling exponents vary greatly for each amenity type. Table A.4 shows 𝑌0,
Amenity 𝑌0 𝛽 𝑅2 Amenity 𝑌0 𝛽 𝑅2
Accounting 0.014 0.727 0.751 Gym 0.002 0.797 0.869
Airport 0.003 0.655 0.362 Hardware store 0.009 0.658 0.783
Amusement park 0.000 0.769 0.309 Home goods store 0.028 0.710 0.709 Aquarium 0.000 0.765 0.621 Hospital 0.006 0.725 0.837
Art gallery 0.010 0.653 0.742 Hotel and
lodg-ing
0.003 0.797 0.712
Atm 0.041 0.690 0.695 Insurance
agency
0.013 0.763 0.582
Bakery 0.001 0.847 0.936 Jewelry store 0.003 0.766 0.833
Bar 0.016 0.728 0.799 Laundry 0.003 0.814 0.775
Beauty salon 0.025 0.745 0.816 Lawyer 0.536 0.520 0.627
Bicycle store 0.001 0.733 0.699 Library 0.005 0.681 0.755
Book store 0.002 0.735 0.829 Liquor store 0.001 0.897 0.775
Bowling alley 0.002 0.589 0.421 Local
Govern-ment Office
0.094 0.553 0.780
Bus station 18.313 0.320 0.250 Locksmith 0.000 0.861 0.526
cafe 0.004 0.767 0.756 Movie Theater 0.001 0.704 0.776
Car dealer 0.015 0.684 0.420 Moving
Com-pany
0.009 0.734 0.544
Car rental 0.000 0.839 0.681 Museum 0.010 0.592 0.692
Car repair 0.024 0.743 0.613 Night Club 0.003 0.762 0.852
Car wash 0.001 0.790 0.601 Park 0.014 0.751 0.736
Casino 0.004 0.461 0.035 Parking 0.010 0.667 0.765
Cemetery 0.006 0.634 0.124 Pet Store 0.000 0.891 0.890
City hall 0.004 0.465 0.285 Pharmacy 0.007 0.763 0.908
Clothing store 0.012 0.772 0.846 Physiotherapist 0.008 0.706 0.830
Construction contractor 0.179 0.656 0.620 Police 0.001 0.737 0.744 Convenience store 0.054 0.611 0.462 Post Office 0.002 0.736 0.888
Courthouse 0.001 0.710 0.717 Real Estate
Agency
0.041 0.704 0.710
Dentist 0.002 0.902 0.823 Religious
Cen-ters 0.156 0.639 0.748 Department store 0.005 0.680 0.519 Restaurant 0.024 0.817 0.945 Doctor 0.205 0.689 0.818 School 0.015 0.790 0.925 Electronics store 0.003 0.803 0.790 Shoe Store 0.002 0.827 0.887 Embassy 0.000 0.838 0.153 Spa 0.000 0.853 0.687 Finance 0.024 0.729 0.798 Stadium 0.003 0.643 0.460
Fire station 0.011 0.583 0.371 Storage 0.003 0.747 0.454
Florist 0.002 0.766 0.843 Train Station 0.005 0.558 0.133
Furniture store 0.011 0.716 0.796 University 0.149 0.482 0.225
Gas station 0.005 0.652 0.289 Veterinary Care 0.002 0.764 0.648
Grocery or su-permarket
0.006 0.766 0.878 Zoo 0.009 0.398 0.364
Table A.4: Shows the value of the parameters 𝑌0, 𝛽 and 𝑅2 of the scaling
re-lationship, of the total number of each type of amenity in a city, 𝐴𝑘𝑐, with that city’s population, 𝑁𝑐 expressed as: 𝐴𝑘𝑐 = 𝑌0𝑁𝑐𝛽.
A.3
Clustering
A.3.1
Effective number of amenities
We begin our clustering procedure by calculating the effective number of ameni-ties at each location. The effective number of amenities, 𝐼𝑖, in a location 𝑖
represents the number of amenities that can be reached by walking from that location. We define 𝐼𝑖 as:
𝐼𝑖 = 𝑌𝑐 ∑︁ 𝑗=1 𝑒−𝛾𝑑𝑖𝑗 = 𝑘 ∑︁ 𝑗=1 𝑒−𝛾𝑑𝑖𝑗 + 𝑌𝑐 ∑︁ 𝑗=𝑘+1 𝑒−𝛾𝑑𝑖𝑗 = 𝑘 ∑︁ 𝑗=1 𝑒−𝛾𝑑𝑖𝑗 + 𝜖
where 𝑑𝑖𝑗 is the distance (in km) between amenity 𝑖 and amenity 𝑗, and 𝑌𝑐 is
the total number of amenities in a city 𝑐. 𝛾 is a decay parameter that discounts amenities based on their distance to location 𝑖. We set 𝛾 = 16, meaning that the contribution of an amenity to the effective number of amenities at a location roughly halves every 62.5 meters and becomes negligible at about 500 meters. To simplify the calculation of the effective number of amenities in a location we use 𝑘 amenities instead of 𝑌𝑐. Theoretically all of the amenities in a city should
contribute to a location’s effective number of amenities, but since amenities that are far from a location are discounted by an exponential factor, considering
the contribution of the 𝑘 closest amenities gives already a good approximation. In general, we find that the effective number of amenities for a location does not change after considering the first few hundred amenities, indicating that 𝑘 = 2, 000 provides a set that is large enough to provide a good estimate for a location’s effective number of amenities.
A.3.2
Identifying cluster centers
We continue our clustering procedure by identifying the centers of each micro-cluster as the local peaks on the landscape. We identify local peaks by searching for locations that have an effective number of amenities, 𝐼𝑖, larger than their 𝑛𝑖
nearest neighbors. We define 𝑛𝑖 as: 𝑛𝑖 = 3𝐼𝑖+ 50, i.e. a function of the effective
number of amenities at location 𝑖, so that the centers of very dense clusters are required to have larger 𝐼𝑖 than a large number of neighbor amenities, while
centers of very sparse clusters are required to have larger 𝐼𝑖 than a small number
of neighboring amenities. By setting 𝑛𝑖 proportional to 𝐼𝑖 we avoid assigning
multiple cluster centers to areas with high density of amenities, and we avoid not assigning any cluster center to areas with a low density of amenities.
A.3.3
Assigning points to clusters
Finally, we assign points to micro-clusters using the cluster centers we obtained. First, we remove the 10% of the points in each city with the lowest effective number of amenities, to eliminate isolated amenities that are not part of a micro-cluster. After that, we assign all amenities that are within a distance of 0.5km of a cluster center to that cluster center. Then, we calculate the distance from each unassigned point to each assigned point. Furthermore, we iteratively:
2. Assign point 𝑢 to the cluster point a belongs to.
3. Calculate the distance from each unassigned point to the newly assigned point 𝑢.
The algorithm finalizes once all points have been assigned to a cluster. Figure A-1 shows the effective number of amenities in the cities of Boston, San Francisco, and New York (left figures), and the corresponding assignments of amenities to clusters (right figures).
A.4
Collocation of amenities
To study the collocation patterns of amenities, we calculate the spearman cor-relation between all pairs of amenities across clusters. We show the resulting correlations in the form of a network, where nodes represent amenity types and edges connect amenities that are highly correlated across micro-clusters. To construct this network we first create a Maximum Spanning Tree (MST) of the network and then add edges only between amenities that have a pairwise corre-lation equal or larger than 0.3.
Here, we show the values of all spearman correlations between amenities across clusters in the form of a matrix (Figure A-2). We cluster amenities using Ward linkages.
A.5
Predictions
We construct four regression models to predict each type of amenity in the in-tercity and intra-city scale using two different metrics: size and composition. In the inter city scale, we predict the number of each type of amenity in a city using the total number of amenities in a city and the composition of amenities in the
Figure A-1: Clustering Algorithm: Boston, SF, NY. The figures on the right show the effective number of amenities at each location in the cities of a Boston, b San Francisco, and c New York. Red lines correspond to areas with a high effective number of amenities and blue lines correspond to areas with a low effective number of amenities. The black dots represent the locations we assign as cluster centers. The figures on the left show the corresponding assignment of amenities to micro-clusters. Each dot represents an amenity, and sets of dots of the same color constitute a micro-cluster.
Figure A-2: Amenities correlations matrix. Matrix showing the Spearman correlation between each pair of amenities. Amenities are clustered using Ward linkages.
city. In the intra-city scale, we predict the number of each type of amenity in a micro-cluster using the size of micro-clusters and the composition of amenities in each micro-cluster. We create a model that uses the total number of ameni-ties in a micro-cluster to predict the number of each type of amenity in that micro-cluster. To construct these models we use a forward selection algorithm that iteratively adds types of amenities to a regression until the contribution of the presence of a new amenity type to the predictive power of the regression is characterized by a p-value of more than 0.001 (nextly we explain how we use AIC and BIC to verify our model selection). Table A.5 shows the 𝑅2 obtained for each of these models.
Given that these four models use a different number of samples and pa-rameters, we calculate the Akaike Information Cirterion (AIC) and Bayesian Information Criterion (BIC) of each of the models. These criteria allow us to differentiate the models: the lower the AIC and BIC values, the more desirable the model (better fit and less overfitted). The AIC and BIC values obtained for each model are summarized in Table A.6.
Intercity Scaling Intra-City Scaling
Size Composition Size Composition
Accounting 0.946 0.985 0.291 0.448 Airport 0.575 0.816 0.016 0.114 Amusement Park 0.382 0.724 0.002 0.005 Aquarium 0.709 0.880 0.014 0.028 Art Gallery 0.603 0.930 0.114 0.271 ATM 0.911 0.967 0.320 0.465 Bakery 0.777 0.980 0.364 0.543 Bar 0.649 0.966 0.462 0.750 Beauty Salon 0.952 0.989 0.449 0.615 Bicycle Store 0.594 0.919 0.080 0.183 Book Store 0.878 0.980 0.245 0.344 Bowling Alley 0.478 0.702 0.004 0.014 Bus Station 0.242 0.431 0.023 0.237 Cafe 0.649 0.956 0.505 0.670 Car Dealer 0.608 0.850 0.003 0.231 Car Rental 0.831 0.942 0.042 0.118
Car Repair 0.867 0.976 0.016 0.437 Car Wash 0.828 0.970 0.005 0.071 Casino 0.016 0.000 0.002 0.008 Cemetery 0.126 0.585 0.001 0.015 City Hall 0.379 0.449 0.031 0.151 Clothing Store 0.884 0.993 0.298 0.718 Construction Contractor 0.824 0.978 0.135 0.456 Convenience Store 0.629 0.928 0.042 0.134 Courthouse 0.676 0.738 0.088 0.446 Dentist 0.954 0.974 0.262 0.439 Department Store 0.673 0.945 0.016 0.200 Doctor 0.957 0.986 0.408 0.694 Electronics Store 0.924 0.966 0.224 0.355 Embassy 0.102 0.419 0.046 0.114 Finance 0.953 0.983 0.424 0.610 Fire Station 0.490 0.632 0.018 0.058 Florist 0.889 0.981 0.207 0.259 Funeral Home 0.476 0.787 0.018 0.146 Furniture Store 0.912 0.980 0.173 0.444 Gas Station 0.443 0.777 0.000 0.028 Grocery or Supermarket 0.791 0.955 0.116 0.377 Gym 0.911 0.984 0.229 0.339 Hardware Store 0.896 0.953 0.020 0.194
Home Goods Store 0.908 0.986 0.213 0.517
Hospital 0.958 0.979 0.096 0.546
Hotel and Lodging 0.795 0.824 0.250 0.435
Insurance Agency 0.825 0.981 0.234 0.433 Jewelry Store 0.902 0.978 0.208 0.352 Laundry 0.933 0.984 0.180 0.354 Lawyer 0.871 0.894 0.359 0.570 Library 0.610 0.937 0.180 0.416 Liquor Store 0.753 0.815 0.175 0.301
Local Government Office 0.901 0.937 0.181 0.567
Locksmith 0.671 0.752 0.033 0.053 Movie Theater 0.780 0.952 0.125 0.190 Moving Company 0.721 0.931 0.012 0.131 Museum 0.499 0.951 0.221 0.412 Night Club 0.735 0.957 0.326 0.606 Park 0.669 0.745 0.149 0.320 Parking 0.666 0.938 0.374 0.610 Pet Store 0.812 0.943 0.077 0.192 Pharmacy 0.878 0.949 0.169 0.371 Physiotherapist 0.863 0.931 0.081 0.260 Police 0.681 0.866 0.052 0.201 Post Office 0.859 0.964 0.090 0.130
Religious Centers 0.744 0.868 0.171 0.430 Restaurant 0.921 0.995 0.659 0.826 School 0.948 0.976 0.251 0.438 Shoe Store 0.916 0.966 0.153 0.648 Spa 0.784 0.940 0.182 0.297 Stadium 0.613 0.749 0.010 0.107 Storage 0.632 0.912 0.010 0.123 Train Station 0.099 0.414 0.047 0.087 Travel Agency 0.813 0.931 0.292 0.402 University 0.238 0.351 0.020 0.328 Veterinary Care 0.814 0.966 0.020 0.115 Zoo 0.343 0.680 0.001 0.011
Table A.5: 𝑅2 of the intercity and intra-city models we construct using metrics of size and composition of cities (in the case of the intercity) and micro-clusters (in the case of the intra-city).
Intercity Scale Intra-City Scale
Size Comp. Size Comp.
AIC BIC AIC BIC AIC BIC AIC BIC
Accounting 387.1 389.0 610.2 615.7 7233.6 7240.7 5467.9 5630.0 Airport 283.6 285.4 534.2 536.0 -14564.4 -14557.4 -14565.1 -14480.5 Amusement Park 252.4 254.3 492.8 496.5 -6923.4 -6916.4 -6940.3 -6933.2 Aquarium 160.8 162.6 395.3 399.0 -24141.2 -24134.2 -23744.2 -23709.0 Art Gallery 404.3 406.2 600.4 605.9 15448.2 15455.3 13780.7 13893.4 Atm 458.2 460.1 614.1 617.8 14260.7 14267.7 12446.0 12664.5 Bakery 437.6 439.5 479.8 483.5 2507.6 2514.7 -349.5 -208.5 Bar 508.4 510.2 724.1 731.5 20416.0 20423.0 14125.5 14358.1 Beauty Salon 471.4 473.3 625.0 628.7 19820.0 19827.0 16895.3 17113.8 Bicycle Store 268.8 270.6 283.5 285.4 -16203.0 -16196.0 -17083.1 -16970.3 Book Store 278.1 280.0 510.0 517.4 -7584.9 -7577.8 -8461.1 -8313.1 Bowling Alley 132.9 134.8 414.8 418.5 -28639.3 -28632.3 -28784.8 -28756.6 Bus Station 667.5 669.3 735.0 736.9 34335.6 34342.7 34766.9 34936.1 Cafe 443.1 445.0 461.9 465.6 4533.6 4540.6 1134.4 1338.8 Car Dealer 461.3 463.1 714.4 716.2 10190.0 10197.1 8802.5 8873.0 Car Rental 290.1 291.9 443.9 449.4 -3146.7 -3139.7 -3181.3 -3110.8 Car Repair 521.0 522.8 731.4 738.8 22908.0 22915.1 18230.6 18371.6 Car Wash 293.8 295.6 460.0 467.4 -11654.7 -11647.6 -11747.7 -11663.2 Casino 216.3 218.2 443.5 443.5 -35421.5 -35414.5 -35127.2 -35113.1 Cemetery 341.0 342.9 586.3 590.0 -21285.1 -21278.0 -21423.3 -21402.2 City Hall 76.2 78.0 293.2 295.0 -37437.7 -37430.7 -38618.9 -38562.5 Clothing Store 500.4 502.2 592.9 600.3 31184.7 31191.8 23911.6 24024.4
Construction Contractor 591.7 593.6 574.4 578.1 23556.0 23563.1 20067.1 20243.3 Convenience Store 455.2 457.1 570.9 572.8 1726.9 1733.9 2246.0 2394.0 Courthouse 160.9 162.8 404.9 406.8 -13810.3 -13803.3 -17901.4 -17788.6 Dentist 436.7 438.6 714.8 718.5 19519.7 19526.7 17163.9 17311.9 Department Store 327.4 329.2 510.3 514.0 -6448.4 -6441.4 -7219.3 -7064.3 Doctor 578.7 580.5 607.2 614.6 42180.3 42187.3 36517.8 36672.9 Electronics Store 386.1 388.0 587.0 590.7 3688.0 3695.0 2189.0 2322.9 Embassy 329.3 331.1 632.8 634.7 -2578.3 -2571.2 -3268.8 -3205.4 Finance 440.6 442.5 615.9 623.3 20231.0 20238.1 16860.1 17036.3 Fire Station 293.9 295.8 600.6 602.4 -17404.5 -17397.5 -17771.8 -17722.5 Florist 332.2 334.1 524.5 530.0 -4705.5 -4698.5 -5303.1 -5197.4 Funeral Home 336.3 338.1 568.3 570.1 -10028.7 -10021.7 -10844.2 -10703.3 Furniture Store 389.5 391.4 441.7 447.2 10673.1 10680.1 7599.0 7697.7 Gas Station 333.2 335.0 543.6 545.5 -13926.6 -13919.5 -11923.4 -11867.0 Grocery or Su-permarket 467.7 469.6 592.6 596.3 9280.9 9288.0 6500.8 6677.1 Gym 321.2 323.1 463.7 469.2 -2721.9 -2714.9 -4013.8 -3837.6 Hardware Store 299.0 300.9 516.4 522.0 -8239.8 -8232.8 -9960.1 -9854.4 Home Goods Store 469.2 471.0 645.5 651.0 17169.9 17177.0 13170.9 13290.7 Hospital 310.3 312.1 428.4 433.9 11386.1 11393.2 5907.4 6027.2 Hotel and Lodging 413.6 415.5 734.4 736.3 12585.9 12592.9 10292.7 10483.0 Insurance Agency 496.8 498.7 692.3 697.9 14861.7 14868.7 12397.0 12538.0 Jewelry Store 353.3 355.1 444.2 447.9 14860.2 14867.3 13143.0 13269.9 Laundry 400.6 402.4 566.7 570.4 4144.1 4151.2 2146.9 2316.0 Lawyer 479.9 481.8 728.9 730.8 38846.6 38853.6 35662.3 35831.4 Library 343.4 345.3 429.1 432.8 -5993.1 -5986.1 -8949.8 -8808.9 Liquor Store 405.4 407.2 632.6 634.4 -1736.3 -1729.2 -2355.6 -2242.8 Local Govern-ment Office 331.1 332.9 481.5 487.0 12849.6 12856.6 7505.0 7638.9 Locksmith 309.2 311.0 542.0 543.9 -14495.9 -14488.8 -14640.9 -14591.5 Movie Theater 223.5 225.4 352.4 356.1 -16822.2 -16815.1 -17422.3 -17337.7 Moving Com-pany 443.6 445.5 628.4 632.1 300.1 307.2 -457.0 -372.4 Museum 318.3 320.1 385.0 392.4 -4793.4 -4786.3 -6985.0 -6872.2 Night Club 377.0 378.9 545.3 550.8 6321.7 6328.7 1774.4 1922.5 Park 504.8 506.7 683.4 685.3 11027.2 11034.2 10194.1 10363.3 Parking 372.0 373.9 596.1 599.8 5373.6 5380.6 1963.1 2153.4
Pet Store 285.8 287.6 452.0 455.7 -13001.7 -12994.6 -14005.4 -13885.6 Pharmacy 429.4 431.2 643.7 647.4 7366.7 7373.7 5035.9 5176.9 Physiotherapist 353.1 355.0 667.6 673.1 791.0 798.0 -544.4 -459.9 Police 252.8 254.6 349.2 352.9 -15255.4 -15248.4 -16701.6 -16602.9 Post Office 269.4 271.3 504.4 508.1 -12492.7 -12485.7 -12755.2 -12670.6 Real Estate Agency 515.7 517.6 668.7 672.4 20820.6 20827.7 19273.1 19449.4 Religious Cen-ters 565.0 566.9 729.1 732.8 24793.2 24800.3 21728.3 21883.4 Restaurant 602.3 604.2 745.4 752.8 32182.1 32189.1 26651.6 26912.4 School 488.4 490.3 741.3 745.0 15330.2 15337.2 13283.1 13445.2 Shoe Store 363.4 365.2 607.3 611.0 18001.5 18008.6 11416.1 11528.9 Spa 307.5 309.4 456.4 460.1 -8683.6 -8676.6 -9852.6 -9732.7 Stadium 225.0 226.8 553.0 554.9 -13695.6 -13688.5 -13931.8 -13875.4 Storage 394.6 396.4 582.3 586.0 -6999.5 -6992.4 -7697.7 -7606.1 Train Station 334.7 336.6 545.1 547.0 -10105.8 -10098.8 -10424.3 -10389.0 Travel Agency 398.7 400.6 545.0 548.7 3926.0 3933.1 2523.5 2671.5 University 403.4 405.3 557.5 559.3 24047.2 24054.3 21500.1 21627.0 Veterinary Care 336.8 338.6 619.2 624.8 -3348.5 -3341.5 -3679.7 -3538.8 Zoo 70.8 72.7 144.4 148.1 -43402.7 -43395.6 -42907.2 -42879.0
Table A.6: AIC and BIC values of the intercity and intra-city models we construct using metrics of size and composition of cities (in the case of the intercity) and micro-clusters (in the case of the intra-city).
Appendix B
Cities Administrative Units and
Populations
City Administrative District Population Total City
Population Atlanta Atlanta 447,841 447,841 Austin Austin 885,400 885,400 Baltimore Baltimore 622,104 Arbutus 20,483 Halethorpe N/A 642,587 Birmingham Birmingham 212,237 Vestavia Hills 34,018 Mountain Brook 20,359 Homewood 25,750 Bessemer 27,053 Fultondale 8,752 Gardendale 13,735 Tarrant 6,285 Center Point 16,864 Chalkville 3,829 Trussville 20,368 389,250 Boston Boston 645,966 Quincy 92,271 Milton 27,003
Dedham 24,729 Brookline 58,732 Somerville 75,754 Cambridge 105,162 Watertown 31,915 Chelsea 35,177 Belmont 24,729 1,121,438 Buffalo Buffalo 258,959 258,959 Charlotte Charlotte 792,862 Mint Hill 23,341 Matthews 27,198 Pineville 7,479 850,880 Chicago Chicago 2,718,782 Lincolnwood 12,590 Park Ridge 37,480 Rosemont 4,202 Schiller Park 11,793 Norridge 14,572 Hardwood Heights 8,612 Bensenville 18,352 Franklin Park 18,333 River Groove 10,227 Elmwood Park 24,883 Northlake 12,323 Stone Park 4,946 Melrose Park 25,411 River Forest 11,172 Oak Park 51,878 Maywood 24,090 Bellwood 19,071 Berkeley 5,209 Hillside 8,193 Forest Park 14,167 Broadview 7,932 Westchester 16,718 North Riverside 6,672 Berwyn 56,800 Cicero 84,103 La Grange Park 13,579 Riverside 8,875 Brookfield 18,978 Lyons 10,729 Stickney 6,786
Forest View 698 La Grange 15,550 Western Springs 12,975 Hinsdale 16,816 Mc Cook 228 Summit 11,054 Countryside 5,895
Indian Head Park 3,809
Hodgkins 1,897 Burr Ridge 10,559 Palos Park 4,847 Palos Heights 12,515 Crestwood 10,950 Willow Springs 5,524 Justice 12,926 Bedford Park 580 Bridgeview 16,446 Hickory Hills 14,049 Palos Hills 17,484 Chicago Ridge 14,305 Worth 10,789 Hometown 4,349 Oak Lawn 56,690 Evergreen Park 19,852 Alsip 19,277 Merrionette Park 1,900 Robbins 5,337 Blue Island 23,706 3,618,465 Cincinati Cincinnati 296,943 Delhi 29,510 Covedale 6,447 Mack 11,585 Bridgetown North 12,569 Dent 10,497 Cheviot 8,375 Monfort Heights 11,948 White Oak 19,167
North College Hill 9,397
Groesbeck 6,788 Finneytown 12,741 Amberley 3,585 Deer Park 5,736 Kenwood 6,981 Fairfax Mariemont 1,699 453,968
Cleveland Cleveland 396,815 Cleveland Heights 46,121 University Heights 13,539 Shaker Heights 28,448 Maple Heights 23,138 Garfield Heights 28,849 Parma 81,601 Brook Park 19,212 Brooklyn 11,169 Rooky River 20,213 Fairview Park 16,826 685,931 Columbus Columbus 787,033 Westerville 36,120 Huber Ridge 4,883 Worthington 13,575 Dublin 41,751 Hilliard 28,435 Upper Arlington 33,771 Marble Cliff 573 Grandview Heights 6,536 Lincoln Village 9,482 Urbancrest 960 Grove City 36,832 Obetz 4,628 Groveport 5,540 Blacklick Estates 9,518 Reynoldsburg 36,347 Bexley 13,057 Whitehall 18,062 Gahanna 33,248 New Albany 7,724 1,128,075 Dallas Dallas 1,197,816 Richardson 103,297 Garland 226,876 Farmers Branch 28,616 Carrollton 126,700 Irving 228,653 University Park 23,068 Highland Park 8,564 Grand Prairie 175,396 Duncanville 38,524 Hutchins 5,338 Seagoville 14,835 Balch Springs 23,728
Mesquite 139,824 Sunnyvale 5,130 Rowlett 56,199 Sachse 20,329 Addison 13,056 2,435,949 Denver Denver 649,495 Glendale 4,184 Englewood 30,255 Sheridian 5,664
Cherry Hills village 5,987
Greenwood Village 13,925 Littleton 41,737 Lakewood 142,980 Edgewater 5,170 Wheat Ridge 30,166 Arvada 111,707 Berkley 11,207 Twin Lakes 171 Westminster 106,114 Sherrelwood 18,287 Welby 14,846 Commerce City 45,913 Derby 7,685 Thornton 118,772 Federal Heights 11,973 Northglenn 35,789 Aurora 345,803 1,757,830 Detroit Detroit 681,090 Lincoln Park 38,144 Dearborn 95,884 Melvindale 10,525 Dearborn Heights 57,774 Highland Park 11,629 Hamtramck 22,423
Grosse Pointe Woods 15,838
Harper Woods 13,990
Grosse Pointe Farms 9,316
Grosse Pointe 5,326
Grosse Pointe Park 11,345
973,284
Houston Houston 2,195,914
Seabrook 11,952
Kemah 3,334
Friendswood 35,805 Pearland 108,715 Fresno 19,069 Sugar Land 83,860 Greatwood 6,640 Rosenberg 31,676 Richmond 11,081 Pecan Grove 15,881 Mission Bend 36,501 Cinco Ranch 18,274 Katy 14,102 Cypress 122,803 Jersey Village 7,620
Hunters Creek Village 4,367
Bellaire 16,855 Spring 54,298 Aldine 15,869 Tomball 10,753 Humble 15,133 Porter 25,627 Atascocita 65,844 Huffman 12,116 Crosby 2,299 Highlands 7,522 Channelview 38,289 Jacinto City 10,553 Galena Park 10,887 Deer Park 32,010 La Porte 33,800 Pasadena 149,043 South Houston 16,983 Sheldon 1,990 Barrett 3,199 Cloverleaf 22,942 Four Corners 2,954 Meadows Place 4,660 Missouri City 67,358 Fifth Street 2,059 Brookside Village 1,523 3,362,560 Indianapolis Indianapolis 843,393 Lawrence 46,001 Beech Grove 14,192 Warren 1,239 Franklin Township 54,594 Perry Township 108,972
Decatur 9,362 Speedway 11,930 Wayne 136,828 Camby 32,388 Pike Township 77,895 Washington township 132,049 1,468,843 Jacksonville Jacksonville 821,784 Lakeside 30,943 Orange Park 8,412 Oakleaf Plantation 20,315 Bellair-Meadowbrook Ter-race 13,343 Atlantic Beach 12,895 Neptune Beach 7,124 Jacksonville Beach 21,823
Ponte Vedra Beach 37,924
Sawgrass 4,942 Palm Valley 19,860 Baldwin 1,430 Nassau Village-Ratliff 5,337 Callahan 962 1,007,094
Las Vegas Las Vegas 583,736
North Las Vegas 216,961
Whitney 38,585 Winchester 27,978 Paradise 223,167 Henderson 257,729 Spring Valley 178,395 Summerlin South 24,085 Enterprise 108,481 Nellis AFB 2,187 Sunrise Manor 189,372 Blue Diamond 290 1,850,966
Los Angeles Los Angeles 3,884,307
Santa Monica 89,736
Marina del Rey 8,866
Beverly Hills 34,290 Culver City 38,883 Inglewood 109,673 Burbank 103,340 La Crescenta Montroes 19,653 La Canada Flintridge 20,246 Glendale 196,021
Pasadena 137,122
East Los Angeles 126,496
South Pasadena 25,619 San Marino 13,147 Vernon 112 Huntington Park 58,114 Bell 35,477 Bell Gardens 42,072 Florence-Graham 63,387 South Gate 94,396 Lynwood 69,772 Compton 96,455 Willowbrook 35,983 Long Beach 462,257 Carson 91,714 West Carson 21,699
View Park-Windsor Hills 11,075
Westmont 31,853 Lennox 22,753 Hawthorne 84,293 Gardena 58,829 El Segundo 16,654 Manhattan Beach 35,135 Redondo Beach 66,748 Torrance 147,478 Lomita 20,256 Rolling Hills 1,860
Palos Verdes Peninsula
Rancho Palos Verdes 41,643
Signal Hill 11,465 6,428,879 Louisville Louisville 609,893 New Albany 36,372 Clarksville 21,724 Jeffersonville 44,953 Oak Park 5,379 Buckner 4,000 Crestwood 1,999 Mt Washington 9,117 Hillview 8,172 Brooks 2,401 Shepherdsville 11,222 Shively 15,157 St Matthews 15,852 Lyndon 11,002 Northfield 970
Rolling Hills 907 Anchorage 2,264 Middletown 7,218 Hurstbourne 3,884 Memphis Memphis 653,450 West Memphis 26,245 Bartlett 55,055 Lakeland 12,430 Germantown 39,161 Collierville 46,462 832,803 Miami Miami 419,777 Coral Gables 49,631 Coral Terrace 24,380 West Miami 5,965 Miami Springs 13,809 Gladeview 14,468 Hialeah 224,669
West Little River 34,699
El Portal 2,325 Miami Shores 10,493 800,216 Milwaukee Milwaukee 599,164 Shorewood 13,162 Whitefish Bay 14,137 Glendale 12,872 Brown Deer 12,088 Bayside 4,411 Wauwatosa 47,068 West Allis 60,732 Greenfield 37,072 Greendale 14,325 Hales Corners 7,746 822,777 Naples Naples 19,537 Vineyards 3,375 Golden Gate 23,961 Lely 3,451 Naples Manor 5,562 Lely Resort 4,646 Pelican Bay 6,346 Naples Park 5,967 East Naples 22,951 95,796 Nashville Nashville 626,681 Ashland City 4,541
Millersville 7,471
Goodlettsville 16,813
Hendersonville 54,068
Mt Juliet 28,222
737,796
New Orleans New Orleans 378,715
Marrero 33,141 Harvey 20,348 Gretna 17,736 Terrytown 23,319 Timerlane 10,243 Arabi 8,093 Chalmette 17,119 Meraux 10,192 Violet 8,555 St Bernard 43,482 570,943
New York New York 8,405,837
8,405,837 Oklahoma Oklahoma 610,613 Mustang 17,395 Yukon 22,709 Bethany 19,563 Piedmont 5,720 Edmond 81,405
The village Nichols Hills 3,710
Moore 55,081 Del City 21,332 Midwest City 54,371 Spencer 3,746 Jones 2,517 Choctaw 15,205 Harrah 5,095 McLoud 4,044 922,506 Orlando Orlando 244,483 Clarcona 2,990 Pine Hills 60,076 Orlovista 6,123 Doctor Phillips 10,981 Williamsburg 7,646 Hunters Creek 14,321 Oak Ridge 22,685 Pine Castle 10,805 Conway 13,467 Belle Isle 5,988
Taft 2,205 Meadow Woods 25,558 Azalea Park 12,556 Winter Park 29,203 Goldenrod 12,039 Eatonville 2,159 Fairview Shores 10,239 493,524 Philadelphia Philadelphia 1,553,165 Westville 4,288 Gloucester City 11,402 Mt Ephraim 4,676 Bellmawr 11,540 Barrington 6,983 Haddonfield 11,507 Collingswood 13,850 Camden 76,903 Cherry Hill 71,722 Pennsauken Township 35,830
Maple Shade Township 19,043
Riverton 2,779 Cinnaminson 16,763 Cheltenham 4,810 Glenside 8,384 Abington 55,234 Wyncote 3,044 Jenkintown 4,422 Rockledge 2,550 Flourtown 4,538 Wyndmoor 5,498 Plymouth Meeting 6,177 Darby 10,687 1,945,795 Phoenix Phoenix 1,445,632 Tolleson 6,756 Glendale 226,721 Peoria 162,592 Sun City 37,499 Tempe 161,719 Guadalupe 6,072 2,046,991 Pittsburgh Pittsburgh 305,841 Homestead 3,165 Whitaker 1,271 Munhall 11,380 Brentwood 9,643
Whitehall 13,938 Castle Shannon 8,316 Mt Oliver 3,403 Dormont 8,593 Scott Township 17,024 Green Tree 4,431 Carnegie 7,972 Ingram 3,330 Crafton 5,951 Rosslyn Farms 427 McKees Rocks 6,104 Stowe Township 6,362 Avalon 4,705 Bellevue 8,370 Reserve Township 3,333 Millvale 3,744 Sharpsburg 3,446 Aspinwall 2,801 Wilkinsburg 15,930 Edgewood 3,118 Rankin 2,122 Braddock 2,159 466,879 Portland Portland 609,456 609,456 Providence Providence 177,994 North Providence 32,078 Cranston 80,387 290,459 Raleigh Raleigh 431,746 Cary 151,088 582,834 Richmond Richmond 214,114 Bon Air 16,366 Bensley 5,819
East Highland Park 14,796
Lakeside 11,849 262,944 Sacramento Sacramento 475,122 Rio Linda 15,106 North Highlands 42,694 Arden-Arcade 92,186 La Riviera 10,802 Rosemont 22,681 Parkway-South Sacra-mento 36,468
Florin 47,513
Vineyard 24,836
767,408
Salt Lake Salt Lake 186,440
South Salt Lake 24,366
210,806
San Antonio San Antonio 1,409,019
Somerset 1,550 Macdona 559 Helotes 7,341 Leon Valley 10,151 Terrell hills 4,878 Castle Hills 4,116 Kirby 8,673 Shavano Park 3,035 Windcrest 5,364 Converse 18,198 Live Oak 9,156
Universal City N/A
Adkins N/A
Cibolo 19,580
Northcliff 1,819
Garden Ridge 1,882
Fair Oaks ranch 5,986
1,511,307
San Diego San Diego 1,345,895
Chula Vista 243,916 National city 58,582 Bonita 12,538 La Presa 34,126 Coronado 24,697 Spring Valley 28,205 La Mesa 57,065
Rancho San Diego 21,208
El Cajon 99,478 Santee 53,413 Granite Hills 3,035 Winter Gardens 20,631 Lakeside 20,648 Poway 47,811 Fairbanks ranch 3,148 Rancho Santa Fe 3,117 Encinitas 59,518 Solana Beach 12,867 Del Mar 4,161 Escondido 143,911
2,297,970
San Francisco San Francisco 837,442
837,442
San Jose San Jose 1,000,536
Sunnyvale 140,081 Santa Clara 116,468 Fruitdale 935 Campbell 39,349 Saratoga 29,926 Los Gatos 29,413 Morgan Hill 37,882 East Foothills 8,269 Milpitas 70,092 1,472,951 Seattle Seattle 608,660 White Center 13,495 622,155 St Louis St Louis 319,294 Castle Point 3,962 Bellefontaine Neighbors 10,828 Jennings 14,712 Normandy 5,008 Northwoods 4,208 Pine Lawn 3,261 361,273 Tampa Tampa 347,645 Town ’N’ Country 78,442 Egypt Lake-Leto 35,282
Greater Carrollwood N/A
Lake Magdalene 28,509 Cheval 10,702 Greater Northdale 22,079 Lutz 19,344 Thonotosassa 13,014 Temple Terrace 24,541
Del Rio N/A
Mango 11,313
Seffner 7,579
Brandon 103,483
Palm River-Clair Mel 21,024
Progress Village 5,392
Gibstonton 14,234
742,583
Virginia Beach Virginia Beach 448,479
Greenbrier East N/A
Washington Washington 658,893 Bethesda 63,374 Silver Spring 76,716 Friendship Village 4,512 Takoma Park 16,715 Hyattsville 17,865 Coral Hills 9,895 Suitland-Silver Hill 33,515 Hillcrest Heights 16,469 Marlow Heights 5,618 Temple hills 7,852 Alexandria 148,892 Arlington 207,627 1,267,943
Table B.1: Administrative units that overlap with our amenities data and their respective population taken from Wikipedia.