T ASKS , CITIESANDURBANWAGEPREMIA

(1)

T ^ASKS , CITIES AND URBAN WAGE PREMIA ^∗

Anja Grujovic ^†

Click here for latest version

September 6, 2019

Abstract

Combining rich administrative data for Germany with representative workforce surveys, I find that job task content is robustly predictive of differences in urban wage premia across otherwise observationally equivalent individuals. Based on this, I propose a model where productive advantages of cities are inherently task-specific. Workers of higher ability have a comparative advantage in the tasks whose production benefits the most from urban spillovers.

In equilibrium, bigger cities generate larger externalities for more able agents and urban wage premia is skill-biased. I estimate the model using German worker panel data on 336 districts, 331 occupations, 3 education categories and 3 tasks. I find that one standard deviation increase in abstract task intensity is associated with a 0.3 percentage point increase in the elasticity of earnings with respect to population size. Differences in task-specific urban wage premia remain significant even after controlling for skill premia of larger cities.

∗

I am extremely grateful to Céline Carrère and Frédéric Robert-Nicoud for their guidance and supervision, and to Diego Puga for advice and support. I am also thankful to Manuel Arellano, Pierre-Philippe Combes, Donald Davis, Gilles Duranton, Giacomo di Giorgi, Joan Monras, Steve Redding, Esteban Rossi-Hansberg, Jose de Sousa, and Lin Tian for their helpful comments, and to numerous seminar and conference participants for their insightful discussions.

This research is partially funded by the European Research Council under the European Union’s Horizon 2020 Programme (ERC Advanced Grant agreement 695107), by the Swiss National Science Foundation under the Early PostDoc Mobility Fellowship (SNSF Grant number 178064), and by the Spanish Government’s “Unit of Excellence Maria de Maeztu” award (Grant number MDM-2016-0684).

†

Centro de Estudios Monetarios y Financieros ( CEMFI ), Casado del Alisal 5, 28014 Madrid, Spain (e-mail:

(2)

1 Introduction

Productivity is increasing in agglomeration size. One of the ways this is manifested in the data is in individual earnings. Figure 1 plots average daily earnings for male workers against log population density of West German cities. There is a strong positive relationship between earnings and city size. A ten percent increase in population density corresponds to a one percent increase in average earnings. Moreover, these urban wage premia are heavily skill-biased. In the densest city, Munich, a worker with university degree earns 35% more than an observationally equivalent worker in the least dense district. For a worker with no post-graduate qualifications, this premium is a mere 18%. Previous empirical studies on other developed economies report comparable estimates of urban wage premia and its corresponding skill-bias. ¹

Traditionally, urban economics has focused on estimating skill price differences across locations of different sizes. However, I document that even after controlling for city size differences in skill prices, a significant heterogeneity in urban wage premia remains across occupations. Figure 2 plots such estimates for a set of nine occupations. These estimates are obtained from a pooled OLS regression of individual log wages on a set of observables, as well as on two interaction terms - education interacted with log-population and occupation interacted with log-population.

The coefficients on the latter term represent occupation-specific urban wage premia that remain significant even after controlling for any skill price differences across locations of different size.

I document that the estimates of these occupation-specific city size premia are strongly correlated with the intensity in abstract task of the occupation. This finding motivates an investigation of how agglomeration economies depend on task demands, and what role these task demands play in explaining the overall skill-premia.

To explain these observations, I build a model where productive advantages of locations are inherently task-specific. I distinguish between three sources of heterogeneity across workers: (i) innate ability, which is given ex-ante and exogenous; (ii) skill, or education, which is regarded as investment into human capital; and (iii) choice of occupation, where each occupation is regarded as a unique bundle of tasks that must be completed simultaneously for output to be produced. In the model, productive advantages of locations act on task productivities directly. Cities reward the production of tasks differently and these task productivities are allowed to differ across locations due to occupation level indivisibilities. Workers of heterogeneous ability choose an occupation and location in which to produce. Comparative advantage across tasks governs the sorting of heterogeneous individuals into occupations, as in Ricardo-Roy models (Costinot and Vogel, 2015).

Frictions to mobility imply imperfect sorting of workers across locations.

1

For estimates see Glaeser, 2011, for the United States, Combes, Duranton and Gobillon, 2008, for France, and de

la Roca and Puga, 2017, for Spain. Some of their results are discussed later in the text.

(3)

Figure 1: Average wages and population density

Notes: Average daily wages for West German urban districts over the period 1995-2014, in real 2010 EUR. Sample includes only male workers of Germany nationality between the ages of 20 and 65. Population density is calculated as 2011 population over the surface area of the administrative district.

In equilibrium, task-specific productivity differences across locations and worker comparative advantage across tasks combine to deliver a rich set of predictions. Workers of higher ability choose to invest more in skill and sort into occupations that are characterised by higher urban productive advantages. As a result, the combination of these two effects gives rise to the observed skill-bias in urban wage premia. In equilibrium, bigger cities are more skill-intensive and have higher wages.

Using high-quality administrative data for Germany for the period 1995-2014, I find evidence

of considerable heterogeneity in city size earning premia across tasks. I find that estimates of task-

specific urban wage premia are significant and robust to controlling for skill-level of workers, with

largest coefficients on tasks usually considered as more complex. One standard deviation increase

in abstract task intensity is associated with a 0.3 percentage point (or 21 percent for the average

worker) increase in the elasticity of earnings with respect to population size. These results suggest

that variations in job requirements are an important determinant in understanding wage inequalities

(4)

Figure 2: Heterogeneity in urban wage premia by profession, after controlling for city-size differences in returns to education

Notes: Estimated occupation-specific population density wage premia, obtained using a pooled OLS regression of log monthly wages on a set of city, occupation, education, sector, and time indicators, as well as two interaction terms - education interacted with log population density and occupation interacted with log population density. Coefficients on the latter term are represented in the graph.

Estimation is based on a sample of full-time working males in Western Germany over the period 1995-2014.

across space within comparable skill groups.

Existing theoretical frameworks of heterogeneous urban wage premia focus mostly on the differences in the economic value of skill across local labour markets. The concept of skill in these frameworks, rooted in Becker’s (1964) human capital model, regards skill as a durable investment that can be acquired by the workers through training or education. Though this human capital approach has been successful in capturing the differences in returns to education across different locations, it however remains silent on how and why these geographical differences in skill prices have evolved over time due to changing skill requirements of jobs. My model, building on the task framework approach of Autor, Levy, and Murnane (2003), provides a tractable framework for capturing how skill-biased urban wage premia evolve with varying job requirements.

This paper contributes to the recent theoretical literature studying the distribution of skills and

(5)

wages across cities. ² Building on the task framework, it delivers similar predictions on the spatial variation of skill premia as Davis and Dingle (2012). In turn, the task framework is advantageous as it allows us to study the heterogeneity in urban wage premia across workers of same skill level but different occupations, and, in addition, could be useful for understanding how skill-premia evolve with technological changes to job demands. This paper also complements existing frameworks that model spatial sorting of heterogeneous agents (Behrens, Duranton, Robert-Nicoud, 2014, Davis and Dingle, 2014, amongst others). It also contributes to the vast labour literature that explores shifts in wage structure using the task-based framework. ³ Unlike my paper, however, existing literature in this field remains silent on how returns to tasks vary across local labour markets.

Empirically, this paper contributes to the growing literature documenting the nature of the relationship between wages, city size and skills. ⁴ The estimated magnitudes of the urban wage premium are in line with recent studies that rely on the panel structure of administrative datasets to control for unobservable heterogeneity using individual fixed effects (Glaeser and Mar´e, 2001;

Combes, Duranton, and Gobillon, 2008; de la Roca and Puga, 2017; and Eckert, Hejlesen and Walsh, 2019). My findings are also consistent with Bacold, Blum and Strange’s (2009) evidence that returns to cognitive and people skills are higher in larger agglomerations. They are also in line with Michaels et al. (2016) who find that occupations intensive in abstract tasks are disproportionately present in larger cities.

Applying the task-based approach to the analysis of agglomeration economies is relevant for several reasons. Firstly, this approach sheds light on how agglomeration effects depend on the type, or complexity, of job requirements. Secondly, insofar as job requirements differ for workers of same education level, this approach is useful in understanding the heterogeneity in the urban wage premium across workers of same skill. Thirdly, in light of job requirements that shift over time, this model provides a tractable framework for rationalizing changes in the skill-bias of agglomeration economies due to technological change. And lastly, this task-based approach presents a possible micro-foundation for the skill-bias in agglomeration economies.

The paper is organized as follows. Section II develops a model of heterogenous agents and task-specific productive advantages of locations. Section III presents the empirical strategy to estimate task-specific agglomeration economies. Section IV presents the German administrative dataset. Section V provides the estimation results. Section VI concludes.

2

For a recent literature review see Duraton and Puga (2004) and Behrens and Robert-Nicoud (2015).

3

See Autor and Handel (2013) for a recent literature review of the task-based approach.

4

Rosenthal and Strange (2004) provide a review of the literature.

(6)

2 Theoretical framework

This section develops a model in which task-driven productivity differences across locations give rise to skill-biased agglomeration economies. I consider an economy in which a continuum of heterogeneous individuals choose an occupation and a location in which to produce. There is a continuum of available occupations, each defined as a unique bundle of tasks, and a discrete set of locations (c ∈ C = 1, ..., C).

2.0.1 Production

Individuals consume a freely traded final good, whose price is set as the numeraire. Producing the final good requires a continuum of occupations indexed by σ ∈ Σ ≡ [σ, σ]. Assuming a CES production function, the output of the final good is given by

Q = Z

σ∈Σ

D(σ)Q(σ)

⁻¹

dσ

₋₁

, (1)

where Q(σ) ≥ 0 is the quantity of output of occupation σ, which is freely traded, 0 < < ∞ is the elasticity of substitution between occupations, and D(σ) is an exogenous demand shifter. Profits by final goods producers are given by

Π = Q − Z

σ∈Σ

p(σ)Q(σ)dσ. (2)

Producing occupational output requires only labour that is supplied by a mass of L heterogeneous individuals characterised by their ability α. I denote by f (α) ≥ 0 the density of workers of skill α and by A ≡ [α, α] the set of abilities available in the economy.

I consider an occupation, or a job, to be an indivisible bundle of tasks that must be completed simultaneously for the output to be produced. In this framework, tasks can be thought of as “a unit of work activity that produces output” (Acemoglu and Autor, 2011). I assume that there exists a continuum of tasks, indexed by τ ∈ T ≡ [τ , τ ]. Each occupation requires the input of all tasks, though occupations vary in the intensities at which they demand these tasks. An occupation therefore is defined as a unique bundle of task demands.

The output q of a given occupation σ depends on the intensity at which it uses each tasks, on the ability α of the worker completing the tasks, and on the location c in which it is being produced:

q(σ, c; α) = e ^α+ ^R

^τ∈T

^Z(τ,c)H ^(τ,σ)dτ , (3)

where Z (τ, c) is the efficiency of task τ production in city c per unit of time, and H(τ, σ) is the

share of time an individual in occupation σ is required to spend on task τ. This share of time,

(7)

or ‘task intensity’, is exogenous, occupation-specific and defined such that H(τ, σ) ∈ [0, 1] and R

τ∈τ H(τ, σ )dτ = 1 for all σ.

The relative efficiency of task τ production Z(τ, c), depends only on the location in which it is produced. This parameter is the only source of agglomeration forces in the model, and is taken as given by the worker. There are many reasons why locations might provide productive advantages that are task-specific and I do not make attempts to distinguish between them. ⁵ These task efficiencies are allowed to differ across locations due to occupation level indivisibilities.

Importantly, I assume that Z(τ, c) is twice-differentiable, strictly supermodular in τ and c, and strictly increasing in c. The latter states that cities are indexed by their total factor productivity ( TFP ), so that higher-c cities are more productive across all tasks. The former dictates comparative advantage in productive advantages of locations, so that the production of higher-τ tasks benefits more from being in higher-c locations than the production of a lower-τ task. As a result, average task efficiency is increasing in τ .

For ease of analysis, I also make assumptions on the functional form of occupation-specific task intensities. I assume that H(τ, σ) is monotonic (non-increasing or non-decreasing), and twice differentiable with ^∂H(τ,σ) _{∂τ ∂σ} > 0. Combined, these assumptions imply that occupations indexed by higher-σ are relatively more intensive in higher-τ tasks. Consequently, the higher-σ occupations are simultaneously more productive on average and have the largest comparative advantage in higher-c locations. For simplicity of exposition, we can then rewrite (3) as

q(σ, c; α) = e ^α+A(σ,c) , (4)

where A(σ, c) = R

τ∈T Z (τ, c)H(τ, σ)dτ is the relative occupation-specific productivity across locations. By the assumptions on Z (τ, c) and H(τ, σ) specified above, A(σ, c) is twice- differentiable, strictly supermodular in σ and c, and strictly increasing in c.

Under these assumptions, developing the model in terms of occupation-specific or task-specific agglomeration economies is observationally equivalent. However, formulating this mechanism on task demands is important for several reasons. Firstly, it allows us to quantify differences in occupation types. Without an observable dimension by which to measure distances between occupations, a definition of an occupation would be entirely meaningless other than as an indicator in an estimation. Secondly, it provides a means to study the evolution of occupation-specific urban wage premia following technological shocks which might alter task demands. This could

5

Several mechanisms can give rise to this effect. Marshall (1920) identifies three key sources of agglomeration

economies: labour market pooling, input sharing, and knowledge spillovers. My approach is most closely linked with

learning spillovers. One can imagine a setting in which task-specific knowledge is ‘spilt’ or exchanged voluntarily

between workers. Such learning spillovers of task-specific human capital would give rise to agglomeration economies

(8)

be particularly relevant in explaining how location specific returns to different occupations have evolved over time following technological shifts in job task requirements over past decades as observed by Autor et al. (2003). Lastly, a mechanism based on tasks is inherently a micro- foundation for occupation-specific agglomeration economies. As such it provides a deeper understanding of factors driving urban wage premia. It also allows for calibration of the model, which can be used to make out-of-sample predictions. In the remained of paper, I therefore intermittently use both the task definition of agglomeration economies and its occupation-specific equivalent, as is convenient for exposition.

Finally, I define wages. Each individual inelastically supplies one unit of labour, so her income is her physical productivity times the price of the occupational output produced

w(σ, c; α) = p(σ)q(σ, c; α). (5)

Labour market and preferences

Each individual chooses both an occupation and a location in which to produce. Workers differ in their ability to train for an occupation, and in their idiosyncratic preferences for locations. For ease of exposition, let us consider this process as a sequential decision making, where workers first train for an occupation, before learning their location taste and sorting across cities. This assumption is without loss of generality. The exercise could easily be adapted into a simultaneous decision- making framework, since worker’s comparative advantage across occupations does not depend on their choice of location.

In the first instance, individuals choose an occupation to maximise their expected utility, given heterogeneous training costs for learning each task. The total cost of training for an occupation depends on the cost of learning each task and on the intensity at which that task is demanded in the occupation. It is equal to

B(σ; α) ≡ Z

τ∈T

b(τ, α)H(τ, σ)dτ (6)

where b(τ, α) > 0 is the cost for learning a task τ by an individual of ability α. I assume that b(τ, α) is strictly positive, twice differentiable, and strictly submodular in τ and α, and is strictly decreasing in α. This implies that higher-α individuals have a comparative advantage in learning tasks which are indexed with higher-τ - that is, in tasks that benefit more from agglomeration economies (as described in section above). In Appendix A.1, I show that B(σ, α) is strictly submodular in σ and α, so that higher α individuals have a comparative advantage in training for higher σ occupations.

Note that this cost B(σ; α) can also be seen as an investment into education, which is generally

(9)

referred to as ‘skill’ in labour economics. As such, this specification allows for an important distinction between skill, a property which can be acquired or invested into, and ability, which is given ex-ante and exogenous. Though in equilibrium they are perfectly correlated, these are two separate concepts. Also note that, as before, it is equivalent whether we had made the assumption of submodularity at task level, b(τ, α), or directly at occupation level, B(σ, α). The benefit of micro-founding the assumption at task level allows us to capture potentially how skill supply in the economy responds to changes in job task demands.

Once individuals have trained for an occupation, and these costs are sunk, serendipity occurs and individuals learn of their preferences for different locations. Workers then choose a location to maximise their utility, given their occupation. To model this location choice, I make use of the conditional logit model, first formulated in the utility maximization context by McFadden (1973).

Thus, I assume that the log utility function in location c for an individual of ability α who has chosen occupation σ is:

ln U (C(σ, c; α), N (σ, c; α)) = ρ + (1 − β) ln C(σ, c; α) + β ln N (σ, c; α) + η(α, c) (7) where C(σ, c; α) is the consumption of final good, N (σ, c; α) is the consumption of housing, and η(α, c) is an i.i.d. preference for a location. ⁶ I assume that η(α, c) is drawn from a Type I Extreme Value distribution with a shape parameter λ. Utility is maximised subject to budget constraint C(σ, c; α) + r(c)N (σ, c; α) ≤ w(σ, c; α), where r(c) is the price of local housing at c.

Given this set-up, the optimisation framework can be specified as follows. First, individuals choose an occupation to maximise their expected utility

max σ E ^c [U (σ, c; α)] − B (σ; α). (8) Next, they observe their preference shock, and choose a location to maximise their utility

max c U (c; α, σ(α)). (9)

Locations and housing market

There is an inelastic supply of housing, N, in each location, owned by absentee landlords. The housing markets are perfectly competitive, so that the market-clearing housing price in location c, consistent with assumption on preferences in (7), is equal to

r(c) = β w(c)L(c) ¯

N . (10)

where w(c) ¯ is the average wage in that location and local population is given by L(c). Note that L(c) also stands for population density, as housing supply is assumed constant across locations.

6

The i.i.d. preference shocks introduce mobility frictions, which will be relevant for obtaining imperfect sorting of

(10)

Definition of a competitive equilibrium

In a competitive equilibrium, individuals maximise their utility, final-good producers and landowners maximise profits, and markets clear.

To qualify the equilibrium, I denote by g(σ, α) the share of α-individuals choosing optimally occupation σ, and by π(σ, α) the share of α-individuals choosing location c. Then, the resulting distribution of individuals across occupations and location is such that

g(σ, α) > 0 ⇐⇒ {σ} ∈ arg max E c [U (σ, c; α)] − B(σ; α) (11) and

π(c, α) > 0 ⇐⇒ {c} ∈ arg max U(c; α, σ(α)). (12) Profit maximisation by final-good producers yields demands for occupational output

Q(σ) = I

p(σ) D(σ)

−

, (13)

where I ≡ L P

c

R

σ

R

α p(σ)q(σ, c, α)π(c, α)g(σ, α)f(α)dαdσ is the total income of all workers in the economy. By free entry, these producer’s profits are zero.

Market clearing requires that the housing market clears, that the demand and supply of occupational output are equal, and that every individual is located somewhere.

N = βL

r(c) Z

σ∈Σ

Z

α∈A

p(σ)q(c, σ ; α)π(c, α)g (σ, α)f (α)dσdα ∀c (14)

Q(σ) = X

c∈C

Q(σ, c) = L X

c∈C

Z

α∈A

q(c, σ; α)π(c, α)g(σ, α)f(α)dα ∀σ (15) f (α) = X

c∈C

π(c, α) = X

c∈C

Z

σ∈Σ

π(c, α)g(σ, α)f(α)dσ ∀α (16)

A competitive equilibrium is a set of functions Q : Σ → R ⁺ , g : Σ × A → R ⁺ , π : C × A , → R ⁺ , r : C → R ⁺ and p : Σ → R ⁺ , such that conditions (10) to (16) hold.

Existence of a competitive equilibrium

To solve for a competitive equilibrium, I start by characterising an individual’s optimal choice of occupation and location. Then, I set out conditions for the competitive equilibrium to exist and be unique. Finally, I consider some relevant properties of this equilibrium.

L EMMA 1 (Occupational assignments.) In a competitive equilibrium, there exists a continuous

and strictly increasing matching function M : A → Σ, such that (i) g(σ, α) > 0 if and only if

M (α) = σ, and (ii) M (α) = σ and M ( ¯ α) = ¯ σ.

(11)

The proof of Lemma 1 builds on the analogous exposition in Costinot and Vogel (2010) and is provided in Appendix A.2. Lemma 1 implies that higher-α individuals sort into higher- σ occupations. Given the strict submodularity of B(σ, α), it follows immediately that individuals of higher ability have a comparative advantage in occupations with higher training costs. As a consequence, individuals of higher innate ability are overrepresented in occupations which are on average more productive, but that also require higher training costs. Observationally, this would imply that more able individuals are also those with higher levels of skill (where skill is measured by level of education). Note that in what follows, σ is a function of α (as given by the matching function σ = M (α) in Lemma 1), but for clarity of exposition I denote it simply as σ.

Given sunk choice of occupation, the workers choose a location to maximise (7). The indirect log utility function for living in each location is

ln v(c, σ, α) = ln p(σ) + α + A(σ, c) − βr(c) + η(α, c).

I exploit the fact that locational choice does not depend on the price of occupation nor on the ability of worker (other than through the preference shock η(α, c)), so that the maximisation problem can be rewritten

arg max

c ln p(σ) + α + A(σ, c) − βr(c) + η _α,c = arg max

c ln V (σ, c) + η(α, c). (17) where I have defined ln V (σ, c) ≡ A(σ, c) − βr(c).

By the properties of the conditional logit model, the outcome of this maximisation gives the share of workers of occupation σ that decide to live in city c

π(σ, c) = V (σ, c) ^1/λ P

k (V (σ, k)) ^1/λ . (18)

Substituting in for V (σ, c), we obtain the following lemma.

L EMMA 2 (Locational assignments) The share of individuals of occupation σ choosing to produce in city c is

π(σ, c) = (e ^A(σ,c) r(c) ^−β ) ^1/λ P

k (e ^A(σ,k) r(k) ^−β ) ^1/λ .

The relative supply of workers of some occupation σ between two cities depends only on the

relative (within-occupation) real wages between these locations. Locations with higher average

productivity, and lower housing prices, attract more people.

(12)

The model being block-recursive, we only need to solve for a vector of housing rents r to determine the full equilibrium. ⁷ I proceed as follows. Taking (14), and substituting in using (13), (4) as well as Lemmas 1 and 2, we can obtain the following system of equations that governs housing rents in each location c:

r(c) = X

k

r(k)

! 1/ Z

σ

e (1+λ)A(σ,c) r(c) ^−β 1/λ

P

k [e (1+λ)A(σ,k) r(k) ^−β ] ^1/λ ( P

k

e (1+λ)A(σ,k) r(k) ^−β 1/λ

P

k [e ^A(σ,k) r(k) ^−β ] ^1/λ

) ^1−1/

ζ(σ)dσ (19) where ζ(σ) ≡ D(σ)

h βL

N e ^M

⁻¹

^(σ) f (M ⁻¹ (σ)) i 1−1/

is a set of variables and parameters that do not depend on r nor c. The equilibrium vector r then determines π(σ, c) as per Lemma 2, which in turn is sufficient to characterise all endogenous variables as per (13)-(16).

In what follows, I consider the existence and uniqueness of this equilibrium. I show that a sufficient condition for the existence and uniqueness of equilibrium is that the elasticity of substitution ≥ 1. In the case of < 1, when occupations are too strong a complements, it is not clear that a unique equilibrium does exist.

L EMMA 3 (Existence and uniqueness) For ≥ 1, a competitive equilibrium characterised by equations (10)-(16) exists and is unique.

Proof. We can consider solving equations in (19) as finding the zeros of an analogous system of

“scaffold” functions F . The scaffold function for each location c is obtained by rewriting (19) such that:

F (r ⁰ , r(c)) = (20)

X

k

r ⁰ (k)

! λ/(λ+β) 





Z

σ

e (1+λ)/λA(σ,c)

P

k

e (1+λ)A(σ,k) r ⁰ (k) ^−β 1/λ ^−1/

P

k [e ^A(σ,k) r ⁰ (k) ^−β ] ^1/λ 1−1/ ζ(σ)dσ







λ λ+β

− r(c)

I verify that the following properties hold for equation (20):

(i) For all r ⁰ ∈ R ^c ++ , there exists r(i) such that F (r ⁰ , r(i)) = 0, (ii) ∂F (r ⁰ , r(i))

∂r(i)

∂F (r ⁰ , r(i))

∂r(j) < 0 for all j,

(iii) There exists r ⁰ such that for r(i) defined in F (sr ⁰ , r(i)) = 0, r(i) = o(s).

7

Note that the housing prices r(c) are proportional to the location’s total revenue w(c)L(c) ¯ by (10). Solving for

r(c) is equivalent to solving for w(c)L(c). ¯

(13)

Then, the existence of an equilibrium vector r ^∗ follows from Lemma 1 of Allen, Arkolakis and Li (2015). ⁸

To prove uniqueness, I check that the following properties also hold:

(iv) F (r ⁰ , r) satisfies gross-substitution,

(v) F (r ⁰ , r(i)) can be decomposed as F (r ⁰ , r(i)) = g(r(i)) − h(r(i)) where g(r(i)) and h(r(i)) are, respectively, homogeneous of degree α and β, with α < β.

A sufficient condition for both propositions to hold is ≥ 1. ⁹ Then, uniqueness follows from Theorem 2 of Allen, Arkolakis and Li (2015). This completes the proof of Lemma 3.

Properties of a competitive equilibrium

I proceed to consider some relevant properties of this equilibrium. ¹⁰ First, I examine the relationship between location size L(c) and the location’s TFP , indexed by c. Aggregating population shares given in Lemma 2, it can be shown that location size is increasing in c.

L EMMA 4 (Population size) For a large enough λ/β, population size is increasing in total factor productivity of the location, indexed by c.

Proof. Combining Lemma 2 and (14), we obtain the following expression for the equilibrium population size of a given location c:

L(c) =





 Z

σ

e

^1+λ^λ

^A(σ,c)+M

⁻¹

^(σ) p(σ)κ(σ)dσ

| {z }

x

1







−β λ+β





 Z

σ

e

^λ¹

^A(σ,c) κ(σ)dσ

| {z }

x

2







λ−β λ+β

,

where κ(σ) ≡ f (M ⁻¹ (σ)) h P

k e

¹^λ

^A(σ,k) r(k) ⁻

^β^λ

i ⁻¹

> 0 is a collection of occupation-specific variables that are common to all locations.

8

It is obvious that condition (i) holds since setting r(c) equal to the first term on the right-hand side of the equation implies F (r

⁰

, r(i)) = 0. Condition (ii) also holds because

^∂F(r_∂r(i)⁰^,r(i))

= −1 and

^∂F(r

0,r(i))

∂r(j)

> 0. Setting F(sr

⁰

, r(i)) implies r(i) ∝ s

λ+(λ+β)

(λ+β)

, then condition (iii) also holds. Hence there exists a set of {r(i)} that satisfy equation (20).

9

That F (r

⁰

, r) satisfies gross-substitution is evident from the inspection of (20). That property (v) holds as well can be shown as follows. Set g(r(i)) equal to the first term of the expression on the right-hand side, and h(r(i)) = r(i). Then g(r(i)) is homogeneous of degree

_(λ+β)^λ

−

₁₋¹

, and g(r(i)) of degree 1. A sufficient condition for

λ

(λ+β)

−

₁₋¹

< 1 is ≥ 1.

10

Note that, in all of the discussion that follows, none of the results depend on the value of the elasticity of

substitution . Therefore, they hold true for any 0 < < ∞.

(14)

This function is increasing in c if and only if λ

β > 1 + ∂ ln(x ₁ )/∂ ln(c)

∂ ln(x ₂ )/∂ ln(c)

where all terms on the right-hand side are positive by the strict supermodularity assumption on A(σ, c).

This inequality can be interpreted as follows. Remember that λ governs dispersion in preferences across locations, and therefore mobility frictions, while β governs the consumption share of housing, which capture congestion costs. The lower is dispersion in preferences (higher λ) and the lower is the share of spending on housing, the more likely it is that agglomeration forces out-power congestion forces and guarantee that location size is increasing in location TFP . If congestion costs are over-bearing however, it is possible to observe locations which are at the top of productivity distribution, but smaller in size, as rents crowd out low-income workers. One such example would be for instance the Silicon Valley, which has some of the highest average wages, but low population density due to high real estate prices.

Given that the correlation of a location’s TFP and its population density is 0.98 for German cities in my sample, I proceed with the assumption that we are in the case where Lemma 4 holds.

That is, the city size is increasing in c. Automatically then, for the remainder of this paper, c becomes the index for city’s population as well.

L EMMA 5 (Rent schedule) Rents are increasing in city size, indexed by c.

Proof. Rearranging (19), the rents in location c can be expressed as r(c) =

Z

σ

˜

κ(σ)e

^1+λ^λ

^A(σ,c) dσ

_λ+β^λ

,

where ˜ κ(σ) is again a collection of variables that are independent of c. ¹¹ By the strict supermodularity on A(σ, c), it is evident that r(c) is increasing in c.

Finally we can show that if Lemma 5 holds, average wages are increasing in city size.

L EMMA 6 (A city’s productivity) For a large enough λ, average wages are increasing in city size.

Proof. By combining Lemmas 4 and 5, we obtain the following expression for average wages in location c:

¯ w(c) =

Z

σ

e

^1+λ^λ

^A(σ,c)+M

⁻¹

^(σ) p(σ)κ(σ)dσ Z

σ

e

^λ¹

^A(σ,c) κ(σ)dσ ⁻¹

A sufficient condition for this term to be increasing in c is that λ > 0 be large enough.

11

Specifically, κ(σ) ˜ ≡ [ P

k

r(k)]

^1/

n P

k

e

(1+λ)A(σ,k)

r(k)

^−β

^1/λ

o

−1/

n P

k

e

^A(σ,k)

r(k)

^−β

^1/λ

o

1/−1

ζ(σ).

(15)

Given the primitives of the model, we can now derive two key properties of interest, which will serve as the basis for the empirical estimation in Section 3.

First, note that by (4) and (5), the log earnings of an individual of ability α, choosing optimally to produce in occupation σ = M(α) and location c, is given by:

ln(w(α, σ, c)) = ln(p(σ)) + α + Z

τ∈T

Z(τ, c)H(τ, σ)dτ. (21)

where, as we have defined before A(σ, c) ≡ R

τ∈T Z (τ, c)H(τ, σ)dτ. It is useful to keep this equation in mind as it will serve as a reference for the empirical analysis in next section. It also brings us directly to the first proposition.

P ROPOSITION 1 (Ability-biased urban wage premia) Wages are log-supermodular in worker ability α and city size c.

Proof. By (21), for any α ⁰ > α and c ⁰ > c:

ln

w(α ⁰ , M (α ⁰ ), c ⁰ ) · w(α, M (α), c) w(α ⁰ , M (α ⁰ ), c) · w(α, M (α), c ⁰ )

= A(M (α ⁰ ), c ⁰ )+A(M (α), c)−A(M (α ⁰ ), c)−A(M (α), c ⁰ ) > 0 by strict supermodularity of A(σ, c) and by monotonically increasing M (α).

Occupations which benefit the most from agglomeration economies, are also those in which high-ability individuals possess a comparative advantage. In equilibrium, this gives rise to the urban-wage premia that is increasing in worker ex-ante ability. Since high-α individuals are also those that invest more into skill, then wages are also log-supermodular in skill and city size. ¹²

This complementarity in ability and city size also drives the sorting of workers across locations of different size, as next proposition states.

P ROPOSITION 2 (Sorting across locations) Distribution of workers is log-supermodular in worker ability and city size.

Proof. By Lemma 2, we have that for any two workers α ⁰ > α and locations c ⁰ > c:

ln

π(M (α ⁰ ), c ⁰ ) π(M (α ⁰ ), c)

π(M(α), c) π(M (α), c ⁰ )

= 1

λ [A(M (α ⁰ ), c ⁰ ) + A(M (α), c) − A(M (α ⁰ ), c) − A(M (α), c ⁰ )] > 0.

This expression is strictly positive by strict supermodularity on A(σ, c) and monotonically increasing M (α).

12

Recall that B(σ, α) is increasing in σ by assumption. Then B(M (α), α) is increasing in α by Lemma 1.

(16)

3 Estimation of the model

In the previous sections I showed how task-specific agglomeration economies can give rise to urban wage premia which are skill-biased in equilibrium. In this section, I bring the model to data.

Specifically I wish to test (i) if there are significant differences in agglomeration effects across tasks, and if so, (ii) how these task-specific urban productivity advantages relate to the observed skill-bias in urban earnings. Additionally, I test whether these differences in task agglomeration economies can explain sorting patters of workers across locations, as the model would be predict.

3.1 Specification: Task specific urban wage premia

I start by considering Proposition I. Direct estimation of equation (21) requires controlling for unobservable worker ability when estimating the relationship between wages and city size. I follow the approach used by Glaeser and Mar´e (2001), Combes, Duranton, and Gobillon (2008), and de la Roca and Puga (2017) who estimate static urban wage premia while including worker fixed- effects to control for unobserved heterogeneity. In addition to their specification, I also allow for heterogeneity in task-specific city size earning premia. Using a panel dataset of individual earnings, I estimate the following baseline specification:

ln(w _cσ(it) ) = µ _c + µ _i + µ _σ + µ _t + X it β + X

τ

δ _τ h _{τ σ(it)} × ln(dens _c ) + ε _it (22) where w _cσ(it) is the wage of an individual i at a time t who is observed to be working in location c and occupation σ; µ _c and µ _σ are indicators for city and occupation, while µ _i and µ _t are individual and time fixed-effects, respectively; X it is a set of worker time-varying characteristics such as tenure, experience and sector indicators, as well as an indicator for the highest level of education achieved; h _{τ σ(it)} is the observed share of time spent on task τ by an occupation σ; dens _c is the population density of city c in which the individual is currently working; and ε _it is an error term.

Parameter δ _τ captures task-specific agglomeration economies. The indicator for city µ _c absorbs the differences in productivity across locations for the reference occupation. ¹³ The indicator for occupation µ _σ captures the differences in productivity across occupations in the reference location. ¹⁴ Therefore the parameter on the interacted task-intensity and location density, δ _τ , captures any remaining heterogeneity in city size earnings premia across occupations. The strength

13

The reference occupation is the one with the highest intensity in the dropped task.

14

In addition to controlling for differences in average returns per task in the reference location, the occupation indicators also might capture any additional occupation-specific productive advantages not covered by my model.

That is why it is preferable to control for occupation averages rather than task averages.

(17)

of the task-based approach is that we can quantify these differences in agglomeration effects across occupations along a tractable dimension - the task intensity.

A possible source of bias in this estimation is that the task-based approach might be capturing differences to factor prices across locations. If the true mechanism of agglomeration effects is through the skill-bias, the estimates of δ _τ would be upward biased for tasks which are intensively used by skilled occupations. To control for this potential confounding effect, I run a robustness check and control for the skill-bias in agglomeration effects by adding the term δ s I ^sit × ln(dens c ) to the baseline specification (22), where I ^sit is an indicator function for the level of education of worker i at time t. The parameter δ s then measures variation in city size earning premia across skill groups, not captured by variation in task demands.

3.2 Specification: Sorting across locations

In addition, I wish to test whether the observed local labour force composition is consistent with Proposition II. The model predicts that occupations intensive in high-δ _τ tasks should be over- represented in larger cities. Specifically, by Lemma 2, the relative sorting rate between two occupations σ ⁰ > σ into some city c is

ln

π(σ ⁰ , c) π(σ, c)

= 1 λ

Z

τ∈T

Z (τ, c) [H(τ, σ ⁰ ) − H(τ, σ)] dτ + ln P

k (e ^A(σ,k) r(k) ^−β ) ^1/λ P

k (e ^A(σ

⁰

^,k) r(k) ^−β ) ^1/λ . (23) To estimate, I run directly the equivalent of equation (23) on a cross-section of observed differences in sorting between each occupation-pair into some location c:

ln

π(σ ⁰ , c) π(σ, c)

= ν _σ + ν _σ

⁰

+ X

τ

θ _τ (h _{τ σ} − h _{τ σ}

⁰

) × ln(dens _c ) + ι _σσ

⁰

_c (24) where π(σ, c) is the fraction of individuals of occupation σ working in location c; ν σ is the indicator for occupation σ, which controls for the term ν σ = ln P

k (e ^A(σ,k) r(k) ^−β ) ^1/λ

in equation (23);

and ι σσ

⁰

c is the error term. The estimated parameter θ τ governs sorting patterns across tasks. It captures how differences in task intensities between two occupations correlate with their relative sorting rates into larger cities. We should expect the estimates of θ τ to be correlated with the estimates of δ τ from (22), since according to the theoretical framework θ τ = λδ τ = ^∂Z(τ,c) _∂c . In other words, an occupation’s intensity in the tasks that benefit more from agglomeration effects, should be predictive of its higher sorting into larger cities.

4 Data

To estimate wage differentials across tasks and locations, while controlling for any unobservable

(18)

locations. The social security records for Germany (SIAB) fulfils these needs. Two additional advantages make the German dataset particularly appropriate. Firstly, the SIAB administrative dataset contains an extremely detailed measure of occupations, coded at around 331 different values. This makes it particularly amenable to analysing occupation-level differences in earnings at a level which cannot be achieved by alternative social security datasets. Secondly, Germany is one of the few countries with representative data on task content of occupations available through rich labour market surveys - The Qualification and Career Surveys (QCS). These surveys are available in six cross-sections in the period of 1979-2012, and are coded on the same occupation classification as SIAB. By mapping along the dimension of 331 occupation codes, I am able to approximate the task content of workers in the SIAB panel dataset.

4.1 SIAB

The dataset “Sample of Integrated Labour Market Biographies (SIAB)” is provided by the Institute for Employment Research (IAB) of the German Federal Employment Agency (BA). This rolling panel dataset covers a 2% random sample of administrative social security records from 1975 to 2014 and is representative for 80% of the German workforce. It covers all employees subject to social security contributions, which includes all white- and blue-collar workers as well as apprentices, but excludes civil servants, the self-employed and individuals performing military service.

The dataset includes 1,757,925 individuals whose employment histories are covered in 58,220,255 lines of data. The unit of observation in the data is any change in the individual’s employment status, which includes changes in occupation, wage or type of contract within the same firm. As such, the data in this spell-based format is exact to the day. I convert this dataset into a monthly panel that includes each individual’s occupation, wage, and work location as well as some socio-demographics for the longest spell in that month.

Work location of every employment spell is observable at the level of administrative districts.

There are 401 administrative districts in Germany of which 107 are “urban” and 294 are “rural”

districts. ¹⁵ By definition, urban districts are cities that constitute districts in their own right. Rural districts on the other hand may comprise one or more smaller towns. Seeing as some smaller towns are comprised within the rural districts, I keep all districts in the analysis sample. This choice is also motivated by the observation that elasticity of wages to population does not appear to be discontinuous between rural and urban districts, as can be seen in Figure 1. In the appendix, I run robustness tests on the subset of urban districts only.

15

District shapefile can be seen in maps in Appendix B.4, which are discussed later in the paper.

(19)

4.2 QCS

To obtain data on the intensity of tasks performed by each occupation, I make use of the BIBB Qualification and Career Survey (QCS) by the German Federal Institute for Vocational Training (Bundesinstitut f¨ur Berufsbildung; BIBB) for the year 2006. Amongst other question, survey respondents were required to indicate how relevant a set of 16 general tasks is to their job.

Following Spitz-Oener (2006) I group the 16 tasks into three categories: abstract, routine or manual. Appendix B.1 shows the mapping of tasks. By standardising this variable, I obtain an estimate of the average time spent on each of the three task types for 301 occupational groups included in the dataset.

Since both SIAB and QCS are based on same occupation identifiers, I am able to merge the QCS occupation-level averages of task intensities into the SIAB dataset along the occupation dimension. As there might be concerns about individual level variation in task intensities across locations, I run a regression of factors determining the task intensities across individuals in the QCS dataset. The estimates are shown in Appendix B.2. As can be seen, there are significant differences in task intensities across city size, but this significance disappears once we control for occupation (except for some estimates on the manual task). This finding would suggest that most of the heterogeneity in task intensity across locations can be explained by sorting of occupations across cities. As such, averaging task intensities by occupation should not lead to any biases from this perspective.

The resulting panel dataset includes full employment histories, with data on wage growth, occupational and sociodemographic changes, as well as occupation-specific data on task intensities.

4.3 Sample selection

I limit the sample to the period between 1995-2014. This is done for two reasons. Firstly, following

unification, data for the entire Berlin district (which is partially included under East Germany) only

became credible in 1995. Secondly, as I am using the data on task intensities from the 2006 QCS

wave, this represents a neat mid-point in the period studied. Furthermore I restrict the sample

under analysis to West Germany only. Due to obvious historical reasons, there are considerable

differences in wages between East and West Germany that persist even to this day. Average wages

in East Germany are significantly lower than those in the West, even after controlling for city size

and observable worker characteristics, as Figure 4 in Appendix B.5 makes evident. For robustness

of exercise and comparability to studies on other countries, I restrict my analysis to West Germany

only. Finally, I limit the sample to full-time working, males of Germany nationality between the

(20)

An apparent particularity of Germany is that its capital and second city by density, Berlin, is estimated to have one of the lowest fixed-effects in the sample. The reason is evidently historical. ¹⁶ Despite it being an outlier, I keep Berlin in the sample for the entire analysis as at the very worst it could be introducing a downward bias on the relationship between earnings premia and city size, and so goes against the mechanisms I wish to estimate.

The daily wages, obtained directly from the SIAB dataset, are calculated based on the fixed- period wages reported by the employer and the duration of the (unsplit) original notification period in calendar days. These wages are converted into real daily wages using annual consumer-price indices for West Germany. Additionally, I drop observations where real daily wage is below 10EUR, in 2010 real terms, as such observations are assumed unreasonable as described in B¨uttner and R¨assler (2008). In what follows, when I refer to wages, it is implied that they are in logs and corrected for inflation.

Finally, wages are top-censored at the censoring level which differs by year. I compute the wages above the censoring threshold using the same method as Card, Heining, and Kline (2013), with the exception that I treat districts as they treat firms. Therefore, I run a series of 800 tobit imputations for 4 age groups (20-29, 30-39, 40-49, 50-65), 10 aggregate occupation groups, and 20 years, separately. In each tobit estimation, the observed censored log wage is regressed on a constant, age, education, and an indicator variable for each month. Furthermore, I include the individual’s average log wages (excluding the current period) and the fraction of top-censored wage observations over her career (excluding current censoring status). Instead of including firm mean log-wages as in Card, Heining and Kline (2013), I include instead annual mean log-wages in the district as well as the log density of the district population (where density is calculated as population over area). ¹⁷ Given the predicted values X ⁰ β from the tobit regressions, as well as the estimated standard deviation σ, I proceed to impute the censored wages w ^s as follows: w ^s = X ⁰ β +σΦ ⁻¹ [κ+u(1−κ)], where u ∼ U [0, 1] and κ = Φ[(s−X ⁰ β)/σ] and s is the official censoring limit. Censored wages are then replaced by these imputed values, while uncensored wages are

16

The Economist covers this phenomenon in a recent article, calling out Berlin for being “poor but sexy.” In the article entitled “Why is Berlin so dysfunctional? - Unlike other capitals, Germany’s is a drain on the rest of the country,” The Economist goes on to explain that this situation was not always the case. Before the second world war, Berlin was a prominent industrial hub of the country. However, following the division, Berlin was no longer an attractive business environment which led most of the firms to relocate their factories to West Germany. Consequently, even after the wall fell, most of the firms, now well established in West Germany, had little incentive to move back.

Instead, Berlin attracted artists and bohemians that flocked into abandoned factories and warehouses, turning Berlin into an art hub of Europe. As such, Berlin is unique among major European capitals to make the country poorer on average. Without Berlin, Germany’s GDP per person would be 0.2% higher.

17

I also include a dummy whenever the individual is observed only once in the entire sample, in which case the

mean life-time wage and fraction of censored observations take the value of the sample mean.

(21)

kept as in the original data source. Appendix B.3 lists selected percentiles of the distribution of simulated earnings. ¹⁸

Due to considerable amount of missing values for education level, I correct these by grouping education into three main categories, as in Fitzernberger, Osikominu, and V¨olter (2006). The corrected educational categories can take the values: low (without postsecondary education), medium (apprenticeship or Abitur) and high (university degree). I then drop all the observations for which the corrected educational category is missing. This results in substantial attrition, taking the sample from 48,098,327 to 41,703,558 observations.

For certain estimation steps, it is convenient to work with aggregated occupation groups. For this purpose, I create a new variable which aggregates the 3-digit occupations into nine professional groups following the classification in B¨ohm, von Gaudecker and Schran (2017). The resulting variable, which I refer to as a “profession,” to differentiate it from the 3-digit occupation coding, can take the value of any of the following nine categories: Managers, Professionals, Technicians, Craftspeople, Sale personnel, Office workers, Production workers, Operators/labourers, and Service personnel.

Finally, I drop all observations for which the 3-digit occupation is not available. This is the case for 655,870 observations. I also drop all occupations for which there is no information on task intensities in the QCS dataset, which leads to a loss of an additional 804,257 observations. The final sample, which covers the working histories of German men in all districts of West Germany in the period of 1995-2014 counts 40,243,431 observations. In parts of the analysis where I limit the sample to urban districts only for robustness checks, the sample decreases further to 16,497,568 observations.

4.4 Summary statistics and descriptive patterns

Table 1 provides some descriptive statistics of the dataset. For a clear overview, professions are ordered by increasing intensity of the abstract task, relative to non-abstract tasks. Production profession spends the least amount of time doing abstract tasks - 39% of the time, while Managers perform abstract tasks most frequently - 55% of the time. The opposite holds true for both routine and manual tasks. Several observations can be made just by looking at these descriptives, that go in support of my model’s features.

Firstly, as the model would predict, there is a strong correlation between the intensity of the abstract task and the level of education. The majority of the workers in top-abstract professions

18

Using equivalent extrapolation method on the Spanish social security data, De la Roca and Puga (2017) and De

la Roca (2017) show that extrapolated wages match well the observed wages for the years in which observations were

not capped.

(22)

Table 1: Descriptive statistics, by profession

Pr oduction

Operators Craftspeople Ser vice

Technician Sales Office w ork

ers

Pr ofessionals

Managers Ov erall

Task intensity

Abstract 39% 40% 42% 44% 48% 51% 52% 53% 55% 45%

Routine 32% 30% 29% 24% 27% 24% 23% 23% 22% 27%

Manual 29% 30% 29% 32% 25% 25% 25% 24% 23% 27%

Education level

Low 20% 24% 7% 12% 2% 3% 3% 1% 1% 10%

Medium 78% 73% 90% 72% 78% 77% 71% 32% 40% 69%

High 2% 3% 3% 17% 20% 20% 26% 67% 59% 21%

Average daily wage

(in real 2010 EUR) 98 87 97 94 140 125 129 167 185 113

Profession intensity by population density (pop/km

²

)

ln(dens) < 7.0 1.15 1.05 1.15 0.90 0.96 0.93 0.86 0.80 0.79

7.0 < ln(dens) < 7.5 0.90 0.98 0.84 1.02 1.13 1.09 1.07 1.18 1.05

7.5 < ln(dens) < 7.9 0.77 0.93 0.75 1.12 1.11 1.14 1.31 1.27 1.30

7.9 < ln(dens) 0.54 0.85 0.64 1.39 0.97 1.12 1.32 1.62 1.73

Number of obs.

(in millions)

9.80 4.55 6.16 3.03 2.51 2.74 3.42 6.25 1.78 40.24

Note: Table shows averages by profession group over the period 1995-2014 for all West German districts. Professions are ordered by abstract task intensity, ranging from lowest to highest. First part of the table, task intensity, shows the average time spent on each task by profession. Second part, gives the share of workers in each profession by level of education. Profession intensity is calculated as the share of workers of a given profession working in that location group, relative to the share of total population working in said location group (

R ^π(σ,c)

σπ(σ,c)dσ

.)

(23)

(Managers and Professionals) have the highest education degree, the majority of those in middle- abstract professions have a medium-level degree, while almost the entire work-force in the bottom three abstract professions (Production, Craftspeople and Operators) only have a low- or medium-education level. The gradient of education level is monotonically increasing across professions, along the dimension of the abstract task. This can be seen more clearly from Table 2. Here, individuals of each education group are ranked according to quartiles of the frequency at which they perform the abstract task. Individuals in lowest education are predominatnly in the lowest abstract task quartile, while the opposite holds true for the high education group. These observations are in line with Lemma 1.

Table 2: Abstract task intensity by education group Education

Low Medium High Abstract task quartile

1st 59% 25% 2%

2nd 25% 29% 3%

3rd 12% 28% 31%

4th 4% 18% 64%

Note: Table shows the repartition of workers by education level into occupations by quartiles of task intensity. Averages are calculated for all West German districts, over the period 1995-2014.

Secondly, wages are increasing in the intensity of abstract task, as can be observed from Table 1. Managers benefit from highest average daily wages, while Operators earn the least on average.

Bar few exceptions, wages are monotonically increasing between the two extremes of abstract task intensity.

Thirdly, professions more intensive in abstract tasks are over-represented in denser locations.

Table 1 displays the profession intensity by location density. ¹⁹ As purported by the model, densest locations are most intensive in high-abstract professions. Cities with log population density above 7.9 (equivalent to 3,000 individuals per squared kilometre), have overrepresented shares of workers in abstract professions such as Service, Sales, Professionals and Managers. Low-density locations, on the other hand, are intensive in non-abstract professions, such as Production and Crafts. For a better geographical overview, I show the geographical repartition of average task intensities by

19

Profession intensity is defined as the share of a profession in a given location, relative to total population share in

that location (

^R^π(σ,c)_π(σ,c)

.)

(24)

district in Appendix B.4. Here, districts are ranked according to the average time spent on each task by their working population. As can be observed from the maps, denser districts (represented by smaller surface area) tend to specialise in abstract tasks, while the more spread-out districts (larger surface area) specialise predominantly in manual tasks.

5 Empirical Results

Before turning to the results of the baseline estimations, I first run some descriptive regressions to show comparability to previous studies and to highlight the role occupations play in explaining the heterogeneity in city size earning premia. An impatient reader can skip directly to sections 5.2 and 5.3 for main results.

5.1 Occupations and urban wage premia: Descriptive regressions

For illustration purposes, I first run a simple pooled ordinary least squares regression to estimate the average static earning premium per location:

ln(w _ict ) = µ _c + µ _t + X _it β + ξ _it , (25) omitting all other variables from (22). Time-varying individual observables such as experience, tenure and sector indicators are included, but not occupation indicators. Column 1 of Table 3 reports the results. All coefficients are of the expected sign.

Next, as in de la Roca and Puga (2017), I regress the estimates of city indicators from Column 1 on log city size. Results of this regression are shown in Column 2. The elasticity of the earnings premium with respect to city size is estimated to be 0.0373. When I restrict the sample to urban districts only, the estimated elasticity increases to 0.065, as shown in Table 10 in the Appendix B.6. Both of these estimates are within the range of previous results in comparable regressions.

Combes, Duranton, Gobillon, and Roux (2010) estimate an elasticity of 0.051 for France, Glaeser and Resseger (2010) find an elasticity of 0.041 for United States, and De la Roca and Puga (2017) obtain an estimate of 0.046 for Spain.

Appendix B.5 show the plots of these estimated city fixed-effects in Column 1 against log city size. In this Appendix, Figure 5 shows the estimated fixed-effects of all districts in the sample, while Figure 6 shows the equivalent estimates when the sample is restricted to urban districts only. There is a strong and positive linear relationship between city size and earnings premia, even after controlling for all observables other than the occupation, and this relationship remains robust regardless of restrictions on district type.

Next, I consider the role of sorting of occupations across locations in explaining city size wage

premia. In column 3, I re-estimate (25) while in addition allowing for 331 occupation indicators.

(25)

Table 3: Descriptive regression: estimation of city-size wage premia

(1) (2) (3) (4) (5) (6)

Dependent variable: Log City indicator Log City indicator Log City indicator earnings coefficients earnings coefficients earnings coefficients

column (1) column (2) column (3)

Log city density 0.0373* 0.0290* 0.0142***

(0.0027) (0.0023) (0.0016)

City indicators Yes Yes Yes

Occupation indicators Yes Yes

Worker fixed-effects Yes

Experience 0.030* 0.025* 0.036***

(0.0000) (0.0000) (0.0000)

Experience

²

-0.001* -0.001* -0.001***

(0.0000) (0.0000) (0.0000)

Medium education 0.196* 0.101*

(0.0002) (0.0002)

High education 0.560* 0.281*

(0.0002) (0.0002)

Observations 41,044,262 326 41,044,262 326 41,044,262 326

Adj. R

²

0.47 0.37 0.57 0.33 0.20 0.20

Notes: All specifications include a constant term. Columns (1), (3), and (5) include month and year indicators, two-digit

sector indicators, as well as a linear and a non-linear term for days of tenure. There are 326 city indicators and 331

occupation indicators. Coefficients are reported with robust standard errors in parenthesis, which are clustered by worker

in columns (1), (3) and (5). , , and indicate significance at the 1, 5, and 10 percent levels. The R

²

reported in

column (6) is within workers. Worker values of experience and tenure are calculated on the basis of actual months worked

and expressed in years.

(26)

The estimated elasticity of earnings premium with respect to city size decreases by around 20%

to 0.0290 (Column 4). Sorting of workers across locations by occupation profiles has a strong predictive power on wage premia of larger cities. Similar magnitude of the effect is estimated when restricting the sample to urban districts only (see Column 4 in Table 10 in Appendix B.6.).

Further allowing for worker-fixed effects (Column 5) decreases the estimate by another 50%.

This drop in earnings elasticity following the introduction of worker fixed-effects is comparable to the that in previous studies: Combes, Duranton, Gobillon, and Roux (2010) find a drop of 35% for France, while de la Roca and Puga (2017) find a decline of 50% for Spain in similar exercises. This significant decrease in the earnings premium relative to city size after controlling for worker fixed- effects would suggests the relevance of sorting by workers across locations on unobserved ability, even within very granulated occupation categories. Combined, controlling for sorting by workers on occupations and unobserved ability explains some 60% of the original elasticity of earnings premia in Column 2. Overall, this would suggest that productivity differences across occupations are an important factor in explaining earning differences across space, though unobserved ability of workers within occupations continues to play a major role.

Finally, I explore how the wage premia of larger cities varies across skill- and profession-groups of workers. Starting with a simple pooled ordinary least squares estimation (as in Column 3 of previous example) I additionally allow for a term capturing skill-bias in agglomeration economies (δ s I ^sit × ln(dens c )). Column 1 in Table (4) shows results. In line with previous empirical studies, I find that productivity gains of bigger cities increase in individual’s skill level. A worker with university degree earns 35% more in the densest city (Munich) than an observationally equivalent worker in the least dense district. For a worker with vocational training only, this premium drops to 28%, and for someone with a high-school diploma only, this premium is a mere 18%.

In Column 2, I additionally allow for city size earnings premia to vary with 81 occupation codes. ²⁰ Allowing for heterogeneity in earnings premia across types of occupations decreases the estimates of skill-bias in agglomeration economies by roughly one half. These estimates of occupation specific city size premia depend positively on the abstract task intensity of the occupation, as my model would predict. In Column 3, I regress the estimated coefficients of occupation-specific agglomeration premia on the average occupational intensity in abstract task.

The coefficient is positive (0.006) and highly significant. If instead we allow the density earning premium to vary directly with abstract task intensity, as I do in Column 4, we obtain comparable results. Earnings premia of bigger cities increases in the complexity of work performed, and this heterogeneity in task-specific urban premia explains around one half of the observed skill-bias of agglomeration economies in a pooled ordinary least squares estimation.

20

When all 331 occupation indicators are interacted with log city density, majority of estimates are not significant.

(27)

Table 4: Heterogeneity in city size earnings premia

(1) (2) (3) (4) (5)

Dependent variable: Log Log Occupation Log Log

earnings earnings x pop-density earnings earnings coefficients

column (2)

City indicators Yes Yes Yes Yes

Occupation indicators Yes Yes Yes Yes

Worker fixed-effects Yes

Medium education 0.038* 0.075* 0.060***

(0.0008) (0.0008) (0.0008)

High education 0.168* 0.212* 0.237***

(0.0010) (0.0012) (0.0010)

Medium education x log pop-density 0.010* 0.004* 0.006* 0.006*

(0.0001) (0.0001) (0.0001) (0.0002)

High education x log pop-density 0.017* 0.010* 0.007* 0.012*

(0.0001) (0.0000) (0.0002) (0.0003)

Abstract task x log pop-density 0.007***

(0.0000) 81 occupation indicators x log pop-density Yes

Abstract task 0.006***

(0.0003)

Observations 41,044,262 41,044,262 81 40,240,350 40,240,350

R2 0.57 0.57 0.06 0.57 0.22

Notes: All specifications include a constant term. Columns (1)-(2) and (4)-(5) also include month and year indicators,

two-digit sector indicators, as well as non-linear terms for days of experience and tenure. There are 326 city indicators

and 331 occupation indicators, unless otherwise indicated. Task intensities are standardised by their respective standard

errors. Coefficients are reported with standard errors in parenthesis. , , and indicate significance at the 1, 5, and

10 percent levels. The R

²

reported in column (3) is within workers.

T ASKS , CITIESANDURBANWAGEPREMIA

T ASKS , CITIES AND URBAN WAGE PREMIA ∗

Anja Grujovic †

Click here for latest version

September 6, 2019

Abstract

Centro de Estudios Monetarios y Financieros ( CEMFI ), Casado del Alisal 5, 28014 Madrid, Spain (e-mail:

1 Introduction

The coefficients on the latter term represent occupation-specific urban wage premia that remain significant even after controlling for any skill price differences across locations of different size.

Frictions to mobility imply imperfect sorting of workers across locations.

For estimates see Glaeser, 2011, for the United States, Combes, Duranton and Gobillon, 2008, for France, and de

la Roca and Puga, 2017, for Spain. Some of their results are discussed later in the text.

Figure 1: Average wages and population density

Notes: Average daily wages for West German urban districts over the period 1995-2014, in real 2010 EUR. Sample includes only male workers of Germany nationality between the ages of 20 and 65. Population density is calculated as 2011 population over the surface area of the administrative district.

Using high-quality administrative data for Germany for the period 1995-2014, I find evidence

of considerable heterogeneity in city size earning premia across tasks. I find that estimates of task-

specific urban wage premia are significant and robust to controlling for skill-level of workers, with

largest coefficients on tasks usually considered as more complex. One standard deviation increase

in abstract task intensity is associated with a 0.3 percentage point (or 21 percent for the average

worker) increase in the elasticity of earnings with respect to population size. These results suggest

that variations in job requirements are an important determinant in understanding wage inequalities

Figure 2: Heterogeneity in urban wage premia by profession, after controlling for city-size differences in returns to education

Estimation is based on a sample of full-time working males in Western Germany over the period 1995-2014.

across space within comparable skill groups.

This paper contributes to the recent theoretical literature studying the distribution of skills and

For a recent literature review see Duraton and Puga (2004) and Behrens and Robert-Nicoud (2015).

See Autor and Handel (2013) for a recent literature review of the task-based approach.

Rosenthal and Strange (2004) provide a review of the literature.

2 Theoretical framework

2.0.1 Production

Individuals consume a freely traded final good, whose price is set as the numeraire. Producing the final good requires a continuum of occupations indexed by σ ∈ Σ ≡ [σ, σ]. Assuming a CES production function, the output of the final good is given by

Q = Z

σ∈Σ

D(σ)Q(σ)

dσ

, (1)

where Q(σ) ≥ 0 is the quantity of output of occupation σ, which is freely traded, 0 < < ∞ is the elasticity of substitution between occupations, and D(σ) is an exogenous demand shifter. Profits by final goods producers are given by

Π = Q − Z

σ∈Σ

p(σ)Q(σ)dσ. (2)

Producing occupational output requires only labour that is supplied by a mass of L heterogeneous individuals characterised by their ability α. I denote by f (α) ≥ 0 the density of workers of skill α and by A ≡ [α, α] the set of abilities available in the economy.

The output q of a given occupation σ depends on the intensity at which it uses each tasks, on the ability α of the worker completing the tasks, and on the location c in which it is being produced:

q(σ, c; α) = e α+ R

Z(τ,c)H (τ,σ)dτ , (3)

where Z (τ, c) is the efficiency of task τ production in city c per unit of time, and H(τ, σ) is the

share of time an individual in occupation σ is required to spend on task τ. This share of time,

or ‘task intensity’, is exogenous, occupation-specific and defined such that H(τ, σ) ∈ [0, 1] and R

τ∈τ H(τ, σ )dτ = 1 for all σ.

q(σ, c; α) = e α+A(σ,c) , (4)

where A(σ, c) = R

τ∈T Z (τ, c)H(τ, σ)dτ is the relative occupation-specific productivity across locations. By the assumptions on Z (τ, c) and H(τ, σ) specified above, A(σ, c) is twice- differentiable, strictly supermodular in σ and c, and strictly increasing in c.

Several mechanisms can give rise to this effect. Marshall (1920) identifies three key sources of agglomeration

economies: labour market pooling, input sharing, and knowledge spillovers. My approach is most closely linked with

learning spillovers. One can imagine a setting in which task-specific knowledge is ‘spilt’ or exchanged voluntarily

between workers. Such learning spillovers of task-specific human capital would give rise to agglomeration economies

Finally, I define wages. Each individual inelastically supplies one unit of labour, so her income is her physical productivity times the price of the occupational output produced

w(σ, c; α) = p(σ)q(σ, c; α). (5)

Labour market and preferences

B(σ; α) ≡ Z

τ∈T

b(τ, α)H(τ, σ)dτ (6)

Note that this cost B(σ; α) can also be seen as an investment into education, which is generally

Thus, I assume that the log utility function in location c for an individual of ability α who has chosen occupation σ is:

Given this set-up, the optimisation framework can be specified as follows. First, individuals choose an occupation to maximise their expected utility

max σ E c [U (σ, c; α)] − B (σ; α). (8) Next, they observe their preference shock, and choose a location to maximise their utility

max c U (c; α, σ(α)). (9)

Locations and housing market

There is an inelastic supply of housing, N, in each location, owned by absentee landlords. The housing markets are perfectly competitive, so that the market-clearing housing price in location c, consistent with assumption on preferences in (7), is equal to

r(c) = β w(c)L(c) ¯

N . (10)

where w(c) ¯ is the average wage in that location and local population is given by L(c). Note that L(c) also stands for population density, as housing supply is assumed constant across locations.

The i.i.d. preference shocks introduce mobility frictions, which will be relevant for obtaining imperfect sorting of

Definition of a competitive equilibrium

In a competitive equilibrium, individuals maximise their utility, final-good producers and landowners maximise profits, and markets clear.

To qualify the equilibrium, I denote by g(σ, α) the share of α-individuals choosing optimally occupation σ, and by π(σ, α) the share of α-individuals choosing location c. Then, the resulting distribution of individuals across occupations and location is such that

g(σ, α) > 0 ⇐⇒ {σ} ∈ arg max E c [U (σ, c; α)] − B(σ; α) (11) and

π(c, α) > 0 ⇐⇒ {c} ∈ arg max U(c; α, σ(α)). (12) Profit maximisation by final-good producers yields demands for occupational output

Q(σ) = I

p(σ) D(σ)

−

T ^ASKS , CITIES AND URBAN WAGE PREMIA ^∗

Anja Grujovic ^†

q(σ, c; α) = e ^α+ ^R

^Z(τ,c)H ^(τ,σ)dτ , (3)

q(σ, c; α) = e ^α+A(σ,c) , (4)

max σ E ^c [U (σ, c; α)] − B (σ; α). (8) Next, they observe their preference shock, and choose a location to maximise their utility

A competitive equilibrium is a set of functions Q : Σ → R ⁺ , g : Σ × A → R ⁺ , π : C × A , → R ⁺ , r : C → R ⁺ and p : Σ → R ⁺ , such that conditions (10) to (16) hold.

c ln p(σ) + α + A(σ, c) − βr(c) + η _α,c = arg max