Topics in Applied Econometrics : Panel Data

(1)

Ch 3. Discrete Choice Panel Data Models 2015-16

Topics in Applied Econometrics : Panel Data

Ch 3. Discrete Choice Panel Data Models

Pr. Philippe Polomé, Université Lumière Lyon 2

M2 Equade & M2 GAEXA

2015 – 2016

(2)

Ch 3. Discrete Choice Panel Data Models 2015-16 Introduction to Binary Choices

Outline

Introduction to Binary Choices

RE Model

FE Model

Random Parameters (“Mixed”) Multinomial Model

(3)

Model

I

Underlying latent model (usual notation)

y_it^⇤ =x_it⁰ +↵i+✏it

(1)

I

Assume

✏it

i.i.d. independant of x’s

I and symmetric ditribution functionF(.)

I y^⇤

is not observed

I y_it=1 ify_it^⇤>x_it⁰ +↵_i+✏_it

I elsey_it =0

I

Estimation is by Max Likelihood

(4)

Incidental Parameters ↵

I

LogLikelihood fct ln

L( ,↵₁, ...,↵_N) = X

i,t

yit

ln

F ⇣

↵_i+x_it⁰ ⌘ +X

i,t

(1 yit)

ln

⇣

1

F⇣

↵_i +x_it⁰ ⌘⌘

I

For fixed

T

and

N! 1, the estimators are inconsistent

because the nbr of parameters

! 1

I Incidental parameter problem: inconsistency of↵ˆ carries over to the estimator for

I Was also there with linear models, but we could eliminate the

↵by diﬀerence

I Can we translate that in a non-linear model ?

(5)

Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model

Outline

Introduction to Binary Choices

RE Model

FE Model

Random Parameters (“Mixed”) Multinomial Model

(6)

RE probability

I

Assume

I Zero correlation between↵_i andx_it

I ↵_i iid 0, ↵²

I Also independance acrossi &✏_it iid

I

The (conditionnal)

probability

of observing a certain outcome for

i

is

f (yi1, ...,yiT|xi1, ...,xiT, ,↵i)

I This is the whole sequence fori, over all periodst =1. . .T

I

Assume

independance

between the

✏it

within each individual, then

f (yi1, ...,yiT|xi1, ...,xiT, ,↵i) =⇧tF(yit|xit, ,↵i)

(7)

Maximum Likelihood

I

If

↵_i

was known, say it was

ci

I then we could estimate the parameters of that distribution ( ) from the classical binary choice likelihood

⇧^N_i=₁⇧^T_t=₁[F(xit +ci)]^y^it[1 F(xit +ci)]¹ ^y^it

(8)

Conditional Maximum Likelihood

I

But

ci

is not known and we cannot estimate

↵_i

consistently

I so instead we want to get rid of it

I The way of getting rid of↵_i is tointegrate it out

I That is, take expectation

I

The resulting likelihood (for one

i) is called Conditional Maximum Likelihood

f (yi1, ...,yiT|xi1, ...,xiT, ) = Z +1

1 [⇧tF (yit|xit, ,↵i)]g(↵i)d↵i I

For the sample, the log-likelihood function of the binary choice

model is a

X

i

X

t

on the Conditional ML

I Although it is now unconditional on↵_i

(9)

Integration

I

The integral has not usually a

closed form

I It cannot be solved analytically

I We use numerical integration techniques

I

The Gauss-Hermite quadrature is very common

I It is just a way of choosing the length of the rectangles

I

Later, we will see simulation techniques

(10)

Integration in Stata

I

In principle quadrature works with any distribution for

↵

and

✏

I In Stata, both xtlogit and xtprobit commands use thenormal distribution for↵

I If↵is in fact distributed diﬀerently ...

I The choice in Stata is on the distribution for✏: Logit or Probit

I In the RE model, that may not have much importance

I The number of points of integration

I i.e. the number of rectangles

I can be adjusted

I It should not be too small

I Increase the number of these points until the results are stable

(11)

Serial Correlation

I

When

independance

between the

✏_it

cannot be assumed within each individual, then

I We cannot writef (y_i1, ...,yiT|x_i1, ...,xiT, ,↵i)as

⇧tF(yit|xit, ,↵i)in the Conditional Likelihood Function

I

That does not invalidate the conditional approach, but

I There is now a multiple integral to integrate over all the periods jointly

I Above 3 such integrals (4 periods and more), quadratures do not work nicely

I

Instead, use

simulation-based

methods

I They are not implemented in Stata

I Stata does not appear to allow correlation between the✏it I Possibly, use R instead

I mlogitpackage below does not do that

(12)

Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model

Outline

Introduction to Binary Choices RE Model

FE Model

Random Parameters (“Mixed”) Multinomial Model

(13)

Logit Probability

I

Fixed eﬀects estimation is possible for the panel

logit

model, using the conditional MLE

I Butnotfor other binary panel models such as panel probit

I Strict exogeneity (of thex) must hold

I Independance of the✏_it acrosst must hold

I And of course acrossi as usual

I

For the logit model, it is possible to show that the

(conditionnal) probability of observing a certain outcome for

i

(over all periods

t) is

f (y_i1, ...,yiT|x_i1, ...,xiT, ,↵_i) =

exp

(↵iP

tyit)

exp

⇣⇣P

yyitx_it⁰⌘ ⌘

⇧t⇥

1

+

exp

↵i +x_it⁰ ⇤

(14)

Unchanging Behavior

I

If

P

tyit =

0 or

=T

, then substituting in the above conditionnal probability shows that

I Changes inx_it fromt tot cannot explain choices fory

I since such choices do not change overt

I ↵_i, being time-invariant, issuﬃcient to explain the choice for eithery_it=08t ory_it =1 8t

I

Such

i

can be dropped from the likelihood

I as they add no information on

I Stata does it automatically

I

So, only

i

that change status/behavior at least once are

relevant for estimating

(15)

Changing Behavior : Take T = 2 and P

t

y

it

= 1

I

Then only the sequences

{0,

1} or

{1,

0} are possible :

Pr

(

(0,

1)

|X

t

yit=

1,

↵i, )

=

Pr

{(0,

1)

|↵_i, }

Pr

{(0,

1)

|↵_i, }+

Pr

{(1,

0)

|↵_i, }

I

Pr

{(0,

1)

|↵_i, }=

Pr

{y_i1 =

0|

↵_i, }

Pr

{yi2=

1|

↵_i, }

I

With the logistic :

I Pr{y_it=1|↵i, }= exp(↵_i+x_it ) 1+exp(↵i+xit )

I Pr{yit=0|↵i, }= 1

1+exp(↵_i+x_it )

(16)

Changing Behavior : Take T = 2 and P

t

y

it

= 1

I

So,

Pr

{(0,

1)

|↵i, }=

1

+

exp

(↵i +x_i1 )

exp

(↵i+x_i2 )

1

+

exp

(↵i+x_i2 )

I

And Pr

(

(0,

1)

|X

t

yit =

1

,↵_i, )

=

exp

(↵i+x_i2 )

exp

(↵i +xi2 ) +

exp

(↵i +xi1 )

=

exp

(x_i2 )

exp

(x_i2 ) +

exp

(x_i1 )

=

exp

((xi2 xi1) )

1

+

exp

((x_i2 x_i1) )

I

Thus

↵_i

drops out of the model

I Much like in a first-diﬀerence linear model

(17)

Estimation

I

This means that we can estimate the FE logit model for

T =

2 using a

standard logit

I withx_i2 x_i1(a first diﬀerence !) as explanatory variables and

I the change in yit as the endogenous event (1 for a positive change, 0 for a negative one)

I

For the case with larger

T

it is more cumbersome to derive all the necessary conditional probabilities

I but in principle it is a straightforward extension of the above case

(18)

Other FE Binary Panel Issues

I

A similar transformation exists for a dynamic panel binary model

I Provided at least four time periods

I Conditions on they_it time-series that may be diﬃcult to test

I See Cameron & Trivedi 23.4.4

I Not implemented in Stata

I

The elimination of the individual eﬀect

↵_i

I Changes the interpretation

I e.g. a one-unit diﬀerence inxit versusxi,t 1 induces a change in the probability of the sequence{yit,yi,t 1}

I compared to a certain probability ifxit=xi,t 1

(19)

To sum up

I

If it can be assumed that

↵i

is

independant

of

xit

I RE ML estimator

I Probit or Logit, does not matter

I Otherwise, we only have the FELogit

estimator

I With a first-diﬀerence interpretation

I onlyi that change status/behavior at least once are relevant for estimating

I

These approaches relies on independance of the

✏_it

I Not of they_it

I It is essential that the↵_i andx_it “filter out” any correlation in they_it

I In general, the remaining correlation will cause inconsistency

I

Packages

I Stata : xtlogit or menu stat!Panel!Binary outcomes

I R : mlogit

(20)

Example : Unionization of Women in the US

I

Setup webuse union

I Loads the dataset on the Stata website : >4000 women observed 1 to 12 times

I

Random-eﬀects logit model (default logit)

I xtlogit union age grade not_smsa south##c.year

I The lattest (##) is so that each variable and their product are in the regression

I South is a dichotomous variable

I South##year induces one dummy for each year and for (South=1 and year=t) : long

I South##c.year treats year as continuous I

Fixed-eﬀects logit model

I xtlogit union age grade not_smsa south##c.year, fe

I 2744 groups (14165 obs) dropped because of all positive or all negative outcomes

(21)

Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model

Outline

Introduction to Binary Choices RE Model

FE Model

Random Parameters (“Mixed”) Multinomial Model

(22)

Random Utility Models

I

We are interested in individual

discrete choices

among

J

exclusive alternatives,

j =

1

. . .J

I We assume that each alternativej provides a certain utility to the individual

I Who then compares theJ alternatives on that basis

I e.g. transport mode choice

I

The utility and therefore the choice is purely deterministic from the individual point of view

I It is random from the researcher’s point of view

I because some of the determinants of the utility are unobserved,

I which implies that the choice can only be analyzed in terms of probabilities.

(23)

Random Utility in Cross-section

I i

’s utility for alternative

j

is

Uij =↵_j+ xij + jzi + jwij +✏_ij

where there may be

I alternative specific variablesx_ij

I with a generic coeﬃcient

I e.g. transport mode time to work

I individual specific variableszi

I with alternative specific coeﬃcients j I e.g. age

I alternative specific variablesw_ij

I with an alternative specific coeﬃcient j I e.g. transport mode safety

(24)

Utility Diﬀerences

I

Utility being ordinal, only utility diﬀerences are relevant to modelize the choice for one alternative

I The diﬀerence between the utility of two diﬀerent alternatives j andk is

U_ij U_ik =↵j ↵k+ (x_ij x_ik)+( _j _k)z_i+ _kw_ij _jw_ik+✏_ij ✏ik

so that thez_i terms drop oﬀwhen j = _k

I ForJalternatives, there areJ 1 such utility diﬀerences

(25)

Utility Diﬀerences

I

Moreover, only diﬀerences of these coeﬃcients are relevant and may be identified

I For example, with three alternatives 1, 2 and 3,

I the three coeﬃcients associated to an individual specific variable cannot be identified,

I but only two linear combinations of them.

I Therefore, a choice of normalization is necessary

I the most simple one is 1=0

I

Coeﬃcients for alternative specific variables may (or may not) be alternative-specific

I For example, transport time is alternative specific

I And individual-specific since it depends on location

I So there could be a constant

I But may be 10 mn in public transport don’t have the same value than 10 mn in a car

I So that would be a j

(26)

Conditional Probabilities

I

If alternative

l

is chosen,

I Rewrite the utility diﬀerences as

U_l U_j=V_l V_j+✏_l ✏_j wherevj =↵j+ xij+ jzi+ jwij

I Possibly withoutzi in earlier models

I

The general expression of the probability of choosing alternative

l

is then :

Pl|✏l =

Pr

{Ul >U₁, . . . ,Ul >Uj}

=F l(✏₁<Vl V₁+✏_l, . . . ,✏_J <Vl VJ +✏_l)

where

F l

is the multivariate distribution of

J

1 error diﬀ.

I F l is a J 1 dimensional integral that depends on the unobservable✏l

(27)

Unconditional Probabilities

I

Since

✏_l

is unobserved,

I We must first remove it fromP_l

I

This is done by taking expectation

Pl =

Z

Pl|✏_lfl(✏l)d✏_l

that is, integrating over

F l

I AJ dimensional integral

(28)

Log-Likelihood Function

I

Write

yij

equal to 1 if

i

chose

j

, and zero otherwise

I For any given choice situation, there areJ such variables

I

If the

✏_ij

are independant across alternatives

j

,

I the probability of the choice made byi is :

Pi=Y

j

P_ij^y^ij

which collapses toP_il for any particular choice l

I that is theP_l from above

I with an addedi index in the context of a sample

I in log: lnPi =P

jyijlnPij

I

Over a sample of independant observations:

ln

L=X

i

ln

Pi =X

i

X

j

yij

ln

Pij

I

We seek to maximize this function over the set of parameters

of the utility diﬀerences

(29)

Standard Multinomial Logit

I

The standard multinomial logit model probability

Pil

is

Pil =

exp

n ₀ xil

o P

j

exp

{ ⁰xij}

I

This is due to McFadden 1974,

I Assuming a “Gumbel” distribution and iid for✏

I And non-random coeﬃcients

I Alternative-invariant regressors are impossible unless becomes j

I That is the next model I

I assume you know this model

I If not!Cameron & Trivedi, Chapter 15

(30)

Random Parameter (“Mixed”) Multinomial Logit

I

The mixed logit model is

Pil| i =

exp

n ₀

ixil

o P

j

exp

_i⁰xij

the

i

coeﬃcients are treated as random

I Just as the↵_i in panel linear models

I Therefore, the probability becomes conditional on the vector of random coeﬃcientsP_il| i

I

See Train 2003 for a complete presentation

I The✏ij are already integrated out as in the McFadden formula

I Random parameters are one way to address the Independance of Irrelevant Alternative issue

(31)

Unconditional Mixed Logit Probabilities

I

As earlier, the conditional probability must be made unconditional by taking expectations

I This time, over the i:

P_il =E [P_il| i] = Z

P_il| if( _i)d _i

where the integration is

I multiple over all the elements of , with possible correlations

I over the support of , usually 1,+1

I implies that we assume a distributionf for I

The question is how to compute this integral ?

(32)

Panel Discrete Choice

I

In a panel context, the iid hypothesis of the

yit

is untenable

I Since successive observations of a single individual are likely correlated

I But independance of the✏it will be assumed as before

I The↵i (individual-specific constant) also have to be integrated out

I Which is only possible by assumingE(↵i) =↵8i

I But here, that is also the case any slope coeﬃcient i I So only in a RE model

I

More specifically, we compute one probability for each

i

and

this is this probability that is included in the log-likelihood

function

(33)

Panel Discrete Choice Probabilities

I

For a given vector of coeﬃcients

i

, the probability that alternative

l

is chosen for the

t^th

observation of

i

is

Pitl = P

exp

{ ixitl}

j

exp

{ ixitj}

I Across all alternatives that isP_it=Y

l

(P_itl)^y^itl

I The joint probability for theT observations of individuali is

P_i=Y

t

Y

l

P_itl^y^itl

I In this formulation, the✏_itj areindependant over time-profile

I But correlation in the behavior is modelled because the i

coeﬃcients are constant in time

(34)

Panel Discrete Choice

I

Panel data in this case are often used for survey data

I where several similar choice situations are often presented to respondants

I e.g. choice of transport modes under diﬀerent attributes

I Prices, frequency, duration...

I As indicated, experimental data are also suitable

I

Lagged dependent variables can be added to mixed logit

I without adjusting the probability formula

I or simulation method

I Provided they “behave”

I This is not explicited in the literature I

We now focus on the integration technique

(35)

Maximum Simulated Likelihood Principle

Outline

Introduction to Binary Choices RE Model

FE Model

Random Parameters (“Mixed”) Multinomial Model Maximum Simulated Likelihood Principle

(36)

Simulation

I

The probabilities for the random parameter logit

I are integrals with no closed form

I the degree of integration is high

I Quadrature techniques become untractable

I

Instead

simulation

techniques are used

I i.e. the expected value is replaced by an arithmetic mean

I

Simulations of a rv are pseudo-random draws from that rv

I Most computer packages have a routine for the Uniform

I e.g. rand() in excel, runif in R

I From the uniform, there exist formulas for all the other distributions

I

Application of the ideas on simulation to ML estimation

I Key result: Simulation can lead to an estimator with the same distribution as the MLE

I Provided the number of simulation draws made to compute the probability for each observation! 1

(37)

Numerical Approximation Maximum Likelihood

I

Assume:

I independenceover observations

I and thaty has conditional densityf(y|x,✓)

I or probabilities for the discrete choice case

I butf(y|x,✓)has no closed-form expression

I anintractableintegral

I

Replace the integral by a

numerical

approximation

f˜(y|x,✓),

I

maximize ln

L˜N(✓) =P_N

i=1

ln

f˜(yi|xi,✓)

with respect to

✓

(38)

Numerical Approximation Maximum Likelihood Properties

I

The estimator will be

I consistent and have the same asymptotic distribution as ML

I iff˜(y|x,✓)is a good approximation

I

The resulting first-order conditions

I are usually nonlinear

I and are solved by iterative methods

I but we do not discussed that

(39)

Simulator

I

Let

f (y|x,✓)

take the following general form

f (yi|xi,✓) =

Z

h(yi|xi,✓,ui)g(ui)dui

without a closed-form solution

I andui isunobservable

I so the estimated parameter vector✓ cannot depend on it

I We sayui must beintegrated out(taking expectations) I

The

direct simulator

for

f (yi|xi,✓)

is the Monte Carlo sum

f˜(yi|xi,uiS,✓) =

1

S

XS s=1

h(yi|xi,✓,u^s_i)

where

uiS

is a vector of S draws

u_i^s ,s =

1, ...,

S

I that are independent simulated draws from unobservedg(u_i)

I Weassume the distribution of unobservedu_i isg(u_i)

(40)

Simulators Properties

I

Such

f˜iS

is unbiased and consistent for

fi

I as the number of drawsS ! 1

I So we dropuiS from the notation I

The direct simulator is one case of simulator

I Other simulators exist

I in some cases doing a better job at approximatingfi I depending on the distribution ofg(ui)

I

Generally we want that the simulator

f˜i

be

diﬀerentiable

I so that gradient methods may be used to optimize the likelihood function

I Gradient methods: based on first-order (or second-order) derivatives

(41)

Maximum Simulated Likelihood

I

In general, the Maximum Simulated Likelihood estimator is simply

✓ˆ_MSL

that maximises

ln

L˜N(✓) = XN i=1

ln

f˜(yi|xi,✓) = XN i=1

ln 1

S

XS s=1

h(yi|xi,✓,u_i^s)

I

To eliminate “chatter” caused by simulation and help numerical convergence

I the underlying Monte Carlo draws used to constructf˜_i should not be redrawn

I as then✓ would changes across the optimization iterations

I ReminduiS is a vector of S drawsui^s

(42)

Random Parameter MN Logit Max Simulated Likelihood

I

More precisely, for the multinomial mixed logit case

1.

Make an initial hypothesis about the distribution of the

random parameter

I e.g. i ~ normal(µ, )

2.

Draw

R

numbers from that distribution

I And keep them throughout

3.

For each draw

_i^r

, compute probability

P_il^r =

exp

{ ^rixil} P

j

exp

_i^rxij

4.

Compute the average of these probabilities

P¯il =PR

r=1P_il^r/R 5.

Use these simulated probabilities into the log-likelihood

I Which is then a “pseudo”-likelihood 6.

Numerical maximization of this ln

L

as usual

(43)

Application: Transport Mode Choice

In R, with package mlogit installed:

I

library("mlogit")

I

data("Train", package = "mlogit")

I Loads the data in memory

I

Tr <- mlogit.data(Train, shape = "wide", varying = 4:11, choice = "choice", sep = "", opposite = c("price", "time",

"change", "comfort"), alt.levels = c("choice1", "choice2"), id

= "id")

I Reshape the data in a form suitable for mlogit command

I

ml <- mlogit(choice ~ price + time + change + comfort, Tr, panel = TRUE, rpar = c(time = "cn", change = "n", comfort

= "ln"), correlation = TRUE, R = 20, tol = 10, halton = NA)

I Regress “Choice” on 4 regressors

I With random parameters for the last 3

I With distribution cn censored normal, n normal, ln log-normal

I Correlation between parameters allowed