Ch 3. Discrete Choice Panel Data Models 2015-16
Topics in Applied Econometrics : Panel Data
Ch 3. Discrete Choice Panel Data Models
Pr. Philippe Polomé, Université Lumière Lyon 2
M2 Equade & M2 GAEXA
2015 – 2016
Ch 3. Discrete Choice Panel Data Models 2015-16 Introduction to Binary Choices
Outline
Introduction to Binary Choices
RE Model
FE Model
Random Parameters (“Mixed”) Multinomial Model
Ch 3. Discrete Choice Panel Data Models 2015-16 Introduction to Binary Choices
Model
I
Underlying latent model (usual notation)
yit⇤ =xit0 +↵i+✏it
(1)
I
Assume
✏iti.i.d. independant of x’s
I and symmetric ditribution functionF(.)
I y⇤
is not observed
I yit=1 ifyit⇤>xit0 +↵i+✏it
I elseyit =0
I
Estimation is by Max Likelihood
Ch 3. Discrete Choice Panel Data Models 2015-16 Introduction to Binary Choices
Incidental Parameters ↵
I
LogLikelihood fct ln
L( ,↵1, ...,↵N) = Xi,t
yit
ln
F ⇣↵i+xit0 ⌘ +X
i,t
(1 yit)
ln
⇣1
F⇣↵i +xit0 ⌘⌘
I
For fixed
Tand
N! 1, the estimators are inconsistentbecause the nbr of parameters
! 1I Incidental parameter problem: inconsistency of↵ˆ carries over to the estimator for
I Was also there with linear models, but we could eliminate the
↵by difference
I Can we translate that in a non-linear model ?
Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model
Outline
Introduction to Binary Choices
RE ModelFE Model
Random Parameters (“Mixed”) Multinomial Model
Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model
RE probability
I
Assume
I Zero correlation between↵i andxit
I ↵i iid 0, ↵2
I Also independance acrossi &✏it iid
I
The (conditionnal)
probabilityof observing a certain outcome for
iis
f (yi1, ...,yiT|xi1, ...,xiT, ,↵i)
I This is the whole sequence fori, over all periodst =1. . .T
I
Assume
independancebetween the
✏itwithin each individual, then
f (yi1, ...,yiT|xi1, ...,xiT, ,↵i) =⇧tF(yit|xit, ,↵i)
Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model
Maximum Likelihood
I
If
↵iwas known, say it was
ciI then we could estimate the parameters of that distribution ( ) from the classical binary choice likelihood
⇧Ni=1⇧Tt=1[F(xit +ci)]yit[1 F(xit +ci)]1 yit
Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model
Conditional Maximum Likelihood
I
But
ciis not known and we cannot estimate
↵iconsistently
I so instead we want to get rid of it
I The way of getting rid of↵i is tointegrate it out
I That is, take expectation
I
The resulting likelihood (for one
i) is called Conditional Maximum Likelihoodf (yi1, ...,yiT|xi1, ...,xiT, ) = Z +1
1 [⇧tF (yit|xit, ,↵i)]g(↵i)d↵i I
For the sample, the log-likelihood function of the binary choice
model is a
Xi
X
t
on the Conditional ML
I Although it is now unconditional on↵i
Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model
Integration
I
The integral has not usually a
closed formI It cannot be solved analytically
I We use numerical integration techniques
I
The Gauss-Hermite quadrature is very common
I It is just a way of choosing the length of the rectangles
I
Later, we will see simulation techniques
Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model
Integration in Stata
I
In principle quadrature works with any distribution for
↵and
✏I In Stata, both xtlogit and xtprobit commands use thenormal distribution for↵
I If↵is in fact distributed differently ...
I The choice in Stata is on the distribution for✏: Logit or Probit
I In the RE model, that may not have much importance
I The number of points of integration
I i.e. the number of rectangles
I can be adjusted
I It should not be too small
I Increase the number of these points until the results are stable
Ch 3. Discrete Choice Panel Data Models 2015-16 RE Model
Serial Correlation
I
When
independancebetween the
✏itcannot be assumed within each individual, then
I We cannot writef (yi1, ...,yiT|xi1, ...,xiT, ,↵i)as
⇧tF(yit|xit, ,↵i)in the Conditional Likelihood Function
I
That does not invalidate the conditional approach, but
I There is now a multiple integral to integrate over all the periods jointly
I Above 3 such integrals (4 periods and more), quadratures do not work nicely
I
Instead, use
simulation-basedmethods
I They are not implemented in Stata
I Stata does not appear to allow correlation between the✏it I Possibly, use R instead
I mlogitpackage below does not do that
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Outline
Introduction to Binary Choices RE Model
FE Model
Random Parameters (“Mixed”) Multinomial Model
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Logit Probability
I
Fixed effects estimation is possible for the panel
logitmodel, using the conditional MLE
I Butnotfor other binary panel models such as panel probit
I Strict exogeneity (of thex) must hold
I Independance of the✏it acrosst must hold
I And of course acrossi as usual
I
For the logit model, it is possible to show that the
(conditionnal) probability of observing a certain outcome for
i(over all periods
t) isf (yi1, ...,yiT|xi1, ...,xiT, ,↵i) =
exp
(↵iPtyit)
exp
⇣⇣Pyyitxit0⌘ ⌘
⇧t⇥
1
+exp
↵i +xit0 ⇤Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Unchanging Behavior
I
If
Ptyit =
0 or
=T, then substituting in the above conditionnal probability shows that
I Changes inxit fromt tot cannot explain choices fory
I since such choices do not change overt
I ↵i, being time-invariant, issufficient to explain the choice for eitheryit=08t oryit =1 8t
I
Such
ican be dropped from the likelihood
I as they add no information on
I Stata does it automatically
I
So, only
ithat change status/behavior at least once are
relevant for estimating
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Changing Behavior : Take T = 2 and P
t
y
it= 1
I
Then only the sequences
{0,1} or
{1,0} are possible :
Pr
((0,
1)
|Xt
yit=
1,
↵i, )=
Pr
{(0,1)
|↵i, }Pr
{(0,1)
|↵i, }+Pr
{(1,0)
|↵i, }I
Pr
{(0,1)
|↵i, }=Pr
{yi1 =0|
↵i, }Pr
{yi2=1|
↵i, }I
With the logistic :
I Pr{yit=1|↵i, }= exp(↵i+xit ) 1+exp(↵i+xit )
I Pr{yit=0|↵i, }= 1
1+exp(↵i+xit )
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Changing Behavior : Take T = 2 and P
t
y
it= 1
I
So,
Pr
{(0,1)
|↵i, }=1
1
+exp
(↵i +xi1 )exp
(↵i+xi2 )1
+exp
(↵i+xi2 )I
And Pr
(
(0,
1)
|Xt
yit =
1
,↵i, )=
exp
(↵i+xi2 )exp
(↵i +xi2 ) +exp
(↵i +xi1 )=
exp
(xi2 )exp
(xi2 ) +exp
(xi1 )=
exp
((xi2 xi1) )1
+exp
((xi2 xi1) )I
Thus
↵idrops out of the model
I Much like in a first-difference linear model
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Estimation
I
This means that we can estimate the FE logit model for
T =2 using a
standard logitI withxi2 xi1(a first difference !) as explanatory variables and
I the change in yit as the endogenous event (1 for a positive change, 0 for a negative one)
I
For the case with larger
Tit is more cumbersome to derive all the necessary conditional probabilities
I but in principle it is a straightforward extension of the above case
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Other FE Binary Panel Issues
I
A similar transformation exists for a dynamic panel binary model
I Provided at least four time periods
I Conditions on theyit time-series that may be difficult to test
I See Cameron & Trivedi 23.4.4
I Not implemented in Stata
I
The elimination of the individual effect
↵iI Changes the interpretation
I e.g. a one-unit difference inxit versusxi,t 1 induces a change in the probability of the sequence{yit,yi,t 1}
I compared to a certain probability ifxit=xi,t 1
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
To sum up
I
If it can be assumed that
↵iis
independantof
xitI RE ML estimator
I Probit or Logit, does not matter
I Otherwise, we only have the FELogit
estimator
I With a first-difference interpretation
I onlyi that change status/behavior at least once are relevant for estimating
I
These approaches relies on independance of the
✏itI Not of theyit
I It is essential that the↵i andxit “filter out” any correlation in theyit
I In general, the remaining correlation will cause inconsistency
I
Packages
I Stata : xtlogit or menu stat!Panel!Binary outcomes
I R : mlogit
Ch 3. Discrete Choice Panel Data Models 2015-16 FE Model
Example : Unionization of Women in the US
I
Setup webuse union
I Loads the dataset on the Stata website : >4000 women observed 1 to 12 times
I
Random-effects logit model (default logit)
I xtlogit union age grade not_smsa south##c.year
I The lattest (##) is so that each variable and their product are in the regression
I South is a dichotomous variable
I South##year induces one dummy for each year and for (South=1 and year=t) : long
I South##c.year treats year as continuous I
Fixed-effects logit model
I xtlogit union age grade not_smsa south##c.year, fe
I 2744 groups (14165 obs) dropped because of all positive or all negative outcomes
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Outline
Introduction to Binary Choices RE Model
FE Model
Random Parameters (“Mixed”) Multinomial Model
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Random Utility Models
I
We are interested in individual
discrete choicesamong
Jexclusive alternatives,
j =1
. . .JI We assume that each alternativej provides a certain utility to the individual
I Who then compares theJ alternatives on that basis
I e.g. transport mode choice
I
The utility and therefore the choice is purely deterministic from the individual point of view
I It is random from the researcher’s point of view
I because some of the determinants of the utility are unobserved,
I which implies that the choice can only be analyzed in terms of probabilities.
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Random Utility in Cross-section
I i
’s utility for alternative
jis
Uij =↵j+ xij + jzi + jwij +✏ij
where there may be
I alternative specific variablesxij
I with a generic coefficient
I e.g. transport mode time to work
I individual specific variableszi
I with alternative specific coefficients j I e.g. age
I alternative specific variableswij
I with an alternative specific coefficient j I e.g. transport mode safety
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Utility Differences
I
Utility being ordinal, only utility differences are relevant to modelize the choice for one alternative
I The difference between the utility of two different alternatives j andk is
Uij Uik =↵j ↵k+ (xij xik)+( j k)zi+ kwij jwik+✏ij ✏ik
so that thezi terms drop offwhen j = k
I ForJalternatives, there areJ 1 such utility differences
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Utility Differences
I
Moreover, only differences of these coefficients are relevant and may be identified
I For example, with three alternatives 1, 2 and 3,
I the three coefficients associated to an individual specific variable cannot be identified,
I but only two linear combinations of them.
I Therefore, a choice of normalization is necessary
I the most simple one is 1=0
I
Coefficients for alternative specific variables may (or may not) be alternative-specific
I For example, transport time is alternative specific
I And individual-specific since it depends on location
I So there could be a constant
I But may be 10 mn in public transport don’t have the same value than 10 mn in a car
I So that would be a j
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Conditional Probabilities
I
If alternative
lis chosen,
I Rewrite the utility differences as
Ul Uj=Vl Vj+✏l ✏j wherevj =↵j+ xij+ jzi+ jwij
I Possibly withoutzi in earlier models
I
The general expression of the probability of choosing alternative
lis then :
Pl|✏l =
Pr
{Ul >U1, . . . ,Ul >Uj}=F l(✏1<Vl V1+✏l, . . . ,✏J <Vl VJ +✏l)
where
F lis the multivariate distribution of
J1 error diff.
I F l is a J 1 dimensional integral that depends on the unobservable✏l
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Unconditional Probabilities
I
Since
✏lis unobserved,
I We must first remove it fromPl
I
This is done by taking expectation
Pl =Z
Pl|✏lfl(✏l)d✏l
that is, integrating over
F lI AJ dimensional integral
Log-Likelihood Function
I
Write
yijequal to 1 if
ichose
j, and zero otherwise
I For any given choice situation, there areJ such variables
I
If the
✏ijare independant across alternatives
j,
I the probability of the choice made byi is :
Pi=Y
j
Pijyij
which collapses toPil for any particular choice l
I that is thePl from above
I with an addedi index in the context of a sample
I in log: lnPi =P
jyijlnPij
I
Over a sample of independant observations:
ln
L=Xi
ln
Pi =Xi
X
j
yij
ln
PijI
We seek to maximize this function over the set of parameters
of the utility differences
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Standard Multinomial Logit
I
The standard multinomial logit model probability
Pilis
Pil =
exp
n 0 xilo P
j
exp
{ 0xij}I
This is due to McFadden 1974,
I Assuming a “Gumbel” distribution and iid for✏
I And non-random coefficients
I Alternative-invariant regressors are impossible unless becomes j
I That is the next model I
I assume you know this model
I If not!Cameron & Trivedi, Chapter 15
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Random Parameter (“Mixed”) Multinomial Logit
I
The mixed logit model is
Pil| i =
exp
n 0ixil
o P
j
exp
i0xijthe
icoefficients are treated as random
I Just as the↵i in panel linear models
I Therefore, the probability becomes conditional on the vector of random coefficientsPil| i
I
See Train 2003 for a complete presentation
I The✏ij are already integrated out as in the McFadden formula
I Random parameters are one way to address the Independance of Irrelevant Alternative issue
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Unconditional Mixed Logit Probabilities
I
As earlier, the conditional probability must be made unconditional by taking expectations
I This time, over the i:
Pil =E [Pil| i] = Z
Pil| if( i)d i
where the integration is
I multiple over all the elements of , with possible correlations
I over the support of , usually 1,+1
I implies that we assume a distributionf for I
The question is how to compute this integral ?
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Panel Discrete Choice
I
In a panel context, the iid hypothesis of the
yitis untenable
I Since successive observations of a single individual are likely correlated
I But independance of the✏it will be assumed as before
I The↵i (individual-specific constant) also have to be integrated out
I Which is only possible by assumingE(↵i) =↵8i
I But here, that is also the case any slope coefficient i I So only in a RE model
I
More specifically, we compute one probability for each
iand
this is this probability that is included in the log-likelihood
function
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Panel Discrete Choice Probabilities
I
For a given vector of coefficients
i, the probability that alternative
lis chosen for the
tthobservation of
iis
Pitl = P
exp
{ ixitl}j
exp
{ ixitj}I Across all alternatives that isPit=Y
l
(Pitl)yitl
I The joint probability for theT observations of individuali is
Pi=Y
t
Y
l
Pitlyitl
I In this formulation, the✏itj areindependant over time-profile
I But correlation in the behavior is modelled because the i
coefficients are constant in time
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Panel Discrete Choice
I
Panel data in this case are often used for survey data
I where several similar choice situations are often presented to respondants
I e.g. choice of transport modes under different attributes
I Prices, frequency, duration...
I As indicated, experimental data are also suitable
I
Lagged dependent variables can be added to mixed logit
I without adjusting the probability formula
I or simulation method
I Provided they “behave”
I This is not explicited in the literature I
We now focus on the integration technique
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Outline
Introduction to Binary Choices RE Model
FE Model
Random Parameters (“Mixed”) Multinomial Model Maximum Simulated Likelihood Principle
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Simulation
I
The probabilities for the random parameter logit
I are integrals with no closed form
I the degree of integration is high
I Quadrature techniques become untractable
I
Instead
simulationtechniques are used
I i.e. the expected value is replaced by an arithmetic mean
I
Simulations of a rv are pseudo-random draws from that rv
I Most computer packages have a routine for the Uniform
I e.g. rand() in excel, runif in R
I From the uniform, there exist formulas for all the other distributions
I
Application of the ideas on simulation to ML estimation
I Key result: Simulation can lead to an estimator with the same distribution as the MLE
I Provided the number of simulation draws made to compute the probability for each observation! 1
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Numerical Approximation Maximum Likelihood
I
Assume:
I independenceover observations
I and thaty has conditional densityf(y|x,✓)
I or probabilities for the discrete choice case
I butf(y|x,✓)has no closed-form expression
I anintractableintegral
I
Replace the integral by a
numericalapproximation
f˜(y|x,✓),I
maximize ln
L˜N(✓) =PNi=1
ln
f˜(yi|xi,✓)with respect to
✓Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Numerical Approximation Maximum Likelihood Properties
I
The estimator will be
I consistent and have the same asymptotic distribution as ML
I iff˜(y|x,✓)is a good approximation
I
The resulting first-order conditions
I are usually nonlinear
I and are solved by iterative methods
I but we do not discussed that
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Simulator
I
Let
f (y|x,✓)take the following general form
f (yi|xi,✓) =Z
h(yi|xi,✓,ui)g(ui)dui
without a closed-form solution
I andui isunobservable
I so the estimated parameter vector✓ cannot depend on it
I We sayui must beintegrated out(taking expectations) I
The
direct simulatorfor
f (yi|xi,✓)is the Monte Carlo sum
f˜(yi|xi,uiS,✓) =
1
SXS s=1
h(yi|xi,✓,usi)
where
uiSis a vector of S draws
uis ,s =1, ...,
SI that are independent simulated draws from unobservedg(ui)
I Weassume the distribution of unobservedui isg(ui)
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Simulators Properties
I
Such
f˜iSis unbiased and consistent for
fiI as the number of drawsS ! 1
I So we dropuiS from the notation I
The direct simulator is one case of simulator
I Other simulators exist
I in some cases doing a better job at approximatingfi I depending on the distribution ofg(ui)
I
Generally we want that the simulator
f˜ibe
differentiableI so that gradient methods may be used to optimize the likelihood function
I Gradient methods: based on first-order (or second-order) derivatives
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Maximum Simulated Likelihood
I
In general, the Maximum Simulated Likelihood estimator is simply
✓ˆMSLthat maximises
ln
L˜N(✓) = XN i=1ln
f˜(yi|xi,✓) = XN i=1ln 1
SXS s=1
h(yi|xi,✓,uis)
I
To eliminate “chatter” caused by simulation and help numerical convergence
I the underlying Monte Carlo draws used to constructf˜i should not be redrawn
I as then✓ would changes across the optimization iterations
I ReminduiS is a vector of S drawsuis
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Random Parameter MN Logit Max Simulated Likelihood
I
More precisely, for the multinomial mixed logit case
1.Make an initial hypothesis about the distribution of the
random parameter
I e.g. i ~ normal(µ, )
2.
Draw
Rnumbers from that distribution
I And keep them throughout
3.
For each draw
ir, compute probability
Pilr =exp
{ rixil} Pj
exp
irxij4.
Compute the average of these probabilities
P¯il =PRr=1Pilr/R 5.
Use these simulated probabilities into the log-likelihood
I Which is then a “pseudo”-likelihood 6.
Numerical maximization of this ln
Las usual
Ch 3. Discrete Choice Panel Data Models 2015-16 Random Parameters (“Mixed”) Multinomial Model
Maximum Simulated Likelihood Principle
Application: Transport Mode Choice
In R, with package mlogit installed:
I
library("mlogit")
I
data("Train", package = "mlogit")
I Loads the data in memory
I
Tr <- mlogit.data(Train, shape = "wide", varying = 4:11, choice = "choice", sep = "", opposite = c("price", "time",
"change", "comfort"), alt.levels = c("choice1", "choice2"), id
= "id")
I Reshape the data in a form suitable for mlogit command
I
ml <- mlogit(choice ~ price + time + change + comfort, Tr, panel = TRUE, rpar = c(time = "cn", change = "n", comfort
= "ln"), correlation = TRUE, R = 20, tol = 10, halton = NA)
I Regress “Choice” on 4 regressors
I With random parameters for the last 3
I With distribution cn censored normal, n normal, ln log-normal
I Correlation between parameters allowed