Topics in Applied Econometrics : Panel Data
Ch 1. Linear Non Dynamic Panel Data Models
Pr. Philippe Polomé, Université Lumière Lyon 2
M2 Equade & M2 GAEXA
2015 – 2016
Overview of Ch. 1
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Outline
Panel Data Models
Panel Data Estimators Panel Data Inference
Fixed Effects vs. Random Effects Unbalanced Panel Data
Models & Estimators
I
Wider range of models and estimators than with cross-section data
I
3 standard models
I Presented in this Section
I Several estimators presented in the next Section
I Same logic with more sophisticated models
I
The different estimators may be applied to the different models
I With varying results
I Will be a table
General Panel Data Model
I
A very general linear model for panel data
I intercept & slopecoefficients vary over bothi&t yit =↵it+xit it0 +uit
i=1, ...,N: individual (or firm or country),t=1, ...,T: time
I yit scalar dependent variable
I xit K⇥1 vector of independentvariables
I uit scalar disturbance term
I
Too general
I Notestimable : more parameters to estimate than observations
I Further restrictions needed
I on the extent to which↵itand itvary withi andt
I on the behavior of the erroruit
Pooled Model
I
The most restrictive model is a pooled model that specifies constant coefficients
yit =↵+xit0 +uit
(1)
I
If this is correctly specified
I and regressors are uncorrelated with the error,
I then it can be consistently estimated as a cross-section
I That is : just with OLS
Individual and Time Dummies
I
A simple variant of the pooled model (1) has
I Interceptsthat vary across individuals and over time
I Constant slopes
yit =↵i+ t+xit0 +uit
(2) or
yit =XN j=1
↵jdj,it+ XT s=2
sds,it+xit0 +uit
where the
Nindividual dummies
dj,it =1 if
i =jand
=0 otherwise
the
T1 time dummies
ds,it =1 if
t=sand
=0 otherwise
Individual and Time Dummies
I xit
does not include an intercept
I If an intercept is included
I then one of theN individual dummies must be dropped
I Many packages do that
I
Focus on short panels where
N! 1but
Tdoes not
I Then (time intercept) can be consistently estimated
I At least in the sense that there is a finite number of them
I T 1 time dummies are simply incorporated into the regressorsxit
I But if we inserted the full set ofN individual interceptsdj,it I It would cause problems asN! 1
I We cannot estimate consistently an1number of parameters
I Information does not increase on the↵i asN increases I
Challenge : estimating the parameters
I controlling for theN individual intercepts↵i
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Individual-Specific Effects Model
I
Individual-specific effects model :
I each cross-sectional unit has a different intercept term butall slopesare the same
yit =↵i+xit0 +✏it
(3) where
✏itis iid over
iand
tI
= a more parsimonious way to express previous (2)
I Time dummies included in regressorsxit
I “standard” linear non-dynamic panel data model
I noyi(t s)inxit
I ↵i
random variables
I Capture unobserved heterogeneity
I = unobserved time-invariant individual characteristics
I In effect: a random parameter model
Reminder : Unobserved Heterogeneity
I
The correct model is
Y = 0+ 1x1+ 2x2+✏I
But the estimated model is
Y = 0+ 1x1+⌫I
The effect of the missing regressor on
Yis implied in the error of the estimated model :
⌫ = 2x2+✏I = unobserved heterogeneity : Unobserved (individual) factors influence the LHS variable
I
If the missing regressor is correlated with an included regressor
I Then⌫ correlated with at least one included regressor
I LS inconsistent
I Furthermore, possibly :
I Heteroscedasticity ifvar(x2t)6=var(x2s),t6=s
I Autocorrelation ifcorr(x2t,x2s)6=0,t6=s
Reminder : Unobserved Heterogeneity
Same slopes
Exogeneity
I
Throughout this chapter: assume strong/strict exogeneity
E[eit|ai,xi1, ...,xiT] =0
, t =1
, ...,T(4)
I
So that
✏itis assumed to have mean zero conditional on past, current, and future values of the regressors
I Zero covariance
I Nothing is said between the random term↵i andxi
I
Strong exogeneity rules out models with lagged dependent variables or with endogenous variables as regressors (Ch. 2)
I yit 1=↵i+xit0 1 +✏it 1: it is often hard to maintain that E(✏it✏it 1) =0
Fixed Effects Model
I
2 variants to model (3) accordingly with hypotheses on
↵iI Both are models with “2” errors↵i and✏it
I Error component models
I Both variants treat↵i as an unobserved random variable
I
Variant 1 of model (3): fixed effects (FE) model
I ↵i is potentiallycorrelatedwith the (time-invariant part of the) observed regressorsxit
I A form ofunobserved heterogeneity
I “fixed” because early treatments treated↵i as (non-random) parameters to be estimated (hence “fixed”)
Random Effects Model
I
Variant 2 of model (3) : Random effects (RE) model
I ↵i distributed independently of x
I Usually makes the additional assumptions that both the random effects ↵i and the error term✏it in (3) are iid :
↵i ⇠ ↵, 2↵
✏it⇠ 0, 2✏ (5)
I
No distribution has been specified in (5)
I ✏it
may show autocorrelation
I Often it is assumedcov(✏it,✏is)6=0
I While bothcov(✏it,✏jt) =0 andcov(↵i,↵j) =0 are assumed
I Except in spatial models
I ↵
can be treated as the intercept of the model
Random Effects Model
I
Other names for this model :
I One-way individual-specific effects model
I Two-way = inclusion of time-dummies or time-specific random effects
I Random intercept model
I To distinguish the model with more general random effects models e.g. random slopes
I Random components model
I because the error term is↵i+✏it
I
The term fixed effect is potentially misleading
I As said effects are in fact random
I The random effects are “purely” random effects - un-correlated
Equicorrelated Random Effects Model
I
RE model
yit=↵i +xit0 +✏itI can be viewed as regression ofyit onxit
I with composite error termuit=↵i+"it
I The RE hypothesis (5) (↵i and✏it iid) implies that
Cov[(ai +eit),(ai+eis)] =
⇢ sv2a, t 6=s
sv2a+sv2e, t =s
(6)
I
RE model thus imposes the constraint that the composite error
uitis equicorrelated
I SinceCor[uit,uis] = 2↵/[ 2↵+ "2]fort 6=sdoes not vary with the time difference t s
I RE model is also called the equicorrelated model or exchangeable errors model
Synthesis of Panel Data Models
Pooled Model (1) yit=↵+xit0 +uit uit ⇠ 0, u2 Fixed-effects model
yit =↵i+xit0 +✏it (3) Cov(↵i,xit)6=0
Random-effects model ↵i ⇠ ↵, 2↵
✏it ⇠ 0, 2✏ (5)
Outline
Panel Data Models Panel Data Estimators
Panel Data Inference
Fixed Effects vs. Random Effects Unbalanced Panel Data
Panel Data Estimators
I
Several commonly used panel data estimators of
I In this non-dynamic, no endogeneity context : LS variants
I
Differ in the extent to which cross-section and time-series variation in the data are used
I their properties vary according to what model is appropriate
I
A regressor
xitmay be either
I time-invariant,xit=xi fort=1, ...,T ,
I or time-varying
I For some estimators only the coefficients of time-varying regressors are identified
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Definition & Properties
I
Stack the data over
i&
tinto one long regression with
NTobs
I
Estimate
yit=↵+xit0 +uitby OLS
I
Pooled OLS is consistent (when
N ! 1, t constant) ifI Cov[uit,xit] =0 and
I Pooled model (1) is appropriate, or
I RE model is appropriate
I
OLS variance matrix based on iid errors is not appropriate
I as the errors for a given individuali are almost certainly positively correlated overt
Variance Matrix
I
For a given
iwe expect correlation in
yover time :
I Cor[yit,yis]is high
I Pooled modelyit=↵+xit0 +uit
I Even after inclusion of regressors,Cor[uit,uis] may remain6=0
I CallCor[uit,uis] = its I Whent=s, its = 2it
Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃
0 BB BB BB BB BB BB BB BB BB BB BB
@
sv211 sv112 · · · sv11T
sv212 ... ... ... ... ... sv1(T 1)T
SYM · · · sv21T
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2N1 svN12 · · · svN1T
sv2N2 ... ...
... ... ... svN(T 1)T
SYM · · · sv2NT
1 CC CC CC CC CC CC CC CC CC CC CC A
Variance Matrix
I
The RE model accommodates (partly) this correlation
I From (6):
Cov[(ai+eit),(ai+eis)] =
⇢ sv2a, t 6=s sv2a+sv2e, t=s
I
OLS output treats each of the
Tyears as independent information, but
I The information content islessthan this
I given the positive error correlation
I Tends to overstate estimator precision
I
Use panel-corrected standard errors when OLS is applied in a panel
I Many possible corrections, depending on assumed correlation and heteroskedasticity and whether short or long panel
FE Model
I
Pooled OLS is inconsistent if the true model is the FE model
I
Rewrite
yit =↵i +xit0 +✏itas
yit=a+xit�b+ (ai−a+eit)
I
Then Pooled OLS of
yiton
xitand an intercept leads to an inconsistent estimator of if the individual effect
↵icorrelated with
xitI Since such correlation implies that the combined error term (↵i ↵+"it)is correlated with the regressors
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Definition
I
Pooled OLS uses variation over both time and cross-sectional units to estimate
I
Between estimator uses just the cross-sectional variation
I Individual-specific effects model (3) yit =↵i+xit0 +✏it I Average over all years : y¯i =↵i+ ¯xi0 + ¯"i
I arithmetic means over time, per individual
I
between estimator = OLS estimator from regression of
y¯ion an intercept and
x¯iI so implicitly on thebetween model
¯
yi =↵+ ¯xi0 + (↵i ↵+ ¯✏i) i =
1, ...,
N(7)
Properties
I
Uses variations between different individuals
I Is the analogue of cross-section regression
I Variationswithinindividuals are discarded
I
Between is consistent if the regressors
x¯iare independent of the composite error
(↵i ↵+ ¯"i)in (7).
I True for the pooled model (1) and the RE model
I Between is inconsistent for the FE model
I as↵i is then correlated withxitand hencex¯i
I
Between is not normally used as it throws away a lot of info
I But it is didactical
I Do not normally use it in applications
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Within Model
I
Principle: Individual-specific deviations of the dependent variable from its time-averaged value
I areexplained by
I individual-specificdeviationsof regressors from their time-averaged values
I
Individual-specific effects model 3
yit =↵i+xit0 +✏itI Average over time : y¯i=↵i+ ¯xi0 + ¯"i
I Subtract: the↵i terms cancel = thewithinmodel
yit y¯i = (xit x¯i)0 + (✏it ¯✏i)
1
, ...,N, t =1
, ...,T(8)
Within / Fixed Effects Estimator
I
Within estimator = OLS estimator of
yit y¯i = (xit x¯i)0 + (✏it ¯✏i)
I Consistent for in the FE model
I
Called the fixed effects estimator by analogy with the FE model
I does not imply that↵i are fixed
I
Each
imust be observed at least twice in the sample
I Elsexit x¯i =0
Consistency of Fixed Effects Estimator
I
FE treats
↵ias nuisance parameters
I can be ignored when interest lies in
I do not need to be consistently estimated to obtain consistent estimates of the slope parameters
I This result needs not carry over to nonlinear FE models
I
Consistency further requires
E(✏it ¯✏i|xit x¯i) =
0 in the within model
yit y¯i = (xit x¯i)0 + (✏it ¯✏i)
I Because of the averages, that requires more thanE(✏it|xit) =0
I Requires the strict exogeneity assumption (4) E[eit|ai,xi1, ...,xiT] =0, t =1, ...,T
Fixed Effects Estimates
I
If the fixed effects
↵iare of interest they can also be estimated as
↵ˆi = ¯yi x¯i0ˆI unbiasedestimator of↵i
I In short (smallT) panels↵ˆi are alwaysinconsistent, because information never accumulate for them
I Their distribution or their variation with a key variable may be informative
I
If
Nis not too large an alternative way to compute Within is Least-Squares Dummy variable estimation
I Directly estimatesyit=↵i+xit0 +✏it by OLS ofyit onxit and N individual dummy variables
I Yields Within estimator for , along with estimates of theN fixed effects
Time-Invariant Regressors
I
Major limitation of Within
I the coefficients of time-invariant regressors arenot identified
I Since ifxit= ¯xi then x¯i=xi so(xit x¯i) =0
I
Many studies seek to estimate the effect of time-invariant regressors
I For example, in panel wage regressions : the effect of gender or race
I
For this reason many practitioners prefer not to use the within estimator
I
Pooled OLS or RE estimators permit estimation of coefficients of time-invariant regressors
I but are inconsistent if the FE model is the correct model
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
First-Differences Model
I
Principle: Individual-specific one-period changes in the dependent variable
I are explained by
I individual-specificone-period changesin regressors
I
Individual-specific effects model (3)
yit=↵i +xit0 +✏itI Lag one periodyi,t 1=↵i+xi,t0 1 +"i,t 1 I Subtract = thefirst-differences model
yit yi,t 1 = (xit xi,t 1)0 + (✏it ✏i,t 1)
i =
1
, ...,N, t=2
, ...,T(9)
First-Differences Estimator
I
The First-differences estimator is OLS in the first differences model (9)
I
Consistent estimates of in the FE model
I The coefficients of time-invariant regressors arenotidentified
I
First-differences is less efficient than the within estimator
I if"it is iid (forT >2)
I
However, it may safeguard against I(1) variables
I That would wise lead to inconsistency
I See Time-series
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Random Effects Model
I
Individual-specific effects model (3)
yit=↵i +xit0 +✏itI Assume RE model with iid↵i and✏it as in RE hyp (5)
↵i ⇠ ↵, 2↵
✏it⇠ 0, 2✏
I
Pooled OLS is consistent
I But pooledGLSwill bemore efficient
Reminder : GLS in a cross-section
I
When all the hypotheses of the linear model are satisfied but the errors covariance matrix
⌃is not the identity, then
I OLS is consistent
I but it is not efficient if we know⌃
I
Let the classical linear (cross-section) model
y =x0 +✏with
E⇣✏✏0⌘
=⌃6= 2I
I LetP0P=⌃ 1
I Nonunique Cholesky decomposition for real sdp matrix
I Premultiply the linear model byP : Py =Px +P✏
I y⇤=x⇤ +✏⇤
I ThenVar(✏⇤) =E⇣
P✏✏0P0⌘
=PE⇣
✏✏0⌘ P0
I =P⌃P0 =P⇣
P0P⌘ 1
P0 =PP 1⇣ P0⌘ 1
P0 =I
Reminder : GLS in a cross-section
I
So the transformed model has spherical disturbances
I Applying OLS to thetransformeddata is anefficient estimator
I That is GLS
I
Since
⌃is unknown in practice, we need an estimate
I Any consistent estimate of⌃,⌃, yields theˆ Feasible (consistent) GLS estimator
Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃
0 BB BB BB BB BB BB BB BB BB BB BB
@
sv2a+sv2e sv2a · · · sv2a
sv2a sv2a+sv2e ... ... ... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2a+sv2e sv2a · · · sv2a
sv2a sv2a+sv2e ... ... ... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e
1 CC CC CC CC CC CC CC CC CC CC CC A
Random Effects Estimator
I
The feasible GLS estimator of the RE model can be calculated from OLS estimation of the transformed model :
yit ˆ ¯yi =⇣
1
ˆ⌘µ+⇣
xit ˆ ¯xi
⌘0
+⌫it
(10) where
⌫it = (1 ˆ)↵i+ ("it ˆ ¯"i)is asymptotically iid, and
I ˆ
is consistent for
=
1
p 2 ✏✏ +T ↵2
(11)
I
Called the RE estimator
Random Effects Estimator
I
The nonrandom scalar intercept
µis added to normalize the random effects
↵ito have zero mean
I as in the RE hypothesis
I
Cameron & Trivedi provide a derivation of (10) and ways to estimate
↵2and
2"and hence to estimate
I Not detailed here
I
Note
I ˆ =0 corresponds to pooled OLS
I ˆ =1 corresponds to within estimation
I ˆ!1 asT ! 1(look at the formula)
I
This is a two-step estimator of
Random Effects Estimator Properties
I
RE estimator is
I Fully efficientunder the RE model
I The efficiency gain compared to Pooled OLS (applied to the RE model) need not be great
I Might still be inefficientif the equicorrelation hypothesis is not true
I In particular, underAR(1)processes
I Inconsistentif the FE model is correct since then↵i is correlated withxit
RE Discussion
I
Most disciplines in applied statistics other than microeconometrics treat any unobserved individual heterogeneity as being distributed independently of the regressors
I Then the effects arerandom effects
I rather : purelyrandom effects
I
Compared to FE models this stronger assumption has the advantage of permitting consistent estimation of all parameters
I Including coefficients of time-invariant regressors
I However, RE and Pooled OLS are inconsistent if the true model is FE
I
Economists often view the assumptions for the RE model as
being unsupported by the data
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Identification of the Individual-Specific Effects
I
In
yit =↵i+xit0 +✏itthe individual effect is a random variable (random coefficient) in both fixed and random effects models
I Both models assume thatE[yit|↵i,xit] =↵i+x0it
I ↵i is unknown andcannotbe consistently estimated
I UnlessT ! 1
I So wecannotestimateE[yit|↵i,xit]
I Contrarily to what we usually do with OLS
I That is reasonnable as↵i includes unobserved individual characteristics
I Possibly with a non-zero mean
I
But, take the expectation wrt
xit:
E[yit|xit] =E[↵i|xit] +xit0I That is, what is the (conditional) expected value of↵i?
I FE and RE have different takes on this expectation
RE : it is assumed that
i| it ↵, so it| it itI HenceE[yit|xit] is identified
I Since we estimate consistently a single intercept asNT ! 1
I But the key RE assumption thatE[↵i|xit]is constant acrossi might not hold in many microeconometrics applications
I
FE :
E[↵i|xit]varies with
xitand it is not known how it varies
I So we cannot identifyE[yit|xit]
I Nonetheless Within & First-Diffestimators consistently estimate with short panels
I Thusidentify the marginal effect =@E[yit|↵i,xit]/@xit
I e.g. identify effect on earnings of 1 additional year of schooling
I Butonly for time-varying regressors
I so the marginal effect of race or gender, for example, is not identified
I And not the expected individualyit as we do not know the individual effect↵i
Random Effects vs. Fixed Effects
I
Both models have different focuses
I
RE
I Time-series structure
I Efficiency
I
FE
I Endogeneity of unobserved heterogeneity
I Consistency
Summary Models & Estimators
Table:Linear Panel Model: Common Estimators and Models
Model
Estimator of Pooled (1) Rnd Effects (3) & (5) Fixed Effects (3)
Pooled OLS (1) Consistent Consistent Inconsistent
Between (7) Consistent Consistent Inconsistent
Within (Fixed Effects) (8) Consistent Consistent Consistent First Differences (9) Consistent Consistent Consistent Random Effects (10) Consistent Consistent Inconsistent
This table considers only consistency of estimators of . For correct computation of standard errors see next Section.
The only fully efficient estimator is RE under the RE model
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Effect of Wage on Labor Supply
I
Labor economics : responsiveness of labor supply to wages
I
Standard textbook model of labor supply suggests that for people already working, the effect of a wage increase on labor supply is ambiguous
I Income effect pushing in the direction of less work offsetting a (leisure-) substitution effect in the direction of more work
Cross-Section & Panel
I
Cross-section analysis for adult males finds a relatively small positive response to hours worked
I However, it is possible that this association isspurious
I Reflecting a greater unobserveddesire to workbeing positively associated with higher wages
I e.g. those who like to work get better/faster promotion (or similar) – that is↵i
I
Panel data analysis can control for this
I Under the assumption that the unobserved desire to work is time-invariant
I For ex. Within : measuring the extent to which an individual works above-average hours inperiodswith above-average wages
Data
I
Data on 532 males for each of the 10 years from 1979 to 1988
I File provided on Cameron & Trivedi’s website
I mom9.dta file on course page
I Balanced panel: don’t do that in your report
I
From the Panel Study of Income Dynamics (PSID)
I Used by Ziliak (1997)
I
5 320 observations, sample means of lnhrs and lnwg : respectively 7.66 & 2.61
I Geometric means of 2 120 hours and $13.60 per hour (geometric mean since sum of log)
Panel Study of Income Dynamics
I
Begun in 1968, PSID is a longitudinal study of a representative sample of U.S. individuals (men, women, and children)
I Family units in which they reside
I Dynamic aspects of economic and demographic behavior
I
Low attrition rates & success in following young adults as they form their own families and recontact efforts (of those declining an interview in prior years)
I Sample size has grown from 4,800 families in 1968 to more than 7,000 families in 2001
I
Conclusion of 2003 data collection, PSID has collected
information about > 65 000 individuals spanning as much as
36 years of their lives
Model
ln
hrsit =↵i +ln
wgit+"itI
ln
hrsnatural logarithm of annual hours worked
I
Single explanatory variable: ln
wgnatural log of hourly wage
I
Individual-specific effect
↵iI Unobserved individual time-invariant characteristics
I e.g. education, abilities
I
measures the wage elasticity of labor supply
I "it
assumed to be independent over
i, but may be correlated
over
tfor given
iModel
I
Ziliak (1997) additionally included
age2, # of children, an indicator for bad health & year dummies
I makessmalldifference to the estimate of and its standard error
I For simplicity, are omitted here
I
Ch. 2: more general models
I Endogenous lnwg
I FE↵i correlated with lnwgit
I Endogeneity✏it correlated with lnwgit I Lags of lnhrs as regressor
I If you work more, you will earn a higher hourly wage
Stata
I
Load the data
I .dta: double-click
I limited import capacity: .csv
I
Declare the dataset to be panel
I Menu: longitudinal / panel data
I ID = i
I Time = t
I
Menu
I longitudinal / panel data!linear models!linear regressions
I or linear models for OLS
Obtaining the Results in Stata 1/2
I
Pooled OLS :
↵ˆ,
ˆand other stats directly in the output
I
Between model (OLS regression on the average per individual) is obtained similarly to POLS
I
Within model (= Fixed-effects)
I Individual↵ˆi estimates can be recovered after estimation (not consistent)
I
Stata presents an intercept in the FE estimates
I Rewrite model (3)yit=↵i+xit0 +✏it as yit=↵+↵i+xit0 +✏it
I leads to perfect multicollinearity
I We need to normalize
I In theory, we chose↵=0 for simplicity
I Instead Stata has chosenP
i↵i=0 because of the analogous assumption in RE :E(↵i) =0
I In all cases, it has no bearing on the estimates
Obtaining the Results in Stata 2/2
I
First-Differences estimator is not readily available in Stata
I In my version at least
I Define the first differences first, then apply the POLS
I Lag 1 period in Stata : by i: gen lnhrsL1 = lnhrs[_n-1](n indexes observations,by iindicates to lag by individual)
I Thenby i: gen lnwgD1 = lnwg-lnwgL1for the 1st diff
I
RE : 2 versions
I GLS (OLS estimation of the transformed model as seen in the section on estimators)
I ML (I will not detail)
Linear Panel Data Estimates
POLS Between Within First Diff RE-GLS RE-MLE
↵ 7.44 7.48 7.22 .001 7.35 7.35
.83 .067 .168 .109 .119 .12
.000 – .624 – .585 .586
N 5320 532 5320 4788 5320 5320
I
is the one from the RE estimator (10)
yit ˆ ¯yi =⇣1
ˆ⌘µ+⇣
xit ˆ ¯xi
⌘0
+⌫it
I It can be infered with some other estimators
Slope Parameter Estimates
I
The estimates of the slope parameter differ across the different estimation methods
I
The between estimate that uses only cross-section variation is less than the pooled OLS estimate
I
The within (= fixed effects) estimate of 0.168 is much higher than the pooled OLS estimate of 0.083
I
The first-differences estimate of 0.109 is also higher than that of pooled OLS
I but is considerably less than the within estimate
I
The RE estimates of 0.119 or 0.120 lie between the between and within estimates
I This is expected, as RE estimates can be shown to be a weighted average of between and within estimates
I The two RE estimates are very close to each other
Which estimates are preferred ?
I
within and first-difference estimators are consistent under all models (pooled, RE, and FE)
I The other estimators areinconsistentunder the FE model
I
The most robust estimates are therefore the within or first-differences estimates of 0.168 or 0.109
I
efficiency loss in using these more robust estimators : next section
I
Hausman test (following next section) : whether or not FE model is appropriate
I Turns out Hausman test rejects the null hypothesis of RE
I That seems natural because of the large difference between the coefficients estimates
First-difference vs. Fixed-effects
I
Both are consistent under all models (pooled, RE, and FE)
I
If
T =2 they are identical
I
If
uithas no serial correlation, FE is in principle better
I Because it does not throw away one period of data
I
If
uitis a random walk, FD is in principle better
I Because it transforms the series to order 0
I
If there is correlation between
xitand
uit(endogeneity)
I Both FE and FD become inconsistent
I
Testing is complicated
I More details requires introducing time-series issues
Outline
Panel Data Models Panel Data Estimators Panel Data Inference
Fixed Effects vs. Random Effects Unbalanced Panel Data
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Panel-Robust Statistical Inference
I
The various panel models include error terms :
uit,
"it,
↵i IIn many microeconometrics applications :
I Reasonable to assume independence overi
I
The errors are potentially
1. serially correlated (correlated overt for giveni ) 2. heteroskedastic (at least acrossi)
I
Valid statistical inference requires controlling for both of
these factors
0 BB BB BB BB BB BB BB BB BB BB BB BB
@
sv2a+sv2e1 sv2a · · · sv2a
sv2a sv2a+sv2e1 ... ... ... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e1
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2a+sv2eN sv2a · · · sv2a
sv2a sv2a+sv2eN ... ...
... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e
N
1 CC CC CC CC CC CC CC CC CC CC CC CC A
I
The White heteroskedastic consistent estimator can be extended to short panels
I since for theithobservation the error variance matrix⌃is of finite dimensionT ⇥T whileN! 1
Reminder : The White heteroskedastic-consistent estimator
I
Classical linear model
y =x0 +✏with
E⇣✏✏0⌘
=⌃6= 2I
I OLS unbiased and consistent
I Var⇣ ˆOLS⌘
=⇣
X0X⌘ 1
X0⌃X⇣
X0X⌘ 1
6
= 2⇣
X0X⌘ 1 I
For pure heteroskedasticity, White (1980) shows that
S =
1
NXN i=1
ˆ
✏2iXiXi0
I whereˆ✏i is the OLS residual
I is a consistent estimate of N1X0⌃X under general conditions
I
The formula can be extended for Autocorrelation
I But often autocorrelation reveals time-series properties
I That need to be investigated in more details
Panel-Robust Statistical Inference
I
Panel-robust standard errors can thus be obtained
I withoutassuming specific functional forms for within-individual error correlation or heteroskedasticity
I
So we use inefficient estimators
I but at least we get their variance right
I Only RE estimator in RE model is efficient
I Moreefficientestimators using GMM : Chap 2
I
The panel commands in many computer packages calculate default se assuming iid errors
I erroneous inference
I Ignoring it can lead tounderestimatedse and over-estimatedt-stat
I
FE or RE tend to reduce the serial correlation in errors, but
not eliminate it
Derivation of the White heteroskedastic-consistent estimator
I
Rewrite the panel estimators as OLS estimation of
✓in
˜
yit = ˜wit0✓+ ˜uit
(12)
I y˜it
a known function of only
yi1, ...,yiT; similarly for
w˜it0and
wit0 =⇥1
xit0 ⇤;
u˜itand
uitI Pooled OLS : no transformation,✓=⇥
↵ 0 ⇤0
I Within : y˜it=yit y¯i, w˜it=xit−¯xi only time-varying regressors
I ✓: coefficients of the time-varying regressors
I ...
I
!! Such transformations will induce serial correlation even if
underlying errors are uncorrelated !!
Notation
I
Stack observations over time periods for a given individual :
I ~yi =W~i0✓+~ui where
I ~yi : T⇥1
I for the first-differences model,(T 1)⇥1
I W~i : T⇥q
I
OLS estimator
✓ˆOLS =" N X
i=1
W ~
0iW ~
i# 1
X
i
W ~
i0~y
iOLS Variance
I
Asymptotic variance of
✓ˆOLSis
Vh
✓ˆOLSi
=
" N X
i=1
W ~
0iW ~
i# 1
X
i
W ~
i0Eh~u
i~u
0i|W ~
ii
W ~
i" N X
i=1
W ~
0iW ~
i# 1
I
= variance of OLS estimates of the ~ model
I We need a consistent estimate of it to make classical inference, e.g. t-test
Panel-Robust “Sandwich” Variance
I
Consistent estimation of
Vh✓ˆOLSi
in this panel setting
I Analogous to the cross-section problem of obtaining a consistent estimate ofVh
✓ˆOLSi
that is robust to heteroskedasticity of unknown form
I Complication is thevectorui rather than a scalarui
I
Panel-robust estimate of
Vh✓ˆOLSi
I Controling for both serial correlationandheteroskedasticity
V\h
✓ˆOLS
i=
" N X
i=1
W ~
i0W ~
i# 1
X
i
W ~
0i^ ~u
i^ ~u
0iW ~
i" N X
i=1
W ~
i0W ~
i# 1
(13)
where ^ ~u
i =~y
iW ~
0i✓ˆPanel-Robust “Sandwich” Variance
I
Estimator (13) assumes independence over
iand
N! 1I but permitsV[uit]andCov[uit,uis] to vary withi,t, ands
I the case for short panels
I
Panel-robust standard errors based on (13) can be computed by use of a regular OLS command
I if the command has acluster-robuststandard error option
I as in Stata, cluster on the individuali
I
Common error : estimate OLS of
y˜it = ˜wit0✓+ ˜uitusing the standard robust se option
I Only adjusts forheteroskedasticity
I In practice in a panel : more important to correct forserial correlation
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Reminder : The Bootstrap
I
Bootstrap hypothesis : if we could resample the population in the same conditions, we would observe something similar to a resampling with replacement of the observed sample
I “Mediocrity principe”
I Not the same as representativity as our sample might not be representative
I
Principle
I Sample the current sample of sizenwith replacement
I makendraws, each with probability 1/n
I Called “Bootstrap pair” as bothy andX are sampled
I Replicate that processB times: B differentpseudo-samples
I For each pseudo-sample<Yb,Xb>: one vector✓ˆb
Reminder : The Bootstrap 2
I
To construct a confidence interval for one element
✓kfrom
✓I We haveB estimates✓ˆkb
I TakeB=10 000 and order those estimates from smallest to largest
I then estimates number 250 and 9750 are the lower bound and upper bound, respectively, of the 95%confidence interval
Why is that interesting?
1.
No distributional hypothesis
1.1 Although there must be no correlation between observations 1.2 Therefore, in panels, resampling is oni only, usingallt for
eachi in the new sample
2.
Confidence intervals can be calculated
2.1 for any function of the estimated parameters, including non-linear ones
2.2 for parameters estimated from models without exact finite sample properties
Panel Bootstrap Variance
I
For each of the
Bpseudo-samples : OLS of
y˜iton
w˜itI B estimates✓ˆb,b=1, ...,B
I
Variance matrix panel bootstrap “empirical” estimate :
VBoot\⇣✓ˆ⌘
=
1
B1
XB b=1
⇣✓ˆb ✓¯ˆ⌘ ⇣
✓ˆb ✓¯ˆ⌘0
(14) where
✓¯ˆ=B 1Pb✓ˆb
I
May be slow – see e.g. Cameron & Trivedi
I
Given independence over
iI Consistent asN! 1
I Asymptotically equivalent to Panel-Robust “Sandwich”
I 8form of heteroskedasticity or autocorrelation (as White)
I Can be applied to any panel estimator ~
Outline
Panel Data Models
Fixed Effects & Random Effects Panel Data Estimators
Pooled OLS Estimator Between Estimator Within Estimator
First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects
Hours & Wages Example Panel Data Inference
Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data