Dynamic Panel Data
Ch 1. Reminder on Linear Non Dynamic Models
Pr. Philippe Polomé, Université Lumière Lyon 2
M2 EcoFi
2016 – 2017
Overview of Ch. 1
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Data
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Panel Data
I i =
1
, ...,N: agent (individual, firm, country...)I t =
1
, ...,T: time
I GenerallyTi : number of periods differs from agent to agent
I Unbalanced Panel (this is the norm)
I Attrition, the property that agents drop out of the sample
I To simplify notation, theore usesT
I But all computer packages manageTi I So that you should balance your sample I yit
one obs. of the dependant variable
yI xit
one obs. of
K⇥1 vector of the independant variables
I “regressors”
I Possibly endogenous – Ch. 2
Data
Data management
obs agent
itime
t y x1 . . . xK1 1 1
y11 x111 xK11... ...
t 1 t
y1t x11t xK1t... ...
T 1 T
y1T x11T xK1TT+1 2 1
y21 x121 xK21... ...
it i t
yit x1it xKit... ...
NT N T
yNT x1NT xKNTOutline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Panel Data Models
Typical Linear Panel Data Model
I
The typical panel data model
yit =↵i + t+xit0 +uit
(1) where
I uit scalar disturbance term
I Intercepts↵i vary across agents
I Intercepts i vary over time
I Slopes are constant
Typical Linear Panel Data Model
I
A mathematically proper way to write this model is
yit = XN
j=1
↵jdj,it+ XT s=2
sds,it+xit0 +uit
where the
Nindividual dummies
dj,it =1 if
i =jand
=0 otherwise the
T1 time dummies
ds,it =1 if
t=sand
=0 otherwise
I xit
does not include an intercept
I If an intercept is included
I then one of theN individual dummies must be dropped
I Many packages do that automatically
Panel Data Models
Time dummies
I
Focus on short panels where
N! 1but
Tdoes not
I Then (time intercept) can be consistently estimated
I In the sense that there is a finite number of them
I T 1 time dummies are simply incorporated into the regressorsxit
I We do not discuss them anymore
I
“Long” panels are treated using time-series methods
I The panel dimension is abandonned
Individual dummies
I
If we inserted the full set of
Nindividual intercepts
dj,itI It would cause problems asN! 1
I We cannot estimate consistently an1number of parameters
I Information does not increase on the↵i asN increases I
Challenge : estimating the parameters
I consistently
I controlling for theN individual intercepts↵i
I In this sense, the↵i are not the focus of the regression
I They represent individual unobservables that do not not have much interpretation
I They arenuisance parameters
I we are not intrested in them
I but we must find a way to deal with them
Panel Data Models
Individual-Specific Effects Model
I
Individual-specific effects model
yit=↵i +xit0 +✏it
(2) where
✏itis iid over
iand
tI
= a more parsimonious way to express the previous model (1) with all the dummies
I Time dummies may be included in regressorsxit I “standard” linear non-dynamic panel data model
I noyi(t s)inxit
I ↵i
random variables
I Capture unobserved heterogeneity
I = unobserved time-invariant individual characteristics
I In effect: a random parameter model
Reminder : Unobserved Heterogeneity
I
The correct model is
Y = 0+ 1x1+ 2x2+✏I
But the estimated model is
Y = 0+ 1x1+⌫I
The effect of the missing regressor on
Yis implied in the error of the estimated model :
⌫ = 2x2+✏I = unobserved heterogeneity : Unobserved (individual) factors influence the LHS variable
I
If the missing regressor is correlated with an included regressor
I Then⌫ correlated with at least one included regressor
I LS inconsistent
I Furthermore, possibly :
I Heteroscedasticity ifvar(x2t)6=var(x2s),t6=s
I Autocorrelation ifcorr(x2t,x2s)6=0,t6=s
Panel Data Models
Reminder : Unobserved Heterogeneity
Same slopes
Exogeneity
I
Throughout this chapter: assume strong/strict exogeneity
E[eit|ai,xi1, ...,xiT] =0,
t =1, ...,
T(3)
I
So that
✏itis assumed to have mean zero conditional on past, current, and future values of the regressors
I Zero covariance
I Nothing is said between the random term↵i andxi
I
Strong exogeneity rules out models with lagged dependent variables or with endogenous variables as regressors (Ch. 2)
I Takeyit =↵i+xit0 + yt 1+✏it
I Thusyit 1=↵i+xit0 1 + yt 2+✏it 1
I it is often hard to maintain thatE(✏it✏it 1) =0
I Strong exogeneity does not hold in dynamic models
Panel Data Models
Fixed Effects Model
I
2 variants to model (2) accordingly with hypotheses on
↵iI Both are models with “2” errors↵i and✏it
I Error component models
I Both variants treat↵i as an unobserved random variable
I
Variant 1 of model (2): fixed effects (FE) model
I ↵i is potentiallycorrelatedwith the (time-invariant part of the) observed regressorsxit
I A form ofunobserved heterogeneity
I “fixed” because early treatments treated↵i as (non-random) parameters to be estimated (hence “fixed”)
Random Effects Model
I
Variant 2 of model (2) : Random effects (RE) model
I ↵i distributed independently of x
I Usually makes the additional assumptions that both the random effects ↵i and the error term✏it in (2) are iid :
↵i ⇠ ↵, 2↵
✏it⇠ 0, 2✏ (4)
I
No distribution has been specified in model (4)
I ✏it
may show autocorrelation
I Often it is assumedcov(✏it,✏is)6=0
I While bothcov(✏it,✏jt) =0 andcov(↵i,↵j) =0 are assumed
I Except in spatial models
I ↵
can be treated as the intercept of the model
Panel Data Models
Other names for the Random Effects Model
I
One-way individual-specific effects model
I Two-way = inclusion of time-dummies or time-specific random effects
I
Random intercept model
I To distinguish the model with more general random effects models e.g. random slopes
I
Random components model
I because the error term is↵i+✏it
Equicorrelated Random Effects Model
I
RE model
yit=↵i +xit0 +✏itI can be viewed as regression ofyit onxit
I with composite error termuit=↵i+"it
I The RE hypothesis (4) (↵i and✏it iid) implies that Cov[(ai +eit),(ai+eis)] =
⇢ sv2a, t 6=s
sv2a+sv2e, t =s
(5)
I
RE model thus imposes the constraint that the composite error
uitis equicorrelated
I SinceCor[uit,uis] = 2↵/[ 2↵+ "2]fort 6=sdoes not vary with the time difference t s
I RE model is also called the equicorrelated model or exchangeable errors model
Panel Data Models
Synthesis of Panel Data Models
Fixed-effects model
yit =↵i+xit0 +✏it (2) Cov(↵i,xit)6=0
Random-effects model ↵i ⇠ ↵, 2↵
✏it ⇠ 0, 2✏ (4)
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Panel Data Estimators
Panel Data Estimators
I
3 commonly used panel data estimators of
I In this non-dynamic, no endogeneity context : LS variants
I Differ in the extent to which cross-section and time-series variation in the data are used
I their properties vary according to what model is appropriate I
A regressor
xitmay be time-invariant
I xit=xi fort =1, ...,T
I so thatx¯i =T1P
txit=xi
I For some estimators only the coefficients of time-varying regressors are identified
Variance Matrix
I
For a given
iwe expect correlation in
yover time :
I Cor[yit,yis]is high
I Even after inclusion of regressors,Cor[uit,uis] may remain6=0
I CallCor[uit,uis] = its I Whent=s, its = 2it
Panel Data Estimators
Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃
0 BB BB BB BB BB BB BB BB BB BB BB
@
sv211 sv112 · · · sv11T
sv212 ... ... ... ... ... sv1(T 1)T
SYM · · · sv21T
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2N1 svN12 · · · svN1T
sv2N2 ... ...
... ... ... svN(T 1)T
SYM · · · sv2NT
1 CC CC CC CC CC CC CC CC CC CC CC A
Variance Matrix
I
The RE model accommodates (partly) this correlation
I From (5):
Cov[(ai+eit),(ai+eis)] =
⇢ sv2a, t 6=s sv2a+sv2e, t=s
I
OLS output treats each of the
Tyears as independent information, but
I The information content islessthan this
I given the positive error correlation
I Tends to overstate estimator precision
I
Always use panel-corrected standard errors when OLS is applied in a panel
I Many possible corrections, depending on assumed correlation and heteroskedasticity and whether short or long panel
I The default is not panel-corrected
Panel Data Estimators Within Estimator
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Within Model
I
Principle: Individual-specific deviations of the dependent variable from its time-averaged value
I areexplained by
I individual-specificdeviationsof regressors from their time-averaged values
I
Individual-specific effects model 2
yit =↵i+xit0 +✏itI Average over time : y¯i=↵i+ ¯xi0 + ¯"i
I Subtract: the↵i terms cancel = thewithinmodel yit y¯i = (xit x¯i)0 + (✏it ¯✏i)
1, ...,
N, t =1, ...,
T(6)
Panel Data Estimators Within Estimator
Within / Fixed Effects Estimator
I
Within estimator = OLS estimator on
yit y¯i = (xit x¯i)0 + (✏it ¯✏i)
I Consistent for in the FE model
I
Called the fixed effects estimator by analogy with the FE model
I does not imply that↵i are fixed
I
Each
imust be observed at least twice in the sample
I Elsexit x¯i =0
Consistency of Fixed Effects Estimator
I
FE treats
↵ias nuisance parameters
I can be ignored when interest lies in
I do not need to be consistently estimated to obtain consistent estimates of the slope parameters
I
Consistency further requires
E(✏it ¯✏i|xit x¯i) =
0 in the within model
yit y¯i = (xit x¯i)0 + (✏it ¯✏i)
I Because of the averages, that requires more thanE(✏it|xit) =0
I Requires the strict exogeneity assumption (3) E[eit|ai,xi1, ...,xiT] =0, t =1, ...,T
Panel Data Estimators Within Estimator
Fixed Effects Estimates
I
If the fixed effects
↵iare of interest they can also be estimated
I
If
Nis not too large an alternative way to compute Within is
I Least-Squares Dummy variableestimation
I Directly estimatesyit=↵i+xit0 +✏it by OLS ofyit onxit and Nindividual dummy variables
I Yields Within estimator for ,
I along with estimates of theNfixed effects: ↵ˆi = ¯yi x¯i0ˆ
I unbiasedestimator of↵i
I But in short (smallT) panels↵ˆi are always inconsistent
I because information never accumulate for them
I Their distribution or their variation with a key variable may be informative
Time-Invariant Regressors
I
Major limitation of Within
I the coefficients of time-invariant regressors arenot identified
I Since ifxit= ¯xi then x¯i=xi so(xit x¯i) =0
I
Many studies seek to estimate the effect of time-invariant regressors
I For example, in panel wage regressions : the effect of gender or race
I
For this reason many practitioners prefer not to use the within estimator
I
RE estimator permits estimation of coefficients of time-invariant regressors
I but are inconsistent if the FE model is the correct model
Panel Data Estimators First-Differences Estimator
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
First-Differences Model
I
Principle: Individual-specific one-period changes in the dependent variable
I are explained by
I individual-specificone-period changesin regressors
I
Individual-specific effects model (2)
yit=↵i +xit0 +✏itI Lag one periodyi,t 1=↵i+xi,t0 1 +"i,t 1
I Subtract = thefirst-differences model
yit yi,t 1 = (xit xi,t 1)0 + (✏it ✏i,t 1)
i =
1
, ...,N, t=2
, ...,T(7)
Panel Data Estimators First-Differences Estimator
First-Differences Estimator
I
The First-differences estimator D1 is OLS in the first differences model (7)
I
Consistent estimates of in the FE model
I The coefficients of time-invariant regressors arenotidentified
I
D1 is less efficient than within
I if"it is iid (forT >2)
I
However, it may safeguard against I(1) / unit root variables
I That would otherwise lead to inconsistency
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Panel Data Estimators Random Effects Estimator
Random Effects Model
I
Individual-specific effects model (2)
yit=↵i +xit0 +✏itI Assume RE model with iid↵i and✏it as in RE hyp (4)
↵i ⇠ ↵, 2↵
✏it⇠ 0, 2✏
I
OLS would be consistent
I ButGLSwill bemore efficient
Reminder : GLS in a cross-section
I
When all the hypotheses of the linear model are satisfied but the errors covariance matrix
⌃is not the identity, then
I OLS is consistent
I but it is not efficient if we know⌃
I
Let the classical linear (cross-section) model
y =x0 +✏with
E⇣✏✏0⌘
=⌃6= 2I
I LetP0P=⌃ 1
I Unique Cholesky decomposition for real definite positive matrix⌃ 1
I Premultiply the linear model byP : Py =Px +P✏
I y⇤=x⇤ +✏⇤
I ThenVar(✏⇤) =E⇣
P✏✏0P0⌘
=PE⇣
✏✏0⌘ P0
I =P⌃P0 =P⇣
P0P⌘ 1
P0 =PP 1⇣ P0⌘ 1
P0 =I
Panel Data Estimators Random Effects Estimator
Reminder : GLS in a cross-section
I
So the transformed model has spherical disturbances
I Applying OLS to thetransformeddata is anefficient estimator
I That is GLS
I
Since
⌃is unknown in practice, we need an estimate
I Any consistent estimate of⌃,⌃, yields aˆ Feasible(consistent) GLS estimator
RE Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃
0 BB BB BB BB BB BB BB BB BB BB BB
@
sv2a+sv2e sv2a · · · sv2a
sv2a sv2a+sv2e ... ... ... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2a+sv2e sv2a · · · sv2a
sv2a sv2a+sv2e ... ... ... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e
1 CC CC CC CC CC CC CC CC CC CC CC A
Panel Data Estimators Random Effects Estimator
Random Effects Estimator
I
The feasible GLS estimator of the RE model
I can be calculated from OLS estimation of the transformed model :
yit ˆ ¯yi =⇣
1
ˆ⌘µ+⇣
xit ˆ ¯xi
⌘0
+⌫it
(8) where
⌫it = (1 ˆ)↵i + ("it ˆ ¯"i)is asymptotically iid, and
I ˆ
is consistent for
=
1
p 2 ✏✏ +T ↵2
(9)
I
Called the RE estimator
Random Effects Estimator
I
The nonrandom scalar intercept
µis added to normalize the random effects
↵ito have zero mean
I as in the RE hypothesis
I
Cameron & Trivedi provide a derivation of (8) and ways to estimate
↵2and
2"and hence to estimate
I Not detailed here
I
Note
I ˆ =0 corresponds to pooled OLS
I ˆ =1 corresponds to within estimation
I ˆ!1 asT ! 1(look at the formula)
I
This is a two-step estimator of
Panel Data Estimators Random Effects Estimator
Random Effects Estimator Properties
I
RE estimator is
I Fully efficientunder the RE model
I The efficiency gain compared to Pooled OLS (applied to the RE model) need not be great
I Might still be inefficientif the equicorrelation hypothesis is not true
I In particular, underAR(1)processes
I Inconsistentif the FE model is correct
I since then↵i is correlated withxit
RE Discussion
I
Most disciplines in applied statistics,
I other than microeconometrics,
I treat any unobserved individual heterogeneity as being distributed independently of the regressors
I Then the effects arerandom effects
I rather : purelyrandom effects I
Compared to FE models,
I this stronger assumption has the advantage of permitting consistent estimation of all parameters
I Including coefficients of time-invariant regressors
I However, RE and Pooled OLS are inconsistent if the true model is FE
I
Economists often view the assumptions for the RE model as
being unsupported by the data
Panel Data Estimators Fixed vs. Random Effects
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Identification of the Individual-Specific Effects
I
In
yit =↵i+xit0 +✏itI the individual effect is a random variable (random coefficient)
I inbothfixed and random effects models
I Both models assume thatE[yit|↵i,xit] =↵i+x0it
I ↵i is unknown andcannotbe consistently estimated
I UnlessT ! 1
I So wecannotestimateE[yit|↵i,xit]
I Prediction is therefore not possible
I Contrarily to what we usually do with OLS
I That is reasonnable as↵i includes unobserved individual characteristics
I Possibly with a non-zero mean
I
But, take the expectation wrt
xit:
E[yit|xit] =E[↵i|xit] +xit0I That is, what is the (conditional) expected value of↵i?
I FE and RE have different takes on this expectation
Random Effects vs. Fixed Effects
I
RE : it is assumed that
E[↵i|xit] =↵, soE[yit|xit] =↵+xit0I HenceE[yit|xit] is identified
I Since we estimate consistently a single intercept asNT ! 1
I But the key RE assumption thatE[↵i|xit]is constant acrossi might not hold in many microeconometrics applications
I
FE :
E[↵i|xit]varies with
xitand it is not known how it varies
I So we cannot identifyE[yit|xit]
I Nonetheless Within & First-Diffestimators consistently estimate with short panels
I Thusidentify the marginal effect =@E[yit|↵i,xit]/@xit
I e.g. identify effect on earnings of 1 additional year of schooling
I Butonly for time-varying regressors
I so the marginal effect of race or gender, for example, is not identified
I And not the expected individualyit as we do not know the individual effect↵i
Random Effects vs. Fixed Effects
I
Both models have different focuses
I
RE
I Time-series structure
I Efficiency
I
FE
I Endogeneity of unobserved heterogeneity
I Consistency
Panel Data Estimators Fixed vs. Random Effects
Summary Models & Estimators
Table:Linear Panel Model: Common Estimators and Models
Model
Estimator of Rnd Effects (2) & (4) Fixed Effects (2) Within (Fixed Effects) (6) Consistent Consistent
First Differences (7) Consistent Consistent
Random Effects (8) Consistent & efficient Inconsistent
This table considers only consistency of estimators of . For correct computation of standard errors see next Section.
The only fully efficient estimator is RE under the RE model
Example Arellano-Bond
I
Unbalanced panel of 140 U.K. manufacturing companies over the period 1976-1984
I Download in webuse abdata
I Year = t, n = log of employment, w = log of real wage, k = log of gross capital, ys = log of industry output, id = firm index (i)
I
Panel structure in
xtset id year, yearlyI
Arellano & Bond are interested in a dynamic employment equation (labour demand)
nit=↵1ni,t 1+↵2ni,t 2+ 0(L)xit+ t+⌘i+⌫it
where
(L)indicates a vector of polynomials in the lag operator so that various lags of
xmight be used
I AB usewt,wt 1,kt,kt 1,yst,yst 1,yst 2 I And time dummies for all years
Panel Data Estimators Fixed vs. Random Effects
Example Arellano-Bond
I
AB model is dynamic
I In this chapter, we estimate
I without the lags ofnin the regressors
I with them
I by FE, D1 and RE I !
AB.do
I All this is in principle known
First-difference in
I
First-Differences estimator is not readily available
I Define the first differences first, then apply the OLS
I This is fairly unsatisfactory as there is no real account of the error term panel structure
I Lag 1 period : by id: gen xL1 = x[_n-1]
I nindexes observations
I by idindicates to lag by group defined on the idvariable
I Thenby id: gen xD1 = x-xL1for the 1st diff
Panel Data Estimators Fixed vs. Random Effects
First-differencing time dummies
I
Take
dta time-dummy
I Recall that a lag one period of x indicates at time t+1 the value that x had at t
I
By construction
L1dtmust be one at
t+1 and zero elsewhere
I with a missing value at t=1 (at the 1st obs period)
I
Thus, e.g. yr1980L1=1 in 1981, 0 in other years
I so yr1980D1=yr1980-yr1980L1=-1 in 1981, 1 in 1980, 0 in other years, missing in 1976
I Also yr1984L1 is zero everywhere since it is the last obs. year (missing in 1976)
I So yr1984D1 cannot be used as it is identical to yr1984 I
Interpretation of the 1st diff. of a time dummy is hard
Table:Coef. Estimates – no lags of n
Variable OLS FE D1 RE
w -0.229 -0.524 -0.543 -0.503 wL1 -0.289 -0.077 0.041 -0.052
k 0.320 0.493 0.399 0.553
kL1 0.493 0.142 0.166 0.196
ys -1.801 0.344 0.532 0.263
ysL1 -0.468 -0.198 -0.268 -0.266
ysL2 2.136 -0.076 -0.001 -0.048
yr1979 -0.057 -0.016 0.006 -0.017
yr1980 -0.233 -0.017 0.022 -0.024
yr1981 -0.467 -0.048 0.004 -0.058
yr1982 -0.392 -0.065 -0.013 -0.069
yr1983 -0.235 -0.058 -0.013 -0.056
yr1984 -0.264 -0.022 omitted -0.011
Intercept 3.748 2.907 -0.010 3.396
Example Arellano-Bond Results
Table:Coef. Estimates – with lags of n; time dummies not presented
Variable OLS FE D1 RE
nL1 1.096 0.736 0.130 1.096
nL2 -0.132 -0.154 -0.035 -0.132 w -0.534 -0.560 -0.556 -0.534
wL1 0.486 0.316 0.124 0.486
k 0.355 0.393 0.392 0.355
kL1 -0.325 -0.098 0.127 -0.325
ys 0.465 0.475 0.560 0.465
ysL1 -0.787 -0.633 -0.368 -0.787 ysL2 0.314 0.056 0.034 0.314 Intercept 0.215 1.810 -0.009 0.215
It is interesting to compare parameter estimates, but we postpone
to next chapter
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Panel Data Inference Panel-Robust Inference
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Panel-Robust Statistical Inference
I
The various panel models include error terms :
uit,
"it,
↵iI
In many microeconometrics applications :
I Reasonable to assume independence overi
I
The errors are potentially
1. serially correlated (correlated overt for giveni ) 2. heteroskedastic (at least acrossi)
I
Valid statistical inference requires controlling for both of
these factors
Het. & Autoc. Block-Diagonal Errors Var-Cov Matrix ⌃
0 BB BB BB BB BB BB BB BB BB BB BB
@
sv211 sv112 · · · sv11T
sv212 ... ... ... ... ... sv1(T 1)T
SYM · · · sv21T
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2N1 svN12 · · · svN1T
sv2N2 ... ...
... ... ... svN(T 1)T
SYM · · · sv2NT
1 CC CC CC CC CC CC CC CC CC CC CC A
I
Not enough structure
0 BB BB BB BB BB BB BB BB BB BB BB
@
sv2a+sv2e sv2a · · · sv2a
sv2a sv2a+sv2e ... ... ... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2a+sv2e sv2a · · · sv2a
sv2a sv2a+sv2e ... ... ... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e
1 CC CC CC CC CC CC CC CC CC CC CC A
I
Equicorrelation implies
I Homoskedasticity
I A limited form of autocorrelation
Heteroskedastic RE Block-Diagonal Errors Var-Cov Matrix ⌃
0 BB BB BB BB BB BB BB BB BB BB BB BB
@
sv2a+sv2e1 sv2a · · · sv2a
sv2a sv2a+sv2e1 ... ...
... ... ... sv2a
sv2a · · · sv2a sv2a+sv2e1
0 · · · 0
0 ... ... ...
... ... ... 0
0 · · · 0
sv2a+sv2eN sv2a · · · sv2a
sv2a sv2a+sv2eN ... ...
... ... ... sv2a
sv2a · · · sv2a sv2a+sv2eN
1 CC CC CC CC CC CC CC CC CC CC CC CC A
I
Small generalisation of RE for Heteroskedasticity
I
The White heteroskedastic consistent estimator can be extended to short panels
I since for theithobservation the error variance matrix⌃is of finite dimensionT ⇥T whileN! 1
Reminder : The White heteroskedastic-consistent estimator
I
Classical linear model
y =x0 +✏with
E⇣✏✏0⌘
=⌃6= 2I
I OLS unbiased and consistent
I Var⇣ ˆOLS⌘
=⇣
X0X⌘ 1
X0⌃X⇣
X0X⌘ 1
6
= 2⇣
X0X⌘ 1 I
For pure heteroskedasticity, White (1980) shows that
S =
1
NXN i=1
ˆ
✏2iXiXi0
I whereˆ✏i is the OLS residual
I is a consistent estimate of N1X0⌃X under general conditions
I
The formula can be extended for Autocorrelation
I But often autocorrelation reveals time-series properties
I That need to be investigated in more details
Panel Data Inference Panel-Robust Inference
Panel-Robust Statistical Inference
I
Panel-robust standard errors can thus be obtained
I following White’s principle
I Called “sandwich” or “robust” estimators
I withoutassuming specific functional forms for within-individual error correlation or heteroskedasticity
I However, we assume a constant covariance as in RE
I
So we use inefficient estimators
I but at least we get their variance better than with OLS formulas
I If there is AR(1) or I(1) errors, we might still be very wrong
I Only RE estimator in RE model is efficient
I Moreefficientestimators using GMM : Chap 2
I
FE or RE tend to reduce the serial correlation in errors
I but not eliminate it
I
The panel commands in many computer packages calculate default se assuming iid errors
I erroneous inference
I Ignoring it can lead tounderestimatedse
I Thusover-estimatedt-stat
commands
I
Robust estimator assumes independence over
iand
N ! 1I but permitsV[uit]andCov[uit,uis] to vary withi,t, ands
I the case for short panels
I
Panel-robust standard errors based on White can be computed by use of a regular panel command
I if the command has acluster-robuststandard error option
I in , cluster on the individuali
I Common error : use thestandard robust se option
I Only adjusts forheteroskedasticity
I In practice in a panel : more important to correct forserial correlation
I In , in a panel estimator, robust automatically accounts for cluster
I
Bootstrap, computes panel-robust standard errors based on bootstrap
I Fewer hypotheses
I Slower, depends on the number of replications
I Do not specify a cluster variable when in a panel model
Example Arellano-Bond Results
Table:p-values – FE models w/ 2 lags of n; time dummies not presented
Variable Standard (Cluster-) Robust Bootstrap (500 rep)
nL1 0.000 0.000 0.000
nL2 0.000 0.027 0.032
w 0.000 0.001 0.001
wL1 0.000 0.029 0.033
k 0.000 0.000 0.000
kL1 0.002 0.032 0.028
ys 0.000 0.006 0.005
ysL1 0.000 0.003 0.002
ysL2 0.677 0.672 0.693
Intercept 0.000 0.005 0.003
Robust is interpreted as Cluster robust, clustering var. is id, the paneli
Note: Variance Decomposition
The total variance
s2of a series
xitcan be decomposed as
NT1
XN
i=1
XT t=1
(xit x)¯ 2 = NT1 XN
i=1
XT t=1
[(xit x¯i) + (¯xi x)]¯ 2
= NT1−N XN
i=1
XT t=1
(xit x¯i)2+N11 XN
i=1
XT t=1
(¯xi x)¯ 2
as the cross-product term sums to zero.
Total variance
s2=
I sw2
within variance [sum across individuals of individual deviations around the individual means]
I
+
sb2between variance [deviations of individual means around the grand mean]
I
The between and within
R2are defined similarly
I R2often small with panel data
Fixed Effects vs. Random Effects
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Fixed Effects vs. Random Effects Non-Test Elements of Choice
Causation
I
The FE model can establish causation under weaker assumptions than those needed with
I cross-section data
I panel data models without fixed effects : pooled & RE models
I
In some studies causation is clear, so RE may be appropriate
I For example, in a controlled experiment, causation is clear
I crop yield from different amounts of fertilizers applied to different fields in a laboratory
I xi is assigned randomly to cases, thus uncorrelated to↵i I In other cases it may be sufficient to use a RE analysis to
measure the extent of correlation
I determination of causation is left to other approaches
I e.g. effect of smoking on lung cancer
Causation
I
Economists are unusual in preferring a FE approach because of a desire to measure causation with observational instead of experimental data
I There is the possibility that instead of measuring causation, we measure only aspuriouscorrelation due to the effect of unobserved variables that are correlated with the variables included in the regression
I
FE eliminates those unobserved variables that are time-invariant by differencing, so that
I The causative effect ofx ony is measured by the association between individualchangesiny and inx
Fixed Effects vs. Random Effects Non-Test Elements of Choice
Fixed Effects Weaknesses in Practice
I
Estimation of the coefficient of any time-invariant regressor is not possible with FE
I
Coefficients of time-varying regressors are estimable, but may be imprecise if most of the variation in a regressor is cross sectional rather than over time
I As then the within transformation will greatly remove this variation
I
Prediction of the conditional mean is not consistent since the indiv. effects are not consistently estimated
I Only changes in the conditional mean caused by changes in time-varying regressors can be predicted
I
Still requires the assumption that the unobservables
↵iare
time-invariant (no
↵it)
Outline
Data
Panel Data Models Panel Data Estimators
Within Estimator
First-Differences Estimator Random Effects Estimator
Fixed vs. Random Effects Panel Data Inference
Panel-Robust Inference Fixed Effects vs. Random Effects
Non-Test Elements of Choice Hausman Test
Unbalanced Panel Data
Fixed Effects vs. Random Effects Hausman Test
Reminder : Hausman Test
I
Principle : if two estimators are consistent, then their difference should not be statistically different from zero, asymptotically
I
Consider two estimators
✓ˆand
✓˜(in the same model)
I We testH0 : plim⇣
✓ˆ ✓˜⌘
=0 ,Ha : plim⇣
✓ˆ ✓˜⌘
6
=0
I
Under
H0, the difference between the 2 estimators converges to a normal with zero mean :
pN⇣
✓ˆ ✓˜⌘
!N[0,VH]
I whereVH is the variance matrix in the limiting distribution
I
Hausman test statistic
H=⇣✓ˆ ✓˜⌘0⇣
N1VˆH
⌘ 1⇣
✓ˆ ✓˜⌘
I asymptotically 2(q)underH0
I rejectH0 at level↵ifH> 2↵(q)
I
The question in practice is to find an estimate of
VH:
VˆHHausman Test for Panel Data
I
If individual effects are fixed
I within estimator ˆW is consistent
I RE estimator ˜RE isinconsistent
I vector of coefficients of just the time-varying regressors
I
Hausman test on presence of fixed effects
I H0: No systematic difference between the coefficients estimates
I If holds, prefer RE as it is more efficient
I In principle, maybe not if errors are I(1)
I Works on any pair of estimators with similar properties
I e.g first differences versus pooled OLS
Fixed Effects vs. Random Effects Hausman Test
Hausman Test for Panel Data
I
Large value of
Hleads to rejection of the null hypothesis
I We infer that since ˆW is consistent, if ˜RE is much different, it must be inconsistent
I So that the individual-specific effects are correlated with the regressors
I
It may still be possible to avoid using a FE estimator
I If regressors are correlated with individual-specific effects because of omittedvariables
I then maybe add further regressors
I It may be possible to estimate a RE model using instrumental variables methods (Ch. 2)
Hausman Test Computation When RE IS Fully Efficient
I
Assume the true model is the RE model with
I ↵i iid⇥ 0, 2↵
⇤uncorrelated with regressors
I error"it iid⇥ 0, 2"⇤
I
Then
˜REfully efficient, the Hausman test statistic simplifies
H=⇣˜1,RE ˆ1,W⌘0 V\h
ˆ1,Wi V\h
˜1,REi 1⇣
˜b1,RE−ˆb1,W⌘
I
where
1denotes the subcomponent of corresponding to time-varying regressors
I since only that component can be estimated by the within estimator
I This test stastistic is asymptotically 2(dim[ 1])underH0
I
Very easy since then the
Vˆmatrices are regular outputs of the
estimation
Fixed Effects vs. Random Effects Hausman Test
Hausman Test When RE IS NOT Fully Efficient
I
The above simple form of the Hausman test is invalid if
↵ior
"it
are not iid
I e.g withheteroskedasticity inherent in much microeconometrics data
I
Then the RE estimator is not fully efficient under the null hypothesis
I
The expression
V\hˆb1,Wi V\h
˜b1,REi
in the formula for
Hneeds to be replaced by the more general
Vh \˜b1,RE ˆb1,Wi
I That is NOT implemented in
I For short panels this variance matrix can be consistently estimated bybootstrapresampling overi
Hausman Test When RE IS NOT Fully Efficient 2
I
A panel-robust Hausman test statistic is
HRobust =⇣
˜b1,RE−ˆb1,W⌘� Vboot \
h˜b1,RE ˆb1,Wi −1⇣
˜b1,RE−ˆb1,W⌘
I whereVbooth\
˜b1,RE ˆb1,Wi
=B11 XB b=1
⇣ˆb ¯ˆ⌘ ⇣
ˆb ¯ˆ⌘0
I bis thebth ofB bootstrap replications and ˆ = ˜b1,RE ˆb1,W
I
This test statistic can
I be applied to subcomponents of 1
I use other estimators such as ˜1,POLS in place of ˜1,RE and ˆ1,FD in place of ˆ1,W
I
There are user-implementations over the Internet
Fixed Effects vs. Random Effects Hausman Test
Example Arellano-Bond Results
I
How it works in
I
e.g. to compare FE & RE
I doxtreg..., fe
I estimatesstore EstimEF
I doxtreg..., re
I hausmanEstimEF.
I Take care to insert the final dot. that means “last estimates computed”
I Stat!Postestimation!Tests!Hausman
I If you try to use vce(robust) or any other than the default
I anerrormessage results
I That is fair as only does the “fully efficient” version of Hausman
Example Arellano-Bond Results
I
Output is fairly complete
I Test: Ho: difference in coefficients not systematic
I chi2(15) = (b-B)’[(V_b-V_B)^(-1)](b-B) = 169.57
I Prob>chi2 = 0.0000 (V_b-V_B is not positive definite)
I The last probably because the difference between some variances are machine-zero
I So what conclusion ?
I
The 2 estimators must have the same number of coef estimates
I It may be necessary to remove time-invariant regressors from FE