• Aucun résultat trouvé

Dynamic Panel Data Ch 1. Reminder on Linear Non Dynamic Models

N/A
N/A
Protected

Academic year: 2022

Partager "Dynamic Panel Data Ch 1. Reminder on Linear Non Dynamic Models"

Copied!
83
0
0

Texte intégral

(1)

Dynamic Panel Data

Ch 1. Reminder on Linear Non Dynamic Models

Pr. Philippe Polomé, Université Lumière Lyon 2

M2 EcoFi

2016 – 2017

(2)

Overview of Ch. 1

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(3)

Data

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(4)

Panel Data

I i =

1

, ...,N: agent (individual, firm, country...)

I t =

1

, ...,T

: time

I GenerallyTi : number of periods differs from agent to agent

I Unbalanced Panel (this is the norm)

I Attrition, the property that agents drop out of the sample

I To simplify notation, theore usesT

I But all computer packages manageTi I So that you should balance your sample I yit

one obs. of the dependant variable

y

I xit

one obs. of

K⇥

1 vector of the independant variables

I “regressors”

I Possibly endogenous – Ch. 2

(5)

Data

Data management

obs agent

i

time

t y x1 . . . xK

1 1 1

y11 x111 xK11

... ...

t 1 t

y1t x11t xK1t

... ...

T 1 T

y1T x11T xK1T

T+1 2 1

y21 x121 xK21

... ...

it i t

yit x1it xKit

... ...

NT N T

yNT x1NT xKNT

(6)

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(7)

Panel Data Models

Typical Linear Panel Data Model

I

The typical panel data model

yit =↵i + t+xit0 +uit

(1) where

I uit scalar disturbance term

I Intercepts↵i vary across agents

I Intercepts i vary over time

I Slopes are constant

(8)

Typical Linear Panel Data Model

I

A mathematically proper way to write this model is

yit = XN

j=1

jdj,it+ XT s=2

sds,it+xit0 +uit

where the

N

individual dummies

dj,it =

1 if

i =j

and

=

0 otherwise the

T

1 time dummies

ds,it =

1 if

t=s

and

=

0 otherwise

I xit

does not include an intercept

I If an intercept is included

I then one of theN individual dummies must be dropped

I Many packages do that automatically

(9)

Panel Data Models

Time dummies

I

Focus on short panels where

N! 1

but

T

does not

I Then (time intercept) can be consistently estimated

I In the sense that there is a finite number of them

I T 1 time dummies are simply incorporated into the regressorsxit

I We do not discuss them anymore

I

“Long” panels are treated using time-series methods

I The panel dimension is abandonned

(10)

Individual dummies

I

If we inserted the full set of

N

individual intercepts

dj,it

I It would cause problems asN! 1

I We cannot estimate consistently an1number of parameters

I Information does not increase on the↵i asN increases I

Challenge : estimating the parameters

I consistently

I controlling for theN individual intercepts↵i

I In this sense, the↵i are not the focus of the regression

I They represent individual unobservables that do not not have much interpretation

I They arenuisance parameters

I we are not intrested in them

I but we must find a way to deal with them

(11)

Panel Data Models

Individual-Specific Effects Model

I

Individual-specific effects model

yit=↵i +xit0 +✏it

(2) where

it

is iid over

i

and

t

I

= a more parsimonious way to express the previous model (1) with all the dummies

I Time dummies may be included in regressorsxit I “standard” linear non-dynamic panel data model

I noyi(t s)inxit

Ii

random variables

I Capture unobserved heterogeneity

I = unobserved time-invariant individual characteristics

I In effect: a random parameter model

(12)

Reminder : Unobserved Heterogeneity

I

The correct model is

Y = 0+ 1x1+ 2x2+✏

I

But the estimated model is

Y = 0+ 1x1+⌫

I

The effect of the missing regressor on

Y

is implied in the error of the estimated model :

⌫ = 2x2+✏

I = unobserved heterogeneity : Unobserved (individual) factors influence the LHS variable

I

If the missing regressor is correlated with an included regressor

I Then⌫ correlated with at least one included regressor

I LS inconsistent

I Furthermore, possibly :

I Heteroscedasticity ifvar(x2t)6=var(x2s),t6=s

I Autocorrelation ifcorr(x2t,x2s)6=0,t6=s

(13)

Panel Data Models

Reminder : Unobserved Heterogeneity

Same slopes

(14)

Exogeneity

I

Throughout this chapter: assume strong/strict exogeneity

E[eit|ai,xi1, ...,xiT] =

0,

t =

1, ...,

T

(3)

I

So that

it

is assumed to have mean zero conditional on past, current, and future values of the regressors

I Zero covariance

I Nothing is said between the random term↵i andxi

I

Strong exogeneity rules out models with lagged dependent variables or with endogenous variables as regressors (Ch. 2)

I Takeyit =↵i+xit0 + yt 1+✏it

I Thusyit 1=↵i+xit0 1 + yt 2+✏it 1

I it is often hard to maintain thatE(✏itit 1) =0

I Strong exogeneity does not hold in dynamic models

(15)

Panel Data Models

Fixed Effects Model

I

2 variants to model (2) accordingly with hypotheses on

i

I Both are models with “2” errors↵i and✏it

I Error component models

I Both variants treat↵i as an unobserved random variable

I

Variant 1 of model (2): fixed effects (FE) model

Ii is potentiallycorrelatedwith the (time-invariant part of the) observed regressorsxit

I A form ofunobserved heterogeneity

I “fixed” because early treatments treated↵i as (non-random) parameters to be estimated (hence “fixed”)

(16)

Random Effects Model

I

Variant 2 of model (2) : Random effects (RE) model

Ii distributed independently of x

I Usually makes the additional assumptions that both the random effects ↵i and the error term✏it in (2) are iid :

i ⇠ ↵, 2

it⇠ 0, 2 (4)

I

No distribution has been specified in model (4)

Iit

may show autocorrelation

I Often it is assumedcov(✏it,✏is)6=0

I While bothcov(✏it,✏jt) =0 andcov(↵i,↵j) =0 are assumed

I Except in spatial models

I

can be treated as the intercept of the model

(17)

Panel Data Models

Other names for the Random Effects Model

I

One-way individual-specific effects model

I Two-way = inclusion of time-dummies or time-specific random effects

I

Random intercept model

I To distinguish the model with more general random effects models e.g. random slopes

I

Random components model

I because the error term is↵i+✏it

(18)

Equicorrelated Random Effects Model

I

RE model

yit=↵i +xit0 +✏it

I can be viewed as regression ofyit onxit

I with composite error termuit=↵i+"it

I The RE hypothesis (4) (↵i and✏it iid) implies that Cov[(ai +eit),(ai+eis)] =

⇢ sv2a, t 6=s

sv2a+sv2e, t =s

(5)

I

RE model thus imposes the constraint that the composite error

uit

is equicorrelated

I SinceCor[uit,uis] = 2/[ 2+ "2]fort 6=sdoes not vary with the time difference t s

I RE model is also called the equicorrelated model or exchangeable errors model

(19)

Panel Data Models

Synthesis of Panel Data Models

Fixed-effects model

yit =↵i+xit0 +✏it (2) Cov(↵i,xit)6=0

Random-effects model ↵i ⇠ ↵, 2

it ⇠ 0, 2 (4)

(20)

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(21)

Panel Data Estimators

Panel Data Estimators

I

3 commonly used panel data estimators of

I In this non-dynamic, no endogeneity context : LS variants

I Differ in the extent to which cross-section and time-series variation in the data are used

I their properties vary according to what model is appropriate I

A regressor

xit

may be time-invariant

I xit=xi fort =1, ...,T

I so thatx¯i =T1P

txit=xi

I For some estimators only the coefficients of time-varying regressors are identified

(22)

Variance Matrix

I

For a given

i

we expect correlation in

y

over time :

I Cor[yit,yis]is high

I Even after inclusion of regressors,Cor[uit,uis] may remain6=0

I CallCor[uit,uis] = its I Whent=s, its = 2it

(23)

Panel Data Estimators

Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃

0 BB BB BB BB BB BB BB BB BB BB BB

@

sv211 sv112 · · · sv11T

sv212 ... ... ... ... ... sv1(T 1)T

SYM · · · sv21T

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2N1 svN12 · · · svN1T

sv2N2 ... ...

... ... ... svN(T 1)T

SYM · · · sv2NT

1 CC CC CC CC CC CC CC CC CC CC CC A

(24)

Variance Matrix

I

The RE model accommodates (partly) this correlation

I From (5):

Cov[(ai+eit),(ai+eis)] =

⇢ sv2a, t 6=s sv2a+sv2e, t=s

I

OLS output treats each of the

T

years as independent information, but

I The information content islessthan this

I given the positive error correlation

I Tends to overstate estimator precision

I

Always use panel-corrected standard errors when OLS is applied in a panel

I Many possible corrections, depending on assumed correlation and heteroskedasticity and whether short or long panel

I The default is not panel-corrected

(25)

Panel Data Estimators Within Estimator

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(26)

Within Model

I

Principle: Individual-specific deviations of the dependent variable from its time-averaged value

I areexplained by

I individual-specificdeviationsof regressors from their time-averaged values

I

Individual-specific effects model 2

yit =↵i+xit0 +✏it

I Average over time : y¯i=↵i+ ¯xi0 + ¯"i

I Subtract: the↵i terms cancel = thewithinmodel yiti = (xiti)0 + (✏it ¯✏i)

1, ...,

N, t =

1, ...,

T

(6)

(27)

Panel Data Estimators Within Estimator

Within / Fixed Effects Estimator

I

Within estimator = OLS estimator on

yiti = (xiti)0 + (✏it ¯✏i)

I Consistent for in the FE model

I

Called the fixed effects estimator by analogy with the FE model

I does not imply that↵i are fixed

I

Each

i

must be observed at least twice in the sample

I Elsexiti =0

(28)

Consistency of Fixed Effects Estimator

I

FE treats

i

as nuisance parameters

I can be ignored when interest lies in

I do not need to be consistently estimated to obtain consistent estimates of the slope parameters

I

Consistency further requires

E(✏it ¯✏i|xiti) =

0 in the within model

yiti = (xiti)0 + (✏it ¯✏i)

I Because of the averages, that requires more thanE(✏it|xit) =0

I Requires the strict exogeneity assumption (3) E[eit|ai,xi1, ...,xiT] =0, t =1, ...,T

(29)

Panel Data Estimators Within Estimator

Fixed Effects Estimates

I

If the fixed effects

i

are of interest they can also be estimated

I

If

N

is not too large an alternative way to compute Within is

I Least-Squares Dummy variableestimation

I Directly estimatesyit=↵i+xit0 +✏it by OLS ofyit onxit and Nindividual dummy variables

I Yields Within estimator for ,

I along with estimates of theNfixed effects: ↵ˆi = ¯yii0ˆ

I unbiasedestimator of↵i

I But in short (smallT) panels↵ˆi are always inconsistent

I because information never accumulate for them

I Their distribution or their variation with a key variable may be informative

(30)

Time-Invariant Regressors

I

Major limitation of Within

I the coefficients of time-invariant regressors arenot identified

I Since ifxit= ¯xi then x¯i=xi so(xiti) =0

I

Many studies seek to estimate the effect of time-invariant regressors

I For example, in panel wage regressions : the effect of gender or race

I

For this reason many practitioners prefer not to use the within estimator

I

RE estimator permits estimation of coefficients of time-invariant regressors

I but are inconsistent if the FE model is the correct model

(31)

Panel Data Estimators First-Differences Estimator

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(32)

First-Differences Model

I

Principle: Individual-specific one-period changes in the dependent variable

I are explained by

I individual-specificone-period changesin regressors

I

Individual-specific effects model (2)

yit=↵i +xit0 +✏it

I Lag one periodyi,t 1=↵i+xi,t0 1 +"i,t 1

I Subtract = thefirst-differences model

yit yi,t 1 = (xit xi,t 1)0 + (✏iti,t 1)

i =

1

, ...,N, t=

2

, ...,T

(7)

(33)

Panel Data Estimators First-Differences Estimator

First-Differences Estimator

I

The First-differences estimator D1 is OLS in the first differences model (7)

I

Consistent estimates of in the FE model

I The coefficients of time-invariant regressors arenotidentified

I

D1 is less efficient than within

I if"it is iid (forT >2)

I

However, it may safeguard against I(1) / unit root variables

I That would otherwise lead to inconsistency

(34)

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(35)

Panel Data Estimators Random Effects Estimator

Random Effects Model

I

Individual-specific effects model (2)

yit=↵i +xit0 +✏it

I Assume RE model with iid↵i and✏it as in RE hyp (4)

i ⇠ ↵, 2

it⇠ 0, 2

I

OLS would be consistent

I ButGLSwill bemore efficient

(36)

Reminder : GLS in a cross-section

I

When all the hypotheses of the linear model are satisfied but the errors covariance matrix

is not the identity, then

I OLS is consistent

I but it is not efficient if we know⌃

I

Let the classical linear (cross-section) model

y =x0 +✏

with

E⇣

✏✏0

=⌃6= 2I

I LetP0P=⌃ 1

I Unique Cholesky decomposition for real definite positive matrix⌃ 1

I Premultiply the linear model byP : Py =Px +P✏

I y=x +✏

I ThenVar(✏) =E⇣

P✏✏0P0

=PE⇣

✏✏0⌘ P0

I =P⌃P0 =P⇣

P0P⌘ 1

P0 =PP 1⇣ P01

P0 =I

(37)

Panel Data Estimators Random Effects Estimator

Reminder : GLS in a cross-section

I

So the transformed model has spherical disturbances

I Applying OLS to thetransformeddata is anefficient estimator

I That is GLS

I

Since

is unknown in practice, we need an estimate

I Any consistent estimate of⌃,⌃, yields aˆ Feasible(consistent) GLS estimator

(38)

RE Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃

0 BB BB BB BB BB BB BB BB BB BB BB

@

sv2a+sv2e sv2a · · · sv2a

sv2a sv2a+sv2e ... ... ... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2a+sv2e sv2a · · · sv2a

sv2a sv2a+sv2e ... ... ... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e

1 CC CC CC CC CC CC CC CC CC CC CC A

(39)

Panel Data Estimators Random Effects Estimator

Random Effects Estimator

I

The feasible GLS estimator of the RE model

I can be calculated from OLS estimation of the transformed model :

yit ˆ ¯yi =⇣

1

ˆ⌘

µ+⇣

xit ˆ ¯xi

0

+⌫it

(8) where

it = (1 ˆ)↵i + ("it ˆ ¯"i)

is asymptotically iid, and

I ˆ

is consistent for

=

1

p 2

+T 2

(9)

I

Called the RE estimator

(40)

Random Effects Estimator

I

The nonrandom scalar intercept

µ

is added to normalize the random effects

i

to have zero mean

I as in the RE hypothesis

I

Cameron & Trivedi provide a derivation of (8) and ways to estimate

2

and

2"

and hence to estimate

I Not detailed here

I

Note

I ˆ =0 corresponds to pooled OLS

I ˆ =1 corresponds to within estimation

I ˆ!1 asT ! 1(look at the formula)

I

This is a two-step estimator of

(41)

Panel Data Estimators Random Effects Estimator

Random Effects Estimator Properties

I

RE estimator is

I Fully efficientunder the RE model

I The efficiency gain compared to Pooled OLS (applied to the RE model) need not be great

I Might still be inefficientif the equicorrelation hypothesis is not true

I In particular, underAR(1)processes

I Inconsistentif the FE model is correct

I since then↵i is correlated withxit

(42)

RE Discussion

I

Most disciplines in applied statistics,

I other than microeconometrics,

I treat any unobserved individual heterogeneity as being distributed independently of the regressors

I Then the effects arerandom effects

I rather : purelyrandom effects I

Compared to FE models,

I this stronger assumption has the advantage of permitting consistent estimation of all parameters

I Including coefficients of time-invariant regressors

I However, RE and Pooled OLS are inconsistent if the true model is FE

I

Economists often view the assumptions for the RE model as

being unsupported by the data

(43)

Panel Data Estimators Fixed vs. Random Effects

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(44)

Identification of the Individual-Specific Effects

I

In

yit =↵i+xit0 +✏it

I the individual effect is a random variable (random coefficient)

I inbothfixed and random effects models

I Both models assume thatE[yit|↵i,xit] =↵i+x0it

Ii is unknown andcannotbe consistently estimated

I UnlessT ! 1

I So wecannotestimateE[yit|↵i,xit]

I Prediction is therefore not possible

I Contrarily to what we usually do with OLS

I That is reasonnable as↵i includes unobserved individual characteristics

I Possibly with a non-zero mean

I

But, take the expectation wrt

xit

:

E[yit|xit] =E[↵i|xit] +xit0

I That is, what is the (conditional) expected value of↵i?

I FE and RE have different takes on this expectation

(45)

Random Effects vs. Fixed Effects

I

RE : it is assumed that

E[↵i|xit] =↵, soE[yit|xit] =↵+xit0

I HenceE[yit|xit] is identified

I Since we estimate consistently a single intercept asNT ! 1

I But the key RE assumption thatE[↵i|xit]is constant acrossi might not hold in many microeconometrics applications

I

FE :

E[↵i|xit]

varies with

xit

and it is not known how it varies

I So we cannot identifyE[yit|xit]

I Nonetheless Within & First-Diffestimators consistently estimate with short panels

I Thusidentify the marginal effect =@E[yit|↵i,xit]/@xit

I e.g. identify effect on earnings of 1 additional year of schooling

I Butonly for time-varying regressors

I so the marginal effect of race or gender, for example, is not identified

I And not the expected individualyit as we do not know the individual effect↵i

(46)

Random Effects vs. Fixed Effects

I

Both models have different focuses

I

RE

I Time-series structure

I Efficiency

I

FE

I Endogeneity of unobserved heterogeneity

I Consistency

(47)

Panel Data Estimators Fixed vs. Random Effects

Summary Models & Estimators

Table:Linear Panel Model: Common Estimators and Models

Model

Estimator of Rnd Effects (2) & (4) Fixed Effects (2) Within (Fixed Effects) (6) Consistent Consistent

First Differences (7) Consistent Consistent

Random Effects (8) Consistent & efficient Inconsistent

This table considers only consistency of estimators of . For correct computation of standard errors see next Section.

The only fully efficient estimator is RE under the RE model

(48)

Example Arellano-Bond

I

Unbalanced panel of 140 U.K. manufacturing companies over the period 1976-1984

I Download in webuse abdata

I Year = t, n = log of employment, w = log of real wage, k = log of gross capital, ys = log of industry output, id = firm index (i)

I

Panel structure in

xtset id year, yearly

I

Arellano & Bond are interested in a dynamic employment equation (labour demand)

nit=↵1ni,t 1+↵2ni,t 2+ 0(L)xit+ t+⌘i+⌫it

where

(L)

indicates a vector of polynomials in the lag operator so that various lags of

x

might be used

I AB usewt,wt 1,kt,kt 1,yst,yst 1,yst 2 I And time dummies for all years

(49)

Panel Data Estimators Fixed vs. Random Effects

Example Arellano-Bond

I

AB model is dynamic

I In this chapter, we estimate

I without the lags ofnin the regressors

I with them

I by FE, D1 and RE I !

AB.do

I All this is in principle known

(50)

First-difference in

I

First-Differences estimator is not readily available

I Define the first differences first, then apply the OLS

I This is fairly unsatisfactory as there is no real account of the error term panel structure

I Lag 1 period : by id: gen xL1 = x[_n-1]

I nindexes observations

I by idindicates to lag by group defined on the idvariable

I Thenby id: gen xD1 = x-xL1for the 1st diff

(51)

Panel Data Estimators Fixed vs. Random Effects

First-differencing time dummies

I

Take

dt

a time-dummy

I Recall that a lag one period of x indicates at time t+1 the value that x had at t

I

By construction

L1dt

must be one at

t+

1 and zero elsewhere

I with a missing value at t=1 (at the 1st obs period)

I

Thus, e.g. yr1980L1=1 in 1981, 0 in other years

I so yr1980D1=yr1980-yr1980L1=-1 in 1981, 1 in 1980, 0 in other years, missing in 1976

I Also yr1984L1 is zero everywhere since it is the last obs. year (missing in 1976)

I So yr1984D1 cannot be used as it is identical to yr1984 I

Interpretation of the 1st diff. of a time dummy is hard

(52)

Table:Coef. Estimates – no lags of n

Variable OLS FE D1 RE

w -0.229 -0.524 -0.543 -0.503 wL1 -0.289 -0.077 0.041 -0.052

k 0.320 0.493 0.399 0.553

kL1 0.493 0.142 0.166 0.196

ys -1.801 0.344 0.532 0.263

ysL1 -0.468 -0.198 -0.268 -0.266

ysL2 2.136 -0.076 -0.001 -0.048

yr1979 -0.057 -0.016 0.006 -0.017

yr1980 -0.233 -0.017 0.022 -0.024

yr1981 -0.467 -0.048 0.004 -0.058

yr1982 -0.392 -0.065 -0.013 -0.069

yr1983 -0.235 -0.058 -0.013 -0.056

yr1984 -0.264 -0.022 omitted -0.011

Intercept 3.748 2.907 -0.010 3.396

(53)

Example Arellano-Bond Results

Table:Coef. Estimates – with lags of n; time dummies not presented

Variable OLS FE D1 RE

nL1 1.096 0.736 0.130 1.096

nL2 -0.132 -0.154 -0.035 -0.132 w -0.534 -0.560 -0.556 -0.534

wL1 0.486 0.316 0.124 0.486

k 0.355 0.393 0.392 0.355

kL1 -0.325 -0.098 0.127 -0.325

ys 0.465 0.475 0.560 0.465

ysL1 -0.787 -0.633 -0.368 -0.787 ysL2 0.314 0.056 0.034 0.314 Intercept 0.215 1.810 -0.009 0.215

It is interesting to compare parameter estimates, but we postpone

to next chapter

(54)

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(55)

Panel Data Inference Panel-Robust Inference

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(56)

Panel-Robust Statistical Inference

I

The various panel models include error terms :

uit

,

"it

,

i

I

In many microeconometrics applications :

I Reasonable to assume independence overi

I

The errors are potentially

1. serially correlated (correlated overt for giveni ) 2. heteroskedastic (at least acrossi)

I

Valid statistical inference requires controlling for both of

these factors

(57)

Het. & Autoc. Block-Diagonal Errors Var-Cov Matrix ⌃

0 BB BB BB BB BB BB BB BB BB BB BB

@

sv211 sv112 · · · sv11T

sv212 ... ... ... ... ... sv1(T 1)T

SYM · · · sv21T

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2N1 svN12 · · · svN1T

sv2N2 ... ...

... ... ... svN(T 1)T

SYM · · · sv2NT

1 CC CC CC CC CC CC CC CC CC CC CC A

I

Not enough structure

(58)

0 BB BB BB BB BB BB BB BB BB BB BB

@

sv2a+sv2e sv2a · · · sv2a

sv2a sv2a+sv2e ... ... ... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2a+sv2e sv2a · · · sv2a

sv2a sv2a+sv2e ... ... ... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e

1 CC CC CC CC CC CC CC CC CC CC CC A

I

Equicorrelation implies

I Homoskedasticity

I A limited form of autocorrelation

(59)

Heteroskedastic RE Block-Diagonal Errors Var-Cov Matrix ⌃

0 BB BB BB BB BB BB BB BB BB BB BB BB

@

sv2a+sv2e1 sv2a · · · sv2a

sv2a sv2a+sv2e1 ... ...

... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e1

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2a+sv2eN sv2a · · · sv2a

sv2a sv2a+sv2eN ... ...

... ... ... sv2a

sv2a · · · sv2a sv2a+sv2eN

1 CC CC CC CC CC CC CC CC CC CC CC CC A

I

Small generalisation of RE for Heteroskedasticity

I

The White heteroskedastic consistent estimator can be extended to short panels

I since for theithobservation the error variance matrix⌃is of finite dimensionT ⇥T whileN! 1

(60)

Reminder : The White heteroskedastic-consistent estimator

I

Classical linear model

y =x0 +✏

with

E⇣

✏✏0

=⌃6= 2I

I OLS unbiased and consistent

I Var⇣ ˆOLS

=⇣

X0X⌘ 1

X0⌃X⇣

X0X⌘ 1

6

= 2

X0X⌘ 1 I

For pure heteroskedasticity, White (1980) shows that

S =

1

N

XN i=1

ˆ

2iXiXi0

I whereˆ✏i is the OLS residual

I is a consistent estimate of N1X0⌃X under general conditions

I

The formula can be extended for Autocorrelation

I But often autocorrelation reveals time-series properties

I That need to be investigated in more details

(61)

Panel Data Inference Panel-Robust Inference

Panel-Robust Statistical Inference

I

Panel-robust standard errors can thus be obtained

I following White’s principle

I Called “sandwich” or “robust” estimators

I withoutassuming specific functional forms for within-individual error correlation or heteroskedasticity

I However, we assume a constant covariance as in RE

I

So we use inefficient estimators

I but at least we get their variance better than with OLS formulas

I If there is AR(1) or I(1) errors, we might still be very wrong

I Only RE estimator in RE model is efficient

I Moreefficientestimators using GMM : Chap 2

I

FE or RE tend to reduce the serial correlation in errors

I but not eliminate it

I

The panel commands in many computer packages calculate default se assuming iid errors

I erroneous inference

I Ignoring it can lead tounderestimatedse

I Thusover-estimatedt-stat

(62)

commands

I

Robust estimator assumes independence over

i

and

N ! 1

I but permitsV[uit]andCov[uit,uis] to vary withi,t, ands

I the case for short panels

I

Panel-robust standard errors based on White can be computed by use of a regular panel command

I if the command has acluster-robuststandard error option

I in , cluster on the individuali

I Common error : use thestandard robust se option

I Only adjusts forheteroskedasticity

I In practice in a panel : more important to correct forserial correlation

I In , in a panel estimator, robust automatically accounts for cluster

I

Bootstrap, computes panel-robust standard errors based on bootstrap

I Fewer hypotheses

I Slower, depends on the number of replications

I Do not specify a cluster variable when in a panel model

(63)

Example Arellano-Bond Results

Table:p-values – FE models w/ 2 lags of n; time dummies not presented

Variable Standard (Cluster-) Robust Bootstrap (500 rep)

nL1 0.000 0.000 0.000

nL2 0.000 0.027 0.032

w 0.000 0.001 0.001

wL1 0.000 0.029 0.033

k 0.000 0.000 0.000

kL1 0.002 0.032 0.028

ys 0.000 0.006 0.005

ysL1 0.000 0.003 0.002

ysL2 0.677 0.672 0.693

Intercept 0.000 0.005 0.003

Robust is interpreted as Cluster robust, clustering var. is id, the paneli

(64)

Note: Variance Decomposition

The total variance

s2

of a series

xit

can be decomposed as

NT1

XN

i=1

XT t=1

(xit x)¯ 2 = NT1 XN

i=1

XT t=1

[(xiti) + (¯xi x)]¯ 2

= NT1N XN

i=1

XT t=1

(xiti)2+N11 XN

i=1

XT t=1

(¯xi x)¯ 2

as the cross-product term sums to zero.

Total variance

s2

=

I sw2

within variance [sum across individuals of individual deviations around the individual means]

I

+

sb2

between variance [deviations of individual means around the grand mean]

I

The between and within

R2

are defined similarly

I R2often small with panel data

(65)

Fixed Effects vs. Random Effects

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(66)

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(67)

Fixed Effects vs. Random Effects Non-Test Elements of Choice

Causation

I

The FE model can establish causation under weaker assumptions than those needed with

I cross-section data

I panel data models without fixed effects : pooled & RE models

I

In some studies causation is clear, so RE may be appropriate

I For example, in a controlled experiment, causation is clear

I crop yield from different amounts of fertilizers applied to different fields in a laboratory

I xi is assigned randomly to cases, thus uncorrelated to↵i I In other cases it may be sufficient to use a RE analysis to

measure the extent of correlation

I determination of causation is left to other approaches

I e.g. effect of smoking on lung cancer

(68)

Causation

I

Economists are unusual in preferring a FE approach because of a desire to measure causation with observational instead of experimental data

I There is the possibility that instead of measuring causation, we measure only aspuriouscorrelation due to the effect of unobserved variables that are correlated with the variables included in the regression

I

FE eliminates those unobserved variables that are time-invariant by differencing, so that

I The causative effect ofx ony is measured by the association between individualchangesiny and inx

(69)

Fixed Effects vs. Random Effects Non-Test Elements of Choice

Fixed Effects Weaknesses in Practice

I

Estimation of the coefficient of any time-invariant regressor is not possible with FE

I

Coefficients of time-varying regressors are estimable, but may be imprecise if most of the variation in a regressor is cross sectional rather than over time

I As then the within transformation will greatly remove this variation

I

Prediction of the conditional mean is not consistent since the indiv. effects are not consistently estimated

I Only changes in the conditional mean caused by changes in time-varying regressors can be predicted

I

Still requires the assumption that the unobservables

i

are

time-invariant (no

it

)

(70)

Outline

Data

Panel Data Models Panel Data Estimators

Within Estimator

First-Differences Estimator Random Effects Estimator

Fixed vs. Random Effects Panel Data Inference

Panel-Robust Inference Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(71)

Fixed Effects vs. Random Effects Hausman Test

Reminder : Hausman Test

I

Principle : if two estimators are consistent, then their difference should not be statistically different from zero, asymptotically

I

Consider two estimators

✓ˆ

and

✓˜

(in the same model)

I We testH0 : plim⇣

✓ˆ ✓˜⌘

=0 ,Ha : plim⇣

✓ˆ ✓˜⌘

6

=0

I

Under

H0

, the difference between the 2 estimators converges to a normal with zero mean :

p

N⇣

✓ˆ ✓˜⌘

!N[0,VH]

I whereVH is the variance matrix in the limiting distribution

I

Hausman test statistic

H=⇣

✓ˆ ✓˜⌘0

N1H

1

✓ˆ ✓˜⌘

I asymptotically 2(q)underH0

I rejectH0 at level↵ifH> 2(q)

I

The question in practice is to find an estimate of

VH

:

H

(72)

Hausman Test for Panel Data

I

If individual effects are fixed

I within estimator ˆW is consistent

I RE estimator ˜RE isinconsistent

I vector of coefficients of just the time-varying regressors

I

Hausman test on presence of fixed effects

I H0: No systematic difference between the coefficients estimates

I If holds, prefer RE as it is more efficient

I In principle, maybe not if errors are I(1)

I Works on any pair of estimators with similar properties

I e.g first differences versus pooled OLS

(73)

Fixed Effects vs. Random Effects Hausman Test

Hausman Test for Panel Data

I

Large value of

H

leads to rejection of the null hypothesis

I We infer that since ˆW is consistent, if ˜RE is much different, it must be inconsistent

I So that the individual-specific effects are correlated with the regressors

I

It may still be possible to avoid using a FE estimator

I If regressors are correlated with individual-specific effects because of omittedvariables

I then maybe add further regressors

I It may be possible to estimate a RE model using instrumental variables methods (Ch. 2)

(74)

Hausman Test Computation When RE IS Fully Efficient

I

Assume the true model is the RE model with

Ii iid⇥ 0, 2

⇤uncorrelated with regressors

I error"it iid⇥ 0, 2"

I

Then

˜RE

fully efficient, the Hausman test statistic simplifies

H=⇣

˜1,RE ˆ1,W0 V\h

ˆ1,Wi V\h

˜1,REi 1

˜b1,RE−ˆb1,W

I

where

1

denotes the subcomponent of corresponding to time-varying regressors

I since only that component can be estimated by the within estimator

I This test stastistic is asymptotically 2(dim[ 1])underH0

I

Very easy since then the

matrices are regular outputs of the

estimation

(75)

Fixed Effects vs. Random Effects Hausman Test

Hausman Test When RE IS NOT Fully Efficient

I

The above simple form of the Hausman test is invalid if

i

or

"it

are not iid

I e.g withheteroskedasticity inherent in much microeconometrics data

I

Then the RE estimator is not fully efficient under the null hypothesis

I

The expression

V\h

ˆb1,Wi V\h

˜b1,REi

in the formula for

H

needs to be replaced by the more general

Vh \

˜b1,RE ˆb1,Wi

I That is NOT implemented in

I For short panels this variance matrix can be consistently estimated bybootstrapresampling overi

(76)

Hausman Test When RE IS NOT Fully Efficient 2

I

A panel-robust Hausman test statistic is

HRobust =⇣

˜b1,RE−ˆb1,W Vboot \

h˜b1,RE ˆb1,Wi −1

˜b1,RE−ˆb1,W

I whereVbooth\

˜b1,RE ˆb1,Wi

=B11 XB b=1

⇣ˆb ¯ˆ⌘ ⇣

ˆb ¯ˆ⌘0

I bis thebth ofB bootstrap replications and ˆ = ˜b1,RE ˆb1,W

I

This test statistic can

I be applied to subcomponents of 1

I use other estimators such as ˜1,POLS in place of ˜1,RE and ˆ1,FD in place of ˆ1,W

I

There are user-implementations over the Internet

(77)

Fixed Effects vs. Random Effects Hausman Test

Example Arellano-Bond Results

I

How it works in

I

e.g. to compare FE & RE

I doxtreg..., fe

I estimatesstore EstimEF

I doxtreg..., re

I hausmanEstimEF.

I Take care to insert the final dot. that means “last estimates computed”

I Stat!Postestimation!Tests!Hausman

I If you try to use vce(robust) or any other than the default

I anerrormessage results

I That is fair as only does the “fully efficient” version of Hausman

(78)

Example Arellano-Bond Results

I

Output is fairly complete

I Test: Ho: difference in coefficients not systematic

I chi2(15) = (b-B)’[(V_b-V_B)^(-1)](b-B) = 169.57

I Prob>chi2 = 0.0000 (V_b-V_B is not positive definite)

I The last probably because the difference between some variances are machine-zero

I So what conclusion ?

I

The 2 estimators must have the same number of coef estimates

I It may be necessary to remove time-invariant regressors from FE

Références

Documents relatifs

Since our estimates for the the space-time filter parameters are consistent with model stability (the sum of the spatial filter parameters being less than one), we will see

3.6.1 First-step results: determinants of expected subsidies 71 3.6.2 First-step results: additionality of public subsidies on R&D probability 72 3.6.3 Second-step

(1 = non, très dépendant; 7 = oui, tout à fait indépendant).. Les institutions économiques dans les pays d’Afrique-subsaharienne Les institutions économiques ont pour

Keywords: Nonlinear Panel Model, Factor Model, Exchangeability, Systematic Risk, Efficiency Bound, Semi-parametric Efficiency, Fixed Effects Estimator, Bayesian Statis- tics,

keywords : dynamic panel data, GMM, incidental functions, local first- differencing, time-varying fixed effects, nonparametric heterogeneity.. ∗ Address: Center for Operations

Pr.. Linear Panel Models : Endogenous Regressors & GMM 2015-16 GMM Theory in

• Panel data lets us eliminate omitted variable bias when the omitted variables are constant over time within a given state.. Example #2: cultural attitudes towards drinking and

Dynamic Panel Data : Intro 2016-17 Course Content & Motivation..