• Aucun résultat trouvé

Topics in Applied Econometrics : Panel Data

N/A
N/A
Protected

Academic year: 2022

Partager "Topics in Applied Econometrics : Panel Data"

Copied!
113
0
0

Texte intégral

(1)

Topics in Applied Econometrics : Panel Data

Ch 1. Linear Non Dynamic Panel Data Models

Pr. Philippe Polomé, Université Lumière Lyon 2

M2 Equade & M2 GAEXA

2015 – 2016

(2)

Overview of Ch. 1

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(3)

Outline

Panel Data Models

Panel Data Estimators Panel Data Inference

Fixed Effects vs. Random Effects Unbalanced Panel Data

(4)

Models & Estimators

I

Wider range of models and estimators than with cross-section data

I

3 standard models

I Presented in this Section

I Several estimators presented in the next Section

I Same logic with more sophisticated models

I

The different estimators may be applied to the different models

I With varying results

I Will be a table

(5)

General Panel Data Model

I

A very general linear model for panel data

I intercept & slopecoefficients vary over bothi&t yit =↵it+xit it0 +uit

i=1, ...,N: individual (or firm or country),t=1, ...,T: time

I yit scalar dependent variable

I xit K⇥1 vector of independentvariables

I uit scalar disturbance term

I

Too general

I Notestimable : more parameters to estimate than observations

I Further restrictions needed

I on the extent to which↵itand itvary withi andt

I on the behavior of the erroruit

(6)

Pooled Model

I

The most restrictive model is a pooled model that specifies constant coefficients

yit =↵+xit0 +uit

(1)

I

If this is correctly specified

I and regressors are uncorrelated with the error,

I then it can be consistently estimated as a cross-section

I That is : just with OLS

(7)

Individual and Time Dummies

I

A simple variant of the pooled model (1) has

I Interceptsthat vary across individuals and over time

I Constant slopes

yit =↵i+ t+xit0 +uit

(2) or

yit =

XN j=1

jdj,it+ XT s=2

sds,it+xit0 +uit

where the

N

individual dummies

dj,it =

1 if

i =j

and

=

0 otherwise

the

T

1 time dummies

ds,it =

1 if

t=s

and

=

0 otherwise

(8)

Individual and Time Dummies

I xit

does not include an intercept

I If an intercept is included

I then one of theN individual dummies must be dropped

I Many packages do that

I

Focus on short panels where

N! 1

but

T

does not

I Then (time intercept) can be consistently estimated

I At least in the sense that there is a finite number of them

I T 1 time dummies are simply incorporated into the regressorsxit

I But if we inserted the full set ofN individual interceptsdj,it I It would cause problems asN! 1

I We cannot estimate consistently an1number of parameters

I Information does not increase on the↵i asN increases I

Challenge : estimating the parameters

I controlling for theN individual intercepts↵i

(9)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(10)

Individual-Specific Effects Model

I

Individual-specific effects model :

I each cross-sectional unit has a different intercept term butall slopesare the same

yit =↵i+xit0 +✏it

(3) where

it

is iid over

i

and

t

I

= a more parsimonious way to express previous (2)

I Time dummies included in regressorsxit

I “standard” linear non-dynamic panel data model

I noyi(t s)inxit

Ii

random variables

I Capture unobserved heterogeneity

I = unobserved time-invariant individual characteristics

I In effect: a random parameter model

(11)

Reminder : Unobserved Heterogeneity

I

The correct model is

Y = 0+ 1x1+ 2x2+✏

I

But the estimated model is

Y = 0+ 1x1+⌫

I

The effect of the missing regressor on

Y

is implied in the error of the estimated model :

⌫ = 2x2+✏

I = unobserved heterogeneity : Unobserved (individual) factors influence the LHS variable

I

If the missing regressor is correlated with an included regressor

I Then⌫ correlated with at least one included regressor

I LS inconsistent

I Furthermore, possibly :

I Heteroscedasticity ifvar(x2t)6=var(x2s),t6=s

I Autocorrelation ifcorr(x2t,x2s)6=0,t6=s

(12)

Reminder : Unobserved Heterogeneity

Same slopes

(13)

Exogeneity

I

Throughout this chapter: assume strong/strict exogeneity

E[eit|ai,xi1, ...,xiT] =

0

, t =

1

, ...,T

(4)

I

So that

it

is assumed to have mean zero conditional on past, current, and future values of the regressors

I Zero covariance

I Nothing is said between the random term↵i andxi

I

Strong exogeneity rules out models with lagged dependent variables or with endogenous variables as regressors (Ch. 2)

I yit 1=↵i+xit0 1 +✏it 1: it is often hard to maintain that E(✏itit 1) =0

(14)

Fixed Effects Model

I

2 variants to model (3) accordingly with hypotheses on

i

I Both are models with “2” errors↵i and✏it

I Error component models

I Both variants treat↵i as an unobserved random variable

I

Variant 1 of model (3): fixed effects (FE) model

Ii is potentiallycorrelatedwith the (time-invariant part of the) observed regressorsxit

I A form ofunobserved heterogeneity

I “fixed” because early treatments treated↵i as (non-random) parameters to be estimated (hence “fixed”)

(15)

Random Effects Model

I

Variant 2 of model (3) : Random effects (RE) model

Ii distributed independently of x

I Usually makes the additional assumptions that both the random effects ↵i and the error term✏it in (3) are iid :

i ⇠ ↵, 2

it⇠ 0, 2 (5)

I

No distribution has been specified in (5)

Iit

may show autocorrelation

I Often it is assumedcov(✏it,✏is)6=0

I While bothcov(✏it,✏jt) =0 andcov(↵i,↵j) =0 are assumed

I Except in spatial models

I

can be treated as the intercept of the model

(16)

Random Effects Model

I

Other names for this model :

I One-way individual-specific effects model

I Two-way = inclusion of time-dummies or time-specific random effects

I Random intercept model

I To distinguish the model with more general random effects models e.g. random slopes

I Random components model

I because the error term is↵i+✏it

I

The term fixed effect is potentially misleading

I As said effects are in fact random

I The random effects are “purely” random effects - un-correlated

(17)

Equicorrelated Random Effects Model

I

RE model

yit=↵i +xit0 +✏it

I can be viewed as regression ofyit onxit

I with composite error termuit=↵i+"it

I The RE hypothesis (5) (↵i and✏it iid) implies that

Cov[(ai +eit),(ai+eis)] =

⇢ sv2a, t 6=s

sv2a+sv2e, t =s

(6)

I

RE model thus imposes the constraint that the composite error

uit

is equicorrelated

I SinceCor[uit,uis] = 2/[ 2+ "2]fort 6=sdoes not vary with the time difference t s

I RE model is also called the equicorrelated model or exchangeable errors model

(18)

Synthesis of Panel Data Models

Pooled Model (1) yit=↵+xit0 +uit uit ⇠ 0, u2 Fixed-effects model

yit =↵i+xit0 +✏it (3) Cov(↵i,xit)6=0

Random-effects model ↵i ⇠ ↵, 2

it ⇠ 0, 2 (5)

(19)

Outline

Panel Data Models Panel Data Estimators

Panel Data Inference

Fixed Effects vs. Random Effects Unbalanced Panel Data

(20)

Panel Data Estimators

I

Several commonly used panel data estimators of

I In this non-dynamic, no endogeneity context : LS variants

I

Differ in the extent to which cross-section and time-series variation in the data are used

I their properties vary according to what model is appropriate

I

A regressor

xit

may be either

I time-invariant,xit=xi fort=1, ...,T ,

I or time-varying

I For some estimators only the coefficients of time-varying regressors are identified

(21)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(22)

Definition & Properties

I

Stack the data over

i

&

t

into one long regression with

NT

obs

I

Estimate

yit=↵+xit0 +uit

by OLS

I

Pooled OLS is consistent (when

N ! 1, t constant) if

I Cov[uit,xit] =0 and

I Pooled model (1) is appropriate, or

I RE model is appropriate

I

OLS variance matrix based on iid errors is not appropriate

I as the errors for a given individuali are almost certainly positively correlated overt

(23)

Variance Matrix

I

For a given

i

we expect correlation in

y

over time :

I Cor[yit,yis]is high

I Pooled modelyit=↵+xit0 +uit

I Even after inclusion of regressors,Cor[uit,uis] may remain6=0

I CallCor[uit,uis] = its I Whent=s, its = 2it

(24)

Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃

0 BB BB BB BB BB BB BB BB BB BB BB

@

sv211 sv112 · · · sv11T

sv212 ... ... ... ... ... sv1(T 1)T

SYM · · · sv21T

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2N1 svN12 · · · svN1T

sv2N2 ... ...

... ... ... svN(T 1)T

SYM · · · sv2NT

1 CC CC CC CC CC CC CC CC CC CC CC A

(25)

Variance Matrix

I

The RE model accommodates (partly) this correlation

I From (6):

Cov[(ai+eit),(ai+eis)] =

⇢ sv2a, t 6=s sv2a+sv2e, t=s

I

OLS output treats each of the

T

years as independent information, but

I The information content islessthan this

I given the positive error correlation

I Tends to overstate estimator precision

I

Use panel-corrected standard errors when OLS is applied in a panel

I Many possible corrections, depending on assumed correlation and heteroskedasticity and whether short or long panel

(26)

FE Model

I

Pooled OLS is inconsistent if the true model is the FE model

I

Rewrite

yit =↵i +xit0 +✏it

as

yit=a+xitb+ (ai−a+eit)

I

Then Pooled OLS of

yit

on

xit

and an intercept leads to an inconsistent estimator of if the individual effect

i

correlated with

xit

I Since such correlation implies that the combined error term (↵i ↵+"it)is correlated with the regressors

(27)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(28)

Definition

I

Pooled OLS uses variation over both time and cross-sectional units to estimate

I

Between estimator uses just the cross-sectional variation

I Individual-specific effects model (3) yit =↵i+xit0 +✏it I Average over all years : y¯i =↵i+ ¯xi0 + ¯"i

I arithmetic means over time, per individual

I

between estimator = OLS estimator from regression of

i

on an intercept and

i

I so implicitly on thebetween model

¯

yi =↵+ ¯xi0 + (↵i ↵+ ¯✏i) i =

1, ...,

N

(7)

(29)

Properties

I

Uses variations between different individuals

I Is the analogue of cross-section regression

I Variationswithinindividuals are discarded

I

Between is consistent if the regressors

i

are independent of the composite error

(↵i ↵+ ¯"i)

in (7).

I True for the pooled model (1) and the RE model

I Between is inconsistent for the FE model

I as↵i is then correlated withxitand hencex¯i

I

Between is not normally used as it throws away a lot of info

I But it is didactical

I Do not normally use it in applications

(30)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(31)

Within Model

I

Principle: Individual-specific deviations of the dependent variable from its time-averaged value

I areexplained by

I individual-specificdeviationsof regressors from their time-averaged values

I

Individual-specific effects model 3

yit =↵i+xit0 +✏it

I Average over time : y¯i=↵i+ ¯xi0 + ¯"i

I Subtract: the↵i terms cancel = thewithinmodel

yiti = (xiti)0 + (✏it ¯✏i)

1

, ...,N, t =

1

, ...,T

(8)

(32)

Within / Fixed Effects Estimator

I

Within estimator = OLS estimator of

yiti = (xiti)0 + (✏it ¯✏i)

I Consistent for in the FE model

I

Called the fixed effects estimator by analogy with the FE model

I does not imply that↵i are fixed

I

Each

i

must be observed at least twice in the sample

I Elsexiti =0

(33)

Consistency of Fixed Effects Estimator

I

FE treats

i

as nuisance parameters

I can be ignored when interest lies in

I do not need to be consistently estimated to obtain consistent estimates of the slope parameters

I This result needs not carry over to nonlinear FE models

I

Consistency further requires

E(✏it ¯✏i|xiti) =

0 in the within model

yiti = (xiti)0 + (✏it ¯✏i)

I Because of the averages, that requires more thanE(✏it|xit) =0

I Requires the strict exogeneity assumption (4) E[eit|ai,xi1, ...,xiT] =0, t =1, ...,T

(34)

Fixed Effects Estimates

I

If the fixed effects

i

are of interest they can also be estimated as

↵ˆi = ¯yii0ˆ

I unbiasedestimator of↵i

I In short (smallT) panels↵ˆi are alwaysinconsistent, because information never accumulate for them

I Their distribution or their variation with a key variable may be informative

I

If

N

is not too large an alternative way to compute Within is Least-Squares Dummy variable estimation

I Directly estimatesyit=↵i+xit0 +✏it by OLS ofyit onxit and N individual dummy variables

I Yields Within estimator for , along with estimates of theN fixed effects

(35)

Time-Invariant Regressors

I

Major limitation of Within

I the coefficients of time-invariant regressors arenot identified

I Since ifxit= ¯xi then x¯i=xi so(xiti) =0

I

Many studies seek to estimate the effect of time-invariant regressors

I For example, in panel wage regressions : the effect of gender or race

I

For this reason many practitioners prefer not to use the within estimator

I

Pooled OLS or RE estimators permit estimation of coefficients of time-invariant regressors

I but are inconsistent if the FE model is the correct model

(36)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(37)

First-Differences Model

I

Principle: Individual-specific one-period changes in the dependent variable

I are explained by

I individual-specificone-period changesin regressors

I

Individual-specific effects model (3)

yit=↵i +xit0 +✏it

I Lag one periodyi,t 1=↵i+xi,t0 1 +"i,t 1 I Subtract = thefirst-differences model

yit yi,t 1 = (xit xi,t 1)0 + (✏iti,t 1)

i =

1

, ...,N, t=

2

, ...,T

(9)

(38)

First-Differences Estimator

I

The First-differences estimator is OLS in the first differences model (9)

I

Consistent estimates of in the FE model

I The coefficients of time-invariant regressors arenotidentified

I

First-differences is less efficient than the within estimator

I if"it is iid (forT >2)

I

However, it may safeguard against I(1) variables

I That would wise lead to inconsistency

I See Time-series

(39)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(40)

Random Effects Model

I

Individual-specific effects model (3)

yit=↵i +xit0 +✏it

I Assume RE model with iid↵i and✏it as in RE hyp (5)

i ⇠ ↵, 2

it⇠ 0, 2

I

Pooled OLS is consistent

I But pooledGLSwill bemore efficient

(41)

Reminder : GLS in a cross-section

I

When all the hypotheses of the linear model are satisfied but the errors covariance matrix

is not the identity, then

I OLS is consistent

I but it is not efficient if we know⌃

I

Let the classical linear (cross-section) model

y =x0 +✏

with

E⇣

✏✏0

=⌃6= 2I

I LetP0P=⌃ 1

I Nonunique Cholesky decomposition for real sdp matrix

I Premultiply the linear model byP : Py =Px +P✏

I y=x +✏

I ThenVar(✏) =E⇣

P✏✏0P0

=PE⇣

✏✏0⌘ P0

I =P⌃P0 =P⇣

P0P⌘ 1

P0 =PP 1⇣ P01

P0 =I

(42)

Reminder : GLS in a cross-section

I

So the transformed model has spherical disturbances

I Applying OLS to thetransformeddata is anefficient estimator

I That is GLS

I

Since

is unknown in practice, we need an estimate

I Any consistent estimate of⌃,⌃, yields theˆ Feasible (consistent) GLS estimator

(43)

Panel Block-Diagonal Var-Cov Matrix of the Errors ⌃

0 BB BB BB BB BB BB BB BB BB BB BB

@

sv2a+sv2e sv2a · · · sv2a

sv2a sv2a+sv2e ... ... ... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2a+sv2e sv2a · · · sv2a

sv2a sv2a+sv2e ... ... ... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e

1 CC CC CC CC CC CC CC CC CC CC CC A

(44)

Random Effects Estimator

I

The feasible GLS estimator of the RE model can be calculated from OLS estimation of the transformed model :

yit ˆ ¯yi =⇣

1

ˆ⌘

µ+⇣

xit ˆ ¯xi

0

+⌫it

(10) where

it = (1 ˆ)↵i+ ("it ˆ ¯"i)

is asymptotically iid, and

I ˆ

is consistent for

=

1

p 2

+T 2

(11)

I

Called the RE estimator

(45)

Random Effects Estimator

I

The nonrandom scalar intercept

µ

is added to normalize the random effects

i

to have zero mean

I as in the RE hypothesis

I

Cameron & Trivedi provide a derivation of (10) and ways to estimate

2

and

2"

and hence to estimate

I Not detailed here

I

Note

I ˆ =0 corresponds to pooled OLS

I ˆ =1 corresponds to within estimation

I ˆ!1 asT ! 1(look at the formula)

I

This is a two-step estimator of

(46)

Random Effects Estimator Properties

I

RE estimator is

I Fully efficientunder the RE model

I The efficiency gain compared to Pooled OLS (applied to the RE model) need not be great

I Might still be inefficientif the equicorrelation hypothesis is not true

I In particular, underAR(1)processes

I Inconsistentif the FE model is correct since then↵i is correlated withxit

(47)

RE Discussion

I

Most disciplines in applied statistics other than microeconometrics treat any unobserved individual heterogeneity as being distributed independently of the regressors

I Then the effects arerandom effects

I rather : purelyrandom effects

I

Compared to FE models this stronger assumption has the advantage of permitting consistent estimation of all parameters

I Including coefficients of time-invariant regressors

I However, RE and Pooled OLS are inconsistent if the true model is FE

I

Economists often view the assumptions for the RE model as

being unsupported by the data

(48)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(49)

Identification of the Individual-Specific Effects

I

In

yit =↵i+xit0 +✏it

the individual effect is a random variable (random coefficient) in both fixed and random effects models

I Both models assume thatE[yit|↵i,xit] =↵i+x0it

Ii is unknown andcannotbe consistently estimated

I UnlessT ! 1

I So wecannotestimateE[yit|↵i,xit]

I Contrarily to what we usually do with OLS

I That is reasonnable as↵i includes unobserved individual characteristics

I Possibly with a non-zero mean

I

But, take the expectation wrt

xit

:

E[yit|xit] =E[↵i|xit] +xit0

I That is, what is the (conditional) expected value of↵i?

I FE and RE have different takes on this expectation

(50)

RE : it is assumed that

i| it ↵, so it| it it

I HenceE[yit|xit] is identified

I Since we estimate consistently a single intercept asNT ! 1

I But the key RE assumption thatE[↵i|xit]is constant acrossi might not hold in many microeconometrics applications

I

FE :

E[↵i|xit]

varies with

xit

and it is not known how it varies

I So we cannot identifyE[yit|xit]

I Nonetheless Within & First-Diffestimators consistently estimate with short panels

I Thusidentify the marginal effect =@E[yit|↵i,xit]/@xit

I e.g. identify effect on earnings of 1 additional year of schooling

I Butonly for time-varying regressors

I so the marginal effect of race or gender, for example, is not identified

I And not the expected individualyit as we do not know the individual effect↵i

(51)

Random Effects vs. Fixed Effects

I

Both models have different focuses

I

RE

I Time-series structure

I Efficiency

I

FE

I Endogeneity of unobserved heterogeneity

I Consistency

(52)

Summary Models & Estimators

Table:Linear Panel Model: Common Estimators and Models

Model

Estimator of Pooled (1) Rnd Effects (3) & (5) Fixed Effects (3)

Pooled OLS (1) Consistent Consistent Inconsistent

Between (7) Consistent Consistent Inconsistent

Within (Fixed Effects) (8) Consistent Consistent Consistent First Differences (9) Consistent Consistent Consistent Random Effects (10) Consistent Consistent Inconsistent

This table considers only consistency of estimators of . For correct computation of standard errors see next Section.

The only fully efficient estimator is RE under the RE model

(53)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(54)

Effect of Wage on Labor Supply

I

Labor economics : responsiveness of labor supply to wages

I

Standard textbook model of labor supply suggests that for people already working, the effect of a wage increase on labor supply is ambiguous

I Income effect pushing in the direction of less work offsetting a (leisure-) substitution effect in the direction of more work

(55)

Cross-Section & Panel

I

Cross-section analysis for adult males finds a relatively small positive response to hours worked

I However, it is possible that this association isspurious

I Reflecting a greater unobserveddesire to workbeing positively associated with higher wages

I e.g. those who like to work get better/faster promotion (or similar) – that is↵i

I

Panel data analysis can control for this

I Under the assumption that the unobserved desire to work is time-invariant

I For ex. Within : measuring the extent to which an individual works above-average hours inperiodswith above-average wages

(56)

Data

I

Data on 532 males for each of the 10 years from 1979 to 1988

I File provided on Cameron & Trivedi’s website

I mom9.dta file on course page

I Balanced panel: don’t do that in your report

I

From the Panel Study of Income Dynamics (PSID)

I Used by Ziliak (1997)

I

5 320 observations, sample means of lnhrs and lnwg : respectively 7.66 & 2.61

I Geometric means of 2 120 hours and $13.60 per hour (geometric mean since sum of log)

(57)

Panel Study of Income Dynamics

I

Begun in 1968, PSID is a longitudinal study of a representative sample of U.S. individuals (men, women, and children)

I Family units in which they reside

I Dynamic aspects of economic and demographic behavior

I

Low attrition rates & success in following young adults as they form their own families and recontact efforts (of those declining an interview in prior years)

I Sample size has grown from 4,800 families in 1968 to more than 7,000 families in 2001

I

Conclusion of 2003 data collection, PSID has collected

information about > 65 000 individuals spanning as much as

36 years of their lives

(58)

Model

ln

hrsit =↵i +

ln

wgit+"it

I

ln

hrs

natural logarithm of annual hours worked

I

Single explanatory variable: ln

wg

natural log of hourly wage

I

Individual-specific effect

i

I Unobserved individual time-invariant characteristics

I e.g. education, abilities

I

measures the wage elasticity of labor supply

I "it

assumed to be independent over

i

, but may be correlated

over

t

for given

i

(59)

Model

I

Ziliak (1997) additionally included

age2

, # of children, an indicator for bad health & year dummies

I makessmalldifference to the estimate of and its standard error

I For simplicity, are omitted here

I

Ch. 2: more general models

I Endogenous lnwg

I FE↵i correlated with lnwgit

I Endogeneity✏it correlated with lnwgit I Lags of lnhrs as regressor

I If you work more, you will earn a higher hourly wage

(60)

Stata

I

Load the data

I .dta: double-click

I limited import capacity: .csv

I

Declare the dataset to be panel

I Menu: longitudinal / panel data

I ID = i

I Time = t

I

Menu

I longitudinal / panel data!linear models!linear regressions

I or linear models for OLS

(61)

Obtaining the Results in Stata 1/2

I

Pooled OLS :

↵ˆ

,

ˆ

and other stats directly in the output

I

Between model (OLS regression on the average per individual) is obtained similarly to POLS

I

Within model (= Fixed-effects)

I Individual↵ˆi estimates can be recovered after estimation (not consistent)

I

Stata presents an intercept in the FE estimates

I Rewrite model (3)yit=↵i+xit0 +✏it as yit=↵+↵i+xit0 +✏it

I leads to perfect multicollinearity

I We need to normalize

I In theory, we chose↵=0 for simplicity

I Instead Stata has chosenP

ii=0 because of the analogous assumption in RE :E(↵i) =0

I In all cases, it has no bearing on the estimates

(62)

Obtaining the Results in Stata 2/2

I

First-Differences estimator is not readily available in Stata

I In my version at least

I Define the first differences first, then apply the POLS

I Lag 1 period in Stata : by i: gen lnhrsL1 = lnhrs[_n-1](n indexes observations,by iindicates to lag by individual)

I Thenby i: gen lnwgD1 = lnwg-lnwgL1for the 1st diff

I

RE : 2 versions

I GLS (OLS estimation of the transformed model as seen in the section on estimators)

I ML (I will not detail)

(63)

Linear Panel Data Estimates

POLS Between Within First Diff RE-GLS RE-MLE

↵ 7.44 7.48 7.22 .001 7.35 7.35

.83 .067 .168 .109 .119 .12

.000 – .624 – .585 .586

N 5320 532 5320 4788 5320 5320

I

is the one from the RE estimator (10)

yit ˆ ¯yi =⇣

1

ˆ⌘

µ+⇣

xit ˆ ¯xi

0

+⌫it

I It can be infered with some other estimators

(64)

Slope Parameter Estimates

I

The estimates of the slope parameter differ across the different estimation methods

I

The between estimate that uses only cross-section variation is less than the pooled OLS estimate

I

The within (= fixed effects) estimate of 0.168 is much higher than the pooled OLS estimate of 0.083

I

The first-differences estimate of 0.109 is also higher than that of pooled OLS

I but is considerably less than the within estimate

I

The RE estimates of 0.119 or 0.120 lie between the between and within estimates

I This is expected, as RE estimates can be shown to be a weighted average of between and within estimates

I The two RE estimates are very close to each other

(65)

Which estimates are preferred ?

I

within and first-difference estimators are consistent under all models (pooled, RE, and FE)

I The other estimators areinconsistentunder the FE model

I

The most robust estimates are therefore the within or first-differences estimates of 0.168 or 0.109

I

efficiency loss in using these more robust estimators : next section

I

Hausman test (following next section) : whether or not FE model is appropriate

I Turns out Hausman test rejects the null hypothesis of RE

I That seems natural because of the large difference between the coefficients estimates

(66)

First-difference vs. Fixed-effects

I

Both are consistent under all models (pooled, RE, and FE)

I

If

T =

2 they are identical

I

If

uit

has no serial correlation, FE is in principle better

I Because it does not throw away one period of data

I

If

uit

is a random walk, FD is in principle better

I Because it transforms the series to order 0

I

If there is correlation between

xit

and

uit

(endogeneity)

I Both FE and FD become inconsistent

I

Testing is complicated

I More details requires introducing time-series issues

(67)

Outline

Panel Data Models Panel Data Estimators Panel Data Inference

Fixed Effects vs. Random Effects Unbalanced Panel Data

(68)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(69)

Panel-Robust Statistical Inference

I

The various panel models include error terms :

uit

,

"it

,

i I

In many microeconometrics applications :

I Reasonable to assume independence overi

I

The errors are potentially

1. serially correlated (correlated overt for giveni ) 2. heteroskedastic (at least acrossi)

I

Valid statistical inference requires controlling for both of

these factors

(70)

0 BB BB BB BB BB BB BB BB BB BB BB BB

@

sv2a+sv2e1 sv2a · · · sv2a

sv2a sv2a+sv2e1 ... ... ... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e1

0 · · · 0

0 ... ... ...

... ... ... 0

0 · · · 0

sv2a+sv2eN sv2a · · · sv2a

sv2a sv2a+sv2eN ... ...

... ... ... sv2a

sv2a · · · sv2a sv2a+sv2e

N

1 CC CC CC CC CC CC CC CC CC CC CC CC A

I

The White heteroskedastic consistent estimator can be extended to short panels

I since for theithobservation the error variance matrix⌃is of finite dimensionT ⇥T whileN! 1

(71)

Reminder : The White heteroskedastic-consistent estimator

I

Classical linear model

y =x0 +✏

with

E⇣

✏✏0

=⌃6= 2I

I OLS unbiased and consistent

I Var⇣ ˆOLS

=⇣

X0X⌘ 1

X0⌃X⇣

X0X⌘ 1

6

= 2

X0X⌘ 1 I

For pure heteroskedasticity, White (1980) shows that

S =

1

N

XN i=1

ˆ

2iXiXi0

I whereˆ✏i is the OLS residual

I is a consistent estimate of N1X0⌃X under general conditions

I

The formula can be extended for Autocorrelation

I But often autocorrelation reveals time-series properties

I That need to be investigated in more details

(72)

Panel-Robust Statistical Inference

I

Panel-robust standard errors can thus be obtained

I withoutassuming specific functional forms for within-individual error correlation or heteroskedasticity

I

So we use inefficient estimators

I but at least we get their variance right

I Only RE estimator in RE model is efficient

I Moreefficientestimators using GMM : Chap 2

I

The panel commands in many computer packages calculate default se assuming iid errors

I erroneous inference

I Ignoring it can lead tounderestimatedse and over-estimatedt-stat

I

FE or RE tend to reduce the serial correlation in errors, but

not eliminate it

(73)

Derivation of the White heteroskedastic-consistent estimator

I

Rewrite the panel estimators as OLS estimation of

in

˜

yit = ˜wit0✓+ ˜uit

(12)

Iit

a known function of only

yi1, ...,yiT

; similarly for

it0

and

wit0 =⇥

1

xit0

;

it

and

uit

I Pooled OLS : no transformation,✓=⇥

00

I Within : y˜it=yiti, w˜it=xit−¯xi only time-varying regressors

I ✓: coefficients of the time-varying regressors

I ...

I

!! Such transformations will induce serial correlation even if

underlying errors are uncorrelated !!

(74)

Notation

I

Stack observations over time periods for a given individual :

I ~yi =W~i0✓+~ui where

I ~yi : T⇥1

I for the first-differences model,(T 1)⇥1

I W~i : T⇥q

I

OLS estimator

✓ˆOLS =

" N X

i=1

W ~

0i

W ~

i

# 1

X

i

W ~

i0

~y

i

(75)

OLS Variance

I

Asymptotic variance of

✓ˆOLS

is

Vh

✓ˆOLSi

=

" N X

i=1

W ~

0i

W ~

i

# 1

X

i

W ~

i0Eh

~u

i

~u

0i|

W ~

i

i

W ~

i

" N X

i=1

W ~

0i

W ~

i

# 1

I

= variance of OLS estimates of the ~ model

I We need a consistent estimate of it to make classical inference, e.g. t-test

(76)

Panel-Robust “Sandwich” Variance

I

Consistent estimation of

Vh

✓ˆOLSi

in this panel setting

I Analogous to the cross-section problem of obtaining a consistent estimate ofVh

✓ˆOLSi

that is robust to heteroskedasticity of unknown form

I Complication is thevectorui rather than a scalarui

I

Panel-robust estimate of

Vh

✓ˆOLSi

I Controling for both serial correlationandheteroskedasticity

V\h

✓ˆOLS

i=

" N X

i=1

W ~

i0

W ~

i

# 1

X

i

W ~

0i

^ ~u

i

^ ~u

0i

W ~

i

" N X

i=1

W ~

i0

W ~

i

# 1

(13)

where ^ ~u

i =

~y

i

W ~

0i✓ˆ

(77)

Panel-Robust “Sandwich” Variance

I

Estimator (13) assumes independence over

i

and

N! 1

I but permitsV[uit]andCov[uit,uis] to vary withi,t, ands

I the case for short panels

I

Panel-robust standard errors based on (13) can be computed by use of a regular OLS command

I if the command has acluster-robuststandard error option

I as in Stata, cluster on the individuali

I

Common error : estimate OLS of

it = ˜wit0✓+ ˜uit

using the standard robust se option

I Only adjusts forheteroskedasticity

I In practice in a panel : more important to correct forserial correlation

(78)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

(79)

Reminder : The Bootstrap

I

Bootstrap hypothesis : if we could resample the population in the same conditions, we would observe something similar to a resampling with replacement of the observed sample

I “Mediocrity principe”

I Not the same as representativity as our sample might not be representative

I

Principle

I Sample the current sample of sizenwith replacement

I makendraws, each with probability 1/n

I Called “Bootstrap pair” as bothy andX are sampled

I Replicate that processB times: B differentpseudo-samples

I For each pseudo-sample<Yb,Xb>: one vector✓ˆb

(80)

Reminder : The Bootstrap 2

I

To construct a confidence interval for one element

k

from

I We haveB estimates✓ˆkb

I TakeB=10 000 and order those estimates from smallest to largest

I then estimates number 250 and 9750 are the lower bound and upper bound, respectively, of the 95%confidence interval

Why is that interesting?

1.

No distributional hypothesis

1.1 Although there must be no correlation between observations 1.2 Therefore, in panels, resampling is oni only, usingallt for

eachi in the new sample

2.

Confidence intervals can be calculated

2.1 for any function of the estimated parameters, including non-linear ones

2.2 for parameters estimated from models without exact finite sample properties

(81)

Panel Bootstrap Variance

I

For each of the

B

pseudo-samples : OLS of

it

on

it

I B estimates✓ˆb,b=1, ...,B

I

Variance matrix panel bootstrap “empirical” estimate :

VBoot\

⇣✓ˆ⌘

=

1

B

1

XB b=1

⇣✓ˆb ✓¯ˆ⌘ ⇣

✓ˆb ✓¯ˆ⌘0

(14) where

✓¯ˆ=B 1P

b✓ˆb

I

May be slow – see e.g. Cameron & Trivedi

I

Given independence over

i

I Consistent asN! 1

I Asymptotically equivalent to Panel-Robust “Sandwich”

I 8form of heteroskedasticity or autocorrelation (as White)

I Can be applied to any panel estimator ~

(82)

Outline

Panel Data Models

Fixed Effects & Random Effects Panel Data Estimators

Pooled OLS Estimator Between Estimator Within Estimator

First-Differences Estimator Random Effects Estimator Fixed vs. Random Effects

Hours & Wages Example Panel Data Inference

Panel-Robust Inference Bootstrap Standard Errors Hours & Wages Example Fixed Effects vs. Random Effects

Non-Test Elements of Choice Hausman Test

Unbalanced Panel Data

Références

Documents relatifs

3.6.1 First-step results: determinants of expected subsidies 71 3.6.2 First-step results: additionality of public subsidies on R&amp;D probability 72 3.6.3 Second-step

I Students should improve both their applied econometrics skills and their English level. I Attendance and interactions

I Rather than a long panel such as a small cross section of countries observed for many time periods.. First advantage of panel data : Precision. I More observations because of

Abstract: We calculate the bias of the profile score for the autoregressive parameters ρ and covariate slopes β in the linear model for N × T panel data with p lags of the

Spatial panel data models are exactly designed to deal with both type of heterogeneity: pure individual heterogeneity captured by fixed effects and interac- tive heterogeneity

For the linear dynamic model we applied a degree-of-freedom correction to account for the estimation of the error variance and, for the half-panel jackknife estimates of θ 0 ,

While individuals in poverty (according to the EU definition) report sharply lower levels of well-being than when they are not in poverty, Table 2 does not tell us anything about

I set out to purchase the local paper of record on this particular day, August 26, because I’m compelled to think about Carter and how the daily arrival of this printed, folded