Lectures in Applied Econometrics Amazonian Deforestation
Pr. Philippe Polomé, Université Lumière Lyon 2
M1 APE Analyse des Politiques Économiques M1 RISE Gouvernance des Risques Environnementaux
2016 – 2017
Outline
Introduction Time-series Theory
Deforestation Data & Analysis References
Definition and scope
I
United States Environmental Protection Agency defines deforestation as the "permanent removal of standing forests."
I
Amazonian Deforestation is monitored by Landsat since 1975
I Google publishes some images
Why is this an important issue ?
I
Biodiversity reservoir
I Habitat loss
I
Carbon sinkhole
I Old-growth forest are (net) carbon sinkholes
I + deforesting emits carbon
I By burning
I Released from soil I
Changes moisture in the air
I Causes droughts down South
Why is this an important issue ?
I
Social issue 1 : not developement
I Deforestation is mostly due to agriculture
I Cattles mostly (about 80%), on planted pasture
I The Amazon basin appears generally not well-suited for crops, soy-bean in particular [9].
I 70% of formerly forested land in the Amazon, and 91% of land deforested since 1970, is used for livestock pasture
I This in turns causes soil erosion and flash floods
I
Social issue 2 : Indigenous people
Deforestation Time Profile
Source: Landsat images interpreted by PRODES project of the Instituto de Pesquisas Espaciais since 1975 - Values for some years linearly interpolated.
I
This is the “Legal Amazon” deforestation
I Why is it declining since the mid-2000’s ?
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Causes of Deforestation
I
Are certainly complex
I but primarily driven by human action
I Hence economic might be a factor
I And thus it may compete with other economic activities
I
The 70’s and 80’s deforestation had been induced by government policies and subsidies
I Slash and burn agriculture appears much less prevalent than it was
World Bank 2004 [9]: Deforestation in the 90s and early 00s
I
Attributed mainly to cattle ranching
I Soybean to a much lesser extent
I Grass does not deplete the soil so much
I The 1995 peak was attributed to accidental forest fire
I
Agriculture and cattle ranching may be more profitable in the Amazon due to
I weak land titling, land grabbing, irregular labor contracts,
I and the continuous process of opening up of new forest areas
I The later are carried out at low cost by small farmers
I who prepare the land for medium- and large-scale cattle ranching which follow them
I
Small farmers are less blamed than they once were
Causes of Deforestation
I
Weinhold and Reis [12]
I analyse the way roads creation induces deforestation
I it turns out that is does only in areas that have not seen deforestation
I but it reduces deforestation in areas where land is already cleared I
Nasa Earth Observatory
1states:
This pattern follows one of the most common deforestation
trajectories in the Amazon. Legal and illegal roads penetrate a remote part of the forest, and small farmers migrate to the area. They claim land along the road and clear some of it for crops. Within a few years, heavy rains and erosion deplete the soil, and crop yields fall. Farmers then convert the degraded land to cattle pasture, and clear more forest for crops. Eventually the small land holders, having cleared much of their land, sell it or abandon it to large cattle holders, who consolidate the plots into large areas of pasture.
1Anonymous, 2012 data, accessed October 2015 at
http://earthobservatory.nasa.gov/Features/WorldOfChange/deforestation.php
Causes of Deforestation
I
“Geography”: Kauppi et.al. 2006 [7]
I Above a certain level of income, countries stop to deforest
I Evidence is essentially a world-wide cross-section
I
This points to an explanation economists are familiar with:
I Deforestation as worldwide cross-section follows an Environmental Kuznets Curve
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Hypothesis: Environmental Kuznetz Curves EKC
I
S. Kuznets (1955) suggested an inverted U-shaped relationship between economic growth and income inequality
I At first, economic development induces major inequalities between the richs and the poors
I As income (per capita) rose, inequalities would become more intolerable and disappear
I Possible because of money transfer, better opportunities or better education / health care / public goods
I
Environnemental KC suggested by Grossman & Kruger [4][5]
I Environmental damage first worsen and then recover as income per capita rises
I Emissions, deforestation,...
Formal EKC
I
Larger levels of per capita income are associated with gradually lower levels of pollutants
yt=b0+b1GDPht+b2GDPh2t+gxt+et
Evidence for EKC
I
Cross-section or Panel studies found EKC on occasion
I Is that causal or spurious?
I
Stern 2004 [10]
I EKC forCO2andCO2eqemissions is an artefact (=spurious) of the analysis
I Instead, the apparent EKC is a mixture of effects:
1. Pollution increases roughly monotonically (linearly) with income 2. But “time” reduces pollution, that is, income-independant policies 3. In rapidly growing middle-income countries, the income effect (%
pollution) overwhelms the time effect
4. In wealthy countries, growth is slower, and pollution reduction efforts can overcome the income effect
I That is what causes an apparent EKC effect in cross-section or panel data sets
Time-series
I
Stern 2004 [10] and others clearly identify EKC as a time-series issue
I as a cross-section forces all countries to the same path
I and a panel only allows a different starting point but the same curvature
I
In other words, Kauppi et.al. [7]
I make the same mistake as earlier papers on identifying an EKC in CO2emissions
I Their results could then be an artefact
Comparing Deforestation across Countries
I
Barbier and Burges (2001)
I Survey of the economics of tropical deforestation
I Indicate that even if countries might follow an EKC,
I They are unlikely to follow all the same path I
Lambin & Meyfroidt 2011 [8]
I using forest cover evidence in a more “geographical” study
I indicate that “there is no default forest transition pathway”
I
Both these results are to be interpreted against resorting to
cross-sections or even panel studies to test EKC
Meta-analyses
I
Lack of an EKC is generally NOT clearly established for damages / emissions other than
CO2I For deforestation, mixed issue
I
Choumert et.al. [2]
I Review 69 papers on Environmental Kuznets Curve for deforestation
I They find only one paper using time-series
I Probably Shafik, N. & Bandyopadhyay, S., 1992
I It is not cointegration
I The economics literature does not appear to supply an explanation for the current decreasing trend in deforestation
I But a more “geography-oriented” literature does not hesitate to point to economic factors
Objectives
I
This paper proposes to test EKC for Brazil deforestation
I Because there is a well-documented and relatively long time-series
I Currently 40 years
I
Regression analysis of Deforestation on
GDPhand its square ?
I Number of issues
I But the cointegration issue appears both essential and untreated
EKC Econometric Issues in a Nutshell
I
Deforestation is a time-series
I Non stationary “stochastic trend”
I No stable expectation or variance
I Several well-known statistical tests
I But no “deterministic trend”yt=a+bt+et
I So Deforestation decline cannot be “only time”
I GDPh
is also a non stationary series
I Regression of non-stationary on non-stationary is spurious
I Unless Cointegration
I A difference between the two series is stationary
I Large literature in econometrics / finance
I But not used in Deforestation EKC studies (roughly 70 studies)
I Could EKC be the cointegration relation ?
I
Cointegration relation could be more complex
I Other series should be considered
Outline
Introduction Time-series Theory
Deforestation Data & Analysis References
Time Series
I
Are very common
I Most macroeconomic data : GDP, inflation, unemployement...
I Individual (or population aggregate) employment, wage, consumption ...
I Stock quotes : yearly, monthly, daily, real-time. . .
I Exchange rate
I Sales / purchases in a firm
I
Time-series are often considered
autocorrelatedI The present is influenced by the past
I
This section is mostly based on
I Wooldridge [13]
I the Gretl User’s Guide[3]
Time-series vs. cross-section
I
Time-series observations are naturally ordered
I Cross-section data has no natural order
I except geo-localised data
I
Time-series observations proceed from a random
stochastic processI Cross-section data proceed from a random sample
I
Time-series models are usually indexed by
t:
yt=b0+b1x1t+. . .+bkxkt+etDistributed lags model
I
Model
yt=b0+d0xt+etis said
staticI Classical Phillips’ curveinflationt =b0+b1unemploymentt+et I
Finite
distributed lags(of the regressor) models
I one or severalximpacty with one or more lags
I gft=b0+d0tet+d1tet 1+d2tet 2+et
I gf “general (average) fertility” (re-used later)
I te“tax exemption”
I this an “order 2” distributed lag
I d0= immediate impact (= short term) fromxony
I The setd0,d1, . . . ,dqdescribes the long-term relation betweenx andy
Shocks
I
Order 2 model
yt=b0+d0xt+d1xt 1+d2xt 2+et ITransient shock (1
t)on constant
xat time
tI yt=b0+d0(x+ ) +d1x+d2x+et
I yt+1=b0+d0x+d1(x+ ) +d2x+et+1 I yt+2=b0+d0x+d1x+d2(x+ ) +et+2
I
Permanent shock (starting from time
t)on constant
xI yt=b0+d0(x+ ) +d1x+d2x+et
I yt+1=b0+d0(x+ ) +d1(x+ ) +d2x+et+1 I yt+2=b0+d0(x+ ) +d1(x+ ) +d2(x+ ) +et+2
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Trend models
I
Model
yt=b0+d0t+eti s a
trendI yt“follows” the time flow with a stochastic noisee
I
Several specifications
I Computer-simulated onTrend and RndWalk.odson website
I Monte-Carlo
Linear trend y
t= b
0+ b
1z
t+ b
2t + e
t, t = 1, 2 . . . T
Quadratic trend y
t= b
0+ b
1z
t+ b
2t + b
3t
2+ e
tI
Not easy to spot or to differentiate from a
lnExponential trend y
t= exp (b
0+ b
1z
t+ b
2t + e
t)
I ln(yt) =ln(yt) ln(yt 1)⇡yt yt 1
yy 1
I Thelog-differentialapproximately equals the growth rate
I For rather small rates
I
An expon. trend without regressor is then
lnyt=b0+b2t+etI Happens whenyhas the samegrowth rateeveryt
I ln(yt) =b2+ et: cst growth rate + zero-expectation error
Spurious Regression
I
Economic chronological variables may have a temporal trend
I Regressing a trend on a trend often seems like a good idea :
I R2&tare often high
I However, unobserved (by the econometrician) variables may actually be causing the trends
I 3 examples below
I The unobserved variables may becontroled forintroducing a deterministic time trend
I the significance of the other regressors might then be brought back to their correct levels
I the time trend maybe only a proxy
I So: not explaining anything
Spurious Example 1: The storks and the babies
I
Fisher, 1936, Copenhagen, post WWII decade
I B=b0+b1S+e
I bˆ1=.15with t-stat 5.98
I What is up ?
Spurious Example 1: The storks and the babies
I
Fisher, 1936, Copenhagen, post WWII decade
I
More likely: reconstruction + rural migration to the city
I Assuming migration and construction are linear:
time trend
I B=b0+b1S+b2t+e
I bˆ1=.03with t-stat 0.34
I However low dof
Note on storks and babies
I
Birds that leave Northern & Central Europe in autumn and come back early april
I That is about 9 months after the summer solstice (21 of June / Saint John)
I
The summer soltice was an important pagan (and later Christian) festival
I In which many people would marry...
Spurious Example 2: property investment and prices
I
Gretl
I File!open data : sample file
I Wooldridge tab
I hseinv.gdt
I data from [13]
I
1947-88 series
I General info under Data!Dataset info
I housing investment per cap
I housing price
I ...
I
Data
!dataset structure
I time-series
I indicate
I periodicity
I start period
Spurious Example 2: property investment and prices
I
model menu : OLS Regression
ln\(invpc) = .55+1.24ln(price)
I
Household property investment elasticity wrt price is significantly different from zero but not from one
I A change in price appears completely passed on to the investment
I But both series follow a trend
Spurious Example 2: property investment and prices
I
Adding a trend (data menu)
ln\(invpc) = .91 .38ln(price) +.0098t
I
Price is not significant any more
I But the (real) investment grows of about 1% yearly
I Possibly, this might be due to omitted regressors
I
The previous result was
spuriousI
If
yand
xhave opposing trends
I Introducing a trend mayincreasethe significance ofx
I The t-stat of a trend is not necessarily correct as we will see on the section on I(1)
Spurious Example 3: simulated data
I Trend and RndWalk.ods
tab
Trend on trendDe-trending
I
To purge the data from the trend (detrend)
I Instead of introducing a linear trend in the regression 1.
Regress each variable from the model on a trend
2.Use the residuals from each equation as new variables
I like a redefinition
I
For example
yt=b0+b1zt+et1.
Creation de-trended variables
yt=g0+g1t+zt99Kytd= ˆztzt=q0+q1t+xt99Kztd= ˆxt
2.
Regress the de-trended variables
ytd=lztd+ntI Intercept no more useful sinceE ytd =E ztd =0
De-trending (2)
I
Introducing a trend and using de-trended variables are in principle
equivalentapproaches
I
But de-trending is a 2-step method
I introduces ameasurement errorin the 2nd step
I The de-trended variables are constructed on the basis ofestimated parameters
I xˆt is a measurement ofztd witherror
I Thus the results are not identical
I
Why use de-trending ?
I Time-series regressions often have a highR2
I mostly because of the trend, which does not explain anything
I So suchR2does not reflect the real explanatory power of the estimated model
I TheR2of the regression using de-trended variables is likely a better measure of the true explanatory power of the model
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Stationnarity
I
A stochastic process is
stationnaryI When its distribution does not change through time
I parameters included
I Stationnarity is similar to“identically distributed”
I
A trend is not stationnary since its expectation changes with time
I
A stochastic process is said
covariance-stationnaryI If its expectation and its variance are constant through time
I And if the covariance between 2 periods depend only on the number of periods between them
I
Stationnary process are covariance-stationnary
I unless the covariance is•
Integration
I
A stationnary process is
integrated of order zero I(0)if
I xtandxt+hare “nearly independant” whenh!•
I We also sayweakly dependentfor I(0)
I
A similar definition exists for a non-stationary process
I I(0) is similar to“independently distributed”
I
A covariance-stationnary series is I(0) if
I its correlation betweenxtandxt+h!0whenh!•
I
I(0) implies that some law of large numbers and central limit theorem may be applied
I It replaces the (simple) random sample hypothesis, that is “iid”
I I(0) is a sufficient condition to use a time series in regression
MA(1) : moving average process of order 1
I
MA(1)
xt=et+aet 1,
t=1,2, . . .I {et:t=0,1, . . .}is an i.i.d. sequence with mean zero and variancese2
I etis aWhite Noise I
An MA(1) is I(0)
I Adjacent terms (in a sequence) are correlated
I As soon as there are 2 periods between 2 terms of an MA(1), correlation falls to zero sinceetis i.i.d.
I Sinceetis i.i.d., an MA(1) is stationnary
I Clearly an MA(1) is covariance-stationnary
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
I
AR(1) is said
stablewhen
|r|<1[vs. explosive]
I µt⇠iid 0,sµ2 white noise
I Expectation 0, constant variance and covariance 0
I
We can write
et=µt+rµt 1+r2µt 2+. . .I Sovar(et) =se2=sµ2+r2sµ2+r4sµ2+. . .= sµ2 1 r2
I Andcov(et,et 1) =cov(ret 1+µt,et 1) =rse2= rsµ2 1 r2
I
Substituting successively in the AR(1)
et=ret 1+µt=r(ret 2+µt 1) +µt=r2et 2+rµt 1+µt=. . .=rset s+
s 1 i=0
Â
riµt iThus
cov(et,et s) = rssµ21 r2 =rsse2
Matrix of var-cov of AR(1) errors
⌃e =se2
0 BB BB B@
1 r r2 ··· rT 1 1 r ··· rT 2
... ...
1 r
sym 1
1 CC CC CA
=se2IT+se2 0 BB BB B@
0 r r2 ··· rT 1 0 r ··· rT 2
... ...
0 r
sym 0
1 CC CC CA
=se2IT+se2
AR(1) stable is I(0)
I
Stationnary since
µti.i.d.
I
Cov
!0when
time between
periods
!•Var-cov matrix of the OLS coefficients MCO with AR(1) errors
I y=Xb+e
with
et=ret 1+µt⌃bˆ =⇣
X0X⌘ 1
X0⌃eX⇣
X0X⌘ 1
=⇣
X0X⌘ 1 X0⇥
se2IT+se2 ⇤ X⇣
X0X⌘ 1
=se2⇣
X0X⌘ 1 +se2⇣
X0X⌘ 1
X0 X⇣
X0X⌘ 1
I
It cannot be shown whether it is larger than
⌃bˆ=se2⇣X0X⌘ 1
I Thus, it is not known whether the t-stats will be over- or under-evaluated
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Random walk definition
I
In an AR(1), the hypothesis
|r|<1is crucial for the series to be I(0)
I
Many economic time-series are better described with an AR(1) where
|r|=1:I yt=yt 1+et : called arandom walk
I
Prediction
I SinceE et+j|yt =08j 1, we haveE(yt+h|yt) =yt 8h 1
I So that whatever the time differenceh, thebest predictionfor yt+hisyt
y
t= y
t 1+ e
twith e ⇠ n (0, 4) and y
0= 0
Computer-simulated data, to show random walk profile
I Trend and RndWalk.ods
tab
Rnd walkRandom walk and OLS
I
Variance of a random walk
%linearly with time (in theory)
I
An AR(1) process is thus non-stationnary
I since its distribution changes with time
I
It can be shown it is not I(0) either
I xtandxt+hdo not become nearly independents whenh!•
I
So the OLS hypotheses for time-series (i.i.d. equivalent) are not satisfied
I OLS has unknown properties
I(1)
I
A random walk is one particular case of
unit rootor I(1) process
I
Such an I(1) process is “strongly persistent” or “long memory”
I “Trend”6=“strongly persistent”
I Series like interests rates, inflation or unemployement are often considered “long memory”
I but have no clear trend
I
But in many other cases, a long memory series also has a clear trend
I e.g. a random walk with drift:yt=⌦+yt 1+et
I ⌦is the drift
I See plot next page
y
t= ⌦ + y
t 1+ e
twith e ⇠ n (0, 4), y
0= 0 and ⌦ = .05
Drift :
yt=⌦+yt 1+et=2⌦+yt 2+et 1+et=. . .Computer-simulated data, to show random walk with drift profile
I Trend and RndWalk.ods
tab
Rnd walkRegression between I(1)
I A simple regression between 2 independent I(1) will often result in a significant t-stat
I Even without trend in any variable
I
Let 2 random walks
yt=yt 1+etand
xt=xt 1+atI Specifyyt=b0+b1xt+xt,
I ThenH0:b1=0is true,
I butxtcontainsyt 1which is a random walk,
I Then the t-stat associated withbˆ1,tbˆ1!•whenT !•
I The limit distribution oftbˆ
1is not normal
I So we are led to thinkxis a significant regressor fory
I Simulated Example
I Trend and RndWalk.odstabSpurious I(1)
Remark: the types of spurious regressions
1.
In a cross-section
I Spurious regression may be due tounobserved heterogeneity
I 2 variables are unrelated, but are both correlated to a third
I Regressing the 1º on the 2º, it appears that the relation is significant
I but inserting the 3º variable, then the 2º looses its significance
I This phenomenon mayalsooccur in time-series
I e.g. Storks & babies
2.
A spurious relation also occurs between series who share a
trendI Both series have a positive trend or a negative one
I This issue may be solved by inserting a trend in the model
I but not always
3.
2
I(1) seriesoften appear in a spurious relation
First Differences
I
The first difference of a unit root
yt:
yt yt 1I is I(0):ytandyt+hbecome near independent whenh!•
I and is often stationnary
I its distribution does not change with time
I It is said that the series isdifference-stationnary
I
Many series
ytthat are
>08tare such that
ln(yt)is I(1)
I Then we often can useln(yt) ln(yt 1)in an OLS regression
I Sinceln(yt) ln(yt 1)⇡yt yt 1
yt 1 the interpretation is in terms of growth rates
I That is : groth rates are often I(0)
I
Differenciating a time-series also remove any linear trend
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Correlation
I
Let
r1=Corr(yt,yt 1)I Is called the1º order autocorrelationof{yt}
I r1can be estimated from the sample correlation betweenytand yt 1
I rˆ1=ÂTt=2⇣
yt ÂTt=2yt
⌘⇣
yt 1 ÂTt=2yt 1
⌘
/(T 2) I
However, the
sampling distributionsof
rˆ1are very different
when
r1is close to 1 than when
r1is far from 1
I Whenr1is close to 1,rˆ1may have a large downwards bias
I Otherwise, the sample correlation is unbiased and consistent
I As a rule of thumb, to “counter” this downward bias, the series should be differenciated as soon asrˆ1> .8, at worstrˆ1> .9
I
When the series has a clear trend
I it is first de-trended and thenr1is estimated
I Otherwise,rˆ1tends to be over-estimated
Unit Root Test
I
AR(1) model
yt=a+ryt 1+etI Dickey-Fuller
(DF) Test
H0:r=1against
H1:r<1I Subtractyt 1on each side
I yt=a+qyt 1+etwithq=r 1
I UnderH0:q=0(sor=1),yt 1is I(1)
I So that the associated t-stat in an OLS regression does not converge to a normal
I but to aDickey-Fullerdistribution
I We testq=0(sor=1) calculating the usual t-stat
I but compare it with the Dickey-Fuller distribution tabulated values
Augmented DF Test
I
Same test as DF for
r=1but in the model
I yt=a+qyt 1+g1 yt 1+g2 yt 2+···+gp yt p+et
I This is most often used: “ADF” test
I
The test can be specified
I Without constant yt=qyt 1+g1 yt 1+···+gp yt p+et
I With a trend
yt=a+bt+qyt 1+g1 yt 1+···+gp yt p+et
I How to choose ?
I a=0andb=0: “pure” random walk
I a6=0andb=0: random walk with drift
I a6=0andb6=0: random walk with drift and trend
I These cases are discussed below
Trend and I(1)
I
For series that have clear time trends, the test is
I yt=a+bt+qyt 1+g1 yt 1+···+gp yt p+et I
A trend-stationary process
I which has a linear trend in its mean but is I(0) about its trend
I can be mistaken for a unit root process
I if we do not control for a time trend in the test [Wooldridge [13]]
I Cfr how a random walk with drift looks like a trended I(0)
I
The usual DF or ADF test on a trending but I(0) series
I (that is not including a trend term)
I has little power for rejecting a unit root
I power = probability of rejecting the null hypothesis of a unit root when there isnotone
I the trend makes us believe there is a unit root
I
BUT, if we include a un-needed trend, we loose power
I So try to avoid including the trend as much as can be
Notes on DF
I
When we include a time trend in the regression, the critical values of the test change.
I
Omitting the intercept
ain the DF equation is rarely done because of biases induced if
a6=0I
We can allow for more complicated time trends, such as
quadratic, is also seldom used.
How many lags ?
I
The inclusion of the lagged changes is intended to “clean up”
serial correlation in
yt IThe more lags,
I the more initial observations we lose
I the smaller the power of the test
I
Too few lags,
I the size of the test will be incorrect, even asymptotically,
I size = probability of rejecting the null hypothesis of a unit root when thereisone
I because the validity of the DF critical values relies on the dynamics being completely modeled
I
Often,
I annual data, one or two lags usually suffice [Wooldridge [13]]
I monthly data, 12 lags may be used
I large sample size : you may experiment
One application of the DF test
I r3t
(annualised) interest rate (or yield) on 3-month treasury bills
I “Bond equivalent yields”, in the financial pages
I
In
Gretl, data in INTQRT.gdt, using Wooldridge [13]I Change structure of the dataset: monthly, initial date unknown
I
Estimate
yt=a+qyt 1+etI OLS cr3 against 0 r3_1
I Coefficient of r3_1 is−0,0907, sorˆ=0.9093
I t-stat of r3_1 is -2.47, but does not follow a t distribution
I On the r3 variable
I Menu “variable”!“unit-root test”!“Augmented...”
I No lag (so: simple DF test), with constant, without trend
I This produces the same results as the regression, with a correct p-value of .12 so¬R H0: there is a unit root
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Motivation & Definition
I
Taking first differences of I(1) series before regressing them is a
“safe strategy”
I but limits the analysis to short term relations
I That is: one-period changes explained by one-period changes
I Cointegrationmay give back its meaning to regressions between I(1) series in levels (or logs)
I
If
{yt}and
{xt}are I(1), then in general
yt bxtis I(1)
8bI However, it ispossiblefor someb6=0,yt bxt to be
I I(0): Asymptotically un-correlated with its own past
I Stationnary : Constant expectation & variance
I When such ab exists, we say that{yt}and{xt}arecointegrated
I bis the cointegration parameter
I {yt}
and
{xt}cannot move much apart from each other in the
long run
Example: Treasury bills Interest Rates
I r6t
(annualised) interest rate series of 6-month treasury bill
I T-bill,r3t idem but 3-month
I
Data in INTQRT.gdt from Wooldridge [13]
I We saw earlier thatr3t had a unit root
I That is also true ofr6t
I
Let
Sprt=r6t r3t(spr for spread)
I b=1: we know the coint. param.
I Test if Spr has a unit root
I DF stat -7.71 with a corresponding near-zero p-value
I thusRH0: spr has unit root
I sor6t arer3tcointegrated with parameter 1
I
Interpretation : if the rates moved apart, one of the two would become a relatively more attractive investment than the other
I therefore, investors would pay more for it, its price would rise
I since the interest rate is the return of the bond divided by its price, it would decrease automatically
Cointegration test
I
When we know the value of the cointegration coefficient
bI then we test whetheryt bxt has a unit root: DF or ADF
I
Usually, we do not know
bI Ifytandxt are cointegrated
I OLS isconsistentforbinyt=a+bxt+ut
I otherwise, OLS yields spurious results andb is falsely significant
I Engle-Granger Test= Dickey-Fuller onuˆt=yt aˆ bˆxt
I Regress uˆtonuˆt 1with a constant, without lag ˆ
ut=d+guˆt 1+xt
I Ifuˆt 1is not significant, thenuˆtis I(0)
I Thenytandxtarecointegrated
I Again, the test uses a special distribution, not at
Engle-Granger Test
I
If the lag order, k, is greater than 0,
I then k lags of the dependent variable are included on the right-hand side of the test regression
I Gretl allows "test down from maximum lag"
I From a selected lag order taken as a maximum,
I the actual lag order used is obtained by testing down
I AIC can be used to compare the different lag levels I
If
ytor
xthas a trend, it must be modeled
I See Wooldridge 2012 p648 [13]
I Where the trend is improperly called a drift
Engle and Granger 2003 Nobel Prize in Economics
“for methods of analyzing economic time series with time-varying volatility
(ARCH)”
Robert F. Engle
with common trends (cointegration)”
Clive Granger
Example: cointegration between fertility and fiscality
I
In the USA “personal exemption” is a tax break on household income
I Among others, the more the HH has children, the bigger the tax break
I The amount is relatively small, but changes arbitrarily through time
I One can then imagine testing a link between the exemption and the number of births
Example: cointegration between fertility and fiscality
I
Data in Gretl Fertil3.gdt from Wooldridge [13]
I Modify the dataset structure for a time-series, annual, beginning 19??
I gfrbirths / 1000 women 15-44 year-old
I DF: p-value .80 so¬R H0: unit root
I pe“personal exemption”, in real $
I DF: p-value .45 so¬R H0: unit root I
Regressions
I In levelsgfrt=a+bpet+ut
I In first differences gfrt =a+b pet+ ut
gfr and pe
gfrt coef (p-val) gfrt coef(p-val)
Cst 99.4 (0) 92.9 (0) 108.6 (0) Cst -.08 (.92) -.32 (.68) -3.45 (0) pet .05 (.40) -.06 (.36) .03 (.66) pet -.05 (.27) -.05 (.17) -.05 (.19)
pet 1 -.02 (.83) -.04 (.72) pet 1 -.01 (.69) -.009 (.75)
pet 2 .11 (.07) .13 (.11) pet 2 .09 (0) .09 (0)
pet 3 -.005 (.93) -.01 (.88) pet 3 .04 (.17) .04 (.15)
pet 4 .08 (.16) .02 (.04) pet 4 -.04 (.04) -.36 (.05)
Pill (63) -27.8 (0) -30.9 (0) .38 (.97) Pill (63) -2.23 (.07) -1.78 (.14) -5.43 (.005)
t -1.17 (0) t .11 (.01)
DW .12 .17 .25 1.44 1.34 1.57
T 72 68 68 T 71 67 67
The differences between the model in levels and in first differences
suggest to test for cointegration because if the series are not
cointegrated, the regressions in level are spurious
gfr and pe
I
Cointegration test
I Gretl : “model”!“Time Series”!“Coint Test”!
“Engle-Granger”
I Variables: gfr and pe, without lag since we test ˆ
ut=a+buˆt 1+et
I Complete output
I DF for gfr and pe : each is I(1)
I MCOgfronpe
I MCO residuals :¬R H0:b=0
I So¬R H0:1 b=1: the residuals are I(1)
I Thusgfrandpeare NOT cointegrated
I
Control for a possible common trend between
gfrand
peI Same procedure, but select “constant and trend”
I Same conclusion
I
Thus, the relation in levels is spurious (Pill !)
I The one in first differences reflects only the short run
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Definition
I
If
ytand
xtare I(1)
I One can only estimate a model in first differences
I a “VAR”: Vector Autoregressive Model
I e.g. yt=a0+a1 yt 1+g0 xt+g1 xt 1+ut I
But if
ytand
xtare cointegrated
I We can introduce additional I(0) variables
I Letst=yt bxtwhich is I(0)
I For simplicity, assumeE(st) =0
I In the simplest case, we insert a lag ofst
I yt=a0+a1 yt 1+g0 xt+g1 xt 1+dst 1+ut I Thedst 1term is called error correction
I As is the whole model
Discussion
I
An error correction model ECM allows us to analyse the short run dynamics between
ytand
xtI Usually,b has to be estimated
I OLS is consistent under cointegration
I There are other models (Leads and Lags) I
For simplicity, a model without lags of
ytor
xtI yt=a0+g0 xt+dst 1+ut
I yt=a0+g0 xt+d(yt 1 bxt 1) +ut
Discussion
I
Then it should be that
d <0I Ifyt 1 bxt 1>0theny has overshoot the equilibrium int 1
I Cointegration imposes that we return to the equilibrium
I Sinced<0the error correction tends to reduce yt
I Which brings us back to the equilibrium
I Likewise whenyt 1 bxt 1<0
I
However, ECM can also be seen as a context for an estimation of a cointegration relation
I In which short-run terms in ytor xtare introduced to reduce the unexplained noise
I That isyt=p0+p1 xt+p2 yt+p3xt+xt
Vector Error Correction Models
I
Consider an n-variate process of order p
I yt= 0 B@
y1t
...
ynt
1
CAthat is n endog. variables
I yt=µt+A1yt 1+. . .+Apyt p+et
I In real life, we don’t know p
I µtmay include exog. variables I
Rewrite
I tautology:yt s=yt 1 ( yt 1+ tt 2+. . .+ yt s+1)
I so yt=µt+⇧yt 1+Âps=11 s yt s+et
I with⇧=Âps=1As Iand s= Âph=s+1Ah
I called the VECM representation ofyt
Vector Error Correction Models
I
The important things are
I It looks like the expression for the Engel-Granger testyt 1 I Plus terms that look like error corrections yt s
I
Interpretation of
yt=µt+⇧yt 1+Âps=11 s yt s+etI depends on the rank of⇧
I called r
I Ifr=0: all the elements ofyt are I(1)
I and not cointegrated
I Ifr=n: all the elements ofytare I(0)
I sis the lag order of the VECM2
I Note yt s=yt s yt s 1
2In Gretl,sis the chosen Lag-order minus 1 because Gretl first computes a VAR of that lag order, while the VECM is with 1st differences, so one lag order less.
Cointegration
I
Occurs when
0<r<nI Then⇧can be written asab0
I ytis I(1)
I butzt=ab0ytis I(0) I
For ex.
I Assumeb1= 1andr=1
I Then9bs.t. zt= y1t+b2y2t+. . .+bnyntis I(0)
I That isy1t =b2y2t+. . .+bnynt+zt is a long run relation
I ztmay be non-zero but is stationary I
In practice
I We do not knowb
I We estimate it first and then the rest
Outline
Introduction
Causes of Deforestation Environnemental Kuznets Curves
Time-series Theory Trends
I(0)
Autorregressive Errors of Order 1 AR(1)
Stationnarity : integration of order 1 I(1)
Deciding if a time-series is I(1)
Cointegration
Errors correction models Johansen Test of Cointegration Deforestation Data & Analysis
Analysis: EKC, I(1) tests and cointegration
The Estimated Relation Alternative Theory Other Econometric Issues Other Explanatory variables Discussion and conclusions References
Johansen Test of Cointegration
I
Works by computing the eigenvalues of a matrix closely related to
⇧I l is the vector of (real) eigenvalues of⇧ifdet(⇧ lI) =0
I So that⇧n=0has a non-zero solution
I We can guess the relation with the VECM representation I
Count the number of eigenvalues different from zero
I If all are significantly6=0
I then all the processes are I(0) (stationary)
I If there is at least one zero eigenvalue
I thenytis I(1)
I but some linear combinationb0ytis stationary
I If no eigenvalues are significantly6=0
I thenytis I(1)
I also any linear combinationb0yt I SO : no cointegration