Digitized
by
the
Internet
Archive
in
2011
with
funding
from
Boston
Library
Consortium
Member
Libraries
Mr"
%
working paper
department
of
economics
A
COMHOI KEROE
IH
THE TREATHEHT OF
TRENDING TIME SERIES
by
Danny Quah
Jeffrey
K.
Vooldridge
«La_- * 0,3CV«*hruarv
1988
massachusetts
institute
of
technology
50 memorial
drive
Cambridge, mass.
02139
A COHHOI
ERROR
II
TEE TREATMENT OF
TRENDING TIKE SERIES
by
Danny Quah
Jeffrey
K.
Vooldridge
A
Common
Error in theTreatment
ofTrending
Time
Series.by
Danny
Quah
and
JeffreyM.
Wooldridge 'February
1988.*
Both
are at theDepartment
ofEconomics
and
the Statistics Center,MIT.
Quah
is also affiliated withthe
NBER.
We
thank
Olivier Blanchard, Francis Diebold, Stanley Fischer, N.Gregory
Mankiw
and Angelo
Melino
forhelpfulcomments
on an earlier draft.A
Common
Error in theTreatment
ofTrending
Time
Series.by
Danny
Quah
and JeffreyM.
WooldridgeEconomics
Department,MIT.
February 1988.
Abstract
There
aretwo
common
misconceptionsin the analysisoftrending time series. First, ifa series isdifference-stationary,
removing
a linear time trend introducesspurious cyclically. Second,regard-lessof
whether
aseriesisdifference-stationary or trend-stationary, takingErst differencesproduces,in either case, a covariance stationary sequence,
and
bo isrecommended
econometricpractice.We
show
that the Bret statementisincorrectand
that thesecond can bemisleading.1.
Introduction.
A
number
ofrecentpapers haverecommended
taking first-differencesofobserved time series prior to econo-metric analysis (see forexample
Campbell and
Mankiw
(1988)and
others).The
reasoning is as follows. Ifa series is truly difference-stationary, then
removing
a lineartime-trend produces spurious cyclicality in theresiduals.
Under
thesame
condition, taking first differences produces a series that is covariance stationary,and
so is convenient for econometric analysis. If, on the other hand, the series is truly trend-stationary,taking first-differences nevertheless produces a covariance stationary series, albeit one with a zero in the spectral density at frequency zero. This isstill satisfactory
however
(the reasoning goes), as the unit root inthe
moving
average partproduced by
over-differencingwillmanifest in the finalestimates. Thus, ifone is toremain
agnostic as to the cyclicality ofthe observed time series, therecommended
practice is alwaysto takefirst-differences prior toeconometricanalysis.
Put
another way,therecommended
practice is not to detrendfor detrendingleads tospurious "cyclical* behaviorintheresiduals. This view has
become
quitewidelyheldby
many
macroeconomists. See forexample
Campbell
(1987),Deaton
(1986), Nelson (1987), Nelsonand
Kang
(1981, 1984),Mankiw
and
Shapiro (1985),Romer
(1987),and
Shapiro (1986)among
others.Nelson
and
Kang
(1981)and
Nelsonand
Plosser (1982) haveforcefullyargued that leastsquares2
OLS
estimators ofthe intercept and timetrend coefficientsconvergetotheir probability limits quiterapidly.Thus
there isgood
reason to believe that in this case, the detrended data appropriately reveal the trueunderlying
dynamics
about trend. In addition, Durlaufand
Phillips (1986) haveshown
thatwhen
the dataare difference-stationary, the
OLS
estimatorfor the time trend coefficient, while converging in probability,does so
much
more
slowlyrelative to that in the trend-stationary case. Further, theOLS
estimator for the intercept in this case actually diverges.Thus
is is not surprising that the "spurious cyclically" view is so prevalent.i
This
paper
demonstrates that this reasoning is fundamentally incorrect. First,we
demonstrate
thatremoving
a linear time trend actually preserves the true stochastic characteristics of the data.We
show
that data detrended
by
leastsquares regressionasymptotically provide the correct picture of the underlying dynamics,independent
ofwhether
the data are truJy trend- or difference-stationary.More
precisely,we
establish
two
results here: First, ifthe dataare trend-stationary, the covariogram estimator using detrended datais asymptoticallyindistinguishablefrom
that using the true unobserved fluctuationsabout
trend.Con-sequently, the
same
is true of the correlogram estimator in this case. Second,suppose
instead the data are difference-stationary,and
a researcher detrends the databy
least squares.We
show
then that thecorrelo-gram
estimator at each lag converges in probability to 1,which
correctly indicates the presence of a unit root. In other words, the researcher will appropriately conclude that the residuals are a unit root process,and
arenot cyclicalabout
trend.These
statementshowever
are asymptotic; itmay
still be the case that in finite samples,measures
ofdynamic
correlationsubsequent toleast squares detrending are severely misleading.To
analyze this,we
takeas starting point the findings of Kelson
and
Kang
(1981) on the finitesample
biasdue
to using detrendedresiduals
when
thedataare truly difference stationary.We
show
thatusingdetrendedresiduals, the exactbiasinthe covariogram estimatoris ofapproximately the
same
magnitude, regardless ofwhether
the underlying process is trend-stationary or difference-stationary.With
finite samples,any
reasonable transformation ofthe estimated covariogram, such as theestimatorforthe spectral density, will
have approximately
thesame
bias properties, againregardless ofwhether
the data are truly trend- or difference-stationary.Thus
the bias3
detrend trend-stationary data.
We
conclude that emphasizing biasdue todetrending a difference stationary processismisguided: there isasignificantfinitesample bias associatedwithleast squaresdetrending, period. Clearly,ifthe dataare difference-stationary, it isa misspecification toestimatea time trend,and
thentointerpret the residuals as being close to a stationary process.
We
agree that estimating a correctly specifiedmodel
isbetterthan estimating an incorrectlyspecifiedmodel. However, unlessitsproponents wish toextendthe case that least squares detrending produces misleading resultsto
when
the data are trend-stationary as well,we
do
not find convincing theargument
thatdetrending produces spurious cyclically.Next,
we show
that taking first-differences is treacherous for subsequent econometric analysis, shouldthe true data generating process actually betrend-stationary.
We
illustrate thisfor three different statisticalprocedures: estimating the
mean
ofsuch a transformed process, estimating themoving
average coefficient,and
finally, estimating causality patternswhen
one ofthe variablesis suchan over-differenced process.In sum, our analysis almost exactly overturns the conclusions indicated in the introductory paragraph.
We
contend with the suggestion thatmacroeconomists
should alwaysuse first-differenced series so asnot topre-judge the cyclicallyin the data:
we
strongly disagree with this view.The
remainder
of the paper is organized as follows. Section 2 considers the effects ofremoving
a fixedbut arbitrary linear time trend
from
a difference-stationary process.The
conclusion here is obvious: the deviationsfrom any
fixed linear time trend remain difference-stationary. Section 3 considers the effects of first-differencinga trend-stationary process.The
resultshere arelessbenign: thisintroduction ofa stochastic singularity manifests in the spectral densityvanishing at frequency zero. This isshown
to cause problemsfor a
number
of inferential questions of interest.In Sections 2
and
3,we
treat the artificial but usefully intuitive casewhere by
detrendingwe mean
removal
ofa fixedalthough completelyarbitrarylineartime trend. Section 4 considers the relevant practical situationwhen
detrendingisperformed by
leastsquaresregression.We
show
here that ourearlier conclusions are unmodified: regardless ofwhether
the true data generatingprocess is trend- or difference-stationary, the correlogram estimators using least squares detrended dataare consistent asymptotically,and
are "similarly''4
2.
Difference
Stationary
Data
Generating
Process.
Suppose
that the underlying data generatingmechanism
isdifference-stationary:(i)
Y
t=
fa+
Yt-i+
«t,«>1,
(ii) Yo a given
random
variable,and
(iii) u< covariance stationary with
mean
zeroand
spectral densitybounded
away
from
zero.Iterating
on
(i) ,we
have
that:3=1
A
lineartime trend specification isa pair ofrealnumbers
(qj,0x).Removing
this lineartime trendfrom
Y
tresults in:
X
t= Y
t—
<*i—
0\ •tt
«*.
X
t=
[Y
-
a,)+
(Po-Pi)-t
+ Yl
«i-]=i
The
"detrended residuals"X
t have the integratedform
of a difference-stationary sequence. In particular,X-t can be written as:
(a.)
X
t=
{0o~
0i)+
*t-i
+
«t, *>
1,(i.)
Xq
a givenrandom
variable, Yo—
Qi,(c.) the
same
as (iii) above.Thus, the detrendedresiduals are anotherdifference-stationarysequence,
where
the first-differencesequenceXt
—
Xt-i
has exactly thesame
probability properties asthefirst-difference ofthe originalsequence,Y
t—
Yt-i(except possibly in
mean).
Therefore, unless one supposes that the original series already has a tendency to
show
"spurious cycli-cally,'' there is absolutelyno
reason todraw
that conclusion forthe detrended series.We
returnin Section 4below
to this observation that detrending does not alter thedynamic
properties ofthe data.5
S.
Trend
Stationary
Data
Generating
Process.
Suppose
now
that the true data generatingmechanism
is trend-stationary:(i)
Y
t=
a
+
p
-t+u
u
(ii) u
t covariance stationary with
mean
zeroand
spectral densitybounded
away
from
zero.As
before, a linear time trend is a pair of realnumbers
(ai,5i).Removing
a time trend thereforeproduces:
X\
=Y
t-
ai-h-t
=>
X]
=
(oo-
01)+
(ft,-fa)-t
+
u
t.The
resulting process A',1 isseen to betrend-stationary,with exactly thesame
dynamic
stochastic propertiesas in the original data
Y
(in u). In the special case,where
qjand
ft are equal toqo and
ft), tn« result is covariance stationarity,which
is simply a special case of trend-stationarity.Comparing
this with theconclusion
from
Section 2,we
note:Removing
a fixed but arbitrary linear time trend preserves the true properties of the data, reg-ardJesso/wietiiertJjeunderlyingmodel
isdifferencestationary or trendstationary.Hence
linear de-trending actually has the very desirable property that it does not distort the true featuresofthe data, contrary to
numerous
assertions that the opposite is true.Next
consider taking first differences of trend-stationary data:A
t2
=
r
1
-r
t_1=
^o+
(u1-u
1_1 ).Since
u
t is covariance stationary,X?
is also covariance stationary, although it has a zero in its spectral density at frequency zero. This last characteristic has been notedby
macroeconomists, but its significancehas always been relegated tothe statement that "it will
show up
asa unit root in themoving
averagepart."We
now
pointout that instead its presence rendersX
7 treacherous to analyze econometrically.Inference
about
X?
requires that as a stochastic process,X
2 produces statistics thatadmit
nonde-generate asymptotic distributions.Then
the researcher can be confident that a givensample
is actuallyinformativeon a hypothesis ofinterest.
We
demonstrate thatX
7 beingan
over-differenced sequence in factdoes not have desirable properties. First,
we show
that in the leading case of expectation estimation, such6
In particular, it produces a statistic that is
tantamount
to using only the firstand
last data points ines-timation. Second,
we show
that the nonlinear least squares estimator for themoving
average coefficient inan over-differenced
X
2 has thesame
asymptotic properties as that for the autoregressive coefficient in thelevels of a difference stationary process. Thus, if
macroeconomists
are using first-differenced data because theydo
not wish tousethenonstandard
distribution theoryassociatedwith unitrootprocesses, they shouldrecognize that they are faced with precisely the
same
problem
when
they first-difference a process that istruly trend-stationary. Third,
we show
that usingan
over-differenced sequenceA
-2 in a bivariate causalitytest will
produce
spurious evidence of causality,when
in fact the data are actually notGranger
causally related. This last pointmay
befound
inSims
(1972) aswell, although itdoes notappear
tobe
well-known.To
begin, considertheproblem
ofestimating themean
ofX
7. For asample
ofsize n, thesample
mean
is:
n"
1J2
*?
=
Po+
n~
l(u„-
u
).t=i
Notice that only
two
(covariance stationary)random
variables u„and
u
enter the calculation of thesample
mean
statistic.Thus
thesample
mean
converges to po, but at a rate that does not allow anondegenerate
asymptoticdistributionfor(any normalizedversionof) thestatistic. Inwords, after first-differencinga
trend-stationary sequence, the
sample
mean
and
associated statistics are econometrically useless.Econometric
techniques that rely in part on a centrallimit propertyfor the
sample
mean
will consequently not allowany
validstatisticalinference. Thus,we
seethat takingfirst differences,when
the true data generating processisactuallytrendstationary; will
produce
datathatmay
not beparticularlyinformativeforstatistical inference.For
our second point, suppose for simplicity that /? is 0,and
that uq is 0. Consider estimating themodel:
X?
=
u
t --yout-i,r=l,
2,...«o
=
0, 7o=
I-For
parameter
7, define the residual function:Rod)
=
o,Rth)
=
X?
+
7^-1(7),
t-1,2,....the problem:
min^i
JZt(7
) 2.
i <-* 2
To
obtain theasymptoticpropertiesofsuchanestimator, itisconvenienttostudy theasymptoticdistribution of the score. For the t—
th term, the score is st{i)=
Rt{l)'
(& R*[~l)Id7).
But from
above, therandom
sequence
8R
t(^)/d'y at the true parameter value 70=
1 is itself a unit root process, with first differenceequaltoU(_i.
Then
itfollowsfromtheinitial conditiondRo(l)/di
=
0,that the score at the trueparameter
fo behavesas: i
*i(l)
=
ux0=
««(l)
=
u
t-^u
yt>2.
y=i
Notice that the score is the product ofa covariance stationary sequence
u
t with its accumulation 5H/»i UJ-Clearly, the resulting processwill not have the usual central limit properties; thus neither will the nonlinear
least squares estimator of the coefficient. Moreover, the hessian itself can be seen to have
nonstandard
asymptotic properties. In fact, the score here has exactly the
same
form
as the score in least squares estimation of the auto-regressive coefficient in a difference-stationary process.1Thus
(the score for) the
nonlinear leastsquares estimatorofthe
moving
averagecoefficient in an "over-differenced process" converges not to anormal
random
variable, but instead to now-familiar functional ofBrownian
motion. If the datawere
truly difference stationaryto begin, thenofcourse the usual asymptotic normality theorywould
apply.The
distribution theory for the non-standard (first-differenced) case isnow
known,
so it is not that aprocedure taking this into account is completely intractable.
(We
have not seen thatany
economist usingfirst-differenced data has actually used this however.)
Our
point instead is that the distribution theory thatapplies to first-differenced data is different across trend stationary
and
difference stationary models. Thus,the unifying
framework
that authors such asCampbell and
Mankiw
(1988)seem
to suggest for the use offirst-differenced data is simply non-existent.
Finally,
we
turn to the use of over-differenced processes inGranger
causality testswhen
the true laglengthis
unknown
and
instead ismade
a functionofsample size. This last conditionmay
seem
unusual, but 1This is related to a familiar result
from
Lagrange
Multiplier theory for standard models: for instance,Godfrey
(1978)shows
thatLagrange
Multiplier tests against stationary autoregressiveand moving
averagealternatives are identical
when
the null is white noise; theLagrange
Multiplier of course is just the score.That
literature ishowever
not particularly concerned with unit root models.8
the reader should
remember
that applied researchers oftendo
have to decide on a lag-length specification intime
domain
work.They
do
this basedon observed data (and henceitssample
size),and
ofcourse the "true"lag length isnever
known
forcertain. Considertwo
random
sequencesZ
and Y,
inwhose
causality relationswe
are interested.We
will use the notationX
2 toindicate avariable that istheoutcome
offirst-differencingY,
which
is already stationary: thusX
2 has a zero in its spectral density at frequency zero.Suppose
thatZ
is a covariance stationary process with spectral densitybounded
away
from
zero.From
Sims'sTheorem
2, the causality relations
between
Z
and
Y
can be studiedby
considering the two-sided projection of onevariable on current, lagged
and
future values of the other. Forsimplicityassume
thatY
isjust white noise,and
thatZ
and
Y
are in fact uncorrected at all leadsand
lags.Thus
neitherZ
norY
Granger
cause the other: the two-sided projections are identically leroand
thus are trivially one-sided.A
researcher usingZ
and
Y
will necessarily discoverthis for sufficiently largesample
sizes.However
consider instead the two-sidedprojection ofZ
on
X
2: OO
Z
t=
Y,
H3)X?-i
+
Vu
EvtX
2 .,-=
0, for allj. y=-ooThe
true projection coefficients b {j) again are zero. Equivalently, the true (optimal)R
2 is zero,and
leastsquares estimation will
attempt
to achieve this. Letn
denotesample
size,and
call the sequence of fittedprojections bn(j), j
= —
oo,.. .,oo,
n
>
1. CallSx{v)
the spectral density ofX
2,and
let b denote thefourier transform of a (square
summable)
sequence b.The
difference inmean
squared errorbetween
fittedprojections (implied by) bn
and
the true best fit (implied by) bo is givenby
Sims'sapproximation
errorformula as:
\d
x
[bn,b )}2
=
±-[
\ln[u)-
toM
S
x
{w)du>Z7T J^T
=
E
£
bn{j)X
2_
3.Now
consider the following family of candidate fitted projection coefficients: forn
>
1, bn(j]=
n~
\n—
j\for |y|
=
0,1,...,n
and
otherwise. For each n, this is a triangularshaped
two-sided lag distributionsymmetrir.
about
0: the distribution takes the value 1 atj
=
0,and
then declines as a straight line todistributions implies a deterioration in
mean
squared error given by:[d
x
{ba,bo)r=
E
=
E
H
K{j)[ut-j-
Vt-i-l]ij=-oo
bn
{-n)u
t+n+
^2
|fc„(y)-&„(/-
l)]ut_y-fc„(n)u
(_n_1j=-n+l
=
Var(u)£
(Mj)
"Mi-
I))2y=-n+i
=
2n
_1Var(u)-Oasn-oo.
As Bample
size increasestoinfinity, such afamily oflag distributionsfits arbitrarily wellin the senseofmean
squared error.However,
allmembers
of this family are two-sided: thusGranger
causality statisticsproduced
atany sample
sizewould
lead the researcher to conclude incorrectly thatZ
Granger
causesX
2.This rather surprisingresult can be understood in terms ofordinary least squares theory.
The
spectral density of the righthand
side variable in a distributed lag regression corresponds to theX'
X
matrix inordinary least squares models. If the spectral density is zero at
some
frequency, that is equivalent tosingularityin the
X'X.
This can be interpreted tomean
that the population regression isnot identified. In the case ofinterest here, this is exactlywhat
happens.The
lag distribution is not uniquely identified:more
than onedistributionwillfit thedataequallywell. Using first-differenced datatherefore renders particularly subtle the interpretation of
Granger
causality statistics.We
emphasize
that this is fundamentally differentfrom
the usual prefiltering effects. It iseasy toshow
that with covariance stationary data, prefilteringby
arbitrary one-sided filterswhose
fourier transforms arebounded
away
from
zero leaves unaltered patterns ofGranger
causality.The
result here is that prefiltering one of the seriesby
first-differencing does affectGranger
causality relations.Thus, contrary to the sanguine conclusion that over-differencing a trend-stationary process will simply
"show
up
as a unit root in themoving
average part,"we
conclude that thismay
insteadproduce
altogether unreliable econometricresults. Itistruethatin eithercase ofdifferencestationarity ortrendstationarity, the resulting process is always covariancestationary.However,
the resulting process is not necessarilyamenable
to econometricanalysis
when
the datawere trendstationary to begin.We
have presented threeexamples
thatseem
to us especially convincingofthe difficulties associatedwith using "over-differenced" data.At
thesame
10
time, these
examples
are ofparticularinteresttomacroeconomists.The
firstand
thirdexamples
(estimating themean
and
testing forGranger
causality) are related in that they are bothdue
directly to the spectraldensity of
an
over-differenced process vanishing at frequency zero. Information on such a process does notaccumulate
as the observedsample
size increases.Our
secondexample
cautions that first-differencing toproduce
stationarity,no
matterwhether
the originaldata are difference-or trend-stationary, isby no
means
a
panacea
for the econometric difficulties confronting researcherswhen
they analyze trending time Beries.In particular, if
they
wish toavoid the nonstandard distribution theory associated with analyzing differencestationary processes in levels, they should realize that they have simply re-created those difficulties
when
they take first-differences ofa trend-stationary sequence.
4.
The
Effectsof
Least
Squares Detrending.
In the previous
two
sections,we
used the convenient fiction that detrending involvedremoving
a fixedalthough arbitrary time trend.
We
now
show
that appropriate versions of the reasoningabove
apply evenwhen
the detrending line is estimatedby
regression.First,
we
establish that if the data have a unit root, the correlogram estimatedfrom
detrended data converges pointwise at each lag to the correct value of 1. Thismay
initiallyseem
surprising:when
the data generatingprocess has a unit root, theOLS
estimator for the interceptisknown
to diverge (see for instanceDurlauf and
Phillips (1986)).Thus
onemight
conjecture that the detrended residualshave
no
desirableproperties.
However,
recall that theDurbin-Watson
statisticin that case neverthelessconvergesin probabilitytoitscorrect valueofzero (again, see
Durlauf and
Phillips (1986)).On
further thought, thishappens
because theDurbin-
Watson
statistic involves the first difference ofthe detrended data.Thus
the ill-behaved estimatorforthe intercept is simply subtracted out,
and
itslack ofconvergence in probability is inconsequential.Next, the estimator for the time trend coefficient only convergesat rate
yn,
whereas
it gets multipliedby
a variable (time)growing
as n, so that the errorfrom
using estimated rather than true residuals grows with thesample
size.However,
again recall that theDurbin-Watson
statistic involves the first difference of11
isjust constant. This feature controls the rate at which the estimation error grows,
and
in particular, that error actually converges at rate y/n in probability.But
now
we
note that thisreasoningapplies not only atthe first lag, as forthe Durbin-Watson
statistic,butin factappliesatevery fixedlag.
Thus
evenin thedifferencestationary caseit isasymptotically irrelevant thatwe
use estimated rather than true residuals in estimating the correlogram.Forthe trend stationary case, the estimators for the intercept
and
time trend coefficients converge inprobability at rates
n
1/ 2and n
3/2 respectively.Thus
in this case, it is straightforward to establish that the correlogram estimator using estimated residuals converges in probability to the correlogram ofthe true residuals.From
the discussion above, and the formal results below,we
are able to establish that thisstatement extends to difference stationary data as well.
Note
thatby
contrast with trend-stationary data,in the difference-stationary case, theestimatorfor the intercept diverges at rate
n
',
and
the estimator forthe time trend coefficient converges only at the slower rate n1'2
.
Second,
we
turn to behavior in finite samples. Nelsonand
Kang
(1981)have
argued that the use of least squares detrended data,when
in fact the data are difference stationary, results in estimators for thecovariogram
and
the spectral density that are biased towards showing cyclically in finite samples.We
interpret this as saying that incorrect econometric specification willlead to an incorrect conclusion. In our
view, this
argument
carries weight only ifcorrect specification in this context does not lead to thatsame
erroneous conclusion.
Suppose
on the other hand, that the true data generatingmechanism
is stationaryabout
trend.Then,
the "best practice" procedure is to detrendby
least squares: it is irrelevant both asymptoticallyand
in finite sampleswhether
one estimates the residual serial correlation simultaneously with the trend line, or with a two-step procedure by first detrending. In this situation,would
a researcher following "best practice" similarly discoverspurious cyclically?The
answer
isyes.If the true data generating
mechanism
were white noiseabout
deterministic trend, the fitted residualswill necessarily be serially correlated. This follows directly
from
the properties ofBLUS
residuals (see forexample
introductorytextbooks such as Theil (1971)).We
show
below
that for every finite sample, atany
fixed lag, the exact bias in the covariogram estimator is "continuous" in the serial persistence of the data generating
mechanism,
inaneighborhood
containing unit root processes. Looselyspeaking, in finitesamples,12
there iscertainly anonzerobias in theestimated
dynamics
ofan inappropriately detrendedunitroot process;however, this bias is only of the
same
order ofmagnitude
as the biaswhen
trend stationary processes are correctly detrended.One
possibleconclusionfrom
thisisthat themessage
duetoNelsonand
Kang
(1981) applies to persistenttrend-stationary
models
as wellas to difference-stationarymodels: researchers willjust always find spuriouscyclicality, regardless of the true model.
We
thinkthisisalittleextreme: ourpreferred interpretation insteadis that finite
sample
theory does not EUggest that detrending isabad
thing todo. Putting thistogether withourresults fortheasymptotic behaviorofthedetrended residuals,
we
conclude that thereisno
evidence thatdetrending produces misleading results
when
the truemodel
has a unit root.Of
course using the correcteconometric specification is always better than using an incorrect specification, but
we
do
not think that isthe issue here.
Some
economists have suggested to us that "ifyou
just look at the picture of fitting a trend line to a unit root process, you'll see clearlywhy
there is spurious cyclicality; the end-points ofboth the trendand
the unit root process aremade
tocome
close to each other.Thus
the detrended data aremade
to lookcyclical.''
While
there is certainly something tothis graphical intuition, its effectsshow
up nowhere
in ourcalculations below. Consequently,
we
believe this intuition to be incorrect,and
we
do
not find these "look at the picture"-typearguments
persuasive.To
begin the formal analysis,we
impose
some
standard assumptions on the heterogeneityand
serialdependence
permitted in our data. Let{u^}^
be the disturbance identified above,and
define the partialsum
sequence: So=
,S
t=
2J,-=1u
j- ^'eimpose
the following conditions.Assumption
4.1(Regularity):
Let u satisfy:(a.)
£u
t=
for all J ;(b.)
Eu\
=
al
>
forall t;(c.)
sup
t£'|ut| 2+*<
oo forsome
6>
; (d.) a%=
lim^oo
E
(n~
1 S%) exists,<
c%<
00 ;13
(e.) {u,}^.j is strong mixing, with mixing coefficients
a
m
such that:oo
l-2/«
£«*
<
oo.m=l
We
use these conditions as they areby
now
relatively familiar in the literature: they are not the weakestconditions possible to obtain our results.
The
reader is referred to Phillips (1987)and
Phillipsand
Perron(1988) for further discussion of these conditions. Finite order covariance stationary
ARMA
process with Gaussian disturbances (where themoving
average part does not have a unit root) can beshown
to satisfythese assumptions.
We
will use the following result repeatedly in the subsequent discussion:Lemma
4.2(Asymptotic
Distributions):Assume
the conditionsofAssumption
4.1.Then
asn
—
oo:(a-)
n-
1/2X:
t ,LiUt^«ToA'(0,l);
(b.)n-^E.l^^ac/^'Mdr;
(e.)n-
3 /=E
t n =i*«t =* *o(W(l)
-
H
W(r)
dr);where
A'(0,1) is the standard normal,W
is standardBrownian
motion,and
=> denotesweak
convergence.Our
asymptotic results are expressed in the following:Theorem
4.8(Consistency):
Let{Y
t,t=
l,2,...,n} be an observed sample; let Ut(n) be the fittedresiduals
from
anOLS
regression on a constantand
time trend.(a..) IfY-t is trend-stationary; then for each fixed lagj, as
n
—
» oc:1 - 1 "
—
> uj(n)u-_,-(Ti) )u
t(n)ut-j[n)—
in probability.n
-—
' Jn
'—
' t=j+i t=y+i(b.) IfYt is difference stationary, then foreach fixed lagj, asn
—
> oo:E,"=3T
i"'(")"'-;(")in-
u k-r*v^t:
—
r~)—
r^ 1—
m
probability.Et=iUtL"
14
Thus
in either case, detrending does not affect convergence in probability of the correlogram estimator tothe correct values.
While
this result is reassuring,and
initially surprising for the reasons articulated above,it is nevertheless still
an
asymptotic statement. Itmay
be that in finite samples, detrending a unit rootprocess is severely misleading.
We
therefore establish a continuity proposition: Ifthe data are trend-stationary, with the truedistur-bances
about
trend highly correlated, then the finitesample
bias in using detrended data is of thesame
order of
magnitude
as the finitesample
biaswhen
the data are in fact difference stationary.We
redefine themodel
somewhat
for convenience: LetYt
=
Qo
+
A>*+
£«,where
etw
iU either be covariance stationary, or be generated as a unit rootsequence with zero drift.Theorem
4.4 (FiniteSample
Bias): Letu be arandom
variable,and
Jet{u
(}t>1 be uncorreiated withuo
and
satisfyAssumption
4.1; let the disturbance {ft}t>! be generatedby
two alternative
modeb:
(a.) ext
=
AeM
_j+
u
t, t>
1, |A|<
1, £A0=
"o;(b.) e
u
-
ei,t-i+
u
t,t>
1, £10=
u
.Given
asampJe
ofsizen
>
3, Jet ?At("), «it(n
), t=
l,2,...,n, denote the detrended data (fittedresiduals).
For
each j=
0, 1,...,n
—
1, the exact biases in the covariogram estimator are:Bx{j,-n).=
E
(»-j)
l13
ht[n)ix.t-:in)- [n-j)
1]P
eAt(n)e A,t_y(n) t=j+iand
*i(j»
=E
[n-j]
:Yi
eit[n)ei,t-j[n)-
[n-j)
1^
£"(
n
)ei-'-j(n
) t=j+i t=y+iThen
for each£xed n
>
3, foreach fixedj=
0, 1,...,
n
—
1:15
Case
(b.) in theTheorem
specifies a zero drift unit root process;however
since the trueparameter
/5q isarbitrary, our discussion certainly covers unit root processes with nonzero drift.
Theorem
4.4 states the following: the bias that arises from using estimated residuals (detrended byleast squares) is ofthe
same
orderofmagnitude, independentofwhether
the underlying data aredifference-stationary or trend-difference-stationary.
To
summarize
the results of this section, while there are clearly differences in the behavior of leastsquares estimators oftrend coefficientsacross trend-
and
difference-stationary data, the resulting detrended data have similarly revealing properties fortheir trueunderlying dynamics.5.
Conclusion.
The
specific technical contribution of this paper is two-fold: Firstwe
haveshown
that the correlogram estimators usingleastsquares detrended dataareconsistentforthe true valuesofthecorrelogram, regardlessof
whether
the data are actually trend- or difference-stationary. Second,we
haveestablished that the finitesample
bias in "cyclically" that results from detrending difference-stationary data is no worse than that inthe best practice case ofdetrending trend-stationary data.
Our
conclusionsdraw
on both exact as well as asymptotic arguments. This is in keeping with thereasoning that has motivated the erroneous conclusions
we
list in the Introduction.We
observe that quitea
number
ofapplied workers have adopted these incorrect statements in theirown
research.We
emphasizethat:
(i)
removing
arbitrary fixed linear time trends actually preserves the true propertiesofthe timeseries data, regardless ofwhether
the truemodel
is difference stationary or trend stationary,(ii) taking first differences, if the true
model
is in fact trend stationary, produces data that areeconometri-cally useless, and,
(iii) least squares detrending does not disguise the cyclical properties of the data.
In our view, the discussion here overturns the conventional
wisdom
among
many
applied researchers.Our
resultscontradict the observationthat incorrect detrending distorts the statistical propertiesofthedata16
differencing is not a procedure
we
would
recommend
to researchers confronted with trending time series17
References
Campbell, J. Y. (1987):
"Does
Saving Anticipate DecliningLabor
Income?
An
Alternative TestoftliePermanent Income
Hypothesis," Econometrica,November,
55 no. 6, 1249-1274.Campbell, J.Y.
and
N.G.Mankiw
(1988): "AreOutput
Fluctuations Transitory?" Quarterly Journai ofEconomics,
forthcoming.Deaton, A.S. (1986): "Life-Cycle
Models
ofConsumption:
IstheEvidence Consistent with theTheory?"
NBER
Working Paper
No. 1910,Cambridge.
Durlauf, S.N.
and
P.C.B. Phillips (1986): "Trends versusRandom
Walks
inTime
Series Analysis,"Cowles Foundation
DiscussionPaper
no. 788, Yale University.Godfrey, L.G. (1978): "Testing againstGeneralAutoregressive and
Moving
Average
ErrorModels
when
the Regressors includeLagged
Dependent
Variables," Econometrica., 46, 1293-1302.Mankiw, N.G. and
M.D.
Shapiro (1985): "Trends,Random
Walks,and
Tests ofthePermanent Income
Hypothesis," Journal of
Monetary
Economics, September, 16, 165-174.Nelson, C. (1987):
"A
Reappraisal of Recent Tests of thePermanent Income
Hypothesis," Journai ofPolitical
Economy,
95 No. 3, 641-646.Nelson, C.
and
H.Kang
(1981): "Spurious Periodicity in InappropriatelyDetrended
Time
Series,"Econometrica,
May,
49 no.3, 741-751.Nelson, C.
and
H.Kang
(1984): "Pitfalls in theUse
ofTime
asanExplanatory
Variable in Regression,"Journal of Business
and
Economic
Statistics, January, 2, 73-82.Nelson, C.
and
C. Plosser (1982): "Trendsand
Random
Walks
inMacroeconomic
Time
Series," Journal ofMonetary
Economics, 10, 139-162.Phillips, P.C.B. (1987):
"Time
Series Regression withA
Unit Root," Econometrica, 55, 277-301. Phillips, P.C.B.and
P.Perron (1988): "Testing fora UnitRoot
inTime
SeriesRegression," forthcomingBiomet.ri.fca.
Romer,
CD.
(1987):"Changes
in theCyclical BehaviorofIndividual ProductionSeries,"NBER
Work-ing
Paper
No. 2440,Cambridge.
Shapiro,
M.D.
(1986): "Investment, Output,and
the Cost of Capital,"Brooking Papers
onEconomic
Activity, 1, 111-152.Sims, C.A. (1972):
"The
Role ofApproximate
Prior Restrictions in DistributedLag
Estimation," Jour-nal of theAmerican
StatisticalAssociation, 67, 169-175.Theil, H. (1971): Principles ofEconometrics,
New
York:John
Wiley.Appendix.
Proof
of
Lemma
4.2: Part (a..) is the usual centrallimit result, seee.g.White
(1984)Theorem
5.19.The
remainder
is simplyLemma
2.3 in Phillipsand
Perron (1988). Q.E.D.To
prove themain
Theorems,
it is convenient to consideran alternative regression model:Yt
=
0oi+
#02(*—
n
+
1)
+
et= z
te+
£t,where
Zt=
(l t—
^y^). Thisaltersthe originalspecification of #oi to the extentthat #02 isdifferentfrom
zero, but it is the specification that
most
researchers have used, see forexample
Nelsonand
Kang
(1981),Durlauf
and
Phillips (1986), Phillipsand
Perron (1988),and
others. Also, it is extremely convenient,and
the modification is inconsequentialfor studying the regressionresiduals.
The
alternativeassumptions
made
in the textregarding the disturbance e are takento applyto thedisturbancesof this equation,
and
detrended data refers to the fitted residuals of this equation.We
abuse notationand
callprocessescovariancestationaryiftheysatisfyAssumption
4.1. Similarlycallprocesses difference stationary if their first differences satisfy
Assumption
4.1.We
first establish aLemma;
this comprises
known
results, but is easy,and
we
place it herefor completeness.Lemma
A.
Let 6n=
\6n i,6 n 2j denote theOLS
estimatorfor8 .(a.)
When
c is covariancestationary,n
3/2 (en2-
O2) =* 6aW(l)
-
2J
W(r)
dr .(b.)
When
e is difference stationary,n"
l/2 (§nl-
Soi)=>°oJ
W(r)dr,
n
1/2 {en7-
O2 ) => teo 2f
rW[r)
dr-J
W[r)
drProof
ofLemma
A.
By
the usualOLS
formula,L-e
=\Yl
Z' tZ
t JY^
Z'tt Let [Di{n]D
2 (n))=
Rewrite the
above
as:Gn
—
6o=
Di{n)
which
implies that:Di(n)
in
8. (n 1'2n
-3
'2 ) ife is covariancestationary;(n~
3/2n~
hl2 ) ife is difference stationary.^2(n)n
3 -i±D
2{n)n 3 J V / \-n -0;—
i r~\r\
V^ n Ij '1-1 , I }\D
2(n)JZ
=1(t-^)e
t)(a..)
When
e is covariance sta.tiona.ry:D
1(n)J2e
t=n
1^J2e
t^
a M(0,\); t=i !)• Thus:w
E
(*-
^
* -
« _3/2E
*
-r"
1/2X>
=*ff °(*w
-jf
w
n
1/2(Li-8oi) =*a
oJ/(0,l);n
3/2 (*„2-
O2)=
6aW(l)-it
W[r)dr
.(r)dr--W(l]
(b.)
When
e is difference stationary:D
l{n)f\,
=
n-3/ 2
^^.
=
*of
W(r)
dr;t=i t=i Jo
t=i V " / 1=1 z .=i LJo J -
Uo
Thus:
n-
1/2(Li-8m)^voJ
W{r)dr;
n
1/2 (f»a-
<?o2 ) => 6a 2/"
rW'(r)
dr-
/"W[r)
drThisestablishes the
Lemma.
Q.E.D.Part (b.) is
due
to Durlaufand
Phillips (1986);we
have not been able to find a reference for part (a.),although it seems to be relatively well-known.
Proof
of
Theorem
4.S(Consistency):
By
the definition,.;o that: et(n)£,_y(n)
=
<:,€,_.,+
Z
t (in-
8 j (§ n-
S j Z[-
{§„-
6 ) [Z[et-
3--
K-pt]
=
UU-3
-r tr Ip„-
o j \vn-
vojZ
tZ
t-
ivn-
u j [2t£t-j-
^t-/
€ »J(a.)
Suppose
e is covariancestationary. Rewritingthe above,y^£,(n)£
t_y(n)-£%
tew
=
tr (§n-
6 ) [8„-
6 Jn
^n
3But by
Lemma
A, 6nl-
8 01= O
r (n-a/2 ),and
6ni-
6Q7= O
r (n-3/2),
which
imply
that \6 nl-
6=
;1(l). Further,by
Lemma
4.1,£t
et=
O
r(n1/'2
)
and
£2ttet=
0,,(n3/2
), so that the second term on
the left
hand
side is again 0,,{\)+
0,,(l)=
0,,(l).Thus
the righthand
side satisfies:that
— 2
£t(i)?t-y(i)2^£
f£,_y—
» asn
—
» oo.(b.J
Suppose
t is difference stationary. Write:£?=/+!
M")<«-yW
_
^
Er=y+
i «t(»)l?«-y(
n
)
-
?iW]
Without
Jossofgenerality, suppose j>
0.Then:
£,_ y (n)
-
£«(n)=
(et-y
-
£t)+
(Z.-
£t-y) («»-
*o)=
-
£
u(_fc+
(0 j) [e„-
6 j .k=0
Therefore: £t(n) [et.^n)-
tt{n)}=
[ £l-
Z,(X
-
O )] y-i-E
U(-fc k=0=
- JT
m-xtt
+
(§n-
S )'(°) u
+
(§n-6o)'j2
Z'tut-t k=0 ' ^ ' fc=o-z
t (e n-So)
(L-6o)'
rX
which
implies:J2
U[n)
[?t-y(n]-
lt(n)}=
-
^E
U
<-fc£«+
{*»~
*")'Q)
5>
+
£
(*»"
6°)' fc=0 t v " ' t fc=0JVow
applyLemma
4.1and
Lemma
A
repeatedly:y^
u,_fc£t= y^
u,_fc £t-k-i+
22
Ut-')
t t \ 1=0 J
k
=
^
Ut-*«t-fc-lT X]
H
U(_fcUt_i=
Op(tl),('»
-
eo)'(°)
£
£ <=
»(»~
1/2 ) •°»(»
3/2 )=
°p(
n
). fa,-«o)Y
T
Ul-fc ")-
O.fn
1/2 ) •0„(n^)
+ O
p(n-V>)
O
p{n*")
=
0„(n),5582
093
and
finally,(L
-
Bo)'(f\
(§n-
h)'^2Zi
=
O
r{n~112) \0r {n-ll2)0
r{n)+
p{rT
ll*)0
9[n*=
0,,(n).Thus
thenumerator
£?
t(n)(ft_y(n)-?
t(n))=
0,,(n). tTo
complete the proof,we
now
establish the asymptotic probability order of the denominator.By
Lemma
4.1and
Lemma
A, -V2,
M
n)
2
converges weakly to Borne functional of
W.
In particular, the limitingrandom
variable takeson thevalue withprobability 0. Therefore (^j-^2,tt{n)7) is0,,(l).Consequently, as
n
—
» oo:EILy+i
«iW«t-yW
...^
n s 1—
» in probability.E,=i
<rThis establishes the
Theorem.
Q.E.D.As
remarked
in the text, the divergent first entry of 6n-
6 in part (b.) is multipliedby
zero atcrucial points in evaluating (§„
-
6j ( • )
£
t«tand
(§„-
6 J ( . J • (§ n-
6 j£
fZ[. Otherwise, thenumerator would
be Oj,(n7), and given the asymptotic probability order ofthe denominator, the estimatorwould
fail to converge in probability.Proof
of
Theorem
4.4 (FiniteSample
Bias): In eithercase, the cross product is-
z[ (e n-
fl)
u-i
-
z[_i (en-
e ) e t.Thus
in either case the exact bias in the covariogram estimatorisn r f
B{j,n)
=
{n-j)-
1Y, Z
tE
(§n-
O )(k
-
*o) i=i+iK
-(n-j)-
1E
(^[(«»-«o)«t-y]+^-y^[(^»-«o)«t]
t=J+l Considercase (a.j. B_v iterating, e^i=
Al£o
+
T^fc=0A^Ut—^, whicii implies/or aiis,t:[/
,-i \ / <-i^1
Ee^
rtx:-
E
I A't+
2J
^ fc "*-fc I^
£ c -r/J
A'LV
fc=o / V j=o U,_/ r-i t-i=
)' +,£
(2+
^^)
t+'r(vrt-,)
k=01=
Define the function:
def
/(s,t,A)
=
Ee
x ,e xt.Next
consider case (b.).By
iterating,en
—
eo+
12h=o u
t-k, so that for alls, t:»-i t-i
Ee
u
elt=Ec-
c^y2^TE
(u,_fcu
t_,).
g(s,t)
=
Ee
u
e lt.Define the function:
Then
in case fa. J:E[(L-6
)ext]= [JZZ'.ZA
[Tz'.E^.^
=
[j^KZA
(X>.'/M,A)
\t=i v,«=i \«=i
v«=i V«=lt=l v«=i Similarly, in case ^t>.^/
and
Therefore: [en-e
)e lt]=
\Tz'
tzA
\Tz',g{s,t)\
, V«=
l V«=
lt=l V«=
l -1 v«=l v«=
l \»=1Tiie expression /or
5
a is identicalwith g in placeoff.The
difference in bias is then:BxV,n)-Bi[j,n)
=
{n-3V
1E
U[EW
fEE^t(/(-,t,A)-p(«,t)))(f;z;z.)
z;,,.t=y+i [ \«=i /
V»=it=i
/ \»=i /-Z
t(f\Z',zA
[J2Z',(f{s,t-j,\)-g(s,t-j)))
-
z
<-:E
«
)E
z
.'c/(^£
>A)
-
-°(£>0)V-=i j V«=i /
For
eacn fixed s,t, as A—
» 1,we
nave
/(s,i, A)-
g[s,t)—
» 0. Further, Bx{j,n)-
J5i(j,n) is evidentJy continuousin /(s,t,A)—
p(s,t). Therefore,we
seeimmediately
that as A—
• 1,B\
[j,n)—
Bi
[j,n)—
0. TiiisMIT LIBRARIES