Montréal
Février 2001
Série Scientifique
Scientific Series
2001s-11
Autoregression-Based
Estimators for ARFIMA Models
CIRANO
Le CIRANO est un organisme sans but lucratif constitué en vertu de la Loi des compagnies du Québec. Le
financement de son infrastructure et de ses activités de recherche provient des cotisations de ses
organisations-membres, d’une subvention d’infrastructure du ministère de la Recherche, de la Science et de la Technologie, de
même que des subventions et mandats obtenus par ses équipes de recherche.
CIRANO is a private non-profit organization incorporated under the Québec Companies Act. Its infrastructure and
research activities are funded through fees paid by member organizations, an infrastructure grant from the
Ministère de la Recherche, de la Science et de la Technologie, and grants and research mandates obtained by its
research teams.
Les organisations-partenaires / The Partner Organizations
•École des Hautes Études Commerciales
•École Polytechnique
•Université Concordia
•Université de Montréal
•Université du Québec à Montréal
•Université Laval
•Université McGill
•MEQ
•MRST
•Alcan Aluminium Ltée
•AXA Canada
•Banque du Canada
•Banque Laurentienne du Canada
•Banque Nationale du Canada
•Banque Royale du Canada
•Bell Québec
•Bombardier
•Bourse de Montréal
•Développement des ressources humaines Canada (DRHC)
•Fédération des caisses populaires Desjardins de Montréal et de l’Ouest-du-Québec
•Hydro-Québec
•Imasco
•Industrie Canada
•Pratt & Whitney Canada Inc.
•Raymond Chabot Grant Thornton
•Ville de Montréal
© 2001 John W. Galbraith et Victoria Zinde-Walsh. Tous droits réservés. All rights reserved.
Reproduction partielle permise avec citation du document source, incluant la notice ©.
Short sections may be quoted without explicit permission, if full credit, including © notice, is given to the source.
ISSN 1198-8177
Ce document est publié dans l’intention de rendre accessibles les résultats préliminaires
de la recherche effectuée au CIRANO, afin de susciter des échanges et des suggestions.
Les idées et les opinions émises sont sous l’unique responsabilité des auteurs, et ne
représentent pas nécessairement les positions du CIRANO ou de ses partenaires.
This paper presents preliminary research carried out at CIRANO and aims at
encouraging discussion and comment. The observations and viewpoints expressed are the
sole responsibility of the authors. They do not necessarily represent positions of CIRANO
or its partners.
Autoregression-Based Estimators for ARFIMA Models
*
John W. Galbraith
†
, Victoria Zinde-Walsh
‡
Résumé / Abstract
Nous décrivons une méthode d'estimation pour les paramètres des modèles
ARFIMA stationnaires ou non-stationnaires, basée sur l'approximation
auto-régressive. Nous démontrons que la procédure est consistante pour -½ < d < 1, et
dans le cas stationnaire nous donnons une approximation Normale utilisable pour
inférence statistique. La méthode fonctionne bien en échantillon fini, et donne des
résultats comparables pour la plupart des valeurs du paramètre d, stationnaires ou
non. Il y a aussi des indications de robustesse à la mauvaise spécification du
modèle ARFIMA à estimer, et le calcul des estimations est simple.
This paper describes a parameter estimation method for both stationary
and non-stationary ARFIMA (p,d,q) models, based on autoregressive
approximation. We demonstrate consistency of the estimator for -½ < d < 1, and
in the stationary case we provide a Normal approximation to the finite-sample
distribution which can be used for inference. The method provides good
finite-sample performance, comparable with that of ML, and stable performance across
a range of stationary and non-stationary values of the fractional differencing
parameter. In addition, it appears to be relatively robust to mis-specification of
the ARFIMA model to be estimated, and is computationally straightforward.
Mots Clés :
Modèle ARFIMA, autorégression, intégration fractionnelle, mémoire longue
Keywords:
ARFIMA model, autoregression, fractional integration, long memory
JEL: C12, C22
*
Corresponding Author: John Galbraith, Department of Economics, McGill University, 888 Sherbrooke Street
West, Montréal, Qc, Canada H3A 2T7
Tel.: (514) 985-4008 Fax: (514) 985-4039
email: galbraij@cirano.qc.ca
We thank Javier Hidalgo, Aman Ullah and seminar participants at Alberta, Meiji Gakuin, Montréal, Queen's and
York universities, and the 1997 Econometric Society European Meeting, for valuable comments. Thanks also to the
Fonds pour la formation de chercheurs et l'aide à la recherche (Québec) and the Social Sciences and Humanities
Research Council of Canada for financial support. The program for computing GPH estimates was written by Pedro
de Lima; the code for ML estimation of ARFIMA models was based on programs to compute the autocovariances of
ARFIMA models written by Ching-Fan Chung. Nigel Wilkins kindly provided additional code for ML estimation.
†
McGill University and CIRANO
‡
McGill University
Long me mory pro cesses, and in part icular mo dels base d on fractional int egrat ion
as in Grange r and Joyeux (1980) and Hosking ( 1981) , have come t o play an increasing
role in t ime series analy sis as longer t ime se ries have b ec ome available; nancial t ime
series have yielde d an esp ecially large numb er of applicat ions. There is a corresp ondingly
substant ial lit eratureon thee st imationof suchmo dels,b oth in thefreq ue ncy domainand
thetimedomain; t hesecont ribut ionscan b efurthersub-divide d intothosewhichestimat e
afractionalintegration(FI)parameter joint lywitht he st andard A RMA paramet ersof an
ARFIMA mo del,andt hose whichest imatet he FI parame ter alone, leaving any ARMA or
other parameters forp ossible estimat ion in a second stage.
Imp ortant cont ribut ions to f req uency -domain estimat ion of theFI parameter include
those of Gewe ke and Porter-H udak (1983) and Robinson ( 1995) . In the time domain,
Hosking (1981), Li and MacLeod ( 1986) and Haslett and Raerty (1989) sugge st ed e st
i-mat ion st rategies based on rst -stage est imation of the long-me mory parameter alone . A
p otential nite -sample problem arises in processes t hat have a short-memorycomp onent,
which can lead to bias as the short-memory comp onent s proj ect onto the long-memory
parameter in the rst stage; see A giak loglou et al. (1992) f or a discussion of t his bias in
the conte xt of the G eweke and Porter-H udak (hereafter GPH) estimat or.
Sowe ll (1992a) and Tieslau et al. (1996) treat joint estimat ion of frac tional
inte-grat ion and ARMA parameters, in the f orme r case via Max imum Likeliho o d, and in the
latte rvia aminimum-distance estimat orbase don estimat edautocorrelat ions and thet
knowledge of t he non-stationarity) b e transformed to stat ionarity. Mart in and Wilkins
(1999) use the indire ct infe rence estimat or, which uses simulat ion to obt ain the f unct ion
to which distance is minimized.
The prese nt pap er oers an alt ernative estimation met hod for joint est imation of a
setof ARFIMAparameters whichisapplic ablet ob othstationaryandnon-st ationary
pro-cesses. The me tho d isbase d on aut oregressive approx imation, as considered by Galbraith
and Zinde-Walsh (1994, 1997), and has several advant ages in addition t o ease of
com-put ation. First, it o ers stable p erformance across a range of stationary (including `ant
i-p ersistent')andnon-st ationaryvaluesoft helong-memoryparamet erd;thatis0 1 2
<d<1;
andt heref oredo esnotrequirepriork nowledgeoft henon-stat ionarityortransformationto
the stat ionarity re gion. Ingeneral, t he aut oregressive metho d p erforms we ll in t he
nite-samplecase sthatweexamine , yieldingro ot meansquarederrorscomparablewit ht hose of
exact time-domain ML in the case s forwhich that estimat or is applicable . Perhaps most
imp ortant ly, thismetho d app ears tob e relatively robust to mis-sp ecic ation, b eing based
on AR approximations which c an represent quite general pro cesses. We oe r simulat ion
evidence on each of these p oint s.
InSec tion2 we brie yreview e st imation ofA RFIMA mo delsin t he timedomainand
autore gressiveapproximat ionof A RFIMAprocesses, andpresent est imatorsbased ont his
approx imation. Se ction 3 considers b oth asymptot ic and nite-sample prop erties of t his
2.1 A ssumptions and the autoregre ssive represent ation
Consider an ARFIMA( p;d;q) pro c ess, denedas
P(L)( 10L) d y t =Q (L)" t ; (2:1:1) wherefy t g T t=1
isasetofobservat ionsonthepro cessofint ere standf" t
g T t=1
formsast
ation-ary innovat ionsequence suchthatf ort he0algebra F t0 1 generat edby f" ; t01g; E(" t jF t0 1 )=0a.s., E( " 2 t jF t01 )= 2 >0a.s.,andE(" 4 t )<1:Le tP(L) ;Q( L) b ep
olyno-mialsofdegre espandq;sucht hatP(L)=10 1 L0::: p L p ; Q (L) =1+ 1 L+:::+ q L q :
Assume t hatQ (L) has allro ots outside t heunit circle,so thatthe mov ingaverage part of
the pro ce ss is invert ible; we can t hen write
[Q (L)] 01 P( L)(10L) d y t =" t : (2:1:2)
Wewillassume alsot hat Q( L)and P(L)haveno commonf actors,and thatthero ots
of P( L) are out side t he unit circ le . However, we will consider non-stationarity arising
through values of d great er than 1 2
, and provide b oth analytic and simulation re sults f or
those cases. We treat the pro ce ss as hav ing zero mean, although we will note below the
eectof using anestimat edmean,andwilluseanest imatedmeaninsimulationevaluat ion
of theest imator.
The te rm ( 10L) d may b e ex pandedas (10L) d = 1 X k =0 d k ( 0L) k = 1 X k =0 b k L k ; (2:1:3)
with b 0 = 1; b 1 = 0d; b 2 = 2 d(1 0 d) ; b j = j b j01 (j 01 0 d) ; j 3: If jdj < 2 ; then P 1 k =0 0 d k 1 2
< 1; and (2.1.3) denes a stat ionary process. For d > 0 1 2
; (10L) d
is
invertible , and expression (2.1.2) can theref ore b e used to obtain the co e cient s of the
innite aut oregressive repre sentation of the ARFIMA process y:
(10 1 X i=1 i L i ) y t =" t ; (2:1:4) with i =b i 0 P q j=1 j i0j + P p j=1 j b i0j and P 2 i <1 where jdj< 1 2 : 2.2 Time-domain e st imation
Time-domain e st imators have b een prop osed by, among ot hers, Hosking (1981), Li
and MacLe o d (1986),Sowell ( 1992a) and Tieslau et al. ( 1996) . The lat ter two allow j oint
estimat ion of all ARFIMA parame ters.
Sowe ll(1992a)givesanexactMax imumLikelihoo dalgorithmforst ationaryA RFIMA
mo de ls withdistinctroot s intheA Rp olynomial. A sBaillie(1996)not es, MLis
computa-tionally burde nsome he re, sincesubstantialcalculat ion( includingT 2T matrix inversion)
is req uiredat each iterationof t henumericalopt imization. Be lowwecomparet he prop
er-ties of e xact ML wit h t hose of an e st imator based on the co e cients of an aut oregressive
approx imation,analogoustotheA RMAestimationme tho dsofSaikkonen( 1986)and
Gal-braithandZ inde-Walsh (1997). Toobtaint he est imatorwewillusee st imatorsoft heco
ef-cients of a long autoregre ssion, and theautore gressive expansion of t heA RFIMA(p;d;q)
mo de l, t o obt ainthe estimates of the long memoryparameter d toget her withparamet ers
which characterize the \short me mory" parts of the mode l. B elow we will give a rule of
threeestimatorsf ort heco ecient sofanautore gressiveapproximat ion,onwhichA RFIMA
estimat es c an later b e based. Each of t he t hree has t he same asympt otic limit in t he
sta-tionarity region. 1
Oneoft he thre e, t heOLSestimat or, canb euse dint henon-st ationarity
regionaswell;forthatreasonOLSwouldb eusedin applications,wherest ationarityisnot
normallyknownapriorit oapply. Howeverot herestimators,inpart iculartheYule-Walker,
are conve nient f or obt aining theoretical prop ert ies in the st ationarity region.
The OLS estimator ~a OLS solves ~ a O L S = argmin 1 T T X t=k +1 (y t 0 1 y t01 01110 k y t0k ) 2 : (2:2:1)
The sec ond estimat or is t he sp ec tral e st imator, asympt ot ic prop erties of which were
examined by Yaj ima ( 1992) forGaussianerrors. We denoteit a~ sp
; it follows from Yajima
(1992) that, in the present notation,
~ a sp = argmin 1 T y 2 1 +(y 2 0 1 y 1 ) 2 +111+(y k 0 1 y k 0 1 01110 k 01 y 1 ) 2 + T X t=k +1 (y t 0 1 y t01 01110 k y t0k ) 2 : (2:2:2)
The third is the Yule-Walker estimat or. Dene j =E( y t y t0j ) and ^ j = 1 T0k P T t=k +1 y t y t0j
; 6(k) as the k 2k matrix with f6( k ) g ij = ji0 jj ; and ^ 6( k ) as the
mat rix with elements f ^ 6( k ) g
i j = ^
ji0jj
: The Yule-Walker estimator is t he n a~ YW
; which
1
Each of course dep e nds on the A R order paramet er, k; but to simplify notat ion t his dep endence will not b e indicated e xplicitly.
^ 6(k ) ~a YW = ^ ( k ) ; (2:2:3) where (k )^ =(^ 1 ; ^ 2 ;:::^ k
): Conside r also the p opulat ion analogue of (2.2.3),
6(k)a (k ) = (k); (2:2:4)
where a( k ) is t he solut ion to t he system.
Thet ermsint he rst-orderc onditionsthata~ OLS
and ~a sp
solvedierf rom( 2.2.3),f or
~ a
YW
;byt ermsoforderinprobabilityatmostO p
(kT 01
) ;itfollowsthatalloft heestimators
ab ove die r by at most O p (k 2 T 01 ): 2
Therefore each one has the same asymptotic limit
a( k ) ; and t he choice of k as O (lnT) implies t hat all of t he est imators have t he same
asymptot ic distributions. We will therefore use the not ation ~a to denot e any one of these
estimat ors;no distinctionamongt hemne edb e madein consideringasy mptotic prop erties
ofARFIMAparamete restimat orsbasedonsuchautoregre ssiveest imates,for0 1 2 <d < 1 2 .
Belowwe use t heYule -Walkerest imator inderivat ionsfort he st ationaryc ase;weuseOLS
for derivations in non-st at ionary case s and in t he Mont e Carlo exp eriments rep ort ed in
subsection 3.3.
The values a( k ) of (2.2.4) are related t o
t he co ecient s of t he innite aut oregressive
representat ion of t he stationary A RFIMA pro cess. If we denote by [1;1)
the vect or of
co e cient s of the innit e autoregre ssion (2.1.4),t he n that vector solves
6(1) [ 1;1)
= (1) : (2:2:5)
2
The only non-standard case here is t hat of 0 1 2
< d < 0; where the eige nvalues of 6 01
could grow at a rat e as high as k 02d
Partit ion [1;1) as ( [1;k ] : [k +1;1)
) ; and denot e t he t op-right sub-mat rix of 6 1 ; part i-tioned conformably, by 6 [ k +1;1) : Then 6(k) [ 1;k ] +6 [k +1;1) [k +1;1) = (k); and therefore a( k )0 [1;k ] =(6( k )) 01 6 [ k +1;1) [k +1;1) : (2:2:6)
Thus anautoregressiveestimatorof [ 1;k ]
will includeadet erministicbias represe nted
by ( 2.2.6). We demonstrat e b elow that this bias goes to zero as k;T !1:
2.3 A RFIMA paramete r est imation
Nowthatwehaveintro ducedest imatorsofaut oregressiveco e cient sfort heA RFIMA
pro c ess, wecanpro c eedtodene est imatorsfort he fullset of ARFIMA paramet ers based
on any of the ab ove est imators. We dene the vect or of all t he paramet ers of A RFIMA
to b e estimat ed by ! = (d; 0 ; 0 ) ; where 0 = ( 1 ;:::; p ); 0 = ( 1 ;:::; q ): Let ~ b e
any of t he autoregressive c o ec ie nt estimat ors intro duc ed ab ove, and let (!) denot e the
vect orcont ainingt hecoec ie ntsoftheinniteaut oregressiverepresentat ionoftheprocess,
viewed as f unct ions of !.
We will examine a minimum-distance estimat or of t he form
~ ! = argmin ( ~a0(!)) 0 (~a0(!) ); (2:2:7)
const ructe dusing any of the estimat ors ~a of ( 2.2.1-2.2.3) , where(!)is givene xplicit ly in
(2.1.4); re presents a k 2k weighting matrix. In all simulations b elow, is chosen as
lags, which contributes t o t he eciency of t he estimator. 3
It isthe fact t hat this estimat or uses aut oregressive parameters,inst ead of auto
corre-lations, that allows it s use for non-stat ionary pro cesses. 4
Note nally t hat estimat es can
b e obt ained for multiple sp ecicationsof an ARFIMA mo del from a single aut oregressive
t t o the data, which is conve nient in model select ion ( informat ion criteria, for example,
can b e compute d based on the residuals fromeach of several mo dels) .
3. Prop er ti es and p erf or mance of the esti mators
In t his se ction we consider asymptotic and nite-sample prop ert ie s of A RFIMA
pa-ramete r estimat es based on (2.2.7), obt ained using any of the various preliminaryaut
ore-gre ssive est imators( OLS, sp e ctral, Yule-Walker) discusse din Sect ion2. We will also not e
the corresp onding prop ertiesof some of the e xist ing ML ,MDE and sp ectral estimators.
Maximum likelihoo d produces asy mptotically Normal est imate s of ARFIMA
param-et ers which converge at the usual T 1 2
rate f or stationary Gaussian processes ( Dahlhaus
1989). The Quasi-MLE has t his prop erty as well for a range of assumpt ions on the e rror
3
If = I, the k 2k identity matrix , we have an unweight ed form of t he est imator. In simulations we found substantial cont ribut ions to nite-sample eciency from the use of = cov( ~a )
01
;and this form alone is used in the result s present ed b elow. 4
It is not eworthy that f or t he pure ARFIMA( 0,d,0) case, an e st imator can b e base d on the rst c o ecient of the approximating A R mo del, since that rst co ecient c onverges to 0d in t his c ase, which corresp onds t o ( 2.1.3). This p oint has a parallel in the use of thesame e st imatorby GalbraithandZ inde-Walsh(1994) f or thepureMA( 1)mode l( and also in the fact, noted by Tie slau et al., that a consistent est imator of d can b e based on
the rst aut o correlation; t hat is, ^ d = ^ 1 (1+^ 1 )
) . The fact that t he same estimat e can serve
for eitherof two dierent mo delsunde rline s, of course, the imp ortance of mode l select ion in determining the character of the estimat ed representat ion.
standardrate toanasy mptoticnormaldistributionf ord2(0 1 2 ; 1 4 );at d= 1 4 , convergence
is to the normal dist ribution at rate ( T log (T) ) 1 2 ; and f or d 2 ( 1 4 ; 1 2 ) convergence is at rat e T ( 1 2 0d)
and t he limiting dist ribution is non-normal. For t he G eweke and Port er-Hudak
estimat or based on least square s estimat es of a regression with de p endent variable given
by theharmonicordinat esofthep eriodogram,andforage neraliz edversionwhichdiscards
anumb erof lowerf requencies,theasy mptoticprop ertiesareobt ainedbyRobinson(1995).
Asympt oticprop ert ies oft heindire ctinference e st imatorforlong memory mo dels(Martin
and Wilkins 1999) have not b een e st ablished.
Before considering prop ert ies of ARFIMA parameter est imate s base d on aut
oregres-sion,weexaminetheestimat es~aoft heaut oregre ssiveparame terst he mselves. Insubsect ion
3.1 we show c onsist enc y of ~ in b ot h st ationary and non-stationary cases, and show that
a N ormal approximat ion to t he asy mptotic distribution can b e used. Consistency and
dist ribut ional results for t he e st imates !~
of the ARFIMA paramet ers are given in 3.2.
Simulation result s describing Finit e sample p erformance app ear in3.3.
3.1 A sy mptotic prop ertiesof estimators ~a
Therstsetofre sultsconce rnsconsistencyof t heautoregressiveparameterest imate s,
from which parametric mo del paramet er est imates are later deduced, in b ot h st at ionary
and non-stat ionary cases. A s note d ab ove we use the Yule-Walke r estimator for ~a in the
stationary case;thesame prop ert iesthenholdforOLS and sp ect ralestimat es. Theorem1
establishes that ~ is a consistent estimator of [ 1;k ]
as T !1; k!1 in t he st ationarity
region; the cases 0< d< 1 2
and 0 1 2
treatment . Theorem 2 establishes t he result, using OLS, in the non-stationary case 2
<
d < 1: For d = 1; the pro ce ss cont ains a unit ro ot ; see for example Phillips (1987) f or
asymptot ic results. For bre vity we omit the case where d = 1 2
: A pro of of consiste ncy can
b e const ruct ed similarly t o that of The orem 2, using t he rates appropriate to the d = 1 2 case. The or em 1. For d 2 (0 1 2 ; 1 2 ) and a( k ) as dene d in (2.2.4), ka (k)0 [ 1;k ] k = O(k jdj0 1 2 )
as k!1: Unde r the assumptions in Section 2.1, for any ;" t he re exist k ; ~ T such that Pr( ka(k)0 [ 1;k ] k>" ) < 8T > ~ T: Proof: Se e A pp e ndix A.
Next considert henonstationary case 1 2
<d<1: Shimot su andPhillips(1999)discuss
two p ossible charac teriz ations of non-stat ionaryI(d) pro ce sses for 1 2 <d< 3 2 :one denes y t
as a partial sum of a stat ionary fractionally int egrated pro cess z t ; so that y t = y 0 + P t i=1 z i
; while the ot her denes y t
via t he expansion of t he function of t he lag op erat or,
(y t 0y 0 ) = P t01 i=0 0 1 i! 1 0(d+i) 0(d) " t0i
:Eachleadstoanexpre ssionoft heform(10L) d (y t 0y 0 )= " t
; Shimot su and Phillips show that t he essent ial dierenc e lies in the fact t hat the rst
denition involves t he presenc e of pre-sample values. Here we treat y t
as a partial sum
andassumet hat y 0
andallpre-samplevaluesarezero es;thesame resultswould hold ifwe
assume d thatmax fy t
:t0g=O p
(1):N ote thatwhileTheore ms 2and 4use t he partial
sum represe ntation, thisestimator do e snot relyon dierenc ed datain the non-st at ionary
case, so t hat a priori knowledge of t he range inwhich d lies is not nec essary.
The or em2. For d 2 ( 1 2
;1); let t he dierenced pro cess z t =y t 0y t01 ; t = 2;:::;T; b e a
stationaryA RFIMApro ce ss sat isfy ing theassumptionsof Sect ion 2.1,withd =10d<0; and let y 0i =0 8i0: Then as T !1;k!1;T 1 2 k !0; we have k~0 [1;k ] k= O p ( k 03d+ 3 2 ) if 0 1 2 <d < 3 4 ; O p ( k 0d ) if d 3 4 : Proof: Se e A pp e ndix B.
Next we charact erize the asy mptotic distribut ions of the e st imators (k)~ of ; in the
stationarity re gion, asT;k !1: Forxedkand0 1 2 <d < 1 4
;t hee st imator~a (k)hasanasymptoticNormaldist ribut ion
withtheusualconvergenc erateofT 1 2
andasy mptot icmeanofa( k )( Yajima1992) . Forthe
co e cient s of the innite autore gression, the diere nce ~a(k)0 [1;k ]
can b e represe nted
as the sum of (~a( k )0a( k ) ) and (a( k )0 [ 1;k ]
) ; the rst of these terms has an asympt otic
Normal dist ribut ion, and the se cond go es to ze ro, by The orem 1, as k!1: The Normal
dist ribut ion can therefore b e used for inf erence in large samples, for0 1 2 < d< 1 4 : For 1 4 < d < 1 2
; t he asymptot ic dist ribut ion of (k)~ is not N ormal, but is related to
the Rose nblat t dist ribut ion; the conve rgence rate is non-st andard (Yajima 1992) , and if
an estimated mean is subtracted the asymptotic dist ribut ion changes. Nonetheless, we
can again represent ~a (k)0 [ 1;k ]
as a sum of two te rms, one of which has an asympt otic
Normal distribution, and t he second of which is a `c orre ction' term which can b e made
arbitrarilysmallinprobability asT;k!1:Thisrepresentationappliestoall0 1 2 <d < 1 2 ;
and is based on Hosking ( 1996) , where asymptot icnormality is est ablished for dierences
of samplecovariances,forallst at ionaryARFIMA processes,under theconditions givenin
dist ribut ion,butratheraNormalapproximat iontothenite-sampledist ribut ionforwhich
the c ovarianc e and closeness t o the true dist ribut ion are governed by t he choice of k f or
sucient ly largeT:The result istheref orenot in con ic twithYajima'sasympt ot ic re sult,
but none theless indicates t hat it is p ossible to conduct approximate inference using the
Normal distribution, ford in this range.
The or em3. Under thecondit ions of Theorem 1, the dierence ~a (k )0 [1;k ]
can b e
repre-sente d as the sum (k )+ (k); where for any p osit ive "; t he re existsk suciently large
that Pr(k(k ) k < ") > 10; and T 1 2
(k) has a limit ing N ormal dist ribution of the form
N( 0;W(k)):
Proof: See App endix C. The asy mptotic covariance matrix W(k ) is given in t he pro of of
Theorem C in A pp endix C.
Asnoted earlier, wesugge st cho osing k =O(ln T) ; a particular rule is givenbelow.
Figure 1( a{d) illust rat est he approximat ion provided by t he Normal by depicting the
simulate d densities of t he rst and second aut oregressive coecients in estimat ed AR
representat ions of A RFIMA(0;d;0) mo dels, d = f0:4;0:5;0:7;0:9g; T = 100 (k = 8):
In each case the Normal approximat ion is very close to the true distribut ion of the AR
co e cient s;small but clearly discernible depart ures fromtheNormalare visible forlarger
values of d; part icularly near t he means of t he dist ribut ions. Note that these simulations
include values of d 0:5; to which Theorem 3 does not apply; nonetheless the Normal
Inordertodiscussconsistencyoft heARFIMAparame terestimat es!~
givenin(2.2.7)
ab ove , we need an additional condit ion.
Condition 3.2. There e xist s a non-stochastic innite-dimensionalmatrix 5 c orre sp onding
to a b ounded norm op erat or on the spac e L 2 such that k05 k k ! 0; whe re 5 k is the
k2k principalsub-mat rix of 5 (convergenc e is in probability if is st o chastic) .
If theestimat or is usedin unweighted form(=I); then all of t he matrices involved
are ident ity mat rices and t his condition is t rivially satised. If t he weight ing matrix is
an inverse covariance matrix (known or estimat ed) of a st ationary and invert ible process,
then 5 c orre sp onds to the Toeplitz form of the inverse pro cess.
Thenext theore mshowst hat!~
isaconsistentestimat orofthetruevectorof
parame-te rs! 0
;andTheorem5showst hatthedi erenceb e tweent hetwocanagainb e represe nted
asasumoftwoterms,onewit hanasympt oticallyN ormaldist ribut ion,andonewhichgo es
to zero in probability as k;T!1: We denot e the q uadrat ic f orm ( ~a0( ! ) ) 0
( ~a0(!) )
by Q T;k
(;~ !):
The or em 4. Under the conditions of Theorems 1 and 2 and condit ion 3.2, and where
T !1; k!1 and T 1 2
k ! 0; t he minimum-dist ance estimator !~
is c onsist ent f or ! 0
;
the t rue paramete r vec tor of a correct ly-sp ec ied A RFIMA mode l. The c orre sp onding
dist ance function min Q T;k
(;~ !) goesto zero in probability.
as Theore m 5 indicat es.
The or em5. U ndert heconditionsof Theorem1,t hedierence !~
0! 0
canb e represe nted
as t he sum ( k )+b(k) whe re foreachk t he correct ion term is
b (k )= @ 0 ( ! 0 ) @! @( ! 0 ) @! 0 01 @ 0 (! 0 ) @! (k);
with(k)as giveninTheorem3,andT 0
1 2
(k ) isasymptot ic allydistribute dasN(0;V( k ) )
as T !1; foreachk:
Proof: Se e A pp e ndix C. The asymptotic covariance matrix V(k) is give n in the pro of.
3.3 Perf ormance in nit esamples
To investigate the p erformance of these estimators in small samples, we generat e
simulate dsample sfromaselect ionof ARFIMA( 0;d;0); (1;d;0)and( 1;d;1)pro cesses,and
comparethere sultsbyparameterrootme ansquarederror( RMSE).MLandaut oregressive
estimat ors, along with GPH in the ( 0;d;0) case s, are compared. In order to examine the
impactofmis-sp ecicationweconsiderseveralcasesof(0;d;0)modelsof(1;d;0)pro cesse s;
in these case s p erf ormance is evaluat ed by t he out-of -sample RMSE of 1-step fore cast s.
In all of the following simulations, t he mean is treate d as unknown and t he
sam-ple mean is removed f rom the pro cess prior to A RFIMA parameter estimation. In the
frequency domain, removal of an e st imated mean is not req uired; the ze ro frequency is
eratelylargesamples,andinfactthattime-domainMLshowslowerMSEinsmallsample s.
Simulate d frac tional noise is obtained by t ransf ormat ion of ( Gaussian) whit e noise using
the Cholesky de comp osition of t he exact covariance of t he ARFIMA process, obt ained
using the result s of Chung ( 1994) ; see also Dieb old and Rudebusch (1989). In each case
estimat ors are compared using the same se ts of starting values (ze ro in (0;d;0) cases; the
b et ter of two alt ernatives is chosen in multiple -parameter cases). Optimization is p
er-formed using the Powell algorithm. Throughout, we use t he rule of thumb that AR lag
length can b e chosen as 8+3ln( T 100
); T 100; rounde d to t he nearest integer. This rule
is approximat ely equivalent to 3ln( T)06; we e xpress it in the former way to emphasize
T =100 as the `base'sample size.
At least 1000 replications are used in each case, more in the (0;d;0) c ases. For
multiple-parameter mo dels we use a sample size of 100, in common wit h much of the
literaturec ontaining resultson exact and approx imate ML; t he comput ational c ost of the
rep eatedinversionof T2T covariancemat ricesasso ciatedwit hMLb ecome sprohibitivef or
simulation atlargesample siz es. Int he (0;d;0)cases werep ortre sults forT =f100;400g:
(i) ARFIMA( 0;d;0)
Web egin by comparing estimat orsin thepurefractionally -int egrat ed case,whichhas
b eenthemostt horoughlyexaminedinthelite raturet odate. Figure2a/bplot stheRMSE's
fromML ,GPHandtheARest imator,onthegridofvaluesfromd= 00:4tod =0:9atan
inte rvalof0.1. 5
BothMLandARshowlowerRMSEthanGPHthroughoutt hest ationarity
5
considered as well. The AR estimator has lower RMSE t han ML at moderate absolut e
values of d; slightly higher RMSE at d = 0:3;0:4; and subst antially highe r at d = 00:4:
Howe ver,MLisconstrainedbytheoptimizationalgorithmtolieind2(00:5;0:5) ;whereas
ARisnotconstrainedinthisway, b eingusableout sidet hisinte rval;MLt he refore b enets
near -0.5 and 0.5 fromb eing const rained t o lie in a region around the true value which is
fairly t ightly b ounded on one side.
Qualitative re sults do not dier great ly b etwe en t he two sample siz es of Figures 2a
and 2b. At T = 400, the RMSE of the A R estimat or is almost complet ely insensitive to
the value of d; except at -0.4 where it rises to almost thevalue produced by GPH.
(ii) ARFIMA( 1;d;0) and A RFIMA(1;d;1)
The rst multiple -parameter cases that we consider are ARFIMA( 1;d;0); a simple
mo de l which allows short-run and long-run comp onent s. In these c ases, one eleme nt of
the estimation e rror arises through the diculty in discriminating these two comp onents
at small sample siz es, sinc e AR and fractional integration paramete rs can imply similar
patterns of autocorrelation for the rst fe w lags. Table 1 gives re sults for a few cases
similar to the A RFIMA(1;d;1)pro cesse s that we will address b elow: d=f00:3;0:3g and
=f0:7;00:2g:
knowledge of t he need for dierencing. The maximum numb er of p erio dogram ordinates is included for GPH , which is optimal for t his case in which there is no short memory comp onent.
RMSE's of ARFIMA( 1,d,0) parame ter estimates T =100; 2000replicat ions ( ML) , 10000 replic at ions ( AR)
d ^ d AR ^ d ML ^ AR ^ ML -0.3 0.7 0.266 0.186 0.287 0.195 -0.3 -0.2 0.110 0.120 0.137 0.139 0.3 0.7 0.179 0.237 0.150 0.116 0.3 -0.2 0.209 0.233 0.217 0.241
Neither estimator dominates in these ex amples; ML is markedly b ett erat (-0.3, 0.7),
AR is b et ter at ( 0.3, -0.2). The two are ve ry similar at (-0.3, 0.2) . For the case ( 0.3,
0.7), p ossibly t he most interesting in combining p ositive d with p ositive short -memory
auto correlat ion, d is b et ter estimatedby theA R estimat or, and by ML.
TheARFIMA( 1;d;1)parameterizationsex aminedinTable2arethoseusedby Chung
and B aillie (1993). N ot e thatin eachc ase t he short -memory parameters and have the
samesign,sot hatthecorresp ondingt erms (10L)and(1+ L) inthelagp olynomialsdo
notapproachc ancellation;at=0:5; =00:5;forexample, cance llat ionwouldt akeplace
and the apparent A RFIMA (1;d;1) pro cess would in fact b e A RFIMA(0;d;0) . In cases of
near-cancellat ion, t he pro cess may b e well approximat ed by a process with subst antially
dierent paramet er values, but similar ro ots of the short -memoryp olynomials, leading to
diculty in evaluating the est imates. Such cases are ruled out by paramete r values such
RMSE's of ARFIMA( 1,d,1) parame ter estimates T =100; 1000 replications (ML), 5000 re plications (A R) (d;;) ^ d AR ^ d ML ^ AR ^ ML ^ AR ^ ML (-0.3, 0.5, 0.2) 0.251 0.163 0.350 0.220 0.213 0.163 (-0.3, 0.2, 0.5) 0.207 0.151 0.295 0.237 0.152 0.154 (-0.3,-0.5,-0.2) 0.339 0.162 0.136 0.139 0.391 0.254 (-0.3,-0.2,-0.5) 0.333 0.179 0.183 0.173 0.291 0.250 ( 0, 0.5, 0.2) 0.285 0.298 0.307 0.235 0.198 0.154 ( 0, 0.2, 0.5) 0.291 0.285 0.332 0.313 0.144 0.139 ( 0,-0.5,-0.2) 0.228 0.187 0.154 0.142 0.328 0.265 ( 0,-0.2,-0.5) 0.243 0.263 0.220 0.202 0.384 0.381 (0.3, 0.5, 0.2) 0.375 0.408 0.285 0.271 0.178 0.152 (0.3, 0.2, 0.5) 0.375 0.390 0.378 0.381 0.132 0.124 (0.3,-0.5,-0.2) 0.231 0.209 0.156 0.149 0.324 0.285 (0.3,-0.2,-0.5) 0.286 0.314 0.238 0.207 0.434 0.430
The RMSE's of ML estimat es are generally some what b ett erthan those rep orted by
Chung and Baillie for the approx imate CSS estimat or. The pat tern of relat ive RMSE's
for t he AR and ML estimat ors is broadly similar t o that observed in lower-orde r case s.
Neit he r estimat or dominat es; ML giveslowe r RMSE fordinsevenoft he twelvecases,AR
inve. Four of t hecases inwhichML issup erior arethefour caseswithnegatived; where
ML's p erformance is markedly b ette r. For non-negative values, p erf ormance of the two
for d is similar. In e st imation of t he short-memory paramet ers, ML shows an advantage
throughoutt heparamete rspace,althoughthedierencesinRMSEb etweenthetechniq ues
Inthenal setofsimulat ionsweconsiderMLandARestimatesof din pro cessesthat
are ARFIMA (1;d;1); but whe re the mo del use d is ARFIMA( 0;d;0). These result s serve
to illustrate the point that the A RFIMA mo dels estimators ne ed not b e seen as purely
`paramet ric' estimat ors, int he sense of dep ending crucially on acorrec t paramet erizat ion
of the pro ce ss; this is t rue in part icular of the AR est imator, where the aut oregressive
struct ure which ex tract s st at ist ical information from the data is capable of tt ing a very
generalclassofpro c esses,andforsomepurp osesusefulestimatescanalsob eobtainedfrom
mis-sp ec ied mo de ls. In t his sense t his estimat or may b e thought of as semi-paramet ric:
the long-memory comp onent is capture d via the paramet ric fract ionally-integrated mo del
with paramet er d; and t he short-me mory comp onent via a variable numb er of A R ( or
ARMA) terms, which may b e increased with sample size t o de tect inc reasingly subtle
short-memory fe at ures as sample inf ormat ion accumulat es.
In mis-sp ec ied case s the obvious criterion of evaluation, accuracy of parameter
es-timates, is not applicable; some paramete rs are missing, and in such cases it will
typi-cally b e optimal t o deviate from the `true' values of paramet ers for some purp oses. For
this exercise, we therefore evaluate the mis-sp e cied mo dels by the accuracy ( RMSE)
of 1-st ep out-of-sample f orecasts of the true pro cess. We consider d = f00:3;0:;0:3g and
(;)=f(0:7;0:5);(0:;0:);(0:2;0:2) g;forat ot alofninecases(c asesinwhich(;)=f( 0;0)
are of c ourse correctly sp ecie d,and are inc luded forcomparison). Weagain use T =100
in e ach case , and noise variance is unity. Results are pre sented in Table 3. This small
mo de l form is unk nown.
Table 3
RMS Forecast Errors in A RFIMA(0,d,0) mo dels of ARFIMA( 1,d,1) pro c esses
T =100; 5000 replic ations d A R ML -0.3 0. 0. 1.0026 1.0016 0. 0. 0. 1.0028 1.0036 0.3 0. 0. 1.0103 1.0103 -0.3 0.2 0.2 1.0263 1.0262 0. 0.2 0.2 1.0265 1.0256 0.3 0.2 0.2 1.0280 1.0351 -0.3 0.7 0.5 1.1378 1.1566 0. 0.7 0.5 1.1382 1.2728 0.3 0.7 0.5 1.1381 1.5823
In t he c orrectly-sp ecie d comparison cases, these RMS forecast error result s mirror
the parame ter RMSE results of Figure 2a; ML is sup erior at d =-0.3, AR at d =0, and
the two are very close at d = 0.3. Whe re (;) = (0:2;0:2); ML pro duc es slight ly b e tter
resultsatd= 00:3 and0. H owever, theA Rforec astsaresubstant ially b ett eratd=0:3;a
phenomenonwhichapp earsmorestronglyat( ; ) =( 0:7;0:5):Inallofthelat tercases,AR
estimat espro ducemarkedlyb ett er1-st epforecasts. Notet hatincase ssuchasthesewhere
there issubstantialautocorrelationf rom theshort -memory parameters, the ARestimat or
is able t o comp ensate for the lack of short -memory comp onents in the estimat ed mo del
also b e produced by imp osing a di erenced represent ation; however, t o do so it is again
necessary t o make a dete rmination of an opt imal degre e of dierencing, either a priori or
based on featuresof the dat a, which t he A R estimat or doesnot require .
4. Conc ludi ng remark s
Est imation of long-memory mo dels by autoregressive approximation is f easible even
in quite small samples, and is consistent across a range of stationary and non-st at ionary
values oft he long-me moryparamete r,sot hat estimationdo esnot requireknowledgeof an
appropriate transformation t o apply.
Autore gressive estimat ion, and mo del select ion based on t hese est imate s, is often
convenientincho osingstarting values forte chniquessuchasML;estimationbased ont his
principlehas b een usedforARMA mo delsasf arback asDurbin (1960). While thisis one
p otential use of autoregressive estimates, such estimates seem to have desirable f eatures
b eyond their ability to pro duc e go o d starting values quick ly. Their relat ive insensit ivity
to t he stat ionarity of the process, and ability for a wide class of pro cesses t o prov ide an
arbitrarily go od statistical approximation as sample size and order increase, make t hem
wellsuitedtot het reat mentofprocessesofunknownform. Thismayb eesp e cially valuable
in economic dataf orwhichtheARFIMA class itself willty pically b e only anapproximat e
Agiakloglou, C., P. Newb old and M. Wohar ( 1992) Bias In an Est imator of The Fractional Dierenc e Parameter. Journal of Time Series Analysis 14, 235-246.
Ame miya, T. (1985) A dvanc ed Ec onometric s. Harvard U niversity Press, Cambridge, MA.
Berk, K.N. (1974)Consistent A ut oregressiveSp ec tral Est imates. Annals of Stat istics 2, 489{502.
Cheung, Y.W. and F.X. Dieb old ( 1994) On Max imum-L ikeliho o d Est imation of the Dierenc ingParamet erof Fract ionally-Integrated Noise wit h U nk nownMean. Journal of Economet rics 62, 311-316.
Chung, C.-F. (1994) A not e on calculat ing t he aut o covariances of t he fract ionally inte grated ARMA mo dels. Ec onomics Lett ers 45, 293-297.
Chung,C.-F.andR.T.B aillie(1993)SmallSampleBiasinCondit ionalSumofSquares Estimat ors of Fractionally -Int egrat ed ARMAModels. Empirical Economics 18, 791-806.
Dahlhaus, R.( 1989)Ec ientparamete rest imation forself -similarpro cesses. Annals of St atistics 17, 1749-1766.
Davies, R.B.and D.S. Hart e (1987) Test s forHurst Eect . Biomet rika 74, 95-101. Dieb old,F.X.andG.D.Rudebusch(1989)LongMe moryandPersiste nceinAggregat e Out put . Journal of Monet ary Economics 24, 189-209.
Durbin, J. (1960). The Fit ting of Time Series Mo dels. Review of the Internat ional Statistical Institute 28, 233-243.
Galbrait h, J.W. and V. Z inde-Walsh (1994) A Simple, N on-Iterative Estimat or f or Moving-A verage Mo dels. B iometrika 81, 143-155.
Galbrait h, J.W. and V. Zinde-Walsh (1997) On Some Simple, Aut oregre ssion-based Estimat ion and Identic at ion Techniques forA RMA Mo dels. B iometrika 84,685-696.
Gewe ke , J. and S. Porter-H udak ( 1983) The Estimation and Applicat ion of L ong Memory Time Se riesMo dels. Journalof Time Series Analy sis 4, 221-238.
Giraitis, L. and D. Surgailis ( 1990) A Cent ral L imit Theorem for Quadrat ic Forms in Strongly D ep endent Linear Variables and its Application to Asympt otic Normality of Whittle's Estimate. Probability Theory andRelated Fields 86, 87-104.
Granger,C.W.J.( 1980)L ongMe mory RelationshipsandtheAggre gat ionofDynamic Mo dels. Journal of Econometrics 14, 227-238.
Granger,C.W.J.,andR.Joyeux ( 1980)AnInt ro ductiontoLong-MemoryTimeSeries Mo dels and Frac tional Diere ncing. Journal of Time Serie s Analysis 1, 15-39.
Grenander,U.andG.Szego(1958)To eplitzFormsandTheirApplications. University of Calif ornia Press, B erkeley.
Hosking, J.R.M. (1981)Frac tional Diere ncing. Biomet rika 68, 165-176.
Hosking, J.R.M. ( 1996) A sy mptot ic Distributions of the Sample Mean, A ut o covari-ances, and A ut o correlations of L ong-memory Time Series. Journal of Economet rics 73, 261-284.
American Society of Civil Engineers 116, 770-799.
Janacek, G.J. (1982) Determining the De gree of Dierencing for Time Series v ia the Long Sp ect rum. Journal of Time Se riesA nalysis3, 177-183.
Li, W.K. and A.I. McLeo d (1986) Fract ional Time Series Mo delling. B iomet rika 73, 217-221.
Martin,V.L.andN .PW ilkins(1999)IndirectEstimat ionofARFIMAandVA RFIMA Mo dels. Journal of Econometrics 93, 149-175.
Phillips, P.C.B. (1987) Time series regre ssion with a unit ro ot. Econome trica 55, 277-301.
Priestley,M.B .(1981)Sp ect ralA nalysisandTimeSeries. AcademicPress,N ewYork. Robinson, P.M. ( 1991) Testing for St rong Serial Correlation and Dy namic Het ero-skedasticity in Mult iple Re gression. Journal of Economet rics 47, 67-84.
Robinson, P.M.( 1995) Log-p erio dogram Regression of Time Series with L ong-Range Dep endence. Annals of Stat istics 23, 1048-1072.
Saikkonen, P. (1986). Asympt ot ic Prop ert ie s of some Preliminary Estimat ors f or Autoregre ssive Mov ing A verage Time Series Models. Journal of Time Series Analy sis 7, 133-155.
Shimot su, K. and P.C.B. Phillips (1999) Mo died Local Whit tle Est imation of the Memory Paramete r in the Nonst ationary Case. Workingpap er, Yale U niversity.
Sowe ll, F.B . (1992) Maximum Likeliho o d Estimat ion of St ationary U nivariate F rac-tionally Integrat ed Time Serie s Mode ls. Journal of Econome trics 53, 165-188.
Tanaka, K. (1999) N onstat ionary Fractional U nit Roots. Econometric The ory 15, 549-582.
Tie slau, M.A ., P. Schmidt and R.T. Baillie ( 1996) A Minimum-Dist ance Estimat or forLong-Memory Pro cesse s. Journal of Econometrics 71,249-264.
Yajima,Y. (1992)AsymptoticProp ertiesof EstimatesinIncorrec tARMA Mo delsf or Long-memoryTime Se ries. Pro c . IMA , Springer-Verlag, 375-382.
We also consider est imators t hat allow us to use Hosking's (1996) results on asymp-tot ic normality of the dierences b etwe en sample covariances. A n estimator wit h these prop erties t hat is similar in form t o ~a
YW
can b e const ructe d in the following way. We denet hevec tor (k;0)=(
1 0 2 ; 2 0 3 ;::: k 0 k +1
)anditssampleanalogue (k;^ 0); andthematrix6( k ;0)wit htypical elementf6(k;0) g
ij = ji0jj 0 ji0 j+1j ;andit ssample analogue ^
6(k ;0) : Dene t he estimator ^a( k ;0) as t he solutiont o
^
6(k;0)^a( k ;0)= ^ ( k ;0); (2:2:5)
and let a( k ;0) b e thesolution to
6(k;0) a( k ;0)= ( k ;0): (2:2:6)
As plim(~a) is denotedby a(k); here a(k;0)=plim(^a( k ;0)) .
An estimator with a simpler form was prop osed by H osking, who suggeste d that rat ios of diere nce s of auto c orrelat ions can be use d. Indeed, conside r the vect or ^a
H with comp onents ^ a H(i) = 10^ i 10^ k +1 ;
fori=1;:::;k . This ve ctor conve rges atrate T 0
1 2
t o its p opulationanalogue a H
and has an asympt ot ically normal dist ribut ion for alld 2(0
1 2 ; 1 2 ):
Proof of Theorem 1.
Consider the expression in (2.2.6).
For the norm of the left-hand-side vector we can write
a
(k
); [1;k
] (k
) ;1 [k
+1;
1)[k
+1;
1):
We begin by evaluating the norm of
[
k
+1;
1)[k
+1;
1):
Denote the j-th
com-ponent of this vector by
s
j
where
s
j
= 1 Xi
=1j
k
+i
;j
jk
+i
; j
=1;
2;:::k:
Recall that we can bound
jl
jand
jl
jas follows:
j
l
j< c
l
2d
;1and
jl
j< c
l
;d
;1for some
c
;c
>
0:
Then for
; 1 2< d <
0we have
s
j
c
c
1 Xi
=1 (k
;j
+i
) 2d
;1 (k
+i
) ;d
;1ck
~ ;d
;1 (k
;j
) 2d
;
where
~c >
0is some constant. The last inequality is obtained by noting that
P 1i
=1 (k
;j
+i
) 2d
;1is
O
((k
;j
) 2d
):
For
0< d <
1 2;
s
j
c
c
1 Xi
=1 (k
;j
+i
) 2d
;1 (k
+i
) ;d
;1 ~ck
;d
(k
;j
) 2d
;
where
~c >
0is some constant. Here the last inequality similarly follows from
P 1i
=1 (k
+i
) ;d
;1 =O
(k
;d
).
The norm of the vector with components
s
j
is thus
ks
k = Pk
j
=1s
2j
1 2and
ks
k ~ck
;2d
;2 Pk
j
=1j
4d
1 2 =O
(k
2d;1 2 );
if
; 1 2< d <
0; ks
k ~ck
;2d
Pk
j
=1j
4d
;2 1 2 =O
(k
2d;1 2 );
if
0d <
1 2:
(A.1)
1
Next consider
k(k
) ;1k
:
It is well known that if all the eigenvalues of
(k
)are greater than some
(k
)>
0then
k(k
);1
k
<
((k
)) ;1:
Consider the spectral density function associated with
;denote by
f
(x
)the spectral density multiplied by
2:
Here
f
(x
)=[P
(e
ix
)P
(e
;ix
)] ;1 (2;2cosx
) ;d
Q
(e
ix
)Q
(e
;ix
):
The polynomials
P;Q
with none of the roots on the unit circle are bounded
there from above and below, providing bounds on
f
(x
);B
L
(2;2cosx
) ;d
< f
(
x
)< B
U
(2;2cosx
) ;d
:
By Grenander and Szego (p.64) this implies that the eigenvalues associated
with
(k
)are bounded from below by the eigenvalues associated with the
function
f
L
(x
)=B
L
(2;2cosx
);
d
:
In the case
0< d <
1
2
the lower bound of
f
L
(
x
)is
B
L
4 ;d
;it follows that
ka
(k
);k=O
(k
d
;1=
2 ):
When
d <
0the lower bound of
f
L
(x
)is zero. We use a dierent approach
to obtain a tighter bound for the eigenvalues of
(k
)from below and evaluate
the rate at which this lower bound goes to zero. To do this we use properties
of eigenvalues of symmetric matrices and of circulant matrices.
Property 1 (Separation Theorem; Wilkinson (1965), pp.103-104). If
S
n
is
a symmetric matrix and is partitioned as
S
n
=S
n
;1s
s
0 1;
then the eigenvalues
i
(S
n
;1)
of
S
n
;1
numbered in order of magnitude
sepa-rate the eigenvalues
i
(S
n
)of
S
n
:
1 (S
n
) 1 (S
n
;1 ) 2 (S
n
):::
n
(S
n
):
Property 2 (Priestley(1981), p.261). If
C
k
is a circulant matrix (and also
a Hermitian Toeplitz matrix) of the form
C
k
= 2 6 6 6 6 6 6 6 6 6 6 6 4o
1
k
;1k
k
1
1
0
1
k
;1k
... ... ... ...
... ...
... ... ... ... ...
...
... ... ... ... ... ...
... ... ... ... ...
0
1
1
1
0 3 7 7 7 7 7 7 7 7 7 7 7 5
;
then its eigenvalues (here not subscipted by order of magnitude) are given
by
l
(C
k
)=k
Xr
=;k
r
exp
(;i!
l
r
);
where
!
l
= 2l
2k
+1:
(A.2)
We shall evaluate the eigenvalues of
(k
)by embedding this matrix in
a circulant
C
k
:
By Property 1 the smallest eigenvalue of
(k
)is bounded
from below by the smallest eigenvalue of
C
k
;
which will be given by
l
(C
k
)in
(A.2) for an appropriate
!
l
, moreover, by applying the separation theorem
repeatedly we can see that the smallest eigenvalue of
(k
)is bounded from
below by any of
k
smallest eigenvalues of
C
k
:
The
l
(C
k
)for large
k
is approximated by
f
(!
l
)= P1
r
=;1r
exp
(;
i!
l
r
);
which without loss of generality we can assume to equal
(1;cos2
k
);
d
;
more-over for
l
=1we have that
1 (C
k
);f
(!
1 )=;2 1 Xr
=k
+1r
cos (!
1r
)>
0by comparing magnitudes of groups of positive and negative terms in the
sum, thus
1 (C
k
)f
(!
1 )=(1;cos 2k
) ;d
O
(k
2d
);
where
indicates that
1
(
C
k
)declines at exactly the rate
O
(k
2d
)
:
Next we
show that
1(
C
k
)is among the
k
smallest eigenvalues. Indeed, for any
l
j
l
(C
k
);f
(!
l
)j= 2 1 Xr
=k
+1r
cos(!
l
r
) =O
(k
2d
):
3
On the other hand, if
k
2> l > k
with
<
1f
(!
l
)=(1;cos 2l
k
) ;d
O
(k
2d
(1;) );
thus for large enough
k
the corresponding
l
(C
k
)>
1 (
C
k
):
It follows that
k(k
) ;1 k =O
(k
;2d
)
:
Combining this with (A.1) we get
that
a
(k
); [1;k
] =O
(k
;2d
+d
; 1 2 )=O
(k
;d
; 1 2 ):
Finally, for any
;"
choose
k
such that
a
(k
); [1;k
]<
"
2;
then for that
value of
k;
by consistency of
~a
for
a
(k
)(Yajima, 1992), nd
~
T
such that
Pr(k ~a
;a
(k
)k>
"
2
)
<
for samples of size
T >
~T:
Appendix B.
We begin with a lemma which will be useful for the proof of Theorem 2.
Lemma B.
Under the assumptions in Theorem 2
.
T
;2d
T
Xt
=k
y
2t
=O
p
(1);
but not
o
p
(1);(B.1)
Ez
i
y
j
=O
(1)if
i
j;
O
(i
;j
) 2d
;2if
i > j
;(B.2)
E
(z
i
y
j
) 2 =O
(j
);(B.3)
E
(z
i
y
i
;l
z
j
y
j
;l
)=O
(1)for
i
6=j
;(B.4)
E
(T
;1z
t
;l
y
t
) 2 =O
(1);(B.5)
T
;1z
t
;l
y
t
=O
p
(1):
(B.6)
Proof.
For
y
t
dened here as the partial sum of a stationary fractionally
inte-grated process
y
t
= Pt
i
=1z
i
it was shown by Shimotsu and Phillips (1999)
that the order of magnitude of
y
t
is similar to that obtained when the process
is dened directly via the expansion of the operator
(1;L
)d
:
Thus similarly
by Lemma 2.13 in Shimotsu and Phillips we have that
Ey
2T
=O
(T
2d
;1 )and
from Tanaka (1999)
T
;2d
PT
t
=k
y
2t
;1converges in law to a functional of an
4
integrated Brownian motion and is thus bounded in probability, which gives
(B.1)
:
Since
y
t
is a partial sum of
z
where
E
(z
i
z
j
)=i
;
j
;
we have that
Ez
i
y
j
= Pj
l
=1i
;
l
:
Recall that
m
for the process
z
is
O
(m
2
d
;3);
(B.2) follows.
Next we consider the MA representation
z
t
=1 0
l
e
t
;
l
and recall that
l
=O
(l
d
;2)
;
and that
e
has bounded fourth moments by the assumptions in
2.1. We can express
E
(z
i
1z
i
2z
i
3z
i
4 )=E
(e
4 ) P 1l
=0l
l
+i
1 ;i
2l
+i
1 ;i
3l
+i
1 ;i
4 + (E
(e
2 )) 2 [i
1 ;i
2i
3;i
4 +i
1;i
3i
2 ;i
4 +i
1;i
4i
2;i
3 ] =O
(ji
1 ;i
2 jd
;2 ji
1 ;i
3 jd
;2 ji
1 ;i
4 jd
;2 +ji
1 ;i
2 j 2d
;3 ji
3 ;i
4 j 2d
;3 +ji
1 ;i
3 j 2d
;3 ji
2 ;i
4 j 2d
;3 +ji
1 ;i
4 j 2d
;3 ji
2 ;i
3 j 2d
;3 ):
for all dierent subscripts on
z;
and similarly
E
(z
2i
z
2l
)=O
(1);E
(z
2j
z
l
z
m
)=O
(jj
;l
j 3d
;3 jj
;m
j 2d
;3 ):
Then (B.3) is obtained by expressing
E
(z
i
y
j
) 2 =j
Xl
=1E
(z
i
z
l
) 2 +2j
Xl
=1j
Xm
=l
+1E
(z
i
2z
l
z
m
)substituting the expectation for each term from the relation above and
eval-uating the orders. Note that it is terms
E
(z
2
i
z
2j
)that provide the largest
contribution to the sum.
A similar substitution into the expression
E
(z
i
y
i
;l
z
j
y
j
;l
)=i
;l
Xn
=1j
;l
Xm
=1z
i
z
n
z
j
z
m
provides (B.4); note that this expression has no terms of the form
E
(z
2i
z
2j
).
The expression in (B.5) immediately follows and thus (B.6) follows by
Chebyshev's inequality.
Proof of Theorem 2.
Consider the AR(
1)representation and write
y
t
; 1y
t
;1 ;;k
y
t
;k
= 1 Xi
=1k
+i
y
t
;k
;i
+"
t
;t
=k
+1;:::;T
(B.7)
5
Recall that
y
with a non-positive subscipt is zero, thus on the right-hand
side we have
t
;k
Xi
=1k
+i
y
t
;k
;i
+"
t
;denote this quantity, which represents the error in the k-th order AR
repre-sentation, by
u
t
:
Compare the OLS estimator
a
^ 0 =( ^a
1;:::
^a
k
)of the model
y
t
= 1y
t
;1 ++k
y
t
;k
+u
t
(B.8)
with the OLS estimator
^ 0 =( ^ 1;:::
^k
)of
y
t
=X
t
+u
t
(B.9)
with
X
t
= (Z
t
;y
t
;1 )=(z
t
;1;:::z
t
;k
+1;y
t
;1 );
0 =( 1;:::
k
):
We note that the one-to-one relation
1=
k
+ 1;
l
=l
;1;
l
for
l
=2;:::;k
produces the same residuals in each model thus we get
a
^1 = ^
k
+ ^ 1 ;^a
l
= ^l
;1 ; ^l
for
k
=l >
1:
If we can establish that
^is a consistent estimator
of
consistency of
a
^as an estimator of
will follow.
Consider (B.9); we can write
^ ; = "T
Xt
=k
X
0t
X
t
# ;1 "T
Xt
=k
X
0t
u
t
#:
:
Partition
;=T
Xt
=k
X
0t
X
t
=A v
v
0w
with
A
=T
Xt
=k
Z
0t
Z
t
;v
=T
Xt
=k
Z
0t
y
t
;1 ;w
=T
Xt
=k
y
2t
;1:
Dene a diagonal
k
k
normalizing matrix
=diag
(T
1
=
2;:::;T
1=
2;T
d
);then
; ;1 =T
;1A
T
;1=
2;d
v
T
;1=
2;d
v
0T
;2d
w
;1:
Since
Z
t
is stationary and as was demonstrated for
d
0= 1;
d <
0;
the
covariance matrix for the process
(z
)is such that
k(z
);1
k =
O
(k
;2d
0 )
;
it
can be shown from Lemma 3 of Berk (1974) that as
T
!1for
k
such that
k
2T
;1 !0;
k(T
;1A
) ;1 ;(z
) ;1 k=o
p
(k
;2d
0 ):
From (B.6)
T
; 1 2 ;d
v
=o
p
(1);and from (B.1)
T
;2d
!
=
O
p
(1)and is bounded in probability away from zero
:
Thus the inverse of the partitioned matrix is
; ;1 = (
z
) ;1 (1+o
p
(1))o
p
(1)o
p
(1)O
p
(1):
Finally evaluate the order of terms in suitably normalized vector
PT
t
=k
X
0t
u
t
:
We can ignore the
"
t
part in
u
t
and need to concentrate on evaluating the
terms
T
;1 PT
t
=k
z
t
;l
Pt
;k
;1i
=1k
+
i
y
t
;k
;i
for the subvector
T
;1 PT
t
=k
Z
0t
u
t
of the
rst
k
;1components and
T
;2d
PT
t
=k
y
t
;1 Pt
;k
;1i
=1k
+
i
y
t
;k
;i
for the
k
th
com-ponent.
Consider rst
T
;1 PT
t
=k
z
t
;l
Pt
;k
;1i
=1k
+i
y
t
;k
;i
for
l
=1;:::;k
;1;denote
this by
W
(l
):
Then
W
(l
) =T
;1 PT
;k
i
=1w
i
with
w
i
=k
+i
PT
t
=k
+i
z
t
;l
y
t
;k
;i
:
From (B.2)
Ew
i
=k
+i
T
Xt
=k
+i
O
(k
+i
;l
) 2d
;2 =O
(k
;d
;1T
2d
;1 ):
Using (B.3)
Ew
2i
=O
(k
;2d
;2T
2 ):
And from (B.4) for
i
6=j
Ew
i
w
j
=O
(k
;2d
;2T
):
Thus
E
(T
;1T
;k
Xi
=1w
i
) 2 =O
(k
;2d
;2 )and for each
l
W
(l
)=T
;1T
;k
Xi
=1w
i
=O
p
(k
d
;1 ):
Then
T
;1T
Xt
=k
Z
0t
u
t
=(W
(l
) 2 ) 1=
2 =O
p
(k
;d
;1=
2 )and
;1T
;1T
Xt
=k
Z
0t
u
t
=O
p
(k
;3d
+3=
2 ):
7
Finally
T
;2d
Xk
+i
T
Xt
=k
+i
y
t
;1y
t
;k
;i
T
;2d
Xk
+i
T
Xt
=k
+i
y
2t
;1 =O
p
(k
;d
)and the statement of the Theorem obtains since
k
;3d
+3=
2< k
;d
if
d
3 4and
is
k
;d
for
1 2< d <
3 4.
Appendix C.Proof of Theorem 3.
Dene for some
1< n < T
the matrix
^(;
;k
) = ^(
k
);^n
0
and the
vector
^(;
;k
)=^(
k
);^
n
;
where
is a vector of ones, and consider
^a
(;;k
)that solves
^
(;
;k
) ^a
(;;k
)=^(;
;k
):
Similarly dened matrices and vectors for the population quantities will
be denoted by the same symbols but without the hats. Also dene
D
l
=T
1=
2 [^0 ;
0
;
^
l
;l
]:
To prove Theorem 3 we prove the following Theorem
that provides an explicit description of the asymptotics for the terms
(k
)= ^a
(;;k
);a
(;;k
)and a bound for
a
(;;k
); [1;k
]:
Theorem C.
Under the conditions of Theorem 3 as
T
!1for any xed
k;n
the limiting distribution of
T
1 2( ^
a
(;;k
);a
(;;k
))is multivariate normal
with mean
0 and covariance matrix
W
(k
)=(;;k
);1
AMA
0(;
;k
);1
:
The
k
(k
+1)matrix
A
has elements
fA
gij
= 8 < :a
i
;j
+a
i
+j
for
1i;j
k;i
6=j;
a
2i
;1for
i
=j;
1;a
i
for
j
=k
+1;
(C.1)
where
a
l
are the components of the vector
a
(;;k
)for
1l
k
and
a
0 =1
;
a
l
= 0;
if
l <
0. The (
k
+1)(k
+1)matrix
M
is the limit covariance
matrix of the vector
G
with the
l
th component
G
l
=D
l
for
l
= 1;
2:::k
and
G
k
+1=